Methods and systems for targeted evaluations of clinical machine learning models on the deployment population