On the evaluation of deep generative models