Deep episodic value iteration : a theory of sample efficient learning for machine learning and cognitive (neuro) science