Reinforcement learning : when can we do sample efficient exploration?