Efficient exploration in bandit and reinforcement learning