Information-directed sampling for reinforcement learning