Purposive visual imitation for learning structured tasks from videos