Visual learning with weakly labeled video