RLVF: Learning from Verbal Feedback without Overgeneralization