An Unexpected Challenge in Using Forward-Mode Automatic Differentiation for Low-Memory Deep Learning