Multimodal object representation learning in haptic, auditory, and visual domains