Text this: Multilingual multimodal integration of sketch and speech: A generic speech representation model for spatial description