Text this: On the audio-visual emotion recognition using convolutional neural networks and extreme learning machine