Text this: A framework for human action detection via extraction of multimodal features.