Text this: A crowd video retrieval framework using generic descriptors