Text this: A framework for extracting, classifying, analyzing, and presenting information from semi-structured web data sources