Text this: Model-based viewpoint invariant human activity recognition from uncalibrated monocular video sequence