Text this: Enhancing document clustering by integrating semantic background knowledge and syntactic features into the bag of words representation