Text this: Source code classification using latent semantic indexing with structural and frequency term weighting