Text this: Improving the efficiency of clustering algorithm for duplicates detection