Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula

The k-NN is one of the most popular and easy in implementation algorithm to classify the data. The best thing about k-NN is that it accepts changes with improved version. Despite many advantages of the k-NN, it is also facing many issues. These issues are: distance/similarity calculation complexity,...

Full description

Saved in:
Bibliographic Details
Main Authors: Zardari, M.A., Jung, L.T.
Format: Conference or Workshop Item
Published: Institute of Electrical and Electronics Engineers Inc. 2016
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-84995551134&doi=10.1109%2fISMSC.2015.7594066&partnerID=40&md5=449ec4f765f99240969706e2a6057759
http://eprints.utp.edu.my/30930/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utp.eprints.30930
record_format eprints
spelling my.utp.eprints.309302022-03-25T07:43:54Z Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula Zardari, M.A. Jung, L.T. The k-NN is one of the most popular and easy in implementation algorithm to classify the data. The best thing about k-NN is that it accepts changes with improved version. Despite many advantages of the k-NN, it is also facing many issues. These issues are: distance/similarity calculation complexity, training dataset complexity at classification phase, proper selection of k, and get duplicate values when training dataset is of single class. This paper focuses on only issue of distance/similarity calculation complexity. To avoid this complexity a new distance formula is proposed. The CF-DWF formula is only strings. The CF-DWF is no applicable for other data types. The F1-Score and precision of CF-DWF with k-NN are higher than traditional k-NN. The proposed similarity formula is also efficient than Euclidean Distance (E.D) and Cosine Similarity (C.S). The results section depicts that the k-NN with CF-DWF reduced computational complexity of k-NN with E.D and C.S from 4.77 to 43.69 and improved the F1-Score of traditional k-NN from 12 to 19. © 2015 IEEE. Institute of Electrical and Electronics Engineers Inc. 2016 Conference or Workshop Item NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-84995551134&doi=10.1109%2fISMSC.2015.7594066&partnerID=40&md5=449ec4f765f99240969706e2a6057759 Zardari, M.A. and Jung, L.T. (2016) Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula. In: UNSPECIFIED. http://eprints.utp.edu.my/30930/
institution Universiti Teknologi Petronas
building UTP Resource Centre
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Petronas
content_source UTP Institutional Repository
url_provider http://eprints.utp.edu.my/
description The k-NN is one of the most popular and easy in implementation algorithm to classify the data. The best thing about k-NN is that it accepts changes with improved version. Despite many advantages of the k-NN, it is also facing many issues. These issues are: distance/similarity calculation complexity, training dataset complexity at classification phase, proper selection of k, and get duplicate values when training dataset is of single class. This paper focuses on only issue of distance/similarity calculation complexity. To avoid this complexity a new distance formula is proposed. The CF-DWF formula is only strings. The CF-DWF is no applicable for other data types. The F1-Score and precision of CF-DWF with k-NN are higher than traditional k-NN. The proposed similarity formula is also efficient than Euclidean Distance (E.D) and Cosine Similarity (C.S). The results section depicts that the k-NN with CF-DWF reduced computational complexity of k-NN with E.D and C.S from 4.77 to 43.69 and improved the F1-Score of traditional k-NN from 12 to 19. © 2015 IEEE.
format Conference or Workshop Item
author Zardari, M.A.
Jung, L.T.
spellingShingle Zardari, M.A.
Jung, L.T.
Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula
author_facet Zardari, M.A.
Jung, L.T.
author_sort Zardari, M.A.
title Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula
title_short Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula
title_full Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula
title_fullStr Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula
title_full_unstemmed Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula
title_sort data classification with k-nn using novel character frequency-direct word frequency (cf-dwf) similarity formula
publisher Institute of Electrical and Electronics Engineers Inc.
publishDate 2016
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-84995551134&doi=10.1109%2fISMSC.2015.7594066&partnerID=40&md5=449ec4f765f99240969706e2a6057759
http://eprints.utp.edu.my/30930/
_version_ 1738657176692457472
score 13.211869