Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula
The k-NN is one of the most popular and easy in implementation algorithm to classify the data. The best thing about k-NN is that it accepts changes with improved version. Despite many advantages of the k-NN, it is also facing many issues. These issues are: distance/similarity calculation complexity,...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference or Workshop Item |
Published: |
Institute of Electrical and Electronics Engineers Inc.
2016
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84995551134&doi=10.1109%2fISMSC.2015.7594066&partnerID=40&md5=449ec4f765f99240969706e2a6057759 http://eprints.utp.edu.my/30930/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utp.eprints.30930 |
---|---|
record_format |
eprints |
spelling |
my.utp.eprints.309302022-03-25T07:43:54Z Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula Zardari, M.A. Jung, L.T. The k-NN is one of the most popular and easy in implementation algorithm to classify the data. The best thing about k-NN is that it accepts changes with improved version. Despite many advantages of the k-NN, it is also facing many issues. These issues are: distance/similarity calculation complexity, training dataset complexity at classification phase, proper selection of k, and get duplicate values when training dataset is of single class. This paper focuses on only issue of distance/similarity calculation complexity. To avoid this complexity a new distance formula is proposed. The CF-DWF formula is only strings. The CF-DWF is no applicable for other data types. The F1-Score and precision of CF-DWF with k-NN are higher than traditional k-NN. The proposed similarity formula is also efficient than Euclidean Distance (E.D) and Cosine Similarity (C.S). The results section depicts that the k-NN with CF-DWF reduced computational complexity of k-NN with E.D and C.S from 4.77 to 43.69 and improved the F1-Score of traditional k-NN from 12 to 19. © 2015 IEEE. Institute of Electrical and Electronics Engineers Inc. 2016 Conference or Workshop Item NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-84995551134&doi=10.1109%2fISMSC.2015.7594066&partnerID=40&md5=449ec4f765f99240969706e2a6057759 Zardari, M.A. and Jung, L.T. (2016) Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula. In: UNSPECIFIED. http://eprints.utp.edu.my/30930/ |
institution |
Universiti Teknologi Petronas |
building |
UTP Resource Centre |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Petronas |
content_source |
UTP Institutional Repository |
url_provider |
http://eprints.utp.edu.my/ |
description |
The k-NN is one of the most popular and easy in implementation algorithm to classify the data. The best thing about k-NN is that it accepts changes with improved version. Despite many advantages of the k-NN, it is also facing many issues. These issues are: distance/similarity calculation complexity, training dataset complexity at classification phase, proper selection of k, and get duplicate values when training dataset is of single class. This paper focuses on only issue of distance/similarity calculation complexity. To avoid this complexity a new distance formula is proposed. The CF-DWF formula is only strings. The CF-DWF is no applicable for other data types. The F1-Score and precision of CF-DWF with k-NN are higher than traditional k-NN. The proposed similarity formula is also efficient than Euclidean Distance (E.D) and Cosine Similarity (C.S). The results section depicts that the k-NN with CF-DWF reduced computational complexity of k-NN with E.D and C.S from 4.77 to 43.69 and improved the F1-Score of traditional k-NN from 12 to 19. © 2015 IEEE. |
format |
Conference or Workshop Item |
author |
Zardari, M.A. Jung, L.T. |
spellingShingle |
Zardari, M.A. Jung, L.T. Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula |
author_facet |
Zardari, M.A. Jung, L.T. |
author_sort |
Zardari, M.A. |
title |
Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula |
title_short |
Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula |
title_full |
Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula |
title_fullStr |
Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula |
title_full_unstemmed |
Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula |
title_sort |
data classification with k-nn using novel character frequency-direct word frequency (cf-dwf) similarity formula |
publisher |
Institute of Electrical and Electronics Engineers Inc. |
publishDate |
2016 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-84995551134&doi=10.1109%2fISMSC.2015.7594066&partnerID=40&md5=449ec4f765f99240969706e2a6057759 http://eprints.utp.edu.my/30930/ |
_version_ |
1738657176692457472 |
score |
13.211869 |