Identifying the Dominant Language of Web Page Using Supervised N-grams

Natural language processing is an emerging technology in linguistic industry and an aid to human-computer interaction in computer science. Language identification, on the other hand, is a form of pattern recognition that helps to identify predefined language of a web page and to predict the unknow...

Full description

Saved in:
Bibliographic Details
Main Authors: Ng, Choon-Ching, Siau-Chuin, Liew, Wan Muhammad Syahrir, Wan Hussin, Tutut, Herawan
Format: Article
Language:en
Published: Conference Publishing Services (CPS) 2013
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/6869/1/dentifying_the_Dominant_Language_of_Web_Page_Using_Supervised_N-grams.pdf
http://umpir.ump.edu.my/id/eprint/6869/
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516378
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1831522064108879872
author Ng, Choon-Ching
Siau-Chuin, Liew
Wan Muhammad Syahrir, Wan Hussin
Tutut, Herawan
author_facet Ng, Choon-Ching
Siau-Chuin, Liew
Wan Muhammad Syahrir, Wan Hussin
Tutut, Herawan
author_sort Ng, Choon-Ching
building UMPSA Library
collection Institutional Repository
content_provider Universiti Malaysia Pahang Al-Sultan Abdullah
content_source UMPSA Institutional Repository
continent Asia
country Malaysia
description Natural language processing is an emerging technology in linguistic industry and an aid to human-computer interaction in computer science. Language identification, on the other hand, is a form of pattern recognition that helps to identify predefined language of a web page and to predict the unknown language of one particular text. Written texts are constructed by common features such as character, word and n-gram and these characteristics are unique among languages. From the experiment result, the performance of the supervised n-gram produces an accurate identification value and outperforms the distance measurement on Arabic script web pages.
format Article
id my.ump.umpir.6869
institution Universiti Malaysia Pahang
language en
publishDate 2013
publisher Conference Publishing Services (CPS)
record_format eprints
spelling my.ump.umpir.68692018-04-27T01:15:23Z http://umpir.ump.edu.my/id/eprint/6869/ Identifying the Dominant Language of Web Page Using Supervised N-grams Ng, Choon-Ching Siau-Chuin, Liew Wan Muhammad Syahrir, Wan Hussin Tutut, Herawan QA76 Computer software Natural language processing is an emerging technology in linguistic industry and an aid to human-computer interaction in computer science. Language identification, on the other hand, is a form of pattern recognition that helps to identify predefined language of a web page and to predict the unknown language of one particular text. Written texts are constructed by common features such as character, word and n-gram and these characteristics are unique among languages. From the experiment result, the performance of the supervised n-gram produces an accurate identification value and outperforms the distance measurement on Arabic script web pages. Conference Publishing Services (CPS) 2013 Article PeerReviewed application/pdf en http://umpir.ump.edu.my/id/eprint/6869/1/dentifying_the_Dominant_Language_of_Web_Page_Using_Supervised_N-grams.pdf Ng, Choon-Ching and Siau-Chuin, Liew and Wan Muhammad Syahrir, Wan Hussin and Tutut, Herawan (2013) Identifying the Dominant Language of Web Page Using Supervised N-grams. 2012 International Conference on Advanced Computer Science Applications and Technologies. pp. 344-348. (Published) http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516378 10.1109/ACSAT.2012.74
spellingShingle QA76 Computer software
Ng, Choon-Ching
Siau-Chuin, Liew
Wan Muhammad Syahrir, Wan Hussin
Tutut, Herawan
Identifying the Dominant Language of Web Page Using Supervised N-grams
title Identifying the Dominant Language of Web Page Using Supervised N-grams
title_full Identifying the Dominant Language of Web Page Using Supervised N-grams
title_fullStr Identifying the Dominant Language of Web Page Using Supervised N-grams
title_full_unstemmed Identifying the Dominant Language of Web Page Using Supervised N-grams
title_short Identifying the Dominant Language of Web Page Using Supervised N-grams
title_sort identifying the dominant language of web page using supervised n-grams
topic QA76 Computer software
url http://umpir.ump.edu.my/id/eprint/6869/1/dentifying_the_Dominant_Language_of_Web_Page_Using_Supervised_N-grams.pdf
http://umpir.ump.edu.my/id/eprint/6869/
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516378
url_provider http://umpir.ump.edu.my/