UTILISING SOCIAL MEDIA THROUGH CROWDSOURCING FOR MORPHOLOGICAL RESOURCES ACQUISITION OF UNDER-RESOURCED LANGUAGE (U-RL): MEl.ANAU
Morphological analyser is the finl processing 1001 required in Nalural Language Processing To anct/yse structure ofa word, the analyser needs 1I100phologica/ resources The resourCeS are from dictionary. grammar book6), and wrilft:n lexls. Yei. huw fa acquire morphological rl!sources jor under-res()u...
Saved in:
Main Author: | |
---|---|
Format: | Final Year Project Report |
Language: | English English |
Published: |
Universiti Malaysia Sarawak, (UNIMAS)
2015
|
Subjects: | |
Online Access: | http://ir.unimas.my/id/eprint/40178/1/Voon%20Mei%20Wei%2024pgs.pdf http://ir.unimas.my/id/eprint/40178/5/Voon%20Mei%20Wei%20latestft.pdf http://ir.unimas.my/id/eprint/40178/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.unimas.ir.40178 |
---|---|
record_format |
eprints |
spelling |
my.unimas.ir.401782024-09-17T08:50:26Z http://ir.unimas.my/id/eprint/40178/ UTILISING SOCIAL MEDIA THROUGH CROWDSOURCING FOR MORPHOLOGICAL RESOURCES ACQUISITION OF UNDER-RESOURCED LANGUAGE (U-RL): MEl.ANAU VOON, MEl WEl QA Mathematics Morphological analyser is the finl processing 1001 required in Nalural Language Processing To anct/yse structure ofa word, the analyser needs 1I100phologica/ resources The resourCeS are from dictionary. grammar book6), and wrilft:n lexls. Yei. huw fa acquire morphological rl!sources jor under-res()urced languages knOllling fha! the /cmglloges are criljca/ly locking of malerials:1 in curren! approach. morphological reSOurCeS an: ucquirf:1d (rom hardcopy versions wherttby (lilt! needs (0 digiNse Ihe documents infO sofh:opy versinns. Dlie to d~[ficlllly in digilisa/iun us if IS lim~ consuming and expensive, 'his project is propO~'ing 11 workflow of acql/iring morphvlogicol resources jor lInder-resourced languages, in the case of Jvle/UI1CIlI language, hy ulili!.·ing social media. Three main stages in the work are: i) c:rowdsourcing the social media hy using CI weh crawler Spider 1.1 and Jsoup methud: ii) performing hybrid normalisalion to (rcm~furm the crawled dolO with informollll1d noisy nature il1lo a cle(med wordlisl: ;il) validating the wordlist, is a crucial stage due fo languages mixing that causes uncertainty of spelling standard. AI thiS sfCl¢e, edIt distance similarilyalgorithms, Jaro-Winkler distance, Levenshtein-based distance, and N-grwn distam.'c, are applied to ;den/~fy the spelling !.'/andord be/ween a source word from Ihe wordlis/ and a largel word in the dictionary. The resulls shOl.." fhal Jaro-Winkler pet/orms the best compared /0 the other two algorithms becc/llse il returns Ihe highes! F-score and the longesl validaled It'ordlisl. The l'ulidaled wordli.\'ts are then considered ns {he A1elol1ou morp/wtogical reSOJ/J"Ci?'i !hal Clln he apfJ"ed I~v computational linguists in the computCllional morphology. indirectly, Ihl! proposed workflow can also bi; usal (0 acquire morphological ri;SOLlrtes for other 1./J1c!i;r-resourced languages in Sura wok Universiti Malaysia Sarawak, (UNIMAS) 2015 Final Year Project Report NonPeerReviewed text en http://ir.unimas.my/id/eprint/40178/1/Voon%20Mei%20Wei%2024pgs.pdf text en http://ir.unimas.my/id/eprint/40178/5/Voon%20Mei%20Wei%20latestft.pdf VOON, MEl WEl (2015) UTILISING SOCIAL MEDIA THROUGH CROWDSOURCING FOR MORPHOLOGICAL RESOURCES ACQUISITION OF UNDER-RESOURCED LANGUAGE (U-RL): MEl.ANAU. [Final Year Project Report] (Unpublished) |
institution |
Universiti Malaysia Sarawak |
building |
Centre for Academic Information Services (CAIS) |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaysia Sarawak |
content_source |
UNIMAS Institutional Repository |
url_provider |
http://ir.unimas.my/ |
language |
English English |
topic |
QA Mathematics |
spellingShingle |
QA Mathematics VOON, MEl WEl UTILISING SOCIAL MEDIA THROUGH CROWDSOURCING FOR MORPHOLOGICAL RESOURCES ACQUISITION OF UNDER-RESOURCED LANGUAGE (U-RL): MEl.ANAU |
description |
Morphological analyser is the finl processing 1001 required in Nalural Language Processing To anct/yse structure ofa word, the analyser needs 1I100phologica/ resources The resourCeS are from dictionary. grammar book6), and wrilft:n lexls. Yei. huw fa acquire morphological rl!sources jor under-res()urced languages knOllling fha! the /cmglloges are criljca/ly locking of malerials:1 in curren! approach. morphological reSOurCeS an: ucquirf:1d (rom hardcopy versions wherttby (lilt! needs (0 digiNse Ihe documents infO sofh:opy versinns. Dlie to d~[ficlllly in digilisa/iun us if IS lim~ consuming and expensive, 'his project is propO~'ing 11 workflow of acql/iring morphvlogicol resources jor lInder-resourced languages, in the case of Jvle/UI1CIlI language, hy ulili!.·ing social media. Three main stages in the work are: i) c:rowdsourcing the social media hy using CI weh crawler Spider 1.1 and Jsoup methud: ii) performing hybrid normalisalion to (rcm~furm the crawled dolO with informollll1d noisy nature il1lo a cle(med wordlisl: ;il) validating the wordlist, is a crucial stage due fo languages mixing that causes uncertainty of spelling standard. AI thiS sfCl¢e, edIt distance similarilyalgorithms, Jaro-Winkler distance, Levenshtein-based distance, and N-grwn distam.'c, are applied to ;den/~fy the spelling !.'/andord be/ween a source word from Ihe wordlis/ and a largel word in the dictionary. The resulls shOl.." fhal Jaro-Winkler pet/orms the best compared /0 the other two algorithms becc/llse il returns Ihe highes! F-score and the longesl validaled It'ordlisl. The l'ulidaled wordli.\'ts are then considered ns {he A1elol1ou morp/wtogical reSOJ/J"Ci?'i !hal Clln he apfJ"ed I~v computational linguists in the computCllional morphology. indirectly, Ihl! proposed workflow can also bi; usal (0 acquire morphological ri;SOLlrtes for other 1./J1c!i;r-resourced languages in Sura wok |
format |
Final Year Project Report |
author |
VOON, MEl WEl |
author_facet |
VOON, MEl WEl |
author_sort |
VOON, MEl WEl |
title |
UTILISING SOCIAL MEDIA THROUGH CROWDSOURCING FOR MORPHOLOGICAL RESOURCES ACQUISITION OF UNDER-RESOURCED LANGUAGE (U-RL): MEl.ANAU |
title_short |
UTILISING SOCIAL MEDIA THROUGH CROWDSOURCING FOR MORPHOLOGICAL RESOURCES ACQUISITION OF UNDER-RESOURCED LANGUAGE (U-RL): MEl.ANAU |
title_full |
UTILISING SOCIAL MEDIA THROUGH CROWDSOURCING FOR MORPHOLOGICAL RESOURCES ACQUISITION OF UNDER-RESOURCED LANGUAGE (U-RL): MEl.ANAU |
title_fullStr |
UTILISING SOCIAL MEDIA THROUGH CROWDSOURCING FOR MORPHOLOGICAL RESOURCES ACQUISITION OF UNDER-RESOURCED LANGUAGE (U-RL): MEl.ANAU |
title_full_unstemmed |
UTILISING SOCIAL MEDIA THROUGH CROWDSOURCING FOR MORPHOLOGICAL RESOURCES ACQUISITION OF UNDER-RESOURCED LANGUAGE (U-RL): MEl.ANAU |
title_sort |
utilising social media through crowdsourcing for morphological resources acquisition of under-resourced language (u-rl): mel.anau |
publisher |
Universiti Malaysia Sarawak, (UNIMAS) |
publishDate |
2015 |
url |
http://ir.unimas.my/id/eprint/40178/1/Voon%20Mei%20Wei%2024pgs.pdf http://ir.unimas.my/id/eprint/40178/5/Voon%20Mei%20Wei%20latestft.pdf http://ir.unimas.my/id/eprint/40178/ |
_version_ |
1811600372768178176 |
score |
13.211869 |