Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani
Summarization is a process to select important information from a source text. Summarizing strategies are the core of the cognitive processes involved in the summarization activity. Summarizing strategies include a set of conscious tasks that are used to determine important information and extrac...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Published: |
2016
|
Subjects: | |
Online Access: | http://studentsrepo.um.edu.my/6400/4/seyed.pdf http://studentsrepo.um.edu.my/6400/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.um.stud.6400 |
---|---|
record_format |
eprints |
spelling |
my.um.stud.64002019-10-23T19:06:07Z Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani Seyed Asadollah, Abdiesfandani QA75 Electronic computers. Computer science Summarization is a process to select important information from a source text. Summarizing strategies are the core of the cognitive processes involved in the summarization activity. Summarizing strategies include a set of conscious tasks that are used to determine important information and extract the main idea of a source text. In this research project, we conducted a study on students’ summaries. The findings of the study show that, there is a strong relationship between the summary writing proficiency of students and the summarizing strategies that they used. We then develop a new algorithm to address the summarizing strategies identification problem. The algorithm simulates two important tasks that are frequently used by the human experts to identify summarizing strategies used to produce the summary sentences: 1) sentences relevance identification; and 2) summarizing strategies identification. The sentences relevance identification module uses a statistical based approach such as vector space model (VSM) to represent sentences and compute similarity between the source sentences and the summary sentences using the cosine similarity measure. It then integrates both the semantic and syntactic similarity measures using a linear equation to capture the meaning in comparison between two sentences. It aims to distinguish the meaning of two sentences, when two sentences have same surface or share the similar bag-of-words (BOW), while their meaning is different. The module also employed a word semantic similarity measuring method to overcome vocabulary mismatch problem in sentence comparison. The method bridges the lexical gaps for semantically similar contexts that are expressed in a different wording. In addition, the sentences relevance identification module requires some degree of linguistic pre-processing, including part of speech tagging (POS), word stemming and stop-words removal. iii The summarizing strategies identification module relies on a set of heuristic rules, statistical and linguistic methods such as position-based method, title-based method, cue-phrase method and word-frequency method to identify the summarizing strategies employed by students. To evaluate the algorithm, we conducted two experiments. In the first experiment, we examine the functionality of the system, whether the system is able to identify the summarizing strategies used by students in summary writing. The result for the first experiment shows that the system is able to identify some of summarizing strategies which are deletion, sentence combination, paraphrase and topic sentence selection. The system is also able to detect copy- verbatim strategy, the most commonly strategy used by students. Besides than these strategies, there are four methods used in topic sentence selection strategy which can also be identified by the system. They are 1) cue method; 2) title method; 3) keyword method; and 4) location method. In the second experiment, we want to measure the performance of the algorithm against human judgment to identify the summarizing strategies using the precision, recall, F-measure score and accuracy rate. The experimental results show that the proposed algorithm achieved acceptable results in comparison to human judgment. The algorithm achieved an average of 87% precision, 83% of recall, 85% of F-score and 82% of accuracy rate. 2016 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/6400/4/seyed.pdf Seyed Asadollah, Abdiesfandani (2016) Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/6400/ |
institution |
Universiti Malaya |
building |
UM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaya |
content_source |
UM Student Repository |
url_provider |
http://studentsrepo.um.edu.my/ |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Seyed Asadollah, Abdiesfandani Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani |
description |
Summarization is a process to select important information from a source text.
Summarizing strategies are the core of the cognitive processes involved in the
summarization activity. Summarizing strategies include a set of conscious tasks that are
used to determine important information and extract the main idea of a source text.
In this research project, we conducted a study on students’ summaries. The findings of
the study show that, there is a strong relationship between the summary writing
proficiency of students and the summarizing strategies that they used. We then develop
a new algorithm to address the summarizing strategies identification problem. The
algorithm simulates two important tasks that are frequently used by the human experts
to identify summarizing strategies used to produce the summary sentences: 1) sentences
relevance identification; and 2) summarizing strategies identification.
The sentences relevance identification module uses a statistical based approach such as
vector space model (VSM) to represent sentences and compute similarity between the
source sentences and the summary sentences using the cosine similarity measure. It then
integrates both the semantic and syntactic similarity measures using a linear equation to
capture the meaning in comparison between two sentences. It aims to distinguish the
meaning of two sentences, when two sentences have same surface or share the similar
bag-of-words (BOW), while their meaning is different. The module also employed a
word semantic similarity measuring method to overcome vocabulary mismatch problem
in sentence comparison. The method bridges the lexical gaps for semantically similar
contexts that are expressed in a different wording. In addition, the sentences relevance
identification module requires some degree of linguistic pre-processing, including part
of speech tagging (POS), word stemming and stop-words removal.
iii
The summarizing strategies identification module relies on a set of heuristic rules,
statistical and linguistic methods such as position-based method, title-based method,
cue-phrase method and word-frequency method to identify the summarizing strategies
employed by students.
To evaluate the algorithm, we conducted two experiments. In the first experiment, we
examine the functionality of the system, whether the system is able to identify the
summarizing strategies used by students in summary writing. The result for the first
experiment shows that the system is able to identify some of summarizing strategies
which are deletion, sentence combination, paraphrase and topic sentence selection. The
system is also able to detect copy- verbatim strategy, the most commonly strategy used
by students. Besides than these strategies, there are four methods used in topic sentence
selection strategy which can also be identified by the system. They are 1) cue method;
2) title method; 3) keyword method; and 4) location method. In the second experiment,
we want to measure the performance of the algorithm against human judgment to
identify the summarizing strategies using the precision, recall, F-measure score and
accuracy rate. The experimental results show that the proposed algorithm achieved
acceptable results in comparison to human judgment. The algorithm achieved an
average of 87% precision, 83% of recall, 85% of F-score and 82% of accuracy rate. |
format |
Thesis |
author |
Seyed Asadollah, Abdiesfandani |
author_facet |
Seyed Asadollah, Abdiesfandani |
author_sort |
Seyed Asadollah, Abdiesfandani |
title |
Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani
|
title_short |
Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani
|
title_full |
Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani
|
title_fullStr |
Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani
|
title_full_unstemmed |
Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani
|
title_sort |
relevance detection and summarizing strategies identification algorithm using linguistic measures / seyed asadollah abdiesfandani |
publishDate |
2016 |
url |
http://studentsrepo.um.edu.my/6400/4/seyed.pdf http://studentsrepo.um.edu.my/6400/ |
_version_ |
1738505911296589824 |
score |
13.211869 |