Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights
Plagiarism detection (PD) in natural language processing involves locating similar words in two distinct sources. The paper introduces a new approach to plagiarism detection utilizing bidirectional encoder represen-tations from transformers (BERT)-generated embedding, an enhanced artificial bee colo...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Published: |
Elsevier
2024
|
Subjects: | |
Online Access: | http://eprints.um.edu.my/44306/ https://doi.org/10.1016/j.eswa.2023.122088 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.um.eprints.44306 |
---|---|
record_format |
eprints |
spelling |
my.um.eprints.443062024-07-05T03:06:52Z http://eprints.um.edu.my/44306/ Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights Xiong, Jiale Yang, Jing Yang, Lei Awais, Muhammad Khan, Abdullah Ayub Alizadehsani, Roohallah Acharya, U. Rajendra QA75 Electronic computers. Computer science TK Electrical engineering. Electronics Nuclear engineering Plagiarism detection (PD) in natural language processing involves locating similar words in two distinct sources. The paper introduces a new approach to plagiarism detection utilizing bidirectional encoder represen-tations from transformers (BERT)-generated embedding, an enhanced artificial bee colony (ABC) optimization algorithm for pre-training, and a training process based on reinforcement learning (RL). The BERT model can be incorporated into a subsequent task and meticulously refined to function as a model, enabling it to apprehend a variety of linguistic characteristics. Imbalanced classification is one of the fundamental obstacles to PD. To handle this predicament, we present a novel methodology utilizing RL, in which the problem is framed as a series of sequential decisions in which an agent receives a reward at each level for classifying a received instance. To address the disparity between classes, it is determined that the majority class will receive a lower reward than the minority class. We also focus on the training stage, which often utilizes gradient-based learning techniques like backpropagation (BP), leading to certain drawbacks such as sensitivity to initialization. In our proposed model, we utilize a mutual learning-based ABC (ML-ABC) approach that adjusts the food source with the most beneficial results for the candidate by considering a mutual learning factor that incorporates the initial weight. We evaluated the efficacy of our novel approach by contrasting its results with those of population-based techniques using three standard datasets, namely Stanford Natural Language Inference (SNLI), Microsoft Research Paraphrase Corpus (MSRP), and Semantic Evaluation Database (SemEval2014). Our model attained excellent results that outperformed state-of-the-art models. Optimal values for important parameters, including reward function are identified for the model based on experiments on the study dataset. Ablation studies that exclude the proposed ML-ABC and reinforcement learning from the model confirm the independent positive incremental impact of these components on model performance. Elsevier 2024-03-15 Article PeerReviewed Xiong, Jiale and Yang, Jing and Yang, Lei and Awais, Muhammad and Khan, Abdullah Ayub and Alizadehsani, Roohallah and Acharya, U. Rajendra (2024) Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights. Expert Systems with Applications, 238 (E). ISSN 0957-4174, DOI https://doi.org/10.1016/j.eswa.2023.122088 <https://doi.org/10.1016/j.eswa.2023.122088>. https://doi.org/10.1016/j.eswa.2023.122088 10.1016/j.eswa.2023.122088 |
institution |
Universiti Malaya |
building |
UM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaya |
content_source |
UM Research Repository |
url_provider |
http://eprints.um.edu.my/ |
topic |
QA75 Electronic computers. Computer science TK Electrical engineering. Electronics Nuclear engineering |
spellingShingle |
QA75 Electronic computers. Computer science TK Electrical engineering. Electronics Nuclear engineering Xiong, Jiale Yang, Jing Yang, Lei Awais, Muhammad Khan, Abdullah Ayub Alizadehsani, Roohallah Acharya, U. Rajendra Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights |
description |
Plagiarism detection (PD) in natural language processing involves locating similar words in two distinct sources. The paper introduces a new approach to plagiarism detection utilizing bidirectional encoder represen-tations from transformers (BERT)-generated embedding, an enhanced artificial bee colony (ABC) optimization algorithm for pre-training, and a training process based on reinforcement learning (RL). The BERT model can be incorporated into a subsequent task and meticulously refined to function as a model, enabling it to apprehend a variety of linguistic characteristics. Imbalanced classification is one of the fundamental obstacles to PD. To handle this predicament, we present a novel methodology utilizing RL, in which the problem is framed as a series of sequential decisions in which an agent receives a reward at each level for classifying a received instance. To address the disparity between classes, it is determined that the majority class will receive a lower reward than the minority class. We also focus on the training stage, which often utilizes gradient-based learning techniques like backpropagation (BP), leading to certain drawbacks such as sensitivity to initialization. In our proposed model, we utilize a mutual learning-based ABC (ML-ABC) approach that adjusts the food source with the most beneficial results for the candidate by considering a mutual learning factor that incorporates the initial weight. We evaluated the efficacy of our novel approach by contrasting its results with those of population-based techniques using three standard datasets, namely Stanford Natural Language Inference (SNLI), Microsoft Research Paraphrase Corpus (MSRP), and Semantic Evaluation Database (SemEval2014). Our model attained excellent results that outperformed state-of-the-art models. Optimal values for important parameters, including reward function are identified for the model based on experiments on the study dataset. Ablation studies that exclude the proposed ML-ABC and reinforcement learning from the model confirm the independent positive incremental impact of these components on model performance. |
format |
Article |
author |
Xiong, Jiale Yang, Jing Yang, Lei Awais, Muhammad Khan, Abdullah Ayub Alizadehsani, Roohallah Acharya, U. Rajendra |
author_facet |
Xiong, Jiale Yang, Jing Yang, Lei Awais, Muhammad Khan, Abdullah Ayub Alizadehsani, Roohallah Acharya, U. Rajendra |
author_sort |
Xiong, Jiale |
title |
Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights |
title_short |
Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights |
title_full |
Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights |
title_fullStr |
Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights |
title_full_unstemmed |
Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights |
title_sort |
efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights |
publisher |
Elsevier |
publishDate |
2024 |
url |
http://eprints.um.edu.my/44306/ https://doi.org/10.1016/j.eswa.2023.122088 |
_version_ |
1805881155607592960 |
score |
13.211869 |