VulBERTa: simplified source code pre-training for vulnerability detection
This paper presents VulBERTa, a deep learning approach to detect security vulnerabilities in source code. Our approach pre-trains a RoBERTa model with a custom tokenisation pipeline on real-world code from open-source C/C++ projects. The model learns a deep knowledge representation of the code synta...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference or Workshop Item |
Published: |
IEEE
2022
|
Subjects: | |
Online Access: | http://eprints.um.edu.my/40469/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.um.eprints.40469 |
---|---|
record_format |
eprints |
spelling |
my.um.eprints.404692025-02-13T04:31:54Z http://eprints.um.edu.my/40469/ VulBERTa: simplified source code pre-training for vulnerability detection Hanif, Hazim Maffeis, Sergio QA75 Electronic computers. Computer science QA76 Computer software This paper presents VulBERTa, a deep learning approach to detect security vulnerabilities in source code. Our approach pre-trains a RoBERTa model with a custom tokenisation pipeline on real-world code from open-source C/C++ projects. The model learns a deep knowledge representation of the code syntax and semantics, which we leverage to train vulnerability detection classifiers. We evaluate our approach on binary and multi-class vulnerability detection tasks across several datasets (Vuldeepecker, Draper, REVEAL and muVuldeepecker) and benchmarks (CodeXGLUE and D2A). The evaluation results show that VulBERTa achieves state-of-the-art performance and outperforms existing approaches across different datasets, despite its conceptual simplicity, and limited cost in terms of size of training data and number of model parameters. IEEE 2022 Conference or Workshop Item PeerReviewed Hanif, Hazim and Maffeis, Sergio (2022) VulBERTa: simplified source code pre-training for vulnerability detection. In: 2022 International Joint Conference on Neural Networks, IJCNN 2022, 18-23 July 2022, Padua. |
institution |
Universiti Malaya |
building |
UM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaya |
content_source |
UM Research Repository |
url_provider |
http://eprints.um.edu.my/ |
topic |
QA75 Electronic computers. Computer science QA76 Computer software |
spellingShingle |
QA75 Electronic computers. Computer science QA76 Computer software Hanif, Hazim Maffeis, Sergio VulBERTa: simplified source code pre-training for vulnerability detection |
description |
This paper presents VulBERTa, a deep learning approach to detect security vulnerabilities in source code. Our approach pre-trains a RoBERTa model with a custom tokenisation pipeline on real-world code from open-source C/C++ projects. The model learns a deep knowledge representation of the code syntax and semantics, which we leverage to train vulnerability detection classifiers. We evaluate our approach on binary and multi-class vulnerability detection tasks across several datasets (Vuldeepecker, Draper, REVEAL and muVuldeepecker) and benchmarks (CodeXGLUE and D2A). The evaluation results show that VulBERTa achieves state-of-the-art performance and outperforms existing approaches across different datasets, despite its conceptual simplicity, and limited cost in terms of size of training data and number of model parameters. |
format |
Conference or Workshop Item |
author |
Hanif, Hazim Maffeis, Sergio |
author_facet |
Hanif, Hazim Maffeis, Sergio |
author_sort |
Hanif, Hazim |
title |
VulBERTa: simplified source code pre-training for vulnerability detection |
title_short |
VulBERTa: simplified source code pre-training for vulnerability detection |
title_full |
VulBERTa: simplified source code pre-training for vulnerability detection |
title_fullStr |
VulBERTa: simplified source code pre-training for vulnerability detection |
title_full_unstemmed |
VulBERTa: simplified source code pre-training for vulnerability detection |
title_sort |
vulberta: simplified source code pre-training for vulnerability detection |
publisher |
IEEE |
publishDate |
2022 |
url |
http://eprints.um.edu.my/40469/ |
_version_ |
1825160580010344448 |
score |
13.244413 |