Ground truth dataset: Objectionable web content
Cyber parental control aims to filter objectionable web content and prevent children from being exposed to harmful content. Succeeding in detecting and blocking objectionable content depends heavily on the accuracy of the topic model. A reliable ground truth dataset is essential for building effecti...
保存先:
主要な著者: | , |
---|---|
フォーマット: | 論文 |
出版事項: |
MDPI
2022
|
主題: | |
オンライン・アクセス: | http://eprints.um.edu.my/43853/ |
タグ: |
タグ追加
タグなし, このレコードへの初めてのタグを付けませんか!
|
要約: | Cyber parental control aims to filter objectionable web content and prevent children from being exposed to harmful content. Succeeding in detecting and blocking objectionable content depends heavily on the accuracy of the topic model. A reliable ground truth dataset is essential for building effective cyber parental control models and validation of new detection methods. The ground truth is the measurement for labeling objectionable and unobjectionable websites of the cyber parental control dataset. The lack of publicly accessible datasets with a reliable ground truth has prevented a fair and coherent comparison of different methods proposed in the field of cyber parental control. This paper presents a ground truth dataset that contains 8000 labelled websites with 4000 objectionable websites and 4000 unobjectionable websites. These websites consist of more than 2 million web pages. Creating a ground truth objectionable web content dataset involved a few phases, including data collection, extraction, and labeling. Finally, the presence of bias, using kappa coefficient measurement, is addressed. The ground truth dataset is available publicly in the Mendeley repository. Dataset: 10.17632/f239556fkr.2; https://data.mendeley.com/datasets/f239556fkr. Dataset License: CC BY 4.0. © 2022 by the authors. |
---|