Leveraging Web Scraping to Gather Tourism Information Data

The influence of Information and Communication Technologies (ICT) on both individuals' daily lives and the economy is of significant importance. In this context, the tourism industry plays a crucial role, and it is essential to recognise the contributions of tourists in terms of sharing their e...

Full description

Saved in:
Bibliographic Details
Main Authors: Kamarazaman, Nadzirah, Mohamad Ali, Nazlena, Arshad, Haslina
Format: Article
Language:English
Published: UUM PRESS 2024
Subjects:
Online Access:https://repo.uum.edu.my/id/eprint/32088/1/JETH%2004%202024%2016-29.pdf
https://repo.uum.edu.my/id/eprint/32088/
https://e-journal.uum.edu.my/index.php/jeth/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The influence of Information and Communication Technologies (ICT) on both individuals' daily lives and the economy is of significant importance. In this context, the tourism industry plays a crucial role, and it is essential to recognise the contributions of tourists in terms of sharing their experiences through tourism websites. Analysing this data is key to improving future tourists' experiences. Therefore, the objective of this study is to employ web scraping to gather data on places of interest (POI) and user attributes, specifically in the state of Melaka via the TripAdvisor website. Melaka is chosen as it is one of the places recognised by the United Nations, Educational, Scientific and Cultural Organization (UNESCO). The study focuses on the 200 POI locations (UNESCO) Map, encompassing both Melaka's core and buffer zones. These POIs are categorised into four heritage types: built heritage, natural heritage, personal heritage, and living heritage, with some belonging to more than one category. For the data collection process, this study utilised the TripAdvisor website and extracted a total of 14 attributes. Specifically, 27282 user data entries were collected from 163 POIs in the core zone area, and 8305 data entries from 37 POIs in the buffer zone area. The data is managed and stored in various formats, including CSV, JSON, and Excel files in the repository. The data helps in the development of a tourism application. Furthermore, the tourism industry can benefit from this study by enhancing their services and conserving the cultural heritage