Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies

Fundamental analysis is essential for retail investors pursuing long-term investment, as a company’s profitability ultimately drives its intrinsic value. At its core, fundamental analysis relies on deriving implicit insights—such as operational resilience, governance quality, or future growth...

Full description

Saved in:
Bibliographic Details
Main Author: Kam, Chee Qin
Format: Final Year Project / Dissertation / Thesis
Published: 2025
Subjects:
Online Access:http://eprints.utar.edu.my/7106/1/fyp_CS_2025_KCQ.pdf
http://eprints.utar.edu.my/7106/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1854094474820452352
author Kam, Chee Qin
author_facet Kam, Chee Qin
author_sort Kam, Chee Qin
building UTAR Library
collection Institutional Repository
content_provider Universiti Tunku Abdul Rahman
content_source UTAR Institutional Repository
continent Asia
country Malaysia
description Fundamental analysis is essential for retail investors pursuing long-term investment, as a company’s profitability ultimately drives its intrinsic value. At its core, fundamental analysis relies on deriving implicit insights—such as operational resilience, governance quality, or future growth potential—from explicit data, including financial disclosures and corporate announcements. Retail investors, however, often lack the expertise, resources, and analytical experience required to perform such analysis effectively. To address this challenge, this study proposes a corporate insight derivation module powered by Large Language Models (LLMs) that systematically transforms explicit corporate disclosures into actionable implicit insights. The module employs a novel ontology-grounded, graph-based Retrieval-Augmented Generation (RAG) pipeline with text-to-Cypher retrieval. It comprises three sub-modules: (i) an Automated Ontology Construction Module, which formalises domain-specific entities and their relationships; (ii) a Graph Construction Module, which integrates heterogeneous corporate data into a coherent knowledge graph capable of multi-hop reasoning; and (iii) a Text-to-Cypher Retrieval Module, enabling natural language queries to access the knowledge graph efficiently. The system leverages disclosures from five ACE Market listed technology companies in Bursa Malaysia as a proof-of-concept. Evaluation results demonstrate that the proposed pipeline successfully derives implicit insights, with the Entity Deduplication process achieving a maximum deduplication rate of 73.0% and an overall rate of 66.5%, producing a compact and coherent knowledge graph. Despite limitations in ontology scalability, dynamic adaptability, and prompt robustness, the pipeline establishes a strong foundation for further refinement. The proposed module holds potential as a practical tool for retail investors, supporting more informed and rational decision-making by bridging the gap between explicit corporate data and implicit investment insights.
format Final Year Project / Dissertation / Thesis
id my-utar-eprints.7106
institution Universiti Tunku Abdul Rahman
publishDate 2025
record_format eprints
spelling my-utar-eprints.71062025-12-28T15:57:38Z Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies Kam, Chee Qin T Technology (General) Fundamental analysis is essential for retail investors pursuing long-term investment, as a company’s profitability ultimately drives its intrinsic value. At its core, fundamental analysis relies on deriving implicit insights—such as operational resilience, governance quality, or future growth potential—from explicit data, including financial disclosures and corporate announcements. Retail investors, however, often lack the expertise, resources, and analytical experience required to perform such analysis effectively. To address this challenge, this study proposes a corporate insight derivation module powered by Large Language Models (LLMs) that systematically transforms explicit corporate disclosures into actionable implicit insights. The module employs a novel ontology-grounded, graph-based Retrieval-Augmented Generation (RAG) pipeline with text-to-Cypher retrieval. It comprises three sub-modules: (i) an Automated Ontology Construction Module, which formalises domain-specific entities and their relationships; (ii) a Graph Construction Module, which integrates heterogeneous corporate data into a coherent knowledge graph capable of multi-hop reasoning; and (iii) a Text-to-Cypher Retrieval Module, enabling natural language queries to access the knowledge graph efficiently. The system leverages disclosures from five ACE Market listed technology companies in Bursa Malaysia as a proof-of-concept. Evaluation results demonstrate that the proposed pipeline successfully derives implicit insights, with the Entity Deduplication process achieving a maximum deduplication rate of 73.0% and an overall rate of 66.5%, producing a compact and coherent knowledge graph. Despite limitations in ontology scalability, dynamic adaptability, and prompt robustness, the pipeline establishes a strong foundation for further refinement. The proposed module holds potential as a practical tool for retail investors, supporting more informed and rational decision-making by bridging the gap between explicit corporate data and implicit investment insights. 2025-06 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/7106/1/fyp_CS_2025_KCQ.pdf Kam, Chee Qin (2025) Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies. Final Year Project, UTAR. http://eprints.utar.edu.my/7106/
spellingShingle T Technology (General)
Kam, Chee Qin
Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies
title Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies
title_full Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies
title_fullStr Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies
title_full_unstemmed Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies
title_short Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies
title_sort fundamental stock analysis with llms and qualitative data: development of ontology-grounded, graph-based rag with text-to-cypher retrieval for malaysian listed companies
topic T Technology (General)
url http://eprints.utar.edu.my/7106/1/fyp_CS_2025_KCQ.pdf
http://eprints.utar.edu.my/7106/
url_provider http://eprints.utar.edu.my