Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies
Fundamental analysis is essential for retail investors pursuing long-term investment, as a company’s profitability ultimately drives its intrinsic value. At its core, fundamental analysis relies on deriving implicit insights—such as operational resilience, governance quality, or future growth...
Saved in:
| Main Author: | |
|---|---|
| Format: | Final Year Project / Dissertation / Thesis |
| Published: |
2025
|
| Subjects: | |
| Online Access: | http://eprints.utar.edu.my/7106/1/fyp_CS_2025_KCQ.pdf http://eprints.utar.edu.my/7106/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Fundamental analysis is essential for retail investors pursuing long-term investment, as
a company’s profitability ultimately drives its intrinsic value. At its core, fundamental
analysis relies on deriving implicit insights—such as operational resilience, governance
quality, or future growth potential—from explicit data, including financial disclosures
and corporate announcements. Retail investors, however, often lack the expertise,
resources, and analytical experience required to perform such analysis effectively. To
address this challenge, this study proposes a corporate insight derivation module
powered by Large Language Models (LLMs) that systematically transforms explicit
corporate disclosures into actionable implicit insights. The module employs a novel
ontology-grounded, graph-based Retrieval-Augmented Generation (RAG) pipeline
with text-to-Cypher retrieval. It comprises three sub-modules: (i) an Automated
Ontology Construction Module, which formalises domain-specific entities and their
relationships; (ii) a Graph Construction Module, which integrates heterogeneous
corporate data into a coherent knowledge graph capable of multi-hop reasoning; and
(iii) a Text-to-Cypher Retrieval Module, enabling natural language queries to access the
knowledge graph efficiently. The system leverages disclosures from five ACE Market
listed technology companies in Bursa Malaysia as a proof-of-concept. Evaluation
results demonstrate that the proposed pipeline successfully derives implicit insights,
with the Entity Deduplication process achieving a maximum deduplication rate of
73.0% and an overall rate of 66.5%, producing a compact and coherent knowledge
graph. Despite limitations in ontology scalability, dynamic adaptability, and prompt
robustness, the pipeline establishes a strong foundation for further refinement. The
proposed module holds potential as a practical tool for retail investors, supporting more
informed and rational decision-making by bridging the gap between explicit corporate
data and implicit investment insights. |
|---|
