Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies
Fundamental analysis is essential for retail investors pursuing long-term investment, as a company’s profitability ultimately drives its intrinsic value. At its core, fundamental analysis relies on deriving implicit insights—such as operational resilience, governance quality, or future growth...
Saved in:
| Main Author: | |
|---|---|
| Format: | Final Year Project / Dissertation / Thesis |
| Published: |
2025
|
| Subjects: | |
| Online Access: | http://eprints.utar.edu.my/7106/1/fyp_CS_2025_KCQ.pdf http://eprints.utar.edu.my/7106/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1854094474820452352 |
|---|---|
| author | Kam, Chee Qin |
| author_facet | Kam, Chee Qin |
| author_sort | Kam, Chee Qin |
| building | UTAR Library |
| collection | Institutional Repository |
| content_provider | Universiti Tunku Abdul Rahman |
| content_source | UTAR Institutional Repository |
| continent | Asia |
| country | Malaysia |
| description | Fundamental analysis is essential for retail investors pursuing long-term investment, as
a company’s profitability ultimately drives its intrinsic value. At its core, fundamental
analysis relies on deriving implicit insights—such as operational resilience, governance
quality, or future growth potential—from explicit data, including financial disclosures
and corporate announcements. Retail investors, however, often lack the expertise,
resources, and analytical experience required to perform such analysis effectively. To
address this challenge, this study proposes a corporate insight derivation module
powered by Large Language Models (LLMs) that systematically transforms explicit
corporate disclosures into actionable implicit insights. The module employs a novel
ontology-grounded, graph-based Retrieval-Augmented Generation (RAG) pipeline
with text-to-Cypher retrieval. It comprises three sub-modules: (i) an Automated
Ontology Construction Module, which formalises domain-specific entities and their
relationships; (ii) a Graph Construction Module, which integrates heterogeneous
corporate data into a coherent knowledge graph capable of multi-hop reasoning; and
(iii) a Text-to-Cypher Retrieval Module, enabling natural language queries to access the
knowledge graph efficiently. The system leverages disclosures from five ACE Market
listed technology companies in Bursa Malaysia as a proof-of-concept. Evaluation
results demonstrate that the proposed pipeline successfully derives implicit insights,
with the Entity Deduplication process achieving a maximum deduplication rate of
73.0% and an overall rate of 66.5%, producing a compact and coherent knowledge
graph. Despite limitations in ontology scalability, dynamic adaptability, and prompt
robustness, the pipeline establishes a strong foundation for further refinement. The
proposed module holds potential as a practical tool for retail investors, supporting more
informed and rational decision-making by bridging the gap between explicit corporate
data and implicit investment insights. |
| format | Final Year Project / Dissertation / Thesis |
| id | my-utar-eprints.7106 |
| institution | Universiti Tunku Abdul Rahman |
| publishDate | 2025 |
| record_format | eprints |
| spelling | my-utar-eprints.71062025-12-28T15:57:38Z Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies Kam, Chee Qin T Technology (General) Fundamental analysis is essential for retail investors pursuing long-term investment, as a company’s profitability ultimately drives its intrinsic value. At its core, fundamental analysis relies on deriving implicit insights—such as operational resilience, governance quality, or future growth potential—from explicit data, including financial disclosures and corporate announcements. Retail investors, however, often lack the expertise, resources, and analytical experience required to perform such analysis effectively. To address this challenge, this study proposes a corporate insight derivation module powered by Large Language Models (LLMs) that systematically transforms explicit corporate disclosures into actionable implicit insights. The module employs a novel ontology-grounded, graph-based Retrieval-Augmented Generation (RAG) pipeline with text-to-Cypher retrieval. It comprises three sub-modules: (i) an Automated Ontology Construction Module, which formalises domain-specific entities and their relationships; (ii) a Graph Construction Module, which integrates heterogeneous corporate data into a coherent knowledge graph capable of multi-hop reasoning; and (iii) a Text-to-Cypher Retrieval Module, enabling natural language queries to access the knowledge graph efficiently. The system leverages disclosures from five ACE Market listed technology companies in Bursa Malaysia as a proof-of-concept. Evaluation results demonstrate that the proposed pipeline successfully derives implicit insights, with the Entity Deduplication process achieving a maximum deduplication rate of 73.0% and an overall rate of 66.5%, producing a compact and coherent knowledge graph. Despite limitations in ontology scalability, dynamic adaptability, and prompt robustness, the pipeline establishes a strong foundation for further refinement. The proposed module holds potential as a practical tool for retail investors, supporting more informed and rational decision-making by bridging the gap between explicit corporate data and implicit investment insights. 2025-06 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/7106/1/fyp_CS_2025_KCQ.pdf Kam, Chee Qin (2025) Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies. Final Year Project, UTAR. http://eprints.utar.edu.my/7106/ |
| spellingShingle | T Technology (General) Kam, Chee Qin Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies |
| title | Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies |
| title_full | Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies |
| title_fullStr | Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies |
| title_full_unstemmed | Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies |
| title_short | Fundamental stock analysis with LLMs and qualitative data: Development of ontology-grounded, graph-based RAG with text-to-Cypher retrieval for Malaysian listed companies |
| title_sort | fundamental stock analysis with llms and qualitative data: development of ontology-grounded, graph-based rag with text-to-cypher retrieval for malaysian listed companies |
| topic | T Technology (General) |
| url | http://eprints.utar.edu.my/7106/1/fyp_CS_2025_KCQ.pdf http://eprints.utar.edu.my/7106/ |
| url_provider | http://eprints.utar.edu.my |
