Fundamental stock analysis with LLMs and qualitative data: development of a vector database for company reports

This project explores the use of natural language processing techniques, specifically Large Language Models (LLMs), for fundamental stock analysis by leveraging qualitative data in corporate financial reports and disclosures. It addresses the challenge of information overload faced by retail inve...

Full description

Saved in:
Bibliographic Details
Main Author: Ting, Jun Jing
Format: Final Year Project / Dissertation / Thesis
Published: 2025
Subjects:
Online Access:http://eprints.utar.edu.my/7241/1/fyp_CS_2025_TJJ.pdf
http://eprints.utar.edu.my/7241/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This project explores the use of natural language processing techniques, specifically Large Language Models (LLMs), for fundamental stock analysis by leveraging qualitative data in corporate financial reports and disclosures. It addresses the challenge of information overload faced by retail investors by automating the collection, processing, and interpretation of fundamental data. The system employs a multi-agent architecture integrating web scraping of financial reports, LLM-based report processing of lengthy documents, and embedding the resulting processed data into a vector database to enable semantic search and efficient information retrieval. Using vector embeddings and retrieval-augmented generation, the system acts as a “virtual analyst” that retrieves relevant information and synthesizes coherent responses to complex investor queries about a company’s fundamentals. The results demonstrate that this LLM-driven approach efficiently distills key insights from enormous unstructured texts, thereby making qualitative analysis more accessible and bridging the gap in analytical capability for retail investors. The project provided a functional proof of concept and highlighted opportunities for further improvements, including expanding data sources, improving summary accuracy, and strengthening the system’s real-time information integration capabilities.