Fundamental stock analysis with LLMs and qualitative data: development of a vector database for company reports
This project explores the use of natural language processing techniques, specifically Large Language Models (LLMs), for fundamental stock analysis by leveraging qualitative data in corporate financial reports and disclosures. It addresses the challenge of information overload faced by retail inve...
Saved in:
| Main Author: | |
|---|---|
| Format: | Final Year Project / Dissertation / Thesis |
| Published: |
2025
|
| Subjects: | |
| Online Access: | http://eprints.utar.edu.my/7241/1/fyp_CS_2025_TJJ.pdf http://eprints.utar.edu.my/7241/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This project explores the use of natural language processing techniques,
specifically Large Language Models (LLMs), for fundamental stock analysis by
leveraging qualitative data in corporate financial reports and disclosures. It addresses
the challenge of information overload faced by retail investors by automating the
collection, processing, and interpretation of fundamental data. The system employs a
multi-agent architecture integrating web scraping of financial reports, LLM-based
report processing of lengthy documents, and embedding the resulting processed data
into a vector database to enable semantic search and efficient information retrieval.
Using vector embeddings and retrieval-augmented generation, the system acts as a
“virtual analyst” that retrieves relevant information and synthesizes coherent responses
to complex investor queries about a company’s fundamentals. The results demonstrate
that this LLM-driven approach efficiently distills key insights from enormous
unstructured texts, thereby making qualitative analysis more accessible and bridging
the gap in analytical capability for retail investors. The project provided a functional
proof of concept and highlighted opportunities for further improvements, including
expanding data sources, improving summary accuracy, and strengthening the system’s
real-time information integration capabilities. |
|---|
