A hybrid SQL-RAG-LLM question answering system for plastic waste management

Plastic waste management is a critical global challenge, demanding rapid synthesis of large and heterogeneous research outputs. Conventional literature review methods are labour-intensive, while retrieval augmented generation (RAG) pipelines, though flexible, often introduce hallucinations. Here, a...

Full description

Saved in:
Bibliographic Details
Main Authors: Sivan, Dawn, Wei, Yi-Lun, Kumar, K. Satheesh, Chen, Yen-Jen, Jose, Rajan
Format: Conference or Workshop Item
Language:en
Published: IEEE 2026
Subjects:
Online Access:https://umpir.ump.edu.my/id/eprint/46286/1/A%20Hybrid%20SQL-RAG-LLM%20Question%20Answering%20System.pdf
https://umpir.ump.edu.my/id/eprint/46286/
https://doi.org/10.1109/ICICSE67247.2025.11390794
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Plastic waste management is a critical global challenge, demanding rapid synthesis of large and heterogeneous research outputs. Conventional literature review methods are labour-intensive, while retrieval augmented generation (RAG) pipelines, though flexible, often introduce hallucinations. Here, a hybrid question answering framework is presented, that integrates structured SQL querying with large language model (LLM) based generation to deliver both accuracy and coverage. A corpus of 10,350 research publications was processed using an automated pipeline to extract metadata and full text, apply optical character recognition for non - extractable content, and store results in a Chroma vector store with rich metadata. Structured facts were parsed into an SQLite database to enable direct query execution. An adaptive router directs fact and aggregate-based queries to SQL, while conceptual or cross-paper synthesis questions trigger a hybrid mode that fuses SQL results with top k document retrieval. A validator LLM selects or merges outputs, followed by a fact-checking pass to ensure alignment with evidence. User feedback drives an interactive fine - tuning loop, logging rejected and accepted answers for supervised model updates. Preliminary results indicate the SQL pathway delivers high precision for structured questions, while the hybrid route improves coverage for broader queries. This approach offers a scalable solution for decision support in environmental policy, recycling optimization and scientific discovery in plastic waste management.