A microservice-based RAG + LLM evaluation system designed to retrieve information, generate answers, and automatically evaluate answer quality using asynchronous processing.
This project implements a distributed Retrieval-Augmented Generation (RAG) pipeline with a separate LLM evaluation system to automatically validate answer quality and detect hallucinations.
The system is built using microservices, where retrieval, generation, and evaluation components communicate through internal APIs and background workers.
Backend
Infrastructure
Designed a microservice-based pipeline where retrieval and evaluation services run independently and communicate through internal APIs, enabling modular system design.
Implemented an asynchronous LLM evaluation workflow using BullMQ workers, allowing generated answers to be processed and scored in background jobs.
Integrated pgvector-based semantic retrieval to fetch relevant documents and built an evaluation system to assess groundedness and hallucination risk of generated responses.
Containerized all components using Docker Compose, orchestrating:
for scalable and reproducible deployments.