
RAG-Based User Content Summariser
The Brief
A RAG-powered system that aggregates and distils UGC into coherent summaries, achieving 100% Recall@5 and 89.44% Precision@5 on Rednote data.
Problem
UGC platforms are key sources for product recommendations and technical problem-solving, yet users encounter scattered, disorganized, and sometimes contradictory information.
Significance
Information overload on platforms like Reddit, Stack Overflow, and Rednote makes it difficult for users to extract reliable, concise answers without manually reviewing hundreds of posts.
My Contribution
Led the development of RBUCS, a system leveraging RAG with LLM that performs text classification to select optimal processing strategies — producing ranked recommendations with evidence for recommendation queries and comprehensive summaries for general queries. Our hybrid retrieval mechanism combines FAISS semantic search, BM25 text matching, and content quality signals (e.g., user votes/likes). Evaluated on 150 labeled query-document samples, achieving 100% Recall@5 and 89.44% Precision@5 on Rednote data.