WENYUFAN
RAG-Based User Content Summariser
Back to Selected Works

RAG-Based User Content Summariser

PMTech·Finalist — USYD Coding Fest 2025100% Recall@5

The Brief

A RAG-powered system that aggregates and distils UGC into coherent summaries, achieving 100% Recall@5 and 89.44% Precision@5 on Rednote data.

CategoryComputer Science / Tech
RoleTeam Lead
Year2025
Tech StackPython, LangChain, FAISS, BM25, Django
AchievementFinalist — USYD Coding Fest 2025

Problem

UGC platforms are key sources for product recommendations and technical problem-solving, yet users encounter scattered, disorganized, and sometimes contradictory information.

Significance

Information overload on platforms like Reddit, Stack Overflow, and Rednote makes it difficult for users to extract reliable, concise answers without manually reviewing hundreds of posts.

My Contribution

Led the development of RBUCS, a system leveraging RAG with LLM that performs text classification to select optimal processing strategies — producing ranked recommendations with evidence for recommendation queries and comprehensive summaries for general queries. Our hybrid retrieval mechanism combines FAISS semantic search, BM25 text matching, and content quality signals (e.g., user votes/likes). Evaluated on 150 labeled query-document samples, achieving 100% Recall@5 and 89.44% Precision@5 on Rednote data.