01
Context
W&W, a branding agency managing assets for 50+ clients, struggled with knowledge fragmentation across Google Drive. Teams wasted hours hunting for brand guidelines, past work, and client docs. There was real risk of cross-client data leakage with no audit trail of who accessed what.
02
What I Built
A production-grade RAG platform that ingests 2,000+ documents from Google Drive, chunks them with structure awareness, and enables semantic search, grounded Q&A with citations, and on-brand draft generation. Built strict multi-tenant isolation ensuring users only access their assigned clients' data, with comprehensive audit logging of every query and retrieved chunk.
03
Key Decisions
1Chose pgvector over Pinecone for simpler ops and no vendor lock-in
2Implemented RRF (Reciprocal Rank Fusion) for hybrid search without score normalization
3Used two-stage retrieval: fast recall with hybrid search, then Cohere reranking for precision
4Built structure-aware chunking preserving heading context for better citations
5Denormalized client_id on chunks table for query-level isolation performance
6Designed explicit refusal behavior with pattern matching for out-of-scope questions
04
Challenges
→Ensuring zero data leakage between clients while maintaining query performance at scale
→Combining vector and keyword search scores without normalization (solved with RRF)
→Maintaining citation accuracy when LLM might hallucinate source references
→Building incremental sync to avoid re-embedding unchanged documents
→Balancing reranking latency (~500ms) against improved relevance
05
Outcomes
✓363 tests covering auth, retrieval, RAG, client isolation, and audit
✓Hybrid search achieving 95%+ recall with sub-second latency
✓Strict client isolation at database query level with full audit trails
✓Production-ready system serving 50+ clients with zero cross-client data access
✓Reusable internal knowledge platform replacing ad-hoc prompts
06
Tech Stack
PythonFastAPIPostgreSQLpgvectorOpenAICohereSQLAlchemyNext.jsTypeScriptDocker