QUESTION 01What are the main differences between naive RAG and advanced RAG?
š DEFINITION:
Naive RAG is the basic implementation: chunk documents, embed them, retrieve top-k chunks based on query similarity, and stuff them into a prompt for generation. Advanced RAG incorporates multiple optimizations across the pipeline - pre-retrieval (query rewriting, expansion), retrieval (hybrid search, reranking), and post-retrieval (context compression, reordering) - to improve accuracy, relevance, and reliability.
āļø HOW IT WORKS:
Naive RAG pipeline: simple chunking ā single-vector retrieval ā top-k concatenation ā generation. Advanced RAG adds: Pre-retrieval: query rewriting (expanding acronyms, correcting spelling), query expansion (generating multiple query variants), HyDE (hypothetical document generation). Retrieval: hybrid search (combining vector + keyword), metadata filtering, multi-stage retrieval (retrieve more, then rerank with cross-encoder). Post-retrieval: context compression (extracting relevant sentences), reordering chunks (most relevant at edges to combat lost-in-middle), iterative retrieval (multi-hop for complex questions).
š” WHY IT MATTERS:
Naive RAG works for simple queries but fails on complex ones, specialized domains, or large knowledge bases. Advanced RAG techniques can improve accuracy by 10-30% by addressing specific failure modes. Query rewriting helps when user queries are poorly phrased. Reranking ensures most relevant chunks used. Context compression fits more information in limited window. The right techniques depend on your data and query types - not all needed for every application, but understanding them enables systematic improvement.
š EXAMPLE:
Medical RAG with naive vs advanced. Naive: user query 'HTN treatment' retrieves chunks with 'HTN' but misses those with 'hypertension'. Fails. Advanced with query expansion: expands 'HTN' to 'hypertension' and 'high blood pressure', retrieves more relevant results. Reranking with cross-encoder ensures treatment guidelines prioritized over general information. Context compression extracts only treatment sentences, fitting more guidelines in context. Result: comprehensive treatment recommendations vs partial or missed information. The 20% accuracy improvement justifies additional complexity.