Back to blog

Enhance your RAG with RAGA: The Key to Precision and Relevance

Vivek SinghAug 02, 20242 min read
TechTalks_Generic.webp
Talk to an expert about something you read on this page

In the fast-paced world of AI, integrating applications with large language models (LLMs) and generative AI is the new frontier. Retrieval-Augmented Generation (RAG) emerged to combat hallucinations, but introduced a new challenge: the “Lost in the Middle” issue. This occurs when the retrieved context is irrelevant or lacks benchmarks. Even tech giants like Google and AWS have faced these hurdles with their custom RAG clients.

Enter RAGA benchmarks—a game-changer for evaluating and refining your AI application.

What is RAGA?

RAGA, or Retrieval-Augmented Generation Assessment, is your toolkit for ensuring your AI's output meets expectations. It assesses whether the data retrieved and the answers generated by your LLM align with the question. Key metrics include:

  • Faithfulness: Are the generated claims supported by the context?
  • Answer Relevancy: Is the generated answer relevant to the question?
  • Context Relevancy: Does the context pertain to the question?
  • Context Precision: How accurate is the retrieved context?

Unlocking the RAGA Advantage

By moving forward with custom metrics, it's possible to calculate and rank feature rollouts, enforcing strict thresholds before anything goes live. Here's the current state of the application:

Component

Context Precision

Context Relevancy

Retriever

0.57

0.647

Component

Faithfulness

Answer Relevancy

Generator

0.81

0.82

For the generator, the formulas implemented were:

  • Faithfulness = (# claims that can be derived from context) / (# total claims) 
  • Answer Relevancy = mean( cosine_similarity(artificially generated questions from the answers, question))

For the retriever, the formulas used were:

  • Context Precision = (# of chunks that ranked high @k) / (total chunks)
  • Context Relevancy = |S| / (total # of sentences) 

Here, k represents the precision count for the nth chunk, and S denotes the number of sentences relevant to the question.

Adjusting parameters like chunk size and overlap led to new experiments and the following results:

Component

Context Precision

Context Relevancy

Retriever

0.53

0.6

Component

Faithfulness

Answer Relevancy

Generator

0.85

0.86

The custom approach to RAGA has highlighted areas for improvement and identified what can remain unchanged:

  • Retriever Performance: With context precision at 0.53 and context relevancy at 0.6, there is significant room for improvement. Enhancing the relevance and accuracy of retrieved chunks and adjusting chunk size and overlap are essential steps.
  • Generator Performance: The generator shows promising results with a faithfulness score of 0.85 and answer relevancy of 0.86, indicating that the LLM generates responses consistent with the provided context and relevant to the questions asked.

RAGA in Action: Elevating AI Excellence

RAGA (Retrieval-Augmented Generation Assessment) metrics are game-changers for improving the quality and reliability of RAG-based AI applications. These metrics provide crucial insights into the performance of both retriever and generator components, offering a quantitative basis for continuous improvement.

Adopting RAGA addresses the "Lost in the Middle" problem and establishes a robust framework for evaluating and enhancing AI applications. This approach ensures more accurate, relevant, and reliable AI-generated responses, leading to a better user experience and increased trust in the system.

Charting the Future with RAGA

Looking ahead, iterating on these metrics and incorporating additional RAGA parameters will continuously refine the RAG pipeline. This commitment to quantifiable quality assurance is essential for navigating the rapidly evolving landscape of AI and LLM applications.

Ready to elevate your RAG game? Dive into the world of RAGA and transform your AI application into a powerhouse of precision and relevancy!

Share on:

About Contentstack

The Contentstack team comprises highly skilled professionals specializing in product marketing, customer acquisition and retention, and digital marketing strategy. With extensive experience holding senior positions in notable technology companies across various sectors, they bring diverse backgrounds and deep industry knowledge to deliver impactful solutions.  

Contentstack stands out in the composable DXP and Headless CMS markets with an impressive track record of 87 G2 user awards, 6 analyst recognitions, and 3 industry accolades, showcasing its robust market presence and user satisfaction.

Check out our case studies to see why industry-leading companies trust Contentstack.

Experience the power of Contentstack's award-winning platform by scheduling a demo, starting a free trial, or joining a small group demo today.

Follow Contentstack on Linkedin

Background.png