Abstract
Retrieval Augmented Generation (RAG) systems connect language models to external
knowledge bases. The rapid advancements in natural language processing (NLP) and
machine learning, particularly transformer-based architectures, have led to the widespread
development of RAG systems. While the field is trending in a research-oriented direction,
the design of such system often remains more of an art than a science. This work aims
to contribute to a more scientific approach for the design and evaluation of such systems.
Recent research has introduced evaluation frameworks that heavily depend on labeled
datasets to provide guidance for RAG system design. However, these frameworks often
fall short in scenarios where labeled datasets are unavailable. This study addresses these
limitations by proposing a lightweight RAG evaluation framework capable of handling
unlabeled datasets. A RAG pipeline is developed and configured using the proposed
evaluation system. The evaluation framework makes use of two complementary scoring
metrics - the lightweight ROUGE metric and the elaborate LLM-judge metric. This
study introduces a novel, ready-to-use RAG evaluation framework and offers general
guidelines for improved RAG system design. Additionally, the RAG Triad approach is
proposed as a method for effectively handling datasets without ground-truth labels. The
findings of this work contribute to the rapidly evolving ecosystem of natural language AI
by offering a robust framework for the evaluation, recommendation and innovation of
RAG systems.
knowledge bases. The rapid advancements in natural language processing (NLP) and
machine learning, particularly transformer-based architectures, have led to the widespread
development of RAG systems. While the field is trending in a research-oriented direction,
the design of such system often remains more of an art than a science. This work aims
to contribute to a more scientific approach for the design and evaluation of such systems.
Recent research has introduced evaluation frameworks that heavily depend on labeled
datasets to provide guidance for RAG system design. However, these frameworks often
fall short in scenarios where labeled datasets are unavailable. This study addresses these
limitations by proposing a lightweight RAG evaluation framework capable of handling
unlabeled datasets. A RAG pipeline is developed and configured using the proposed
evaluation system. The evaluation framework makes use of two complementary scoring
metrics - the lightweight ROUGE metric and the elaborate LLM-judge metric. This
study introduces a novel, ready-to-use RAG evaluation framework and offers general
guidelines for improved RAG system design. Additionally, the RAG Triad approach is
proposed as a method for effectively handling datasets without ground-truth labels. The
findings of this work contribute to the rapidly evolving ecosystem of natural language AI
by offering a robust framework for the evaluation, recommendation and innovation of
RAG systems.
Originalsprache | Englisch |
---|---|
Qualifikation | Master of Science |
Gradverleihende Hochschule |
|
Betreuer/-in / Berater/-in |
|
Datum der Bewilligung | 3 März 2025 |
Publikationsstatus | Veröffentlicht - 2025 |
Research Field
- Multimodal Analytics