Evaluation and Comparison of Open-Source LLMs Using Natural Language Generation Quality Metrics

Dzenan Hamzic (Autor:in und Vortragende:r), Markus Wurzenberger, Florian Skopik, Max Landauer, Andreas Rauber

Publikation: Beitrag in Buch oder TagungsbandBeitrag in Tagungsband mit PosterpräsentationBegutachtung

Abstract

The rapid advancement of Large Language Models (LLMs) has transformed natural language processing, yet comprehensive evaluation methods are necessary to ensure their reliability, particularly in Retrieval-Augmented Generation (RAG) tasks. This study aims to evaluate and compare the performance of open-source LLMs by introducing a rigorous evaluation framework. We benchmark 20 LLMs using a combination of established metrics such as BLEU, ROUGE, BERTScore, along with and a novel metric, RAGAS. The models were tested across two distinct datasets to assess their text generation quality. Our findings reveal that models like nous-hermes-2-solar-10.7b and mistral-7b-instruct-v0.1 consistently excel in tasks requiring strict instruction adherence and effective use of large contexts, while other models show areas for improvement. This research contributes to the field by offering a comprehensive evaluation framework that aids in selecting the most suitable LLMs for complex RAG applications, with implications for future developments in natural language processing and big data analysis.
OriginalspracheEnglisch
TitelProceedings - 2024 IEEE International Conference on Big Data, BigData 2024
Seiten5342-5351
ISBN (elektronisch)979-8-3503-6248-0
DOIs
PublikationsstatusVeröffentlicht - 16 Jan. 2025
Veranstaltung2024 IEEE International Conference on Big Data (BigData) - Washington, Washington, USA/Vereinigte Staaten
Dauer: 15 Dez. 202418 Dez. 2024

Konferenz

Konferenz2024 IEEE International Conference on Big Data (BigData)
Land/GebietUSA/Vereinigte Staaten
StadtWashington
Zeitraum15/12/2418/12/24

Research Field

  • Cyber Security

Fingerprint

Untersuchen Sie die Forschungsthemen von „Evaluation and Comparison of Open-Source LLMs Using Natural Language Generation Quality Metrics“. Zusammen bilden sie einen einzigartigen Fingerprint.

Diese Publikation zitieren