An Empirical Study on How Large Language Models Impact Software Testing Learning

Simone Mezzaro (Speaker), Alessio Gambi, Gordon Fraser

Research output: Chapter in Book or Conference ProceedingsConference Proceedings with Oral Presentationpeer-review

Abstract

Software testing is a challenging topic in software engineering education and requires creative approaches to engage learners. For example, the Code Defenders game has students compete over a Java class under test by writing effective tests and mutants. While such gamified approaches deal with problems of motivation and engagement, students may nevertheless require help to put testing concepts into practice. The recent widespread diffusion of Generative AI and Large Language Models raises the question of whether and how these disruptive technologies could address this problem, for example, by providing explanations of unclear topics and guidance for writing tests. However, such technologies might also be misused or produce inaccurate answers, which would negatively impact learning. To shed more light on this situation, we conducted the first empirical study investigating how students learn and practice new software testing concepts in the context of the Code Defenders testing game, supported by a smart assistant based on a widely known, commercial Large Language Model. Our study shows that students had unrealistic expectations about the smart assistant, “blindly” trusting any output it generated, and often trying to use it to obtain solutions for testing exercises directly. Consequently, students who resorted to the smart assistant more often were less effective and efficient than those who did not. For instance, they wrote 8.6% fewer tests, and their tests were not useful in 78.0% of the cases. We conclude that giving unrestricted and unguided access to Large Language Models might generally impair learning. Thus, we believe our study helps to raise awareness about the implications of using Generative AI and Large Language Models in Computer Science Education and provides guidance towards developing better and smarter learning tools.
Original languageEnglish
Title of host publicationProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, EASE 2024, Salerno, Italy, June 18-21, 2024
PublisherAssociation for Computing Machinery (ACM)
Pages555-564
Number of pages10
ISBN (Print)979-8-4007-1701-7
DOIs
Publication statusPublished - 2024
EventEASE 2024: 28th International Conference on Evaluation and Assessment in Software Engineering - Salerno , Salerno , Italy
Duration: 18 Jun 202421 Jun 2024

Conference

ConferenceEASE 2024: 28th International Conference on Evaluation and Assessment in Software Engineering
Country/TerritoryItaly
CitySalerno
Period18/06/2421/06/24

Research Field

  • Enabling Digital Technologies

Keywords

  • Generative AI
  • ChatGPT
  • Computer Science Education
  • Smart Learning Assistant

Fingerprint

Dive into the research topics of 'An Empirical Study on How Large Language Models Impact Software Testing Learning'. Together they form a unique fingerprint.

Cite this