Abstract
This thesis outlines the development of a code completion model in Kotlin Jetpack Compose. As developers increasingly rely on AI tools to assist with coding, this has become an interesting topic for research.
Since no dedicated datasets for this domain were available, the work followed an iterative data collection process: starting from a small prototype and leading to a deduplicated and cleaned dataset of over 50,000 Compose blocks. This dataset was then used to fine-tune the StarCoder base model.
The methodology covers related work, base model selection, the data retrieval and cleaning of the data, training, and finally, generating code. The focus is on the challenges encountered, such as API limitations, datset quality, and restricted computational resources, and the solutions applied to address them.
Although the final results remain limited, the process provides valuable insights into the difficulties of domain-specific fine-tuning.
Since no dedicated datasets for this domain were available, the work followed an iterative data collection process: starting from a small prototype and leading to a deduplicated and cleaned dataset of over 50,000 Compose blocks. This dataset was then used to fine-tune the StarCoder base model.
The methodology covers related work, base model selection, the data retrieval and cleaning of the data, training, and finally, generating code. The focus is on the challenges encountered, such as API limitations, datset quality, and restricted computational resources, and the solutions applied to address them.
Although the final results remain limited, the process provides valuable insights into the difficulties of domain-specific fine-tuning.
| Originalsprache | Englisch |
|---|---|
| Qualifikation | Master of Science |
| Gradverleihende Hochschule |
|
| Betreuer/-in / Berater/-in |
|
| Datum der Bewilligung | 11 Sept. 2025 |
| Publikationsstatus | Veröffentlicht - 2025 |
Research Field
- Multimodal Analytics