Abstract
Semantic segmentation of machinery or vehicles that consist of different parts is a non-trivial task that requires a lot of annotated training data. Creating a dataset with sufficient size for the desired machinery is extremely time-consuming. This master thesis proposes a novel way of semantic image part segmentation by combining the three state-of-the-art foundation models CLIPSeg, SuperPoint, and Segment Anything Model together with a custom graph neural network-based classifier. Using graph neural networks allows for understanding the relationships and hierarchies of the different machinery parts in an image, while the pre-trained foundation models assist in graph generation or final segmentation with their unique capabilities. To evaluate the proposed method, a synthetic dataset showing a truck-mounted loading crane was created. The maximum class count the model can segment reliably was determined by dividing the segmentation masks into five different granularity levels, with class counts ranging from 2 to 22. The model was trained with 1, 3, 5, 10, or 25 annotated images and evaluated on 250 test images. The results imply that the model can learn part segmentation with only a few annotated images, since it achieves a median Dice score of 0.58 for 8 classes with only 25 annotated images. Only five annotated images reach results above 0.8 for lower granularities, which shows the potential of the proposed method. Furthermore, the model generalizes well to unseen data, as it was applied to a real dataset and achieved positive results. Training time is at most 10 minutes, and inference time is 0.5 seconds per image. This method is an efficient and effective way to learn the segmentation of machinery parts with only a few annotated images.
Originalsprache | Englisch |
---|---|
Qualifikation | Master of Science |
Gradverleihende Hochschule |
|
Betreuer/-in / Berater/-in |
|
Datum der Bewilligung | 25 Juli 2024 |
Publikationsstatus | Veröffentlicht - 2024 |
Research Field
- Assistive and Autonomous Systems