Abstract
The need for high-speed image processing has increased, especially in the field of computer vision. Many optimized algorithms have been developed to accelerate imaging tasks. However, these algorithms are limited by the hardware on which they are implemented.
Since image processing at the highest speeds makes cloud computing infeasible, local computation is needed. Fortunately, advancements in machine learning have led to the development of new devices that are specialized in performing machine learning tasks as efficiently as possible. These computing devices mainly perform tensor computations for deep learning models.
While manufacturers designed this hardware for machine learning applications, it can also be utilized for many tasks involving matrix and tensor calculations in standard image processing algorithms. Although these devices have been extensively tested for machine learning [3] [4] [5], very little research has been conducted on their use for other purposes. In this work, we explore the potential for integrating such devices into an image processing pipeline.
Two devices were selected for closer examination: The Google Coral Tensor Processing Unit (TPU)[1] and the NVIDIA Jetson Orin Nano included in the Imago Vision Cam XM2 [2]. We investigate the capabilities of both devices for local image processing tasks.
Our focus lies on the Jetson Orin’s tensor cores, which are compared to standard CUDA cores. Both devices are tested using implementations of edge detection and bilinear interpolation algorithms. A comparison is conducted with a CPU implementation. An approach to implement custom functions on the Coral TPU is demonstrated. The devices should demonstrate, among others, a substantial energy savings while maintaining good performance results, such as computational speed, without significant degradation
in bit-level accuracy.
Since image processing at the highest speeds makes cloud computing infeasible, local computation is needed. Fortunately, advancements in machine learning have led to the development of new devices that are specialized in performing machine learning tasks as efficiently as possible. These computing devices mainly perform tensor computations for deep learning models.
While manufacturers designed this hardware for machine learning applications, it can also be utilized for many tasks involving matrix and tensor calculations in standard image processing algorithms. Although these devices have been extensively tested for machine learning [3] [4] [5], very little research has been conducted on their use for other purposes. In this work, we explore the potential for integrating such devices into an image processing pipeline.
Two devices were selected for closer examination: The Google Coral Tensor Processing Unit (TPU)[1] and the NVIDIA Jetson Orin Nano included in the Imago Vision Cam XM2 [2]. We investigate the capabilities of both devices for local image processing tasks.
Our focus lies on the Jetson Orin’s tensor cores, which are compared to standard CUDA cores. Both devices are tested using implementations of edge detection and bilinear interpolation algorithms. A comparison is conducted with a CPU implementation. An approach to implement custom functions on the Coral TPU is demonstrated. The devices should demonstrate, among others, a substantial energy savings while maintaining good performance results, such as computational speed, without significant degradation
in bit-level accuracy.
| Originalsprache | Englisch |
|---|---|
| Publikationsstatus | Veröffentlicht - 16 Okt. 2025 |
| Veranstaltung | EMVA Forum 2025 - Fraunhofer Institute for Integrated Circuits , Fürth, Deutschland Dauer: 16 Okt. 2025 → 17 Okt. 2025 https://emvf-2025.emva.org/page-2941 |
Konferenz
| Konferenz | EMVA Forum 2025 |
|---|---|
| Land/Gebiet | Deutschland |
| Stadt | Fürth |
| Zeitraum | 16/10/25 → 17/10/25 |
| Internetadresse |
Research Field
- High-Performance Vision Systems