A Lightweight CNN-Transformer Network with Laplacian Loss for Low-Altitude UAV Imagery Semantic Segmentation
Date
Authors
Supervisor
Item type
Degree name
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Semantic segmentation is crucial for enabling autonomous flight and landing of low-altitude unmanned aerial vehicles (UAVs) and is indispensable for various intelligent applications. However, real-time semantic segmentation is a computationally intensive task because it involves pixel-wise classification, which renders conventional semantic segmentation networks impractical for deployment on embedded systems of limited hardware resources. Moreover, variations in flight height and object appearance increase the likelihood of misjudgment in segmentation results. To address these challenges, we propose an efficient approach consisting of a convolutional neural network (CNN)–Transformer network and an auxiliary loss. The encoder of the network integrates a newly designed module, which equally handles objects with varying scales. The decoder is composed of the innovative query–value squeeze axial transformer attention (QVSATA), which reduces computational complexity from quadratic in terms of image size to O(2C(H2+W2)) , linear in terms of image size. By incorporating Laplacian operator convolution, the novel network-agnostic loss effectively captures intricate patterns, boundaries, and small objects. This enables extra penalization of misjudgments in these areas and compels the network to focus on objects that are challenging to distinguish. Our approach attains impressive accuracy when processing 4K resolution images in real time (15 FPS) on a mobile GPU. It demonstrates over 2× faster speed compared to representative lightweight networks, underscoring its suitability for onboard deployment.