Optimized deep learning vision system for human action recognition from drone images

Hussein Samma; Ali Salem Bin Sama

Journal ArticleOPEN ACCESS

Optimized deep learning vision system for human action recognition from drone images

Multimedia Tools and Applications (2024) 83(1) 1143-1164

DOI: 10.1007/s11042-023-15930-9

7Citations

29Readers

Abstract

There are several benefits to constructing a lightweight vision system that is implemented directly on limited hardware devices. Most deep learning-based computer vision systems, such as YOLO (You Only Look Once), use computationally expensive backbone feature extractor networks, such as ResNet and Inception network. To address the issue of network complexity, researchers created SqueezeNet, an alternative compressed and diminutive network. However, SqueezeNet was trained to recognize 1000 unique objects as a broad classification system. This work integrates a two-layer particle swarm optimizer (TLPSO) into YOLO to reduce the contribution of SqueezeNet convolutional filters that have contributed less to human action recognition. In short, this work introduces a lightweight vision system with an optimized SqueezeNet backbone feature extraction network. Secondly, it does so without sacrificing accuracy. This is because that the high-dimensional SqueezeNet convolutional filter selection is supported by the efficient TLPSO algorithm. The proposed vision system has been used to the recognition of human behaviors from drone-mounted camera images. This study focused on two separate motions, namely walking and running. As a consequence, a total of 300 pictures were taken at various places, angles, and weather conditions, with 100 shots capturing running and 200 images capturing walking. The TLPSO technique lowered SqueezeNet’s convolutional filters by 52%, resulting in a sevenfold boost in detection speed. With an F1 score of 94.65% and an inference time of 0.061 milliseconds, the suggested system beat earlier vision systems in terms of human recognition from drone-based photographs. In addition, the performance assessment of TLPSO in comparison to other related optimizers found that TLPSO had a better convergence curve and achieved a higher fitness value. In statistical comparisons, TLPSO surpassed PSO and RLMPSO by a wide margin.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Samma, H., & Sama, A. S. B. (2024). Optimized deep learning vision system for human action recognition from drone images. Multimedia Tools and Applications, 83(1), 1143–1164. https://doi.org/10.1007/s11042-023-15930-9

Readers' Seniority

PhD / Post grad / Masters / Doc 4

40%

Professor / Associate Prof. 3

30%

Researcher 2

20%

Lecturer / Post doc 1

10%

Readers' Discipline

Computer Science 5

56%

Medicine and Dentistry 2

22%

Engineering 2

22%

Optimized deep learning vision system for human action recognition from drone images

Abstract

Author supplied keywords

References Powered by Scopus

Deep residual learning for image recognition

You only look once: Unified, real-time object detection

Technical Note: Q-Learning

Cited by Powered by Scopus

Weighted voting ensemble of hybrid CNN-LSTM Models for vision-based human activity recognition

Remote Sensing SurveillanceRemote Sensing Surveillance using Multilevel Feature Fusion and Deep Neural Network

Diving deep into human action recognition in aerial videos: A survey

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline