Abstract
Highlights: What are the main findings? A lightweight YOLOv8n model integrated with C2f-FasterBlock and SE attention achieves high apple detection accuracy (mAP = 0.885) and real-time performance (83 FPS) with 37% fewer parameters and a compact 4.3 MB size An end-to-end active perception framework based on ResNet50 and multi-modal fusion enables the robotic arm to autonomously navigate to optimal viewpoints, significantly reducing occlusion and improving recognition success. What are the implications of the main findings? The proposed co-design of efficient perception and active sensing offers a practical solution for reliable fruit detection in cluttered orchard environments, addressing a key bottleneck in agricultural automation. The system’s direct mapping from visual input to motion planning demonstrates a scalable paradigm for closed-loop robotic harvesting, paving the way for deployment in real-world field conditions. Addressing the issue of fruit recognition and localization failures in harvesting robots due to severe occlusion by branches and leaves in complex orchard environments, this paper proposes an occlusion avoidance method that combines a lightweight YOLOv8n model, developed by Ultralytics in the United States, with active perception. Firstly, to meet the stringent real-time requirements of the active perception system, a lightweight YOLOv8n model was developed. This model reduces computational redundancy by incorporating the C2f-FasterBlock module and enhances key feature representation by integrating the SE attention mechanism, significantly improving inference speed while maintaining high detection accuracy. Secondly, an end-to-end active perception model based on ResNet50 and multi-modal fusion was designed. This model can intelligently predict the optimal movement direction for the robotic arm based on the current observation image, actively avoiding occlusions to obtain a more complete field of view. The model was trained using a matrix dataset constructed through the robot’s dynamic exploration in real-world scenarios, achieving a direct mapping from visual perception to motion planning. Experimental results demonstrate that the proposed lightweight YOLOv8n model achieves a mAP of 0.885 in apple detection tasks, a frame rate of 83 FPS, a parameter count reduced to 1,983,068, and a model weight file size reduced to 4.3 MB, significantly outperforming the baseline model. In active perception experiments, the proposed method effectively guided the robotic arm to quickly find observation positions with minimal occlusion, substantially improving the success rate of target recognition and the overall operational efficiency of the system. The current research outcomes provide preliminary technical validation and a feasible exploratory pathway for developing agricultural harvesting robot systems suitable for real-world complex environments. It should be noted that the validation of this study was primarily conducted in controlled environments. Subsequent work still requires large-scale testing in diverse real-world orchard scenarios, as well as further system optimization and performance evaluation in more realistic application settings, which include natural lighting variations, complex weather conditions, and actual occlusion patterns.
Author supplied keywords
Cite
CITATION STYLE
Zhang, T., Huang, J., Niu, J., Liu, Z., Zhang, L., & Song, H. (2026). Occlusion Avoidance for Harvesting Robots: A Lightweight Active Perception Model. Sensors, 26(1). https://doi.org/10.3390/s26010291
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.