In this study, additional depth images were used to enrich the information in each image pixel. Segmentation, by its nature capable to process image up to pixel level. So, it can detect up to the smallest part of the object, even when it’s overlapped with another object. By using segmentation, the main goal is to be able to maintain the tracking process longer when the object starts to be occluded until it is severely occluded right before it is completely disappeared. Object tracking based on object detection was developed by modifying the Mask R-CNN architecture to process RGBD images. The detection results feature extracted using HOG, and each of them got compared to the target objects. The comparison was using cosine similarity calculation, and the maximum value of the detected object would update the target object for the next frame. The evaluation of the model was using mAP calculation. Mask R-CNN RGBD late fusion had a higher value by 5% than Mask R-CNN RGB. It was 68,234% and 63,668%, respectively. Meanwhile, the tracking evaluation uses the traditional method of calculating the id switching during the tracking process. Out of 295 frames, the original Mask R-CNN method had ten switching ID times. On the other hand, the proposed method Mask R-CNN RGBD had much better tracking results with switching ids close to 0. Keywords—Occlusion, RGBD, Mask R-CNN, Late fusion, Cosine similarity
CITATION STYLE
Pratiwi, S. H., Shaniya, P., Jati, G., & Jatmiko, W. (2023). Improved mask RCNN and cosine similarity using RGBD segmentation for Occlusion handling in Multi Object Tracking. Jurnal Ilmu Komputer Dan Informasi, 16(1), 1–13. https://doi.org/10.21609/jiki.v16i1.1073
Mendeley helps you to discover research relevant for your work.