Several unmanned retail stores have been introduced with the development of sensors, wireless communication, and computer vision technologies. A vision-based kiosk that is only equipped with a vision sensor has significant advantages such as compactness and low implementation cost. Using convolutional neural network (CNN)-based object detectors, the kiosk recognizes an object when a customer picks up a product. In retail object recognition, the key challenge is the limited number of detections and high interclass similarity. In this study, these challenges are addressed by utilizing the 'view-specific' feature of an object; specifically, an object class is divided into multiple 'view-based' subclasses, and the object detectors are trained using these data. Further, the 'view-aware feature' is defined by aggregating subclass detection results from multiple cameras. A superclass classifier predicts a superclass by utilizing an informative subclass detection result that distinguishes the target object from other similar-looking objects. To verify the effectiveness of the proposed approach, a prototype of the vision-based unmanned kiosk system is implemented. Experimental results indicate that the proposed method outperforms the conventional method, even on a state-of-the-art detection network. The dataset used in this study has been subsequently provided in the IEEE DataPort for reproducibility.
CITATION STYLE
Jeon, J. Y., Kang, S. W., Lee, H. J., & Kim, J. S. (2022). A Retail Object Classification Method Using Multiple Cameras for Vision-Based Unmanned Kiosks. IEEE Sensors Journal, 22(22), 22200–22209. https://doi.org/10.1109/JSEN.2022.3210699
Mendeley helps you to discover research relevant for your work.