Well begun is half done: Generating high-quality seeds for automatic image dataset construction from web

Yan Xia; Xudong Cao; Fang Wen; Jian Sun

Conference ProceedingsOPEN ACCESS

Well begun is half done: Generating high-quality seeds for automatic image dataset construction from web

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8692 LNCS(PART 4) 387-400

DOI: 10.1007/978-3-319-10593-2_26

15Citations

21Readers

Abstract

We present a fully automatic approach to construct a large-scale, high-precision dataset from noisy web images. Within the entire pipeline, we focus on generating high quality seed images for subsequent dataset growing. High quality seeds are essential as we revealed, but they have received relatively less attention in previous works with respect to how to automatically generate them. In this work, we propose a density score based on rank-order distance to identify positive seed images. The basic idea is images relevant to a concept typically are tightly clustered, while the outliers are widely scattered. Through adaptive thresholding, we guarantee the selected seeds as numerous and accurate as possible. Starting with the high quality seeds, we grow a high quality dataset by dividing seeds and conducting iterative negative and positive mining. Our system can automatically collect thousands of images for one concept/class, with a precision rate of 95% or more. Comparisons with recent state-of-the-arts also demonstrate our method's superior performance. © 2014 Springer International Publishing.

Cite

CITATION STYLE

APA

Xia, Y., Cao, X., Wen, F., & Sun, J. (2014). Well begun is half done: Generating high-quality seeds for automatic image dataset construction from web. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8692 LNCS, pp. 387–400). Springer Verlag. https://doi.org/10.1007/978-3-319-10593-2_26

Well begun is half done: Generating high-quality seeds for automatic image dataset construction from web

Abstract

Cite

Register to see more suggestions