An Intelligent Data-Centric Web Crawler Service for API Corpus Construction at Scale

Mehdi Assefi; Mehdi Bahrami; Sarthak Arora; Thiab R. Taha; Hamid R. Arabnia; Khaled M. Rasheed; Wei Peng Chen

Conference Proceedings

An Intelligent Data-Centric Web Crawler Service for API Corpus Construction at Scale

Proceedings - IEEE International Conference on Web Services, ICWS 2022 (2022) 385-390

DOI: 10.1109/ICWS55610.2022.00064

2Citations

2Readers

Get full text

Abstract

The number of web APIs is growing rapidly. API adoption is increasing across all industries with executives prioritizing investments in the API economy. Each API provider offers API documentation which includes complex descriptions. In order to collect and understand the applications and operations of diverse APIs, software engineers read lengthy and complicated API documentations. Understanding the variety of API documentations is a labor intensive and error-prone process. In this paper, we introduce a data-centric web crawler service to collect, analyze, and construct a large corpus of API documentations. The generated API Corpus can be used in machine programming (i.e., code generation, code search). The proposed API web-crawler intelligently harvests more than 2.8M API documentation pages where it uses a machine-learning-based approach with an accuracy of 91.32% to select only web API pages (REST). We also conducted an extensive and end-to-end real-world evaluation, where the proposed API web-crawler not only collects a sheer number of API pages, but also successfully validates 1,222 APIs out of 1,521 target APIs with a success rate of 80.34%.

Author supplied keywords

Cite

CITATION STYLE

APA

Assefi, M., Bahrami, M., Arora, S., Taha, T. R., Arabnia, H. R., Rasheed, K. M., & Chen, W. P. (2022). An Intelligent Data-Centric Web Crawler Service for API Corpus Construction at Scale. In Proceedings - IEEE International Conference on Web Services, ICWS 2022 (pp. 385–390). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ICWS55610.2022.00064

An Intelligent Data-Centric Web Crawler Service for API Corpus Construction at Scale

Abstract

Author supplied keywords

Cite

Register to see more suggestions