An Intelligent Data-Centric Web Crawler Service for API Corpus Construction at Scale

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The number of web APIs is growing rapidly. API adoption is increasing across all industries with executives prioritizing investments in the API economy. Each API provider offers API documentation which includes complex descriptions. In order to collect and understand the applications and operations of diverse APIs, software engineers read lengthy and complicated API documentations. Understanding the variety of API documentations is a labor intensive and error-prone process. In this paper, we introduce a data-centric web crawler service to collect, analyze, and construct a large corpus of API documentations. The generated API Corpus can be used in machine programming (i.e., code generation, code search). The proposed API web-crawler intelligently harvests more than 2.8M API documentation pages where it uses a machine-learning-based approach with an accuracy of 91.32% to select only web API pages (REST). We also conducted an extensive and end-to-end real-world evaluation, where the proposed API web-crawler not only collects a sheer number of API pages, but also successfully validates 1,222 APIs out of 1,521 target APIs with a success rate of 80.34%.

Author supplied keywords

Cite

CITATION STYLE

APA

Assefi, M., Bahrami, M., Arora, S., Taha, T. R., Arabnia, H. R., Rasheed, K. M., & Chen, W. P. (2022). An Intelligent Data-Centric Web Crawler Service for API Corpus Construction at Scale. In Proceedings - IEEE International Conference on Web Services, ICWS 2022 (pp. 385–390). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ICWS55610.2022.00064

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free