Design Considerations for a Sustainable Scholarly Big Data Service

Jian Wu; Shaurya Rohatgi; Manoj K. Angadi; Kavya S. Puranik; C. Lee Giles

Conference Proceedings

Design Considerations for a Sustainable Scholarly Big Data Service

ACM International Conference Proceeding Series (2022) 83-87

DOI: 10.1145/3574318.3574340

1Citations

6Readers

Get full text

Abstract

The advancement of web programming techniques, such as Ajax and jQuery, and datastores, such as Apache Solr and Elasticsearch, have made it much easier to deploy small to medium scale web-based search engines. However, developing a sustainable search engine that supports scholarly big data services is still challenging often because of limited human resources and financial support. Such scenarios are typical in academic settings or small businesses. Here, we showcase how four key design decisions were made by trading-off competing factors such as performance, cost, and efficiency, when developing the Next Generation CiteSeerX (NGX), the successor of CiteSeerX, which was a pioneering digital library search engine that has been serving academic communities for more than two decades. This work extends our previous work in Wu et al. (2021) and discusses design considerations of infrastructure, web applications, indexing, and document filtering. These design considerations can be generalized to other web-based search engines with a similar scale that are deployed in small business or academic settings with limited resources.

Author supplied keywords

Cite

CITATION STYLE

APA

Wu, J., Rohatgi, S., Angadi, M. K., Puranik, K. S., & Giles, C. L. (2022). Design Considerations for a Sustainable Scholarly Big Data Service. In ACM International Conference Proceeding Series (pp. 83–87). Association for Computing Machinery. https://doi.org/10.1145/3574318.3574340

Design Considerations for a Sustainable Scholarly Big Data Service

Abstract

Author supplied keywords

Cite

Register to see more suggestions