The advancement of web programming techniques, such as Ajax and jQuery, and datastores, such as Apache Solr and Elasticsearch, have made it much easier to deploy small to medium scale web-based search engines. However, developing a sustainable search engine that supports scholarly big data services is still challenging often because of limited human resources and financial support. Such scenarios are typical in academic settings or small businesses. Here, we showcase how four key design decisions were made by trading-off competing factors such as performance, cost, and efficiency, when developing the Next Generation CiteSeerX (NGX), the successor of CiteSeerX, which was a pioneering digital library search engine that has been serving academic communities for more than two decades. This work extends our previous work in Wu et al. (2021) and discusses design considerations of infrastructure, web applications, indexing, and document filtering. These design considerations can be generalized to other web-based search engines with a similar scale that are deployed in small business or academic settings with limited resources.
CITATION STYLE
Wu, J., Rohatgi, S., Angadi, M. K., Puranik, K. S., & Giles, C. L. (2022). Design Considerations for a Sustainable Scholarly Big Data Service. In ACM International Conference Proceeding Series (pp. 83–87). Association for Computing Machinery. https://doi.org/10.1145/3574318.3574340
Mendeley helps you to discover research relevant for your work.