Google Scholar ’ s Ranking Algorithm : The Impact of Citation Counts ( An Empirical Study )
- ISBN: 9781424428649
- DOI: 10.1109/RCIS.2009.5089308.Published
Abstract
Google Scholar is one of the major academic search engines but its ranking algorithm for academic articles is unknown. In a recent study we partly reverse-engineered the algorithm. This paper presents the results of our second study. While the previous study provided a broad overview, the current study focused on analyzing the correlation of an article's citation count and its ranking in Google Scholar. For this study, citation counts and rankings of 1,364,757 articles were analyzed. Some results of our first study were confirmed: Citation counts is the highest weighed factor in Google Scholar's ranking algorithm. Highly cited articles are found significantly more often in higher positions than articles that are cited less often. Therefore, Google Scholar seems to be more suitable for searching standard literature than for gems or articles by authors advancing a view different from the mainstream. However, interesting exceptions for some search queries occurred. In some cases no correlation existed; in others bizarre patterns were recognizable, suggesting that citation counts sometimes have no impact at all on articles' rankings. 2009 IEEE.
Author-supplied keywords
Google Scholar ’ s Ranking Algorithm : The Impact of Citation Counts ( An Empirical Study )
Citation Counts (An Empirical Study)
Jöran Beel
Otto-von-Guericke University
Department of Computer Science
ITI / VLBA-Lab / Scienstein
Magdeburg, Germany
j.beel@scienstein.org
Bela Gipp
Otto-von-Guericke University
Department of Computer Science
ITI / VLBA-Lab / Scienstein
Magdeburg, Germany
b.gipp@scienstein.org
ABSTRACT
Google Scholar is one of the major academic search engines but
its ranking algorithm for academic articles is unknown. In a
recent study we partly reverse-engineered the algorithm. This
paper presents the results of our second study. While the
previous study provided a broad overview, the current study
focused on analyzing the correlation of an article‟s citation count
and its ranking in Google Scholar. For this study, citation counts
and rankings of 1,364,757 articles were analyzed. Some results
of our first study were confirmed: Citation counts is the highest
weighed factor in Google Scholar‟s ranking algorithm. Highly
cited articles are found significantly more often in higher
positions than articles that are cited less often. Therefore, Google
Scholar seems to be more suitable for searching standard
literature than for gems or articles by authors advancing a view
different from the mainstream. However, interesting exceptions
for some search queries occurred. In some cases no correlation
existed; in others bizarre patterns were recognizable, suggesting
that citation counts sometimes have no impact at all on articles‟
rankings.
Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information Search
and Retrieval – Information filtering, Search process, Selection
process.
General Terms
Algorithms
Keywords
Academic Search Engines, Google Scholar, Ranking Algorithm,
Citation Counts, Empirical Study
1. INTRODUCTION
With the increasing use of academic search engines it becomes
increasingly important for scientific authors to have their
research articles well ranked in those search engines, in order to
reach their audience. In other words, for scientists, knowledge
about ranking algorithms is essential in order to optimize their
research papers for academic search engines, such as Google
Scholar or Scienstein.org. For instance, if search engines
consider how often a search term occurs in an article‟s full text,
authors should use the most relevant keywords in their articles
whenever possible to achieve a top ranking.
For users of academic search engines, knowledge about applied
ranking algorithms is also essential, for two basic reasons.
Firstly, users should know about the algorithms in order to
estimate the search engine‟s robustness towards manipulative
attempts by authors and spammers and therefore, the
trustworthiness of the results. Secondly, knowledge of ranking
algorithms enables researchers to estimate the usefulness of
results in respect to their search intention. For instance,
researchers interested in the latest trends should use a search
engine putting a high weight on the publications‟ date. Users
searching for standard literature should choose a search engine
putting a high weight on citation counts. In contrast, if a user
searches for articles by authors advancing a perspective which
differs from the majority, search engines putting a high weight on
citation counts might not be appropriate.
This paper deals with the question of how Google Scholar ranks
its results, and is structured as follows: First, related work is
presented. Then, the research objective is outlined, followed by
the applied methodology. Finally, results and their
interpretations are presented.
2. RELATED WORK
Due to different user needs, many academic databases and search
engines enable the user to choose a ranking algorithm. For
instance, ScienceDirect lets users select between date and
relevance1, IEEE Xplore in addition, offers a ranking by title and
ACM Digital Library allows users to choose whether to sort
results by relevance, publication date, alphabetically by title or
journal, citation counts or downloads. However, these
„algorithms‟ can be considered trivial since users can select only
one ranking criteria and are not allowed to use a (weighed)
combination of them.
Ranking academic articles by citation counts is a common
procedure, but remains controversial. With regards to academic
search engines, two points of criticism are particularly relevant.
Firstly, ranking articles based on citation counts strengthens the
Matthew Effect. This means that those articles with many
citations are displayed first, therefore they get many readers and
receive many citations, which in turn causes them to be displayed
first. This is a common problem which exists in the scientific
community [1]. However, academic search engines could
increase this dilemma as users of search engines usually pay
1 „Relevance‟ in most cases means that the more often a search
term occurs in a document, the more relevant it is considered.
Jöran Beel and Bela Gipp. Google Scholar‟s Ranking Algorithm: The Impact of Citation Counts (An Empirical Study). In
André Flory and Martine Collard, editors, Proceedings of the 3rd IEEE International Conference on Research Challenges in
Information Science (RCIS’09), pages 439–446, Fez (Morocco), April 2009. IEEE. doi: 10.1109/RCIS.2009.5089308.
Downloaded from www.sciplore.org.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


