Sign up & Download
Sign in

Crowdsourcing Real-Time Research Trend Data

by Victor Henning, Jason J Hoyt, Jan Reichelt
Significance (2010)

Abstract

This paper proposes a demo of Mendeley at WWW10. Mendeley is a research workflow and collaboration tool, which crowdsources real-time research trend information and semantic annotations of research papers in a central data store. We describe how Mendeleys data can overcome some of the weaknesses of traditional citation-based impact factor metrics and tag-based semantic databases. In the 12 months since its public launch, Mendeley has captured information on 13 million research papers, with its database doubling every 12 weeks.

Cite this document (BETA)

Available from Jan Reichelt's profile on Mendeley.
Page 1
hidden

Crowdsourcing Real-Time Research Trend Data

Copyright is held by the author/owner(s).
FWCS2010, April 26, 2010, Raleigh, USA.
Crowdsourcing Real-Time Research Trend Data
Victor Henning
Mendeley
144a Clerkenwell Road
London EC1R 5DF
+44 207 713 8486
victor.henning@mendeley.com
Jason J. Hoyt
Mendeley
144a Clerkenwell Road
London EC1R 5DF
+44 207 713 8486
jason.hoyt@mendeley.com
Jan Reichelt
Mendeley
144a Clerkenwell Road
London EC1R 5DF
+44 207 713 8486
jan.reichelt@mendeley.com
ABSTRACT
This paper proposes a demo of Mendeley at FWCS2010.
Mendeley is a research workflow and collaboration tool, which
crowdsources real-time research trend information and semantic
annotations of research papers in a central data store. We
describe how Mendeley’s data can overcome some of the
weaknesses of traditional citation-based impact factor metrics
and tag-based semantic databases. In the 12 months since its
public launch, Mendeley has captured information on 13 million
research papers, with its database doubling every 12 weeks.
Categories and Subject Descriptors
H.3.7 [Information Storage and Retrieval]: Digital Libraries –
collection, dissemination, standards, user issues.
General Terms
Management, Measurement, Design, Standardization.
Keywords
Usage-Based Impact Measurement, Crowdsourcing, Article-level
Metrics, Journal Impact Factor, Real-Time, Research Trends,
Scientific Databases
1. THE SHORTCOMINGS OF CITATION-
BASED IMPACT METRICS
Citation-based reputation metrics such as the Journal Impact
Factor (JIF), the h-index or the g-index play an ever-increasing
role in modern science [1, 2, 3]. As seemingly objective
measures of academic impact and performance, they are used to
determine career progression, post-doc positions, tenure, and
grant funding.
Pressure on scholars to perform well according to these metrics
has mounted. So has the criticism leveled against such metrics. It
has been argued that these metrics can lead to academics
engaging in citation bartering, gratuitous authorship, and a
general increase in aggressive, exploitative, and self-promotional
behavior [4]. Citation-based metrics are also thought to tempt
journal editors into gaming the system using techniques that
inflate their JIF. This includes only accepting papers expected to
receive a higher number of citations, encouraging self-citations,
and publishing review articles in place of research articles.
From a methodological perspective, critics point out that citation
counts are context-free, i.e. a citation is counted as positive even
if a paper was cited in a negative context. Moreover, the Gini
coefficient of the citation distribution is extremely high. A small
fraction of all papers garner the majority of all citations, while
the majority of all papers are never cited at all [5]. This also
implies that a single highly cited article could inflate a JIF.
Another major problem for JIFs is the arbitrary two-year window
within which citations are measured, which favors fast-evolving
disciplines. The h-index, similarly, is arbitrarily bound by the
number of papers a researcher has published – so a young
researcher with a few high-impact publications will still have a
low h-index. Finally, there is evidence that only 20% of all
papers cited have actually been read by the authors citing them
[6].
2. USAGE-BASED IMPACT METRICS ON
MENDELEY
Our demo at FWCS2010 will exhibit alternative, usage-based
impact metrics which could potentially alleviate many of the
problems associated with traditional citation-based metrics. Our
usage-based metrics rely on a distributed measurement of article
readership on Mendeley [7], a desktop- and web-based research
management and collaboration tool. Mendeley Desktop, a free
and cross-platform desktop application, automatically extracts
metadata, full-text and cited references from research papers to
minimize manual data input when setting up a local research
paper database. It then enables researchers to manage, tag, full-
text search, read and annotate PDF documents, share research
papers with colleagues, and create bibliographies in word
processors and text editors.
Users can sync their libraries and annotations to the companion
website, Mendeley Web, and manage them online. In this way,
Mendeley Web has accumulated data on more than 13 million
research papers and 150 million citations, by more than 100,000
users, in the first 12 months after its public launch. With
Mendeley’s research paper database doubling in size roughly
every 12 weeks, it is on track to surpass Thomson Reuters’ Web
of Science catalogue of 40m full-text documents and 700m
citations at some point this year.
Our starting point for usage-based metrics is to track the
pervasiveness of research papers in Mendeley user libraries, i.e.
whether they are present on the computers of a wide-ranging,
distributed sample of academics. Preliminary investigation
suggests that, for example, the correlation between Thomson
Reuters’ ISI citation count of the “Top 5 Biology Papers of 2009”
and their corresponding readership number on Mendeley is r=.76
[8]. More encompassing correlation statistics will be presented
during our demo at FWCS2010.
Readership metrics can be seen as a measure of the popularity or
awareness that a paper – and by association, its author,
publication journal, and topic – is enjoying. A second, more fine-
grained usage metric which Mendeley will begin to start tracking
by FWCS2010 is the actual time the users spend reading each
Page 2
hidden
research paper in Mendeley’s integrated PDF viewer, and the
number of repeat readings per paper. This is a measure of the
intensity with which the paper (its author, publication journal,
topic, respectively) is being examined.
A major advantage of such usage-based metrics is that they are
available immediately on a “per article” basis. Usage-based
metrics let authors track how readership of their individual
papers is evolving in real-time, and the article’s impact can
evolve independently of the journal and its impact factor. To
better understand the readership, Mendeley also collects
anonymous demographic information alongside the usage
metrics. This information is presented to users in different
segments such as geographic region, academic discipline, or
junior versus senior faculty.
Mendeley’s usage-based metrics let researchers retrieve the
“hottest” papers for each topic (as marked by user-generated tags
assigned to papers), the “hottest” tags in each academic
discipline (to spot emerging research trends), or up-and-coming
authors with the highest percentage growth in readership in the
past month. By looking at longitudinal trend data, scholars might
be able to assess whether a paper, topic or theory is steadily
gaining followers, is subject to a sudden “hype,” or is already on
the decline again.
3. REFERENCES
[1] Garfield, E. The history and meaning of the journal impact
factor. JAMA 295, 1 (2006), 90-93.
[2] Hirsch, J. An index to quantify an individual's scientific
research output. PNAS 102, 46 (2005), 16569-72.
[3] Egghe, L. Theory and practise of the g-index. Scientometrics
69, 1 (2006), 131-51.
[4] Lawrence, P. Lost in publication: how measurement harms
science. ESEP 8, (2008), 9-11.
[5] Weale, A., Bailey, M., and Lear, P. The level of non-citation
of articles within a journal as a measure of quality: a
comparison to the impact factor. BMC Medical Research
Methodology 4, 1 (2004), 14.
[6] Simkin, M. and Roychowdhury, V. Do you sincerely want to
be cited? Or: read before you cite. Significance 3, 4 (2006),
179-181.
[7] http://www.mendeley.com.
[8] Henning, V. The Top 10 Journal Articles Published in 2009
by Readership on Mendeley. Mendeley Blog (2010),
http://www.mendeley.com/blog/academic-features/the-top-
10-journal-articles-published-in-2009-by-readership-on-
mendeley/.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

13 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
23% Student (Master)
 
23% Ph.D. Student
 
15% Librarian
by Country
 
31% United Kingdom
 
15% Canada
 
15% Germany