Sign up & Download
Sign in

Mendeley's Reply to the DataTEL Challenge

by Kris Jack, James Hammerton, Dan Harvey, Jason J Hoyt, Jan Reichelt, Victor Henning
Procedia Computer Science (2010)

Abstract

Mendeley has and continues to build a strong user community of researchers who benefit from both its desktop and web-based software. In building its community, Mendeley has recorded a considerable amount of data that can be analyzed in order to support researchers to do better research. One key area in which researchers are helped is by providing them with recommendations on research articles that they have not yet encountered but would be interested in. Recommendation system research, while being well studied in some domains, such as cinematography, lacks the kind of scientific data sets that Mendeley has been building. Mendeley has taken up the DataTEL challenge in order to provide recommendation system researchers with valuable data on users and their relationship with scientific literature. The data set has been made anonymous to protect user privacy and can only be used for non- commercial scientific purposes.

Cite this document (BETA)

Available from Kris Jack, Dan Harvey, James Hammerton, Jason Hoyt and Jan Reichelt's profiles on Mendeley.
Page 1
hidden

Mendeley's Reply to the DataTEL Challenge

Mendeley's Reply to the DataTEL Challenge
Kris Jack, James Hammerton, Dan Harvey, Jason J. Hoyt, Jan Reichelt, Paul
Foeckler, and Victor Henning
Mendeley Ltd., 144a Clerkenwell Road
London, EC1R 5DF, United Kingdom
{kris.jack, james.hammerton, dan.harvey, jason.hoyt, jan.reichelt, paul.foeckler,
victor.henning}@mendeley.com
Abstract. Mendeley has and continues to build a strong user community of
researchers who benefit from both its desktop and web-based software. In
building its community, Mendeley has recorded a considerable amount of data
that can be analyzed in order to support researchers to do better research. One
key area in which researchers are helped is by providing them with
recommendations on research articles that they have not yet encountered but
would be interested in. Recommendation system research, while being well
studied in some domains, such as cinematography, lacks the kind of scientific
data sets that Mendeley has been building. Mendeley has taken up the
DataTEL challenge in order to provide recommendation system researchers
with valuable data on users and their relationship with scientific literature. The
data set has been made anonymous to protect user privacy and can only be used
for non-commercial scientific purposes.
Keywords: Mendeley, Recommendations, Personalization, Data Set, Scientific
Articles, Research Articles.
1 Introduction
Mendeley is a research platform that helps users to organize their research,
collaborate with colleagues and discover new knowledge [1]. Mendeley records and
analyzes a vast amount of data on a daily basis. As of October, 2010, Mendeley's
user base has grown to over 550,000 researchers who have contributed 44 million
articles, since being launched in the previous year. This paper presents researchers
with access to data that can be used to test recommendation systems. The data has
been collected primarily through analyzing research articles that users have added to
Mendeley Desktop's reference management tool.
To protect user privacy, the data set has been made anonymous. All of the ids that
appear in the data, such as articles and user ids, do not correspond to the ids that are
used in Mendeley's databases and are accessible through Mendeley's API. The data
set contains just under 10% of the user profiles that have been registered with
Mendeley.
Page 2
hidden
2 Data Set
Mendeley's data set provides information on user libraries in three files. One file
includes the set of articles that appear in user libraries, while the other two provide
usage-based information: one of them showing which articles users have read using
Mendeley Desktop; and the other showing which articles users have marked with
stars using Mendeley Desktop. Mendeley's data set is intended to help researchers to
test and optimize recommendation systems in the domain of scientific literature.
Researchers use Mendeley Desktop and Mendeley Web to add scientific articles to
their libraries. A selection of these libraries were randomly selected and entered into
the data set (Table 1). The file has 50,000 user libraries that contain a total of
4,848,724 articles, 3,652,285 of them being unique. All user libraries contain at least
20 articles.
The second data file provides readership information for researchers and their
articles (Table 2). Using Mendeley Desktop, users can open up their articles and read
them. When read, the application indicates to the user that the article has been read.
This file includes the readership data for the same articles presented in the first file
and indicates whether the user has used Mendeley Desktop to read them or not.
1,466,489 of the articles that appear in libraries, or 30%, have been read using
Mendeley Desktop.
Researchers can also make use of Mendeley Desktop to star articles that are in
their libraries. This starring information is included in the third and final file, the
Library Starring table (see Table 2). In the file, 615,308 of the 4,848,724 articles
library entries (13%) have been starred by users. Mendeley does not put any
requirements on why users should star articles. As a result, users may star articles for
different reasons, making the action semantically ambiguous.
3 Obtaining the Data
Mendeley's data set is available for download from the Mendeley Developer Portal
(http://dev.mendeley.com/). To obtain a copy of the data, please write to
datachallenge@mendeley.com with the following information:
• Your name;
• Institutional affiliation;
• Contact details (physical address and phone number).
The portal also provides an API that allows developers to gain access to much of
the data that is available on the Mendeley Web. Developers should note that the user
and article ids employed in the API do not correspond to the ids used in the data set to
ensure user anonymity. Mendeley may contact developers if changes are required to
be made to the data set. Mendeley's data set is being provided for non-commercial
scientific use only.
Page 3
hidden
4 Conclusion
Research conducted using Mendeley's data set is expected to contribute to new
recommendation system algorithms in the domain of scientific research.
References
1. Henning, V., Reichelt, J.: Mendeley - A Last.fm For Research? 2008 IEEE Fourth
International Conference on eScience. 327-328 (2008).
Appendix
Table 1. User Library Data Scheme
User Library
Element Description
SchemeID Mendeley.com user libraries
No. Columns 2
Column 1 User id (string id that uniquely identifies a user)
Column 2 Article id (string id that uniquely identifies an article)
Table 2. Library Readership Data Scheme
Library Readership
Element Description
schemeID Mendeley.com library readership
No. Columns 3
Column 1 User id (string id that uniquely identifies a user)
Column 2 Article id (string id that uniquely identifies an article)
Column 3 Read status (int that is 1 if the article has been read and 0 if
it has not been read using Mendeley Desktop)
Table 3. Library Starring Data Scheme
Library Starring
Element Description
schemeID Mendeley.com library starring
No. Columns 3
Column 1 User id (string id that uniquely identifies a user)
Column 2 Article id (string id that uniquely identifies an article)
Column 3 Star status (int that is 1 if the article has been starred and 0
if it has not been starred using Mendeley Desktop)

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

41 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
29% Ph.D. Student
 
24% Other Professional
 
10% Post Doc
by Country
 
27% United States
 
22% United Kingdom
 
7% Germany