Sign up & Download
Sign in

Usage Impact Factor: the effects of sample characteristics on usage-based impact metrics

by Johan Bollen, Herbert Van De Sompel
Journal of the American Society for Information Science ()

Abstract

There exist ample demonstrations that indicators of scholarly impact analogous to the citation-based ISI Impact Factor can be derived from usage data. However, contrary to the ISI IF which is based on citation data generated by the global community of scholarly authors, so far usage can only be practically recorded at a local level leading to community-specific assessments of scholarly impact that are difficult to generalize to the global scholarly community. We define a journal Usage Impact Factor which mimics the definition of the Thomson Scientific's ISI Impact Factor. Usage Impact Factor rankings are calculated on the basis of a large-scale usage data set recorded for the California State University system from 2003 to 2005. The resulting journal rankings are then compared to Thomson Scientific's ISI Impact Factor which is used as a baseline indicator of general impact. Our results indicate that impact as derived from California State University usage reflects the particular scientific and demographic characteristics of its communities.

Cite this document (BETA)

Available from arxiv.org
Page 1
hidden

Usage Impact Factor: the effects ...

arXiv:cs/0610154v2 [cs.DL] 26 Oct 2006 Usage Impact Factor: the effects of sample characteristics on usage-based impact metrics. Johan Bollen������ and Herbert Van de Sompel��� ��� Digital Library Research & Prototyping Team Los Alamos National Laboratory { jbollen, herbertv}@lanl.gov LA-UR-06-7626 There exist ample demonstrations that indicators of scholarly impact analogous to the citation-based ISI Impact Factor can be derived from usage data. How- ever, contrary to the ISI IF which is based on citation data generated by the global community of scholarly authors, so far usage can only be practically recorded at a local level leading to community-specific assess- ments of scholarly impact that are difficult to gener- alize to the global scholarly community. We define a journal Usage Impact Factor which mimics the defi- nition of the Thomson Scientific���s ISI Impact Factor. Usage Impact Factor rankings are calculated on the basis of a large-scale usage data set recorded for the California State University system from 2003 to 2005. The resulting journal rankings are then compared to Thomson Scientific���s ISI Impact Factor which is used as a baseline indicator of general impact. Our results indicate that impact as derived from California State University usage reflects the particular scientific and demographic characteristics of its communities. 1 Introduction Usage of scholarly resources as recorded by digital information systems has been gaining acceptance as a tool to study the scholarly community. Usage data has been used to study trends in science (Bollen, Luce, Vemulapalli, & Xu, 2003) as well as to visually map the interests of certain subsets of the scholarly community (Bollen & Van de Sompel, 2006a). In addition, usage data has been shown to be a promising alternative to citation data in the assessment of scholarly impact. As early as 2001 (Darmoni, Roussel, Benichou, Thirion, & Pinhas, 2002) propose a reading factor to rank journals according to their impact derived from a library���s access statistics. Bollen and Luce (2002) and Bollen, Van de Sompel, Smith, and Luce (2005) propose the use of social network metrics calculated for journal networks derived from usage sequences in a library���s access log. Kurtz et al. (2004b, 2004a) discuss the potential of usage data for impact ranking. Brody, Harnad, and Carr (2006) later explore how early article usage statistics can predict citation rates. In addition to these research developments, practical standards for publisher reported usage statistics (COUNTER project1) and their aggregation (SUSHI project2) have been developed. Thomson Scientific has recently included usage statistics in its ISI Web of Knowledge product3. Since usage data is recorded by particular information systems, the acquired data naturally pertains to the user community of those systems. For example, when Bollen and Luce (2002) rank journals according to their usage this is done on the basis of usage data recorded by the Los Alamos National Laboratory Research Library servers and therefore reflects the preferences of the LANL community. In a similar manner, the results reported by Brody et al. (2006) apply to the user community of the UK arXiv mirror4. A similar argument can be made for the "citation-download correlation tool" of the University of Southampton���s CiteBase system5 which uses download information from the UK arXiv mirror. In all cases the community for which usage was 1http://www.projectcounter.org/ 2http://www.niso.org/committees/SUSHI/SUSHI_comm.html 3ISI Web of Knowledge Usage Reporting System (WURS) 4http://uk.arxiv.org/ 5http://www.citebase.org/ 1
Page 2
hidden
recorded is delimited by the boundaries of a particular service or information system. The resulting sample of the scholarly community that generated the usage data through its interaction with these systems is unknown both in terms of its diversity and span. The CiteBase user community could in fact be a diverse mix of undergrad- uate students, professors, university staff, laypersons, and scholars. Its span may or may not be limited to the United Kingdom. The resulting usage data and its subsequent analysis could therefore be shaped by a set of sample characteristics that are not well-understood. In fact, when considering usage statistics as a population statistic, the question then emerges for which sample of the scholarly community usage has been recorded, and how the characteristics of that particular sample will influence the outcomes of a subsequent assessment of scholarly impact based on these statistics. The issue of sampling permeates the field of schol- arly impact assessments, even where citation data is used. Thomson Scientific���s ISI Impact Factor (ISI IF) is calcu- lated from citation rates recorded for a set of ISI-selected journals. The corresponding sample of the scholarly com- munity consequently has the following characteristics: 1. Span: extends to the global set of scholarly authors. 2. Diversity: limited to scholarly authors, and articles published in the set of ISI-selected journals. In spite of the latter limitation, the ISI IF is perceived to be based on a representative and respected sample which supports its general acceptance as an indicator of scholarly impact. In comparison to the ISI IF, usage-based assessments of scholarly impact are generally based on samples of the scholarly community with the following characteristics: 1. Span: delimited by the local boundaries of a partic- ular information service. 2. Diversity: extends to all user types who can request services for any type of scholarly communication unit. In order to realize impact measures derived from usage data that could achieve the same level of acceptance as the ISI IF, explorations along both the above dimensions need to take place. The first dimension, i.e. span, entails the aggregation of usage data across a wide range of services to create a more global, representative sample of the scholarly community, i.e. increase its span. In fact, Bollen and Van de Sompel (2006b) propose an architecture for the large-scale aggregation of usage data which could be employed to achieve such global samples. This architecture however only addresses the technical issues involved in aggregating such samples it does not address the issue of what constitutes a representative global sample, nor which services usage should be aggregated for. The second dimension, i.e. diversity, entails efforts to better understand and control how community characteristics, i.e. sample diversity, affect usage-based impact assessments, regardless of whether the sampled community is representative of the global scholarly community. Whereas (Bollen & Van de Sompel, 2006b) is focused on aspects of the first dimension, i.e. sample span, this article addresses the second dimension, i.e. sample di- versity: studying the effects of sample characteristics on usage-based assessments of impact. Usage of scholarly resources for all 23 California State University (CSU) campuses, comprising about 405,000 students and 44,000 faculty and staff, was recorded throughout the entire Oc- tober 2003 to August 2005 period by the CSU linking servers (Van de Sompel & Beit-Arie, 2001), thereby gen- erating an extensive, high-granularity usage data set cov- ering one of the world���s largest and most diverse schol- arly communities. A simple Usage Impact Factor (UIF) was defined to mimic the definition of the ISI IF and was then used to determine journal rankings on the basis of the recorded CSU usage data. Correlations between the resulting CSU UIF and ISI IF rankings are determined for a set of scholarly disciplines, demarcated by ISI journal classification codes. These correlations are then matched to the demographic features of the CSU community to yield insights into how they affect usage-based assess- ment of impact. 2 Background 2.1 Citation Impact Factor The IF of a particular journal in a particular year as defined by Garfield (1979) is determined by counting the number of citations that occur in a given year to articles published in the journal during the two previous years and dividing that number by the total number of published items in that two year period. As such, the IF corresponds to the probability that the articles published 2

Readership Statistics

50 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
24% Ph.D. Student
 
12% Librarian
 
12% Other Professional
by Country
 
26% United States
 
12% United Kingdom
 
8% Spain

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in