The rate of growth in scientific ...
The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index Peder Olesen Larsen ��� Markus von Ins Received: 3 September 2009 / Published online: 10 March 2010 �� The Author(s) 2010. This article is published with open access at Springerlink.com Abstract The growth rate of scientific publication has been studied from 1907 to 2007 using available data from a number of literature databases, including Science Citation Index (SCI) and Social Sciences Citation Index (SSCI). Traditional scientific publishing, that is publication in peer-reviewed journals, is still increasing although there are big differences between fields. There are no indications that the growth rate has decreased in the last 50 years. At the same time publication using new channels, for example conference proceedings, open archives and home pages, is growing fast. The growth rate for SCI up to 2007 is smaller than for comparable databases. This means that SCI was covering a decreasing part of the traditional scientific literature. There are also clear indications that the coverage by SCI is especially low in some of the scientific areas with the highest growth rate, including computer science and engineering sciences. The role of conference proceedings, open access archives and publications published on the net is increasing, especially in scientific fields with high growth rates, but this has only partially been reflected in the databases. The new publication channels challenge the use of the big databases in measurements of scientific productivity or output and of the growth rate of science. Because of the declining coverage and this challenge it is problematic that SCI has been used and is used as the dominant source for science indicators based on publication and citation numbers. The limited data available for social sciences show that the growth rate in SSCI was remarkably low and indicate that the coverage by SSCI was declining over time. National Science Indicators from Thomson Reuters is based solely on SCI, SSCI A preliminary version was presented at the 12th International Conference on Scientometrics and Informetrics 2009 (Larsen and von Ins 2009). The author sequence is alphabetic and does not reflect relative contributions to the work. P. O. Larsen (&) Marievej 10A, 2, 2900 Hellerup, Denmark e-mail: email@example.com M. von Ins Institute for Research Information and Quality Assurance iFQ, Godesberger Allee 90, 53175 Bonn, Germany 123 Scientometrics (2010) 84:575���603 DOI 10.1007/s11192-010-0202-z
and Arts and Humanities Citation Index (AHCI). Therefore the declining coverage of the citation databases problematizes the use of this source. Keywords Growth rate for science Growth rate for scientific publication Databases for scientific publications Coverage of databases Coverage of science citation index Coverage of conference proceedings Number of scientific journals Little Science, Big Science Exponential growth Doubling time Cumulative values Introduction In 1961 Derek J. de Solla Price published the first quantitative data about the growth of science, covering the period from about 1650 to 1950. The first data used were the numbers of scientific journals. The data indicated a growth rate of about 5.6% per year and a doubling time of 13 years. The number of journals recorded for 1950 was about 60,000 and the forecast for year 2000 was about 1,000,000 (Price 1961). Price used the numbers of all scientific journals which had been in existence in the period covered, not only the journals still being published. However, this is not a major source of error. In 1963 Price continued the work using the number of records in abstract compendia for the period from 1907 to 1960. Figure 1 is a copy of the classical figure from Little Science, Big Science, with the data for Chemical Abstracts, Biological Abstracts, Physics Abstracts and the Mathematical Review. From the data Price deduced a doubling time of 15 years (corresponding to an annual growth rate of 4.7%). Price underlined the obvious fact that this growth rate sooner or later would decline although until then there were no indications of this. Price conjectured ������that at some time, undetermined as yet but probably during the 1940s or 1950s, we passed through the midperiod in general growth of science���s body politic������ and that although ������It is far too approximate to indicate when and in what circumstances saturation will begin ��� We now maintain that it may already have arrived������ (Price 1963, p. 31). Price also dis- cussed the increasing role of the newcomers in science, first of all The Soviet Union and China. He suggested that the doubling time in The Soviet Union for science might be as low as 7 years and that ������one may expect it [China] to reach parity within the next decade or two������ and that ������the Chinese scientific population is doubling about every three years������ (Price 1963, p. 101). Subsequently Price stated: ������all crude measures, however arrived at, show to a first approximation that science increases exponentially, at a compound interest of about 7% per annum, thus doubling in size every 10���15 years, growing by a factor of 10 every half century, and by something like a factor of a million in the 300 years which separate us from the seventeenth-century invention of the scientific paper when the process began������ (Price 1965). However, a growth rate of 7% per year corresponds to a doubling time of 10 years, growth by a factor of 32 in 50 years and of one billion in 300 years, obviously too high. Price���s quantitative measurements were not completely correct but his investigations were pioneering. As a result of his work Research and Development (R&D) statistics and science indicators have become necessary and important tools in the science of science, research policy and research administration. Publication numbers have been used as measures of the output of research, especially academic research and university research. The basis for the measurement of publication numbers are the big databases for scientific publications. Some of the databases also give the basis for measurements of citations, used as indicators of the quality of publications. 576 P. O. Larsen, M. von Ins 123
In the present study we investigate the growth rate of science from 1907 to 2007. The study is based on information from databases for scientific publications and on growth data recorded in the literature. Using these data we have obtained time series from the begin- ning of the 20th century to 2007 with the best coverage from 1970 to 2005. The data give information about changes in the growth rate of science and permit a discussion about the internal and external causes of the observed changes. The data have also been used to establish the coverage provided over time by the different databases. The dominant databases used in R&D statistics are Science Citation Index/Science Citation Index Expanded (SCI/SCIE) (SCIE is the online version of SCI), Social Science Citation Index (SSC) and Arts and Humanities Citation Index (AHCI). Together with other databases these databases are included in the Web of Science (WoS) Fig. 1 Cumulative number of abstracts in various scientific fields, from the beginning of the abstract service to given data . From Little Science, Big Science, by Derek J. de Solla Price. Columbia Paperback Edition 1965. Copyright �� 1963 Columbia University Press. Reprinted with permission of the publisher The rate of growth in scientific publication 577 123
provided by Thomson Reuters, USA (Thomson Reuters 2008a). Of special interest is Conference Proceedings Citation Index (CPCI) (Thomson Reuters 2008b), partially overlapping with SCI/SCIE (Bar-Ilan 2009). It is necessary to specify the databases included in a search on WoS. In our work special attention has been paid to the coverage of SCI and SSCI. One of the products from Thomson Reuters is National Science Indicators. This product is based solely on SCI/SCIE, SSCI and AHCI (Regina Fitzpatrick, Thomson Reuters, personal communication). Therefore, the coverage of this source is determined by the coverage of the citation databases. The main focus of our work is on Natural and Technical sciences, not only because of the importance of these fields but also because publication patterns here are very different from those found in Social Science and Arts and Humanities. We have not included Arts and Humanities, especially but certainly not only because of the importance of use of other languages than English (Archambault et al. 2005). An additional reason is the lack of suitable databases to compare with A&HCI. Comparable problems are present for Social Sciences (Archambault et al. 2005). However, results obtained for Social Science using SSCI have validity and are therefore reported. Based on the data from the databases included in our studies we address the following problems: 1. Is the growth rate of scientific publication declining? 2. Is the coverage by SCI and SSCI declining? 3. Is the role of conference proceedings increasing and is this reflected in the databases? We are aware that many and important changes in publication methods are happening in the present years. These include open access archives, publications on the net, the increasing role of conference proceedings in many fields, the recent expansion of SCIE and SSCI (Testa 2008b) Conference Proceedings Citation Index from Thomson Reuters (Thomson Reuters 2009b) and the rapid expansion of Scopus and Google Scholar. Therefore, extrapolation from our results up to 2007 can not be made. However, vast amounts of bibliometric studies and scientometric studies is depending on publication numbers up to 2007 and will be so for a long time ahead. We are also aware that counting of publications is treating all publications alike without regard to their widely different values. This is the major problem in scientometrics: Can all publications be treated alike and can they be added to provide meaningful numbers? Mathematically all units with common denominators can be added but this does not answer the problem. Statistically it can be hoped (or assumed) that the differences will be neu- tralized when large data sets are used for addition. However, this can not be proven (Garbage in, garbage out) and does not provide a solution. Citation studies may say something about the value of individual publications but there are large differences between fields and the number of references per publication is steadily increasing in all fields. The ������value������ of a publication is also changing with time (Ziman 1968). Anyway, publications are added all over the world for scientometric purposes. The lack of answer to the major problem posed above is not a deficiency of our publication. Pub- lication numbers are of interest and are used generally in scientometrics and research statistics. It is impossible to combine a system based on giving values to individual publications with a study of the growth rate of science. 578 P. O. Larsen, M. von Ins 123
Methodology Chemical Abstracts Annual data for the total number of records in Chemical Abstracts (Chemical Abstracts Service, American Chemical Society) are available in CAS Statistical Summary 1907��� 2007. The data include separate values for papers, patents and books. Conference pro- ceedings are also covered in Chemical Abstracts but are included under the heading papers and there are no separate figures for the number of proceedings. The share of papers slowly increased until about 1950. Since then the share has been relatively constant around 80%. Compendex Annual data for the Total Number of Records in Compendex (Engineering Village, Elsevier Engineering Information) from 1870 to 2007 were obtained on the net using the year in question as the search term and restricting the search to the same year. Compendex covers not only scientific publications in engineering but also other engineering publica- tions. Therefore, comparisons with the other databases must be made with reservations. The values for 2004���2007 differed significantly from values received directly from Compendex. However, the growth rates for the two series were nearly identical. The values from 1988 to 2001 also differed from those reported for Compendex by National Science Foundation (Hill et al. 2007 Appendix, Table 1) but again the growth rates for the two series were similar. CSA, Cambridge Scientific Abstracts Annual data from CSA, Cambridge Scientific Abstracts, have been collected for Natural Science from 1977 to 2007 and for Technology from 1960 to 2007. The data includes values for All Types, Journals, Peer-Reviewed Journals and Conference Proceedings. However there are data breaks in most series, partly due to changes in the databases used as basis for the compilations. Inspec Values for Inspec and the sections of Inspec, Computers/Control Engineering, Electrical/ Electronical Engineering, Manufacturing and Production Engineering and Physics, pub- lished by The Institution of Engineering and Technology, Stevenage, Herts., U.K., have been found on the net. The database was searched using the year in question as the search term and restricting the search to the same year. Values were found for the Total Number of Records as well as for Journal Articles, Conference Articles and Conference Proceedings. Inspec Physics is a direct continuation of Physics Abstracts but the change from the value from Physics Abstracts for 1969 to the value from Inspec Physics in 1970 indicates a break in the series. Data were also obtained directly from Inspec but only giving the Total Number of Records from all sources. The data were not identical with those found on the net. However, the numbers of total records found on the net for the period 1969���2005 were only 1.6% higher than those given by Inspec. For the sections Computers/Control Engi- neering, Electrical/Electronical Engineering, Manufacturing and Production Engineering The rate of growth in scientific publication 579 123