Nineteen Ways of Looking at Statistical Software
Journal Of Statistical Software (2011)
- ISSN: 15487660
We identify principles and practices for writing and publishing statistical software with maximum bene t to the scholarly community.
Available from Micah Altman's profile on Mendeley.
Nineteen Ways of Looking at Stati...
JSS Journal of Statistical Software June 2011, Volume 42, Issue 2. http://www.jstatsoft.org/ Nineteen Ways of Looking at Statistical Software Micah Altman Harvard University Simon Jackman Stanford University Abstract We identify principles and practices for writing and publishing statistical software with maximum benefit to the scholarly community. Keywords: statistical computation, programming methods. 1. Introduction People who read journals like this are continually creating snippets of code. We can hardly avoid it. Anyone who performs statistical analysis on a regular basis naturally encounters repetitive tasks that beg for automation, data that needs to be prepared in new ways for analysis, and models that cannot be estimated well with canned statistical packages. So we write code ��� to automate tasks, manipulate data, extend existing methods of analyses, and to create new ones. Most of this code is never seen by anyone else. Much of it evaporates soon after its task is completed. Without doubt, a good portion of code deserves this fate. However the rest is useful, and in general continues to persist for a time, while gradually ossifying or mutating until eventually, either lifeless or monstrous, it is buried in an unmarked grave. This is a lost opportunity ��� with a small additional effort this code could be shared, and lead long healthy lives in service to the community. Exemplifying the potential value of statistical software, many of the packages included in this special volume, and the work they enabled, have already received substantial scholarly recog- nition: wnominate (Poole et al. 2011) is a modern port and update of Poole and Rosenthal���s NOMINATE package, which won the Statistical Software Award from the Society for Political Methodology in 2009, has been used for hundreds of published articles. The authors of Synth (Abadie et al. 2011) received the Gosnell Prize from the Society for Political Methodlogy for the development and application of the methods that are implemented in that package. The authors of MatchIt (Ho et al. 2011) won the Warren Miller Prize for their article describing
2 Nineteen Ways of Looking at Statistical Software the method implemented in that package. BARD (Altman and McDonald 2011) received the Best Research Software from the Information Technology and Politics section of the American Political Science Association in 2009, and is now being used to support public redistricting contests to promote transparency in government. And the other packages in this volume have supported many other research articles and other software development efforts. We write this article to identify some principles for writing statistical code that benefits the community. This is based on our own experience writing statistical code professionally, our study of practices in the field of software engineering, and having directly observed many others creating software. These principles represent good practice, not necessarily common practice. 2. Six motivations for writing statistical software 1. Understand your problem. When you solve a problem by writing software you test both your knowledge of both the problem and the adequacy of your proposed method of solving it. As Knuth (1974) put it: ���It has been often said that a person does not really understand something until he teaches it to someone else. Actually a person does not really understand something until he can teach it to a computer, i.e., express it as an algorithm.��� 2. Solve a problem. Start from a real problem that you need to solve. It doesn���t have to be a big problem, but it should be one for which a good solution does not already exist. Before you write code do some research: Check books, documentation, and software archives that could contain solutions to your problem. It is tempting to start writing immediately, or to dismiss existing solutions as inadequate, and many developers follow this impulse. (This is one of the reasons why the majority of software projects started with in SourceForge and other software archives are abandoned before reaching a stable a release.) But take a long look. Software development tends to involve hidden complexities that are revealed only after a substantial amount of design work has been done, and code has been written. And it is common to find if you���ve studied the code and documentation closely, and communicated with the code���s maintainers, that existing open-source solutions can be adapted or extended. So, avoid building a solution from scratch if an adequate solution exists, which can be used or improved: When existing software fails to do what you want, or is too inaccurate, slow, tedious to use, or awkward to integrate into your larger research workflow, it is then time to build. 3. Do good. Code that solves your problem could often be useful to others. This is especially likely whenever you implement a statistical analysis that is not available in canned statistical packages. Think about the problem that your code solves ��� is it unique, or are there other problems like it that real people are actively trying to solve? Can your code solve these problems too? Can it be extended easily? Change your code if you can easily solve more real problems for real people by doing so. 4. Get credit. For many of us, credit is our most valued currency. Making your code useful to and available to others is an excellent way of making it possible for people to try a method that you have developed, and to aid in the replication and extension of work
32 Readers on Mendeley
16% Social Sciences
by Academic Status
34% Ph.D. Student
13% Researcher (at a non-Academic Institution)
9% Student (Master)
38% United States