A lot of scientific software, and this is especially true in bioinformatics, is “open source” in some way or another. That it seems the community doesn’t quite understand the value of open source is another matter and another post, so for the sake of this post, let’s assume it is. Perhaps more importantly, a good chunk of the software used is developed by academia. In my mind, this increases the bar on code quality and software stewardship. Most importantly, developers of academic software need to think about their applications differently and funding agencies need to think about how they fund software development differently.
Under the assumption that the majority of code used to do scientific discovery originates in academia, the question to ask is, what responsibiity does a scientific software developer have? Should they think of their potential users as customers from the beginning or is that something that becomes important later in the process. While in some open source academic projects, especially ones that gave been developed ground up, a customer-centric approach seems to exist, in general it appears that much code is developed to get published or to get something out there to solve a particular problem. Given the realities of scientific problems, I don’t believe you can assume on day 1 that your applications are going to find use in the broader community, but it is a safe assumption to make that for many applications that is the end goal. The reality is that you might be the only one that ever uses the code, especially if it is being developed to solve a specific problem, then it might be your team, then other labs and collaborators and ultimately a wider community. This means that not only should scientific software developers take a step back and think about the potential scope of their project as it evolves, it means that funding agencies need to rethink how they fund software.
First, publishing software as papers needs to go away. Algorithms should get published, novel architectures should get published. Software should only be published as a note to aid discovery. Funding agencies also need to recognize that funding new software projects for 3-5 years and expecting the developer to know the outcome at the beginning is short sighted. Software evolves, features and scope evolves along the way. Three years is an eternity for a software project, five .. I don’t have a word for how long that is. Funders also need to recognize that there is a greater need for funding as a piece of software grows and is recognized by the community. In a way that could be looked at as a return on investment. The broader the reach and impact to science the more successful the initial funding, but you need the concept of angel funding as well to get a project off the ground, see how it will evolve. We also need to raise the bar. Should new proposals be funded or should developers be encouraged to contribute to existing projects? Since there doesn’t seem to be much emphasis on the latter, you see new applications being developed as opposed to getting funding to contribute to existing applications.
The problem with scientific software is more cultural than anything else. As Susan Baxter tweeted
bioinformatician still = PI mentality, not team-based or community
Software development is different, it works at different time scales and it requires a different approach. Note that I am not talking about research code, but code that’s meant to be used over a period of time, at the least by multiple generations in your research group. The change has to start within the community, but they aren’t going anywhere without funding agencies changing the incentives.