There seems to be a bit of a debate brewing in the bioinformatics community around code. There have been a number of posts recently, including my own. A recent entrants is a wonderful post by Titus Brown. The concern that Titus raises, and I see in many comments and discussion is that a lot of computational science, at least in the life sciences, is very anecdoctal and suffers from a lack of computational rigor, and there is an opaqueness that makes science difficult to reproduce (or replicate as Titus prefers). I’ll let you read Titus' post for his reasoning and thought process. My concern is where I think computational science is right now. Maybe I am being too negative, but here’s what I think
- We are accepting mediocrity and a non-open culture. I crave a world of science full of gists and code thrown up on github. Who knows how it might end up being useful, or end up fostering interesting collaboration. But for whatever reason we aren’t ready to do that.
- Actually I think I do know why. The bioinformatics community is all too aware that the quality of our code is very substandard. Even today, we don’t consider programming skills and computational literacy an essential requirement for biological research. So we have way too many people writing poor code, even if it is code never meant to see the light of day. My biggest concern is that this is driving shoddy science that we can’t trust. There is a difference in the skillset required for an algorithm developer, and someone using computational techniques to analyze data. The code bar for the latter should be a lot lot higher.
- We have a cultural problem cause good hacking skills are not exactly the route to scientific success.
A recent example was a case where I was encouraging someone to cite an application by pointing to it’s source, but others insisted on a paper (which was not even about that particular piece of code). That’s just wrong. We have to do better. I am getting a little tired of excuses about time and a lack of funding. Yes, funding is important, and funding agencies need to realize that we need to encourage the right skill sets. But we have to be responsible for the quality of science and the quality of our work. Perhaps all that work hidden in our machines is good, but right now I don’t believe it.
Note that none of this is about software engineering. There are software products, e.g. repositories, deployment infrastructure, visualization systems, that are different and have an even higher bar. I am exclusively talking about the code we use to actually do exploratory research (good frameworks will make exploration a lot more effective, but that’s another post).
Update: Greg Wilson adds to this discussion as well.