*(Disclaimer: my background is in materials physics, and it may be different in ...

JohnnyBrown · on July 7, 2011

I'm working in biology now, and a good example of researchers who produce quality, documented code that other people find useful is the Knight group at UC Boulder. They write python with good docs and support, publish the algorithms they come up with in bioinformatics journals, and people cite them all the time.

Might be worth thinking about why there are incentives there and not elsewhere.

roadnottaken · on July 7, 2011

That is an excellent example and a good point. But, for what it's worth, the Knight lab doesn't really do any biology. Most of their biology is done by collaboration with other labs, and the people in the lab are almost entirely programmers or database people. There's nothing wrong with that, but it's more an example of programmers getting into biology than the other way around.

bluekeybox · on July 7, 2011

Another place where good software work is done is Broad Institute. The reasons are as follows: (a) Broad can hire the best people in bioinformatics, (b) they are a relatively large organization where focusing on the process pays off. Software ultimately is process (i.e. how you do stuff) and small labs often cannot afford to focus on the process and instead try to reach the goal (i.e. publishable results) more or less directly, regardless of the inefficiencies they may encounter.

In the past, the model of having many small labs in universities was a great idea. Today things are looking a bit different in some fields because larger labs can afford to do more automation (by hiring programmers instead graduate students).

rweba · on July 7, 2011

My PhD was in computer science and my experience was quite similar.

I wrote probably around 3000 lines of code on 4 separate projects (mostly MATLAB, C and Java). This code was never shared with anyone, my advisors were not interested in the code, all they cared about were the results. To be honest it wasn't very good code, I would have a hard time understanding it now (although I could probably figure it out eventually).

And after I graduated I took the code with me and I am the only person who ever verified the working of the code.

This bothers me on some level, since no one can really verify and inspect the results of my publications (unless they tracked me down to ask me for the code some of which has been lost) - but it is pretty much the norm in my field.

There was an interesting discussion about this on the Theoretical Computer Science Stackoverflow a while back:

http://cstheory.stackexchange.com/questions/5361/code-in-aca...

Bottomline: Yes, we should probably do it (especially in areas where the research is simulation and the code encapsulates all the results) but we probably won't unless we're pushed.

xtracto · on July 7, 2011

I have a PhD in Comp. Sci. too, and continue working in academia.

Regarding your code, you could have just uploaded it to SourceForge or any other OpenSource repository. I know a guy (Steve Phelps) who did exactly that ( http://sourceforge.net/projects/jasa/ ) with his PhD code.

On a related note, the institute where I am working now has this "great" simulation program (housemade in C++) for which a lot of publicatios have been written. However, the code is closed source and thus cannot be third-party verified.

This is wrong, and actually, a colleague of mine who just started doing her PhD found an error in the simulation program, bad enough that it makes me question the previous research.

In my opinion it must be a requirement that all software related to a publication must be made open-source before (or at the same time) the paper is published.

In the traditional research method, computer programs are part of the methods of the reserach. It is amazing that nowadays researchers can publish research without clearly showing the process they used to arrive to those.

enjalot · on July 7, 2011

Don't forget that a particular code might have 2 or 3 paper's worth in it, so releasing the code after 1 paper could mean getting "scooped" on another paper.

I'm left a little cynical after a Master's in computational science, and I still can't believe that open code is not part of the repeatability doctrine. I suppose my goals are not aligned with most grad students since I have no interest in an academic career (at least not after many years in industry) but I got much more satisfaction from feedback on my blog posts than publication.

Hell, each blog post is its own little publication, and it may not be peer reviewed before its published, but the amount of links to them and google searches prove that I have more than a few peers who appreciate my contributions.

ahi · on July 7, 2011

As far as I can tell, the two main criteria that makes a publication an academic publication:

1) Peer review is done anonymously and errors are discussed privately.

2) More people "wrote" it than read it.

jedbrown · on July 7, 2011

It might also spawn a new collaboration. There are dishonest people in science, but anyone scooping your work has to weigh the risks of getting called out for it, which is more likely if your software is good and widely used.

I don't believe in the private model, so I release code when it's ready, regardless of where it fits in the publication cycle. It's pretty neat from a reproducibility perspective to submit a paper based on code that is runnable as a tutorial example shipped with a library that the reviewers stand a good chance of already having installed.

FrojoS · on July 7, 2011

From my experience, even in a field so close to computer science as robotics your analyzes is correct.

In my opinion, publishing all code that was used for the paper should be mandatory. Everything else is an obvious violation of the confirmation requirement in the scientific method.

But just like with Open Access I have little hope, that this will be adapted soon on a wide scale. If you are a student, I believe, all you can do is get permission to publish your code and do so. Maybe this will hurt tenure but it increases your karma!

shriphani · on July 7, 2011

Matt Might has something on this topic. The CRAPL is what he calls it : http://matt.might.net/articles/crapl/

wisty · on July 7, 2011

Another reason - a bug (and there's always bugs) would probably invalidate the paper, possibly causing a recall. Recalled papers are not seen in a positive light by the science establishment.

Careers could be destroyed, if people where held to account.

People in software see this as ludicrous. Of course there's bugs, just update the conclusions, and move on! But that's not how a lot of scientists think.

tincholio · on July 7, 2011

Just publishing the code is not enough, in any case. In order for the research to be verifiable, everything, from raw data to the final paper (and notes on how you went about the process) should be properly documented and available. Something along the lines of this: http://rr.epfl.ch/

Of course, the problem with this is that it's a large amount of work and in most cases probably doesn't have a good ROI.

I have recently started to try this approach of better documenting everything, mostly because I have found it hard to go back to work I did 6 or 7 years ago and understand it (e.g. a bunch of one-off, poorly documented data processing scripts that could, if properly done, save me some time today). I haven't yet published anything like this yet, but it looks promising.

DennisP · on July 7, 2011

I'm still kind of amazed that when scientific results are based on original software, the sourcecode isn't required by the journals for peer review. How are people supposed to check the results?

CS papers without code are even worse.