(Disclaimer: my background is in materials physics, and it may be different in other fields. But I doubt it.)
Unfortunately there is very little direct incentive for research scientists to write or publish clean, readable code:
- There are no direct rewards, in the tenure process or otherwise, for publishing code and having it used by other scientists. Occasionally code which is widely used will add a little to the prestige of an already-eminent scientist, but even then it rarely matters much.
- Time spent on anything other than direct research or publication is seen as wasted time, and actively selected against. Especially for young scientists trying to make tenure, also the group most likely to write good code. Many departments actually discourage time spent on teaching, and they're paid to do that. Why would they maintain a codebase?
- Most scientific code is written in response to specific problems, usually a body of data or a particular system to be simulated. Because of this, code is often written to the specific problem with little regard for generality, and only rarely re-used. (This leads to lots of wheel re-invention, but it's still done this way.) If you aren't going to re-use your code, why would others?
- If by some miracle a researcher produces code which is high-quality and general enough to be used by others, the competitive atmosphere may cause them to want to keep it to themselves. Not as bad a problem in some fields, but I hear biology can be especially bad here.
- Most importantly, the software is not the goal. The goal is a better understanding of some natural phenomenon, and a publication. (Or in reverse order...) Why spend more time than absolutely necessary on a single part of the process, especially one that's not in your expertise? And why spend 3x-5x the cost of a research student or postdoc to hire a software developer at competitive rates?
I went to grad school in materials science at an R1 institution which was always ranked at 2 or 3 in my field. I wrote a lot of code, mostly image-processing routines for analyzing microscope images. Despite it being essential to understanding my data, the software component of my work was always regarded by my advisor and peers as the least important, most annoying part of the process. Time spent on writing code was seen as wasted, or at best a necessary evil. And it would never be published, so why spend even more time to "make it pretty"?
I'm honestly not sure what could be done to improve this. Journals could require that code be submitted with the paper, but I really doubt they'd be motivated to directly enforce any standards, and I have no faith in scientists being embarrassed by bad code. Anything not in the paper itself is usually of secondary importance. (Seriously, if you can, check out how bad the "Supplementary Information" on some papers is.) But even making bad code available could help... I guess. And institutions could try to more directly reward time put into publishing good code, but without the journals on board it may be seen as just another form of "outreach"--i.e., time you should have been in lab.
I did publish some code, and exactly two people have contacted me about it. That does make me happy. But many, many more people have contacted me to ask about how I solved some problem in lab, or what I'm working on now that they could connect with. (And are always disappointed when I tell them I left the field, and now work in high-performance computing.) Based on the feedback of my peers... well, on what do you think I should've spent my time?
I'm working in biology now, and a good example of researchers who produce quality, documented code that other people find useful is the Knight group at UC Boulder. They write python with good docs and support, publish the algorithms they come up with in bioinformatics journals, and people cite them all the time.
Might be worth thinking about why there are incentives there and not elsewhere.
That is an excellent example and a good point. But, for what it's worth, the Knight lab doesn't really do any biology. Most of their biology is done by collaboration with other labs, and the people in the lab are almost entirely programmers or database people. There's nothing wrong with that, but it's more an example of programmers getting into biology than the other way around.
Another place where good software work is done is Broad Institute. The reasons are as follows: (a) Broad can hire the best people in bioinformatics, (b) they are a relatively large organization where focusing on the process pays off. Software ultimately is process (i.e. how you do stuff) and small labs often cannot afford to focus on the process and instead try to reach the goal (i.e. publishable results) more or less directly, regardless of the inefficiencies they may encounter.
In the past, the model of having many small labs in universities was a great idea. Today things are looking a bit different in some fields because larger labs can afford to do more automation (by hiring programmers instead graduate students).
My PhD was in computer science and my experience was quite similar.
I wrote probably around 3000 lines of code on 4 separate projects (mostly MATLAB, C and Java). This code was never shared with anyone, my advisors were not interested in the code, all they cared about were the results. To be honest it wasn't very good code, I would have a hard time understanding it now (although I could probably figure it out eventually).
And after I graduated I took the code with me and I am the only person who ever verified the working of the code.
This bothers me on some level, since no one can really verify and inspect the results of my publications (unless they tracked me down to ask me for the code some of which has been lost) - but it is pretty much the norm in my field.
There was an interesting discussion about this on the Theoretical Computer Science Stackoverflow a while back:
Bottomline: Yes, we should probably do it (especially in areas where the research is simulation and the code encapsulates all the results) but we probably won't unless we're pushed.
I have a PhD in Comp. Sci. too, and continue working in academia.
Regarding your code, you could have just uploaded it to SourceForge or any other OpenSource repository. I know a guy (Steve Phelps) who did exactly that ( http://sourceforge.net/projects/jasa/ ) with his PhD code.
On a related note, the institute where I am working now has this "great" simulation program (housemade in C++) for which a lot of publicatios have been written. However, the code is closed source and thus cannot be third-party verified.
This is wrong, and actually, a colleague of mine who just started doing her PhD found an error in the simulation program, bad enough that it makes me question the previous research.
In my opinion it must be a requirement that all software related to a publication must be made open-source before (or at the same time) the paper is published.
In the traditional research method, computer programs are part of the methods of the reserach. It is amazing that nowadays researchers can publish research without clearly showing the process they used to arrive to those.
Don't forget that a particular code might have 2 or 3 paper's worth in it, so releasing the code after 1 paper could mean getting "scooped" on another paper.
I'm left a little cynical after a Master's in computational science, and I still can't believe that open code is not part of the repeatability doctrine. I suppose my goals are not aligned with most grad students since I have no interest in an academic career (at least not after many years in industry) but I got much more satisfaction from feedback on my blog posts than publication.
Hell, each blog post is its own little publication, and it may not be peer reviewed before its published, but the amount of links to them and google searches prove that I have more than a few peers who appreciate my contributions.
It might also spawn a new collaboration. There are dishonest people in science, but anyone scooping your work has to weigh the risks of getting called out for it, which is more likely if your software is good and widely used.
I don't believe in the private model, so I release code when it's ready, regardless of where it fits in the publication cycle. It's pretty neat from a reproducibility perspective to submit a paper based on code that is runnable as a tutorial example shipped with a library that the reviewers stand a good chance of already having installed.
From my experience, even in a field so close to computer science as robotics your analyzes is correct.
In my opinion, publishing all code that was used for the paper should be mandatory. Everything else is an obvious violation of the confirmation requirement in the scientific method.
But just like with Open Access I have little hope, that this will be adapted soon on a wide scale. If you are a student, I believe, all you can do is get permission to publish your code and do so. Maybe this will hurt tenure but it increases your karma!
Another reason - a bug (and there's always bugs) would probably invalidate the paper, possibly causing a recall. Recalled papers are not seen in a positive light by the science establishment.
Careers could be destroyed, if people where held to account.
People in software see this as ludicrous. Of course there's bugs, just update the conclusions, and move on! But that's not how a lot of scientists think.
Just publishing the code is not enough, in any case. In order for the research to be verifiable, everything, from raw data to the final paper (and notes on how you went about the process) should be properly documented and available. Something along the lines of this: http://rr.epfl.ch/
Of course, the problem with this is that it's a large amount of work and in most cases probably doesn't have a good ROI.
I have recently started to try this approach of better documenting everything, mostly because I have found it hard to go back to work I did 6 or 7 years ago and understand it (e.g. a bunch of one-off, poorly documented data processing scripts that could, if properly done, save me some time today). I haven't yet published anything like this yet, but it looks promising.
I'm still kind of amazed that when scientific results are based on original software, the sourcecode isn't required by the journals for peer review. How are people supposed to check the results?
Unfortunately there is very little direct incentive for research scientists to write or publish clean, readable code:
- There are no direct rewards, in the tenure process or otherwise, for publishing code and having it used by other scientists. Occasionally code which is widely used will add a little to the prestige of an already-eminent scientist, but even then it rarely matters much.
- Time spent on anything other than direct research or publication is seen as wasted time, and actively selected against. Especially for young scientists trying to make tenure, also the group most likely to write good code. Many departments actually discourage time spent on teaching, and they're paid to do that. Why would they maintain a codebase?
- Most scientific code is written in response to specific problems, usually a body of data or a particular system to be simulated. Because of this, code is often written to the specific problem with little regard for generality, and only rarely re-used. (This leads to lots of wheel re-invention, but it's still done this way.) If you aren't going to re-use your code, why would others?
- If by some miracle a researcher produces code which is high-quality and general enough to be used by others, the competitive atmosphere may cause them to want to keep it to themselves. Not as bad a problem in some fields, but I hear biology can be especially bad here.
- Most importantly, the software is not the goal. The goal is a better understanding of some natural phenomenon, and a publication. (Or in reverse order...) Why spend more time than absolutely necessary on a single part of the process, especially one that's not in your expertise? And why spend 3x-5x the cost of a research student or postdoc to hire a software developer at competitive rates?
I went to grad school in materials science at an R1 institution which was always ranked at 2 or 3 in my field. I wrote a lot of code, mostly image-processing routines for analyzing microscope images. Despite it being essential to understanding my data, the software component of my work was always regarded by my advisor and peers as the least important, most annoying part of the process. Time spent on writing code was seen as wasted, or at best a necessary evil. And it would never be published, so why spend even more time to "make it pretty"?
I'm honestly not sure what could be done to improve this. Journals could require that code be submitted with the paper, but I really doubt they'd be motivated to directly enforce any standards, and I have no faith in scientists being embarrassed by bad code. Anything not in the paper itself is usually of secondary importance. (Seriously, if you can, check out how bad the "Supplementary Information" on some papers is.) But even making bad code available could help... I guess. And institutions could try to more directly reward time put into publishing good code, but without the journals on board it may be seen as just another form of "outreach"--i.e., time you should have been in lab.
I did publish some code, and exactly two people have contacted me about it. That does make me happy. But many, many more people have contacted me to ask about how I solved some problem in lab, or what I'm working on now that they could connect with. (And are always disappointed when I tell them I left the field, and now work in high-performance computing.) Based on the feedback of my peers... well, on what do you think I should've spent my time?