Empirical Evidence for the Value of Version Control?

jasonkester · on Dec 17, 2012

Here's a real world example of Version Control saving thousands of dollars of developer time. From three days ago:

Text is rendering blurry in the thing I'm building. It used to look nice and crisp, but now its as though it's all nudged half a pixel off the grid. It looks terrible. And I don't remember having touched the text rendering stuff or the vector code recently. wtf?

So I pull up the change log for the project and it gives me a giant list of everything that has ever been done on it back to version one where it was nothing but "test.html" moving a yellow box around the screen in time to an audio track. I grab a version from a few weeks back, 50 checkins ago, and click "update to this version". Reload the project in a new tab and sure enough: beautiful crisp text. Tab over to the current version: blurry blurriness.

So it's on. Jump forward to yesterday: still bad. 2 weeks ago: good. 1 week ago: bad, and so on. Until I'm looking at one checkin that definitely contains change that broke this. As luck would have it, it was a big one, so I get to try individually updating each file in turn. And since the culprit is nowhere near touching either text rendering or vector stuff, I get to go line by line on it until in disbelief I find the single "if" statement that's been flipped to a configuration that allows opacity animations to occasionally get left at 0.9997 instead of locking them at 1.0.

And we're fixed. In ten minutes.

Now I can honestly say that I don't think I would have found the cause of this bug without version control. All the places I would have thought to look were innocent, and the most innocuous thing in the world turned out to be the culprit. With nothing but tarballs and hand-renamed folders I would have been completely screwed. Not just more time to fix this, but several orders of magnitude more time. And most likely we would have simply resigned ourselves to ship with blurry text and kept an issue in the tracker about it for years.

Now, in case you missed it, this happened to me three days ago. It's not some isolated instance I heard about once. Version Control saves you like this on a daily basis.

If you see a software company that's currently in business and shipping software, I'd say you can use that as empirical evidence of the value of version control.

pilif · on Dec 17, 2012

Also, if you are using git, `git bisect` will do a great job at automating this kind of history search. It'll also do a nice binary search quickly narrowing down the culprit.

Additionally, you said "As luck would have it, it was a big one," - this is what I personally use `git add -p` and `git rebase -i` for before pushing to the public repo: I try to keep the commits as small and self-contained as possible such that finding the faulty commit gets much simpler, because there will be no "big one".

scott_w · on Dec 17, 2012

Keeping commits small is crucial. I've been bitten by the "big commit" enough times to force myself to keep commit diffs to a few lines across a few files.

I even go as far as to put "formatting" changes into one commit, and actual code into a separate commit. Formatting changes tend to be things like "unindent this large block of code", or "strip extraneous whitespace".

They can be larger than usual commits, but they allow me to separate functional changes from non-functional changes. And yes, I've seen a non-functional change break working code before.

steferson · on Dec 17, 2012

>And yes, I've seen a non-functional change break working code before.

Really? how? I'm really curious about this.

scott_w · on Dec 21, 2012

Honestly, I don't know how it happened.

The changeset was unindenting an HTML file with in-line JavaScript by 4 spaces.

The JavaScript stopped working, so I reverted the change and it worked again.

bjourne · on Dec 17, 2012

Do you employ unit testing? If so, that regression would possibly have been caught even earlier.

meaty · on Dec 17, 2012

You can't really unit test blurry text.

bjourne · on Dec 17, 2012

Sure you can. Just render the glyphs into a framebuffer and check the ratio of opaque pixels to semi-transparent ones. The more transparency the blurrier. But I agree with the poster below that it is usually to much trouble and only worth it if your software is deep down in the stack. For example, if you are writing a text rendering system it would make sense for such tests but not otherwise.

tcgv · on Dec 17, 2012

I agree, he would only have caught that if he were unit testing his UI, and usually doing that is more trouble than it's worth.

Spidler · on Dec 17, 2012

The Linux kernel is a public example of a LARGE software project being maintained for a long time (Several years) without any source control, then migrating to BitKeeper, and seeing a large increase in development rate as a result of this.

BitKeeper pressrelease on the subject: http://www.bitkeeper.com/press/2004-03-17.html

Also the discussion and article at LWN as the kernel moves _away_ from BitKeeper may be interesting: https://lwn.net/Articles/130746/

If you look at fex. ChangeLogs for the Linux kernel before and after, and also, after the git change, you can see the rate of development increasing quite a lot with the assistance of proper tooling.

cheald · on Dec 17, 2012

This seems like an awfully - and very academic - strange question. "Have there been controlled studies to see if this thing that millions of developers have personally experienced value from actually has value?"

No, I rather suspect not. What's more curious is the implied suggestion that VC has no value if it has not been empirically proven through formal scientific study.

I also suspect that you will find no papers on the effectiveness of building a house using a hammer and nails and making it up as you go versus using a nailgun and blueprints, but as man has known since he first discovered fire, better tools pretty much always mean faster and higher-quality results.

antidoh · on Dec 17, 2012

There might be something in here: http://dblp.kbs.uni-hannover.de/dblp/Search.action;jsessioni...

rgbrenner · on Dec 17, 2012

Let's say you're working by yourself on your pet project. It's entirely for your personal benefit, and no one else matters. Why would you want version control?

* when you make a change that introduced some bug, you can see exactly what was changed, and roll it back

* when you delete a feature because it's no longer useful, then some time passes, and you realize that feature is useful (or you can use that code for something else), you can just pull up your old code and re-add it.

* you can delete old stuff, knowing that it's not in fact gone.

* you can do more radical experiments by branching your code. While you could do this without version control by just copying your code to a new location.. with VC you could keep all of the file/code history.

I work alone writing software projects. No one has seen my code in 10 years (since I started). I use SVN. Every bit of code gets put in SVN.

If you aren't sure about version control, take my word for it: you want to use version control. It will make your life easier. Just do it, and in a short time, you'll see exactly what I mean.

bartwe · on Dec 17, 2012

Doing a review of the diff of your own work for the day before commit is also a good way to spot issues.

arethuza · on Dec 17, 2012

Not to mention peace of mind when you use an off-site version control repository...

I use an off site repository and a separate off-site backup.

gurkendoktor · on Dec 17, 2012

I like that you didn't list backing up data as a bullet point. I really think SCM is terrible for backing up data, but it often gets listed as an advantage. Forget to add a file or ignore the wrong one, and it's gone. (Though of course the internet is quick to blame it on the user)

brudgers · on Dec 17, 2012

"but I have no experimental evidence to base that decision."

"Empirical" and "experimental" are not synonyms. Experimental evidence constitutes a small range within the set of all things which might count as empirical evidence - e.g. anecdotes are empirical evidence but not experimental evidence.

It is rare for something as vague as version control to undergo formal investigation via experiment. As this thread shows, there is a broad range of often incompatible activities which might be called "version control."

Typically, investigations would be via trial (and error), not experiment. The experience gained from such trials is empirical evidence - evidence from experience is all that "empirical" means.

Although the strength of experimental evidence is typically based upon statistical correlation, the strength of empirical evidence is based on various other types of judgements.

It would not be unreasonable to give the reference to Moore substantial empirical weight, if one attributed significant authority to Moore (perhaps based upon previously finding that Moore's judgement about similar matters corresponded with one's own experience). On the other hand, such an appeal to authority would never be acceptable within an experiment.

In a sense, the idea of experimenting with version control, in a formal sense is a bit absurd - at the point where the benefits or disadvantages appear obvious, the experiment would be abandoned, e.g. at the point where productivity rises sharply or the shipping date is missed.

In a sense, the question seems to miscategorize version control as something other than a tool. Tools are particular. So is version control. There are some programming tasks where it is hammer to nails. Other tasks however, use screws.

Version control, as others in this thread have noted, is not a monolithic thing. It's a cluster concept. Implementations may be hammers, screwdrivers, rivet guns, or glue.

exDM69 · on Dec 17, 2012

The OP considers versioning files but does not address collaborative sharing, branching, merging, etc at all. Dropbox and the other alternatives suggested in the post do not simply offer anything for merging or dealing with conflicts (or do they? what about versioning filesystems?).

When working alone on a fairly straightforward, linear project, you might actually get away using Dropbox. Try to work in a team of any substantial size and there will be trouble.

mattmanser · on Dec 17, 2012

Having used Dropbox for a couple of random personal projects a few years back I can honestly tell you it's not a good idea. It's better than nothing but Github is free, there's absolutely no good reason not to be using source control.

It's simple things like not having more than a month's changes or having commits you can roll back as a group.

icebraining · on Dec 17, 2012

Dropbox's answer to conflicts is dumping a lot of copies of the files, appending "([user]'s Conflicted Copy)" to the filename.

https://www.dropbox.com/help/36/en

gutnor · on Dec 17, 2012

I'm sure they are trying to be useful, but I don't think it is the right choice of question to ask the community of a blog that pretend to bridge CS theory with practice.

For a practicing developer, that is like finding a professional carpentry blog with the question "Can you think of any practical situation where a hammer is useful", so I fear that will taint the opinion of those whose first contact with that blog was through this link.

At least they should have answered themselves - or since that's what they are suppose to do - talk about some theory that could render source control obsolete ?

Tichy · on Dec 17, 2012

I don't usual hit my thumb with a hammer, but I have no empirical evidence that I shouldn't. Maybe I should smash my thumb with a hammer occasionally?

Seriously, does this need research? Just try it, and you'll see some pain go away, not the least the mess of multiple copies of project directories and zip files.

Not that there isn't anything research worthy about version control, but it is entirely possible to recognize it as a good thing without it.

praptak · on Dec 17, 2012

Similar point has been raised in the following paper:

http://www.bmj.com/content/327/7429/1459 "Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials"

Results: "We were unable to identify any randomised controlled trials of parachute intervention."

codeulike · on Dec 17, 2012

We think that everyone might benefit if the most radical protagonists of evidence based medicine organised and participated in a double blind, randomised, placebo controlled, crossover trial of the parachute.

Thats quite a paper, thanks for the ref

Tichy · on Dec 17, 2012

Great link, thanks :-)

tomlu · on Dec 17, 2012

The lack of study for even the simplest, universally acknowledged principles of software engineering has always troubled me. Proving any non-trivial conjecture (for instance the proposition that version control reduces software cost) should be possible, but the cost of doing so is often prohibitively expensive.

Until the evidence somehow materialises I think the best approach is to accept that software engineering is mostly a craft, not a science. That way we can heed the advice of respected software artisans with a clear conscience - all of which would recommend the use of source control.

bjourne · on Dec 17, 2012

I can belive that keeping files under vc is generally beneficial. But what I'm not so sure about and what I wonder if it is worth the trouble making elaborate commits? It's the difference between anal committing and loose committing.

In the first style you ensure that your commit messages properly describe what bugs your checkin fixes, that it follows the formatting convention you have for writing checkin messages, that there are no extraneous whitespace changes producing unnecessary hunks and that you don't accidentally insert a big block of uncommented out code.

Loose committing is when you periodically commit the stuff you are working on. Several times per day, when you feel you've reached something good, you checkin what is in your repository and, at best, you write a comment like "fixed the bug with bla".

Of course, there are several possible shades of gray between totally loose and totally anal committing. What is the right strategy may very well vary depending on how many developers are involved in the project.

For me, I've found that loose works best for my own projects. Prettifying commits and thinking up good descriptions is annoying and can disrupt my flow. I can't be bothered to force myself to do it just "because it's the right thing" and I usually don't revisit old commits anyway. I'd definitely like to hear other developers thoughts on the issue though.

kabdib · on Dec 17, 2012

It depends on the project you're on, and what phase of development you're in.

If you work on your own, do anything you like.

If you work on something being used daily by tens of millions of people, upon which your company's lifeblood depends, you're more careful. Actually checking in code is somewhat anticlimactic; leading up to that, you've run a bunch of tests, gotten a code review, and satisfied yourself that the change is appropriate and necessary. Post-checkin, you monitor the build, make sure that BVTs work, and basically make sure you haven't broken anything.

A good checkin isn't a big deal. It's the stuff around it that matters.

spacemanaki · on Dec 17, 2012

Depending on the size and culture of the team you're working on, I think that requiring detailed commit messages can't really hurt that much. My problem with playing fast and loose with commit messages is that if you have no standards at all, or very loose standards, when things slip a little bit you can end up with utter garbage -- like a string of commits on a feature/topic branch all with the message '...' which are next to useless (not made up).

If you have even slightly stringent standards in place, then when things slip it's not the end of the world. If someone forgets to include the ticket number of a bug or feature, but has relatively detailed commit messages, the messages are still useful. I'm not advocating rejecting pull requests because commit message formatting is off (lines > 80 chars or no short description for the first line, etc), but just paying attention to the stuff that really matters.

I think of commit messages like emails to my colleagues. I wouldn't write junk emails that had poor grammar and spelling, or emails that required you to have unreasonable amount of context to understand, and I don't do that with commit messages either.

anarchitect · on Dec 17, 2012

The company I work for (an online retailer) had no meaningful VCS until after I joined and introduced Git.

Aside from the obvious software collaboration benefits, it's been particularly valuable this week when we have had to manage multiple different releases for the gifting season (promos, merchandising etc) across all six of our sites. All of our post-Christmas sales are all waiting in branches to be deployed.

Nursie · on Dec 17, 2012

Without a vcs you don't have a development process.

It stores the code. It stores all versions of the code. On a rudimentary level, when you break something, you still have the good version.

On a more sophisticated level, with a proper version control system and branching strategy you can -

Have multiple developers working on the same codebase and most changes automatically merged in (though obviously you need some human oversight)

Support multiple different releases of a product from the same tree.

Roll forward patches and fixes from one version to the next, again largely automatically.

...

Maybe some of these are less important for things you run on your own servers rather than stuff that gets deployed at customer sites, but to me it's difficult to imagine working without it.

codeulike · on Dec 17, 2012

I think this would be a useful study, people are saying that its self-evidently obvious that Version Control helps, but its always good to challenge things that are supposedly obvious.

I expect if it was properly studied we'd be able to identify a rough level of complexity below which using version control takes extra time and delivers not much benefit.

e.g. Linux Kernel definitely couldn't be done without version control, whereas some of the 24 hour things I've done solo at hackathons would probably have moved faster without git.

edit: but don't get me wrong, for any serious, multi-developer project I'm definitely using version control

mathattack · on Dec 17, 2012

I think the question is bigger than just what's being asked. Of course we all know that version control is vital. But has there been any formal study that documents how much?

There are a lot of common accepted practices that actually have weak scientific foundations. The waterfall method comes to mind. Asking for the basis of a practice is still useful, even if the answer is just "look around."

_3u10 · on Dec 17, 2012

It's an issue of having more data / insight into your codebase.

Ideally your software has a fitness function that it must pass or which it is compared with other iterations of your software. With each change made to the software it ideally becomes more fit, however we know that some changes cause bugs, reduce fitness, etc.

Source control allows regular human beings to revert to an earlier more fit stage and progress from there.

If it were possible to write a correct version of a program on the first try and never would the fitness function change then there would be no use in source control.

Frequently small programs will be created with out source control. Like a script to print the numbers 1 to 100, Fizzbuzz, etc. These kinds of software generally don't benefit from source control and thus it is not used. Software of simple complexity usually can be written correctly in a few iterations.

When working with multiple programmers the primary added benefit is file syncing and visibility into who made what change so it can be inquired as to the purpose/impact of the change.

In short if someone isn't using source control tell them that the prototype they showed you last week was perfect for a new client, you need what was built last week in the next 15 minutes to demo for a client, but the menu color should be blue instead of red.

OT: I've been trying to figure out a way to auto-create commits everytime a file is saved, then quickly quash those commits into a single new commit when substantial changes have been made. Anyone know of a readymade solution?

steeleduncan · on Dec 17, 2012

OT: seconded. If anyone knows of a way to commit to a "staging" repository, then bundle my broken wip commits to the main repository, I'd love to know.

I'd like to keep the main repo clean with one commit per feature/bugfix, but still be able to commit regularly and walk back/bisect changes whilst working on the feature.

cheald · on Dec 17, 2012

You can do this with git and squashed commmits, but while it seems like a good idea, it's really not - when you're bug-hunting, smaller single-change commits make it far, far easier to track down and kill the bug. Having to crawl through a 3000-line squashed commit to find where a bug was introduced is awful.

A better workflow is to use feature branches to work on a feature, when when you're done with it, rebase and merge it back to master and create a tag for the feature. This gives you an easy timeline of what features or fixes were introduced when, lets you commit often without breaking master, and doesn't destroy your valuable commit history.

maw · on Dec 17, 2012

when you're bug-hunting, smaller single-change commits make it far, far easier to track down and kill the bug.

This is true, but with the caveat that each of those commits needs to basically work. If you have untestable commits, things work less well, and maybe it's worth having slightly larger commits so that they will be testable.

steeleduncan · on Dec 17, 2012

Thank you. This looks like it will work well. The only problem is remembering not to push.

It would be nice to be able to work with two repositories in parallel. One repository for my development and the other one the "official" repository with larger commits, better commit messages and the guarantee that all commits work.

cheald · on Dec 17, 2012

You might be interested in git-flow[1] which is a formalization of this workflow.

[1] http://jeffkreeftmeijer.com/2010/why-arent-you-using-git-flo...

raverbashing · on Dec 17, 2012

I think anyone who questions the need for an "empirical evidence" has never written a single line of code in their life

Because it's in plain sight

No, this is not a matter of "trying to find it"

If someone questions this they're already wrong.

CJefferson · on Dec 17, 2012

While I agree there is a clear need for version control, telling someone looking for evidence that they are already wrong is extremely unhelpful.

Almost everyone had a time when they didn't use version control, at least not properly. I wrote thousands of lines of code on an Amstrad CPC, and the closest I came to version control was cycling between two different tapes when it came to saving my data. When I moved to PC I did the same, with two floppy discs.

raverbashing · on Dec 17, 2012

So you see that even when you had no idea of what was version control you invented a rudimentary form of it?

That's the thing, that is so basic people invent it if they don't have it.

I did similar things, though not on an Amstrad ;)

sliverstorm · on Dec 17, 2012

I agree that the lack of empirical evidence is probably because it's a pretty fundamental idea that is "common knowledge", but that doesn't mean evidence isn't useful and it also doesn't mean the original poster deserves a bashing.

tshaddox · on Dec 17, 2012

> I think anyone who questions the need for an "empirical evidence" has never written a single line of code in their life

Just like anyone who questions the need for empirical evidence of the miasma theory has never worked around sick people a day in their life.

pekk · on Dec 17, 2012

While miasma theory has been falsified, it was based on phenomena which were real and would have been stupid to deny.

If you can 'preserve the phenomena' that cause almost every serious producer of software to use version control, by some mechanism other than version control, with any kind of advantage whatsoever, everyone will be extremely interested because you are in a position to revolutionize the way software is made.

In reality the alternative to version control, particularly with multiple people touching the same files, is an insane mess, which is the reason we stick to these tools. This isn't to say that empirical evidence is not required, rather that the phenomena are so obvious and omnipresent that any idiot can gather them easily.

raverbashing · on Dec 17, 2012

"While miasma theory has been falsified, it was based on phenomena which were real and would have been stupid to deny."

Exactly!

Miasma theory is incorrect, but it doesn't mean you can stand around sick people and won't get sick.

"ather that the phenomena are so obvious and omnipresent that any idiot can gather them easily."

Thank you.