So I have a solid mental of git, and I understand the theoretical need for the staging area.
However, I find the occasions for using the staging area in practice are few and far between, for the simple reason that I can't test and execute the code that's in the staging area without also having the code from the working directory also be there. It feels like after having partially staged some of my working directory, it would be a blind commit with no guarantee that things are working.
Very rare is the situation that I can break out a list of files over here that are for feature A and some over there for feature B, and never the two shall interact.
I think this is probably what most struggle with regarding the staging area, without being able to articulate it.
I second this. It wasn't until I adopted this practice that the staging area really made sense to me. I find it helpful not just for making atomic commits, but as a way of remembering what I was actually doing, so that I can write a good commit statement.
This has never made sense to me. I've seen others say that they commit only parts of a file. How does this scenario start? Are you working on solving one problem, but then notice some other unrelated issue and fix that too, before committing the first change?
Partly, yes. Or, I'll be working on a task overall, and have to touch multiple files in the process. Then when I'm ready to commit, I review all the modified files on disk, and look for ways to break those down into smaller discrete logical changes. I prefer to avoid "big bang" commits as much as possible, because smaller individual commits are easier to inspect, easier to back out if necessary, and provide a better "story" when inspecting a file's history sometime down the road.
But then, you either never run/tested those smaller individual commits, or you have to do extra work (stash changes, test, restore stash) to do that.
I do not see why a source control system should make it easier to make a commit that hasn’t ever existed on disk and thus cannot have been tested.
I think the better model would be to stash your changes and have an diff editor between the on-disk working copy and the stashed version that allows you to commit a set of changes as several smaller, more coherent commits.
That wouldn’t guarantee that each of those intermediate commits gets tested or even built, but it would guarantee that each smaller commit is in the on-disk copy at some time.
> But then, you either never run/tested those smaller individual commits
Not necessarily. One nice option that the git rebase command has is --exec (which can be specified multiple times). So you can run a rebase and have git execute a command (like running a test suite) for each commit in the branch. If any commit files, the rebase process will stop and let you amend the commit to fix the issue.
> or you have to do extra work (stash changes, test, restore stash) to do that.
I've found that it's easier to write and locally test a given feature and them incrementally stage parts of it and create commits before pushing the code up for review. To me, that's easier than just making a large commit and then trying to split it out into a better set of commits after the fact.
For example, I may write a new method and then call it several places in the code. So my first commit would be to add the new method along with its unit tests and my second commit would be to add calls to it in the code base and update the associated integration tests (if necessary).
One common scenario is that I'm working on one problem, and in the process of solving that issue do some refactoring of related code. In this case, I want to commit the refactoring (which does not change the program's behaviour) before committing the changes that do change the program's behaviour.
I typically then send that first refactoring commit to Github (on its own branch) so that it gets full CI test coverage. And then continue working on the fix/feature while it runs.
One use case is to exclude extra lines of the file you don't want to commit. For example, I might have some debug print statements in my file that I want to keep in my local copy of the file while testing, but I don't want to include in the commit I push up for review.
> Are you working on solving one problem, but then notice some other unrelated issue and fix that too, before committing the first change?
Almost. Most often it's:
- Working on solving problem A
- Notice problem B
- Start to solve problem B
- Notice I'm getting distracted from A, and return to finish it.
- Want to commit my fix for A, but don't want to lose or forget the partial work on B.
Two different approaches I might take in this situation, depending on whether B is related to A.
1. If they are related (eg, B depends on A), use `add --patch` to commit A, then finish and commit B.
2. If unrelated, use `git stash --patch` to stash B, then commit A, then switch to a different branch to finish B.
Honestly, I see the point of both stash and staging, but not both together. Too many tools for the same job. On my long list of projects to do is a git porcelain that combines some of these concepts (eg, stash and working directory which would be tied to a branch):
- Each branch would have a single stash.
- When you check out a new branch, all uncommitted changes are automatically stashed.
- If the branch you're switching to has anything stashed, that stash gets popped.
- Any current workflow that involves stashing can be replicated by using a branch instead of a stash.
This way, branches can be thought of as "state of the working directory", which is more intuitive with the branching tree model, imo; commits are a snapshot of the repo at that point in time; and the staging area is just a way to choose what should be included in those commits.
> Git’s workflow wouldn’t even be sane without the staging area. This is what allows you to fix mistakes and make your work presentable for remotes.
I did exactly the same diff/tidy/diff workflow when I used p4 and svn, neither of which make a distinction between "working directory" and "staging area".
Right, but p4 & svn have “checkout” which is similar to staging. Staging is part of what we get because we can edit files without having to checkout / open for edit.
P4 and svn don’t have a strict commit parentage, which is why you can push commits in those systems in any order. Git’s strict concept of parentage is what makes the staging area so important for keeping your workflow similar to p4 & svn Workflows. Without a staging area, you’d either have to always fix mistakes with new commits, which is bad, or rewrite already pushed history, which is worse.
The terminology is a bit different - unless configured with mandatory locking (essential for some workflows) you don't have to open for edit. You just edit stuff and it goes in the "default changelist", roughly equivalent to automatic staging.
> Without a staging area, you’d either have to always fix mistakes with new commits
Mistakes at what point? In the normal svn workflow you can review with svn diff, then when you're happy do svn commit; it's just that there's no local place you're committing to. In both cases there's a critical point, either "svn commit" or "git push".
> unless configured with mandatory locking ... you don’t have to open for edit.
I’d guess you’re learning toward talking about svn, which I don’t remember very well, and I am leaning towward talking about p4, which always does mandatory locking.
You’re right the terminology is different between these different systems, I’m just pointing out that the git staging area has what you can think of as some equivalences in the other systems. Or, you can think of it as tradeoffs. Either way, the git staging area is something that helps you pretend like you’re using svn or p4 in the sense that it helps support editing multiple changes at the same time before pushing them to a server.
> Mistakes at what point?
With git I’m referring to mistakes between commit and push. But there’s a philosophical difference here that I glossed over. With git it’s easier to commit early and often than it is with svn or p4. With svn & p4 it’s easier to lose your work because version control doesn’t know anything about it before you push. If I make micro-commits, which I want and I like, then I put more “mistakes” along the way into my local history, and I can use the staging area to clean everything up before I push. With svn & p4, you make those mistakes and do the cleanup without ever telling the version control, and you run a greater risk of losing that work while you do the cleanup.
Committing isn't a commitment. After making the first commit, you can use the `git stash` command to put the rest of your changes aside, and go through the normal test->amend loop until you're happy with that first commit. Then you just retrieve your other changes from the stash to make your second commit.
It's also possible to do this without the stash command, by making both commits right away, and testing them later. However, that would involve rebasing(?) your second commit on top of any changes you end up making to your first commit, so using the stash makes more sense to me personally.
Fwiw, stash can get you into trouble more easily than commit. It’s no more typing to commit or branch, so I recommend preferring those to stash when it makes sense, or when you’re playing with changes you don’t want to lose. Stash is handy for a bunch of things, so use it by all means, just remember that there’s often an equivalent way that is just as easy and much safer.
“If you mistakenly drop or clear stash entries, they cannot be recovered through the normal safety mechanisms.”
One of the best things about git is how big the safety net is, as long as you tell git about your changes. Almost any mistake can be fixed, so why use features that aren’t sitting over the safety net?
You're adding a feature to your proggie. That involves modifying the main bits to add the feature and, say, adding a couple of interfaces to internal library modules.
Split out the changes to the library modules into separate commits---it's safe because nothing uses them, they're logically separate from the feature changes (although they don't appear to have a justification without the feature), the log will be marginally cleaner, and git bisect will have more granularity.
Why is the staging area needed in such a case ? In more traditional systems, you'd just do, say, "svn commit library/" and then commit the rest. (and you could do just the same in git too without seeing the staging area)
However, I find the occasions for using the staging area in practice are few and far between, for the simple reason that I can't test and execute the code that's in the staging area without also having the code from the working directory also be there. It feels like after having partially staged some of my working directory, it would be a blind commit with no guarantee that things are working.
Very rare is the situation that I can break out a list of files over here that are for feature A and some over there for feature B, and never the two shall interact.
I think this is probably what most struggle with regarding the staging area, without being able to articulate it.