Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Do you rewrite pull requests?
81 points by phendrenad2 on May 1, 2022 | hide | past | favorite | 71 comments
Just wondering how other open-source developers deal with valid by badly-written pull requests.

I usually work closely with the PR author, giving them multiple rounds of suggestions until it's up to our project's coding standards. Sometimes the PR author "drops out" of the process in frustration, as they were just trying to give us a quick fix, and didn't necessarily want to do a lot of work. In those cases, I usually thank them, and rewrite the PR (or heavily modify it). But I'm hesitant to do this in all cases, because often the PR author is excited to see their name as part of the contributors list, and/or to see their own code used in the project.

How so you deal with this kind of thing?



If the PR is technically correct, just violates coding style or smells a bit like of hacky trash, I merge it anyways. As long as it's not breaking the build, functionality, or bisectability, just go with it. You can always chase it with a cleanup commit of your own.

It's up to the PR's author to decide if they care enough to do better next time. By accepting their work and landing the change you've demonstrated an appreciation for their effort with open arms, and I feel this makes it more likely they will contribute in the future and care about doing better.

Maybe make a comment to that effect before merging as-is, creating the opportunity to clean it up themselves first.

If it's coming from a regular contributor I'm far less charitable, this approach I mostly reserve for first-time/infrequent contributors.


Pieter Hintjens (author of ZeroMQ) wrote and spoke a lot about merging liberally. Great advice IMHO.

> merge pull requests liberally and get new contributors’ “vision of progress on record” so that they immediately become members of the community. Worry about fixing the progress later.

See also https://jeremyfelt.com/2016/03/12/pieter-hintjens-building-o...


I've found "importance scoring" to be a useful trick in improving my relationships with teammates and open-source contributors, while preserving my sanity. When I give feedback, I add a score from 1-10 to my comments, where 1 is "trivial nit, take it or leave it," and 10 is "there is no way I will merge this because it's fundamentally broken or would otherwise be a huge mistake." It helps me balance my perfectionist nature against the need for forward progress, and contributors appreciate this guidance because it helps prioritize where further efforts are needed.


I just go with an important or not important for something I understand, I mean 6/10 I guess is important enough you don't want to merge it unless someone fixes it? And for something I don't understand I of course ask for clarification (by not understand I mean can you clarify why you did this bit?)


In practice, the histogram looks like a camel back :)


I find this an interesting concept. How do you do this in practice add a number to every comment and explain your system somewhere?


Yes, that’s right. Just a (1/10) or (8/10) at the end of my note.


This seems like the more ethical approach with regards to giving credit to the original author.

I've seen projects (on GitHub, won't call them out) get bad reputation when the maintainer implements code previously submitted as a PR.


> won't call them out

I will. vim/vim has been doing this for years, with Bram basically ignoring the whole point of GitHub, and putting his name on every commit. I love Vim, but that's not acceptable.


Conversely I think that's fine. There's no obligation to accept every PR as-is. Ultimately Bram volunteers to be responsible for supporting the project, so he can do whatever he wants.

I don't understand how this is the whole point of GitHub. I feel like GitHub has exacerbated misguided expectations that every contribution is worthy and that being rejected should feel bad.


The entire point of github is collaboration. If you're going to take others people code, you should at least attribute them. If you look at neovim where I imagine most serious vim users myself included have migrated to, you can see the projects first order of business:

Simplify maintenance and encourage contributions

One project has 700+ contributors, the other is less than 120. So no, it's not looking "fine" from a user perspective.


At least he should give credit to the original author.

Not aware there would be a standardized header for that, but you can always invent your own ones.

   Original-idea-from: a b <a.b@c.com>


There is "Co-authored-by" which is supported on GitHub [1] and seems appropriate if the maintainer is basing the solution on someone's code.

[1] https://github.blog/2018-01-29-commit-together-with-co-autho...


I think it's worth investing some time into people, to teach them about testing and documentation standards. If they can't / won't play along, my first approach is to do a PR into their branch. If that stalls out, I'll make a duplicate* PR from my fork of theirs, maintaining their credit and keeping my source tree up to standards.

Edit: * that almost never happens, it appears that github now supports changing the base of a PR


I just wrote a comment where I outline almost the same tactics. I think this is the most sensible strategy.


It's not even about "badly written". Sometimes it's just really hard to get a patch into the shape the maintainer wants (tests, architecture, general feel, how it slots into a longer term vision that may not be clearly defined, but the maintainer knows it when they see it) and takes a lot of back and forth. I don't begrudge that, after all, it's their project.

But it would be nice if there's a way to say "look, it is what it is, and while I'm willing to make an effort, I'm not living and breathing this codebase, and I'm not planning to become a major contributor going forward, much as I like the project. Playing 20 questions is exhausting on both ends and if you'd rather just take it and completely recast it how you want, I honestly don't mind and won't take offense" without sounding like "here's some random code, IDK if it works really, good luck, lol".

I don't even care about the authorship, I'd rather have the fix than get credit for a moribund PR. Half the time I contribute pseudonymously anyway.


Yeah, this is how I feel. When I make a PR, I do not do it for the credit, it’s because I really need/want a specific fix/change in the codebase without the overhead of having to maintain a fork myself.


I think that way to say that is open a PR, and just let it sit open. Eventually the repo owner will either merge it, close it, or sometimes ask you to do something to it. In the last case, I often just politely decline. It's the flip side of the idea that publishing an open source project doesn't obligate you to support it — likewise, submitting a PR to a project doesn't obligate you to do any more work on it.


Right, but the point is I don't want the maintainer to avoid or delay handling it because they would like to change something, but either don't want to offend or don't want to go through the tedious rigamarole of editing the file as they want by using the ensemble of GitHub, the Internet, me and my text editor as a very slow and inefficient text editor (especially as I probably don't understand all their comments at first because they know the system inside out, and I don't).

It's not that I will refuse to make changes, I will, within reason, it's that I'm also fine with the maintainer just taking over at any point and finishing it however they want, without any further back-and-forth. And I don't even care whose name is on it.


Unless the PR is really broken, I merge it so they get "credit" for it and then rewrite/refactor/add tests as needed. I do this for a few reasons:

1. I've been on the other side of spending time trying to figure out the arcane organizational details, existing utils functions, etc only to have there be things I missed. This is deeply frustrating.

2. I'm deeply appreciative that somebody took the effort to do something on project of mine! I recognize they did that for free out of good will and don't want to make it harder for them.

3. I know the codebase much better. I can fix up/trim/format/etc properly 10x faster than the contributor can. I'd rather spend 15 mins of my own time than try to suck away an hour or two of their time. I value my time very highly but I also try to be generous and loving, and that requires some giving of my most valuable asset: my time.

Super low effort/spammy PRs do not count. I don't merge those because I don't want to incentivize more of them.


My typical workflow - in private business repos, but for the same reasons specified in OP - is as follows:

1. Fork the PR branch. 2. Make the changes I want to see upstream. 3. Create a PR into the author's branch. 4. Ask the author to review my changes. 5. When my PR is complete, finally review and merge The original PR.

This gives me the opportunity to explain my changes in commit messages, and gives the author an opportunity to push back or argue actual functionality changes I made if it's not all coding style. Now the author has license in accepting my changes. It avoids the cumbersome comment/resolve feedback loop, shows the author that their contribution is important and I'm willing to put time into making it right, and ultimately gives them credit when their PR is merged.


The PR into their branch sounds nice, but not always necessary. If you forked the PR branch, threw a commit on top, opened and accepted a PR from your-branch to main and declined theirs, would that work? They get their commit in the history, you make the changes you want without back-and-forth. You could link from their-declined PR to your-accepted PR.

I guess that's messy if it's linked to issues and the connotation of "declined PR" isn't as nice.


The `Co-authored-by` trailer is your friend here. Even if you completely rewrite their code, give them coauthor credit in the commits and they’ll get the satisfaction of seeing their face in the commit history without the hassle of actually addressing all your feedback. https://docs.github.com/en/pull-requests/committing-changes-...


Wouldn't making commits on top of their (untouched) commits fine too? When creating PRs, there is a checkbox to allow repository committers to push into the PR branch.

That preserves the original author credit (and any GPG signatures), but moves things along. If that's not enabled, making a new PR based on the same commits works too.


If only two people are involved in the PR you don't even need "Co-Authored-By": unless you --reset-author, it's not going to get blown up by updating the commits, only "committer" is.


For squash merges, GitHub will autofill the Co-Authored-By in the merge commit, but it is up to you to keep it there.


Not for a GNU project, where the PR author didn't file a copyright assignment via snail mail yet. Such PR's could need months.


I try to help contributors to make it so their commits can be merged without any changes. For some this can be annoying in the beginning, but the goal is that they learn to write quality patches and that I will have less work getting their work merged. I hope that this is a win-win. There are times where I do some changes (rebasing, small fixes, rewording of commit messages), but I will try to ensure their name is in the Author: field in the commit messages.

As mentioned by sph: some projects have maintainers that have much stricter standards for outside contributors that for themselves. Speaking as a maintainer of a project that accepts contributions through PR requests, I am afraid I do indeed have such a double standard, although I try to avoid it as much as possible. I had to set some rules for myself:

- Contributors must fix outright bugs in their commits.

- Style issues are avoided by having a coding style enforced by a code formatting tool. This saves everyones time.

- If the commit is bug free (passes test suite and code review) and is a net improvement, accept it, regardless of my own plans for the project.

- If there is more that can be done to make it ideal, this can be done in a future pull request, or perhaps I will add those changes later myself.

I also try to hold myself to the same standards as I hold contributors to, and now often create a development branch myself for features so they get the same CI testing as pull requests, although I still sometimes bypass this for quick fixes.


Just wanted to say thanks for sharing these rules, I think they’re super helpful.

Do you have any particular rules about performance? I worry that accepting net improvements on functionality alone could result in “death-by-a-thousand-cuts” performance issues down the line (assume it’s a project where performance is one of the deliverables).

Would you ask a contributor to perform benchmarks or some other performance validation, for example?


It would depend on the project. If performance is important, it should indeed by validated, but then it would greatly help if the project also has a performance test suite that contributors could run locally, and compare their change against the baseline, and a reviewer could then also verify it themselves.

What you do with the performance results really depends on the kind of patch they want to commit. Is it an important bug fix? Then I would get that merged first and worry about fixing performance afterwards. And then you have those patches that improve performance in one area at the cost of decreasing it in another. In that case you might have to use your experience in what is more important to decide whether to accept or reject it.


Here's a relevant quote from Google's code review guidelines on speed https://google.github.io/eng-practices/review/reviewer/speed...

>When code reviews are slow, several things happen: ...

>Developers start to protest the code review process. If a reviewer only responds every few days, but requests major changes to the CL each time, that can be frustrating and difficult for developers. Often, this is expressed as complaints about how “strict” the reviewer is being. If the reviewer requests the same substantial changes (changes which really do improve code health), but responds quickly every time the developer makes an update, the complaints tend to disappear. Most complaints about the code review process are actually resolved by making the process faster.


That guideline definitely must applies only internally. All the Google public repos I've interacted with stall in the PR department like crazy.


For contributions by new authors I usually don't make them do a respin for something minor, for instance if I think a more explanatory comment would be helpful -- I just fix those up when I commit. It saves the author doing multiple rounds dealing with a process they may be unfamiliar with, and it's usually less work on my end. For an established submitter the balance is more to ask them to respin, because they ought to be learning the project rules and process; even there I often fix up nits like typos in comment when committing rather than requesting a respin.

I generally err on the side of retaining the authorship when I do this kind of thing (our project follows the kernel's de-facto standard of retaining the original git author but adding a note of extra changes made in square brackets after the author's signed-off-by and before mine as the committer, which is how we record who's responsible for what parts of a change like this). If you've literally completely written a fresh change to solve the same problem a different way then that should be your authorship with some kind of reported-by or other acknowledgement to the PR author, though.


On Github at least the default setting when sending a PR to a repo lets maintainers submit their own commits to your branch, so if I need to do some cleanup I'll often default to that so the user can get credit for both the PR and the commit. If for some reason that isn't possible, it's still relatively easy to fetch the branch locally, add another commit, and then merge that (either manually or through a separate PR), at which point you can also add a comment with a link to the PR the user created if you want to make sure to give them credit as much as possible when closing the PR. If for some reason you absolutely need to have a single commit (e.g. the commit the user submitted would fail some CI check and not be mergeable even with a follow-up commit to fix it), then it's also usually possible to amend their commit with the changes you need locally and then merge that (unless they happend to be GPG signing their commits or something). I consider this a last resort since it feels a bit weird to attribute code to someone without them actually having written it themself, so if I need to do this I'll generally also add another paragraph the commit message explaining what I added myself (there's probably some built-in way with git to add a co-author or something that someone can fill me in on, but I don't know it right now).

The other thing I try to make sure to do is thank each contributor explicitly in the release notes when their commits get merged. A simple "Special thanks to @foo for the fix for bug X, @bar for the implementation of feature Y, and @baz for improving the documentation of Z!" is a small amount of effort that I think can go a long way in making contributors feel welcome and appreciated.


When I was writing a book for O'Reilly about the GitHub API, I interviewed a bunch of people from GitHub. One of them told me a surprising story: he said that often they ask people to write the PR BEFORE writing any code at all. The PR should be created with zero commits. That way they can discuss the ideas, implementation plan, impact, without even thinking about the code.

After hearing this, I believe that a PR should be changed dozens of times even if it is a one line change of code. PRs can reflect a million changing things, and I often update a PR months after the code is done because new information has come in that makes the code change need more context.

More stories like this could help your cause, no?


That sounds like a solution for someone without a bug tracking tool. All of the discussion should totally happen outside of GitHub in a proper story tracking tool.


I guess the point is that GitHub already has a bug tracking tool. So, in a way, they are perfectly integrated.


Yeah but PRs aren't the bug tracking tool. Issues are. Why wouldn't this be discussed in the issue, rather than the PR?


To me, an issue implies “please do this”, whereas a PR implies “I'm going to do it myself, just tell me if you're going to accept it”.


> The PR should be created with zero commits

Doesn't a PR need a commit to be able to create it?


Just a branch, no commit needed.


I will type up all the review comments first, then...

If I expect or hope the contributor to become a long-term collaborator, I will submit the comments and wait for them to fix it. I'd rather they learn

If I don't expect them to stay around, I'll make the changes, force-push to their branch keeping their authorship, submit the comments, and let them know what I did and link to the diff caused by just the force push. I'll then ask them if it's OK and let the PR sit for a few days or until they respond, then I'll merge it.

---

The more of your expectations you can automate or at least document (e.g. in CONTRIBUTING.md) the easier it is.


Every day you delay pushing a fix for whatever the pull request was accomplishing is a day you are punishing all the other users of your project, as well as robbing them of the benefits of the work done by the person who submitted the request.

If I find an issue in your code and I submit a pull request or (more likely) file a detailed issue with associated patch, and I watch you try to block back onto me to edit it to your liking when I know it would take you less effort to just do that final work yourself, you are less likely to get me to care in the future to do the work to isolate an issue and potentially even likely to lose me as a downstream user of your code.

I also definitely wouldn't want you force pushing a change with my name on it: if you are editing my code it isn't my code anymore... just commit the code as you. I am not playing some weird game where "credit" on a commit matters: I'm trying to get work done (as I'd hope we all are), and feel a need to be helpful to others instead of selfishly hoarding patches for myself. And if someone is playing such a game, it is probably better to discourage them of it rather than putting up with it.


Why do you force push to their branch, instead of committing your fixes as separate commits later on?


I strive to have every commit be meaningful/buildable/testable/etc. To that end, I use a lot of `git rebase -x` to massage the pull request into a reasonable sequence of commits that “tell a story”. Having a bunch of “fixing foo” commits is ugly to me and reduces my ability to bisect problems later on.

Sometimes I will actually create the fixup commits and push them separately to communicate with the submitter, but then squash them into the appropriate places in history before merging.

The only technical issue is when someone has signed their commits, as I obviously cannot re-sign them :-). In that case, I call it out to them and let them re-sign if they want before I merge.


If the code which is eventually merged is mostly the contributor's code, and you just had to do some small tweaks to get it up to the project's standard, then commit and merge it under their name.

If the changes are so major that the code cannot reasonably be considered theirs any more, then it would be dishonest to merge it under their name, and would not be doing them or anyone else a favor. In that case, commit and merge under your own name, but include a note in the commit message thanking them for their contribution.


Here's another approach for situations which fall in the middle:

https://docs.github.com/en/pull-requests/committing-changes-...


I do this a lot on PRs, especially for small fixes. I’ll make sure to put a comment explaining my changes and link to the relevant doc about why the change was made.

It doesn’t make sense to ask contributors (especially new ones) to spend time going back and forth on such small changes. It’s better use of both our times if I take a few minutes to make the small changes and merge in.

For more complex PRs, my guiding principle is:

- Make those changes that might require a lot of back and forth and harder to explain or might be subjective.

- Request changes that are objective.


This applies to open-source projects with volunteer contributors, most of whom are undergrads.

If I know they won't be frustrated by a review process, I work with them to improve it. When we get to a point where I think they'll be frustrated if we continue (or if it's just actually good now), and also the code is working, I merge it. I then will not touch it for at least a week or two. At that point, the code is fair game for me to refactor.

(edit - I think it's pretty important that the original merge is only their code, so that they have a sense of ownership & accomplishment, especially when they're college students.)

Assuming I have a good relationship with the author, if I do significant refactors to their code, I'll send them the commit in which I did so and explain to them why I did what they did so they can learn for next time.

This approach has never really backfired on me, except that maybe sometimes I forget/become apathetic about doing the second part, so I have some not-so-great code in part 2. But, it's never (to me) been worth the frustration of a volunteer contributor; people contributing to the project is more important than anything else, because otherwise the project dies. One of my projects does have one mostly-abandoned large component that no one wants to touch because someone came in, wrote a ton of mediocre code, we didn't really want to do a huge code review, and then that person left anyway because they got busy irl and/or thought "good enough" and/or got frustrated with what little code review we did do, but we openly say "this is not actively supported use it at your own risk" and people seem happy enough about the situation. Hopefully someday someone takes it over again.


I had this experience on the committer side recently, where I contributed some functionality to a large project. The immediate response was effectively comments on lines saying "this is bad style, fix like this; the commit message is to long, rebase to that" etc. It wasn't necessarily a negative experience (in fact, as a new experience, I would say it was very useful), but I was a bit taken aback at how I was treated as an employee who had not committed something that was up to standards, rather than as someone who was keen to provide value to the project, and appreciated/guided accordingly.

I have to say, after I pointed out this was my first non-trivial contribution to the project, the developer in question backpedalled a bit and "on-boarded" me very politely and gave me lots of feedback and nice tips. But I think another way it could have gone was with me saying nothing and just quietly abandoning the PR.

I think saying something like the following can go a long way: "Thank you for your contribution and goodwill towards the project! Please understand that we cannot generally merge code that does not comply with our style standards. In your particular case, this means one of the regular developers may have to make the following changes (a,b,c) based on your suggestions, and commit on your behalf. If you'd prefer to do this yourself to carry the process to completion, and learn about our project's style and conventions a bit more in the process, you're more than welcome to do so; I'd be happy to guide the process or help out with any questions you have along the way."

This would prepare the committer that this dance is about to take place (and why!), so that they expect it, and decide whether they want to commit to it up front, rather than realise they're dancing halfway through and decide whether they want to continue or not.

Also, I would like to clarify the word 'badly-written' a bit. I would argue that occasionally, 'not up to the project's coding standards' and 'badly written' are not necessary synonyms. Some projects in particular have really awkward / prehistoric coding standards, and non-complying but well thought out commits are written with the best intentions. If there isn't a good explanation of why you're now trying to undo all those good intentions to introduce arbitrary spacings that hurt legibility, it probably won't go down so well with your committer.


There is no way to win, for me, because it burns me out. I am/was a semi-perfectionist maintainer.

If I subject others to my standards, it's annoying for them.

If I hold my nose and merge it anyway, I feel the pressure to clean it up before the release. Which burns me out, so I avoid doing this.


Or you could just accept the help from others--crediting them in your bug tracker or, at most, in some CONTRIBUTORS file--but add the perfection yourself?


That's point #2 that I mentioned, also burns me out, nags me.


Hypothetically if you’ve documented how you expect contributions to be, and someone doesn’t follow that, then at least they can’t say that they couldn’t have known what you wanted when you reject it.

The thing that is hard for people to get is that time costs a ton of money. If you use software without contributing back at all, you’re getting a lot for free but at least you’re not actively costing the project. If however you “contribute” in a lazy way, it starts to actively cost the maintainers to deal with this. So: feel free to contribute but read up on what a project wants and try to show that you’re at least meeting their effort halfway in terms of sufficient detail, following coding styles and request templates, etc.


I think it is better if you codify the rules that you can (with quality checker tools) and ignore the rest. Most of these tools are sophisticated enough that they don't let utter crap slide.


At work I sometimes leave a comment but still approve. If someone continually refuses to improve their code I will end up refactoring it for them and bearing an unspoken grudge toward then


Really depends. If its a simple fix for something I will usually merge and clean up in subsequent PR's. The contributor is helping me out, I want to make it as easy for them as possible, which makes it more likely they will continue to help. As the difficulty and complexity of the PR increases I will proportionately increase the standards for the PR. Obviously there are other factors that would change this approach but in general, I try to make contributing easy as possible.


The first person who ever submitted a PR to one of my projects also reformatted the files he touched according to whatever formatter his IDE was using, which was different than mine. I was conflicted about whether to accept his change and inferior formatting or whether telling him to adopt my formatting would be obnoxious. I settled on merging his change and then instantly reformatting it to the way I like it - keeping the meaningful part of his change.


Reformatting changes are the worst -- they break the git history, and in a few years when you try to find the commit message for a specific line of code you'll have to navigate past the formatting changes...


I wonder if there is a case to be made for a patch branch that gets all PRs. Then the maintainer can do any further tweaks needed before merging down to mainline.


The solution is simple, make it easy for people to contribute in way you deem appropriate. In the case of PR you can simplify the process quite a bit:

https://docs.github.com/en/communities/using-templates-to-en...


Depending on the details, I'll rewrite as you describe, or merge, then fix with my own commit.

It's frustrating being on the other end of this (PR submitter), since for the repos I've contributed to, the maintainers usually want you to fix it until it's exactly how they like. Then I end up giving up and using a custom fork, or integrating their code into my program directly.


If it's a quick fix, and "Allow edits by maintainers" is checked, might as well make the needed changes yourself directly on their branch and merge the PR.

That way their commit is in the git history, you aren't "robbing" them from showing up in the contributors list.


Clean and extend it under their name.


There's a git commit key which keeps the original author. You can just use this key always. Git amend = git commit --amend, preserve author, but update time.


Have you considered adding the original author as a co-author on the rewritten commit(s)? That could still add them to the contributor list


I used to have this problem then I figured that this takes too much time and it is also frustrating. So what I ended up doing it that I added some code quality checkers that will break the build if the PR is shit. If not then I usually approve it after glossing over it.

The thing is that most of these PRs are small and very localized. The potential to do damage is very low. I'd rather have some new feature implemented with some not-so-stellar code that's hidden behind an interface then doing it myself.

I think there is a goldilocks zone between perfection (and being a control freak) and a steaming pile of crap where the stuff you're working on has a clean architecture and it is acceptable to have a few poorly written modules. I usually end up rewriting them anyway when I have time and they perform poorly. By that time the contributor is usually either long gone or already improved their skills so they don't mind me tampering with the code.

TL;DR: after it passes the tests / quality checker I just merge them and fix them later if they are too crappy. All in all the project benefits more from this compared to too strict coding standars.


The most frustrating thing in a PR is getting comments, addressing those comments, and the getting more comments (on something unrelated) by the same person. Give all your feedback at once. Otherwise, the review process will go on forever.


Give all your feedback at once.

Sometimes that just really isn't feasible or efficient. Happens often enough I glance over a PR to see it has code style issues or obvious mistakes in the code or architectural problems. Especially in those last cases, when the implementation still needs to change, I'll just point that out, then later on revisit and do a more thorough review. Because it makes no sense reviewing things which might not even end up in the final PR.


Hence why I said "and the(n) getting more comments (on something unrelated) by the same person". I understand clarifying comments after changes have been made but comments in a completely separate part of the code that have nothing to do with the original feedback keeps the PR open forever.


What about sending a patch to them?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: