It has been my experience that after a substantial period of being in real use, methods embody tacit knowledge about the problem domain that isn't necessarily evident from their public interface. Because initially they didn't embody that, and it caused bugs, so people edited in fixes.
It has been my experience that rewrites from scratch typically lose all this tacit knowledge and re-implement the original bugs.
That's sort of the point of the suggestion, I think. If you always rewrite from scratch rather than edit, this sort of tacit knowledge won't get embedded undocumented and untested into the middle of a function.
This is sort of the great contradiction of software development. Untested legacy software is hard to work with. So should our focus be on how to work with legacy software or how not to create it in the first place? There's really no answer to that.
One thing the OP doesn't address is, is the rewrite intended to be black-box or not? If not, then every "rewrite" will just be someone copy-pasting the old code with the new fixes, which seems completely pointless. And if it is a black-box rewrite, then all that tacit knowledge will be lost, leading to having the exact same bugs over and over and over and over again, forever.
I understand that this is all just a thought experiment, but there has to be a better way of framing it that isn't so immediately farcical.
> And if it is a black-box rewrite, then all that tacit knowledge will be lost,
This is a thought experiment on how to avoid that tacit knowledge, not how to deal with it. Consider approaching every edit in the same way you'd approach new code: is the function adequately tested, does the function have a single responsibility, is the intent clear, etc.? The idea is that if you do this, then you won't reach the point where you have a 200-line method with tons of hidden business rules.
I agree there's probably a better way of framing this. Still, as a thought experiment, it's worth thinking about when you can and/or should do this.
I'm glad it's immediately farcical because it should help people from not interpreting it as actual advice (even with as farcical as it is, I see many comments here that miss that point).
Unit tests can only test certain kinds of behavior of the function under test. To give other kinds of examples, the current implementation might have used a clever trick to achieve faster execution (this cannot be evaluated with unit tests). Alternately, it might have been implemented so as to share common functionality with a different piece of code by calling a common subroutine. That sharing of functionality is probably not easy to evaluate with a unit test (nor is it wise), but is still an important property to preserve.
> the current implementation might have used a clever trick to achieve faster execution (this cannot be evaluated with unit tests).
Sure it can. I worked on a project with a pubic API with many time sensitive routines and we most definitely had unit tests measuring performance. The test failed if the execution time exceeded the acceptable threshold.
> Alternately, it might have been implemented so as to share common functionality with a different piece of code by calling a common subroutine.
You can certainly write coherency tests to ensure that two methods are doing what's expected. They might not share the same subroutine any longer but you can throw an exception when their respective results no longer align and have comments on the test explaining the rationale. Honestly if you have this sort of subtle dependency and you don't have a unit test for it then you're just asking for trouble.
> The test failed if the execution time exceeded the acceptable threshold.
In order to not be beholden to specific hardware/execution environment and absolute times, you can also compare an unoptimized version against an optimized version, so the test can be: optimized version must be 2x faster than unoptimized.
Of course if you actually have real-time requirements, it is good to test against those.
The most "fun" subcategory of "how do I write a test for this?" that I've seen is when something in your software triggers a bug in somebody else's product, and "somebody else" is both too big and important to return your calls and too relied-upon for the bug to go unaddressed (e.g. the Oracles and Ciscos of the world). So you have little recourse beyond screwing around with your code until the user stops seeing the bug.
That happens all the time. I generally encapsulate the logic compensating for the bug and decorate it with copious comments explaining the rational then write a unit test explicitly testing for the buggy behavior. If the bug gets fixed then the test fails and the product doesn't ship until it's resolved.
Yes I write unit tests for 3rd party APIs and libraries I consume if they show themselves to be inconsistent or buggy.
We actually use a 3rd party API that crashes regularly and we have a suite of unit tests surrounding it so when reports come in we can isolate, reproduce, and verify the problem before bringing fury down upon the vendor.
It is also handy during pre and post deployment validation to ensure that the environment your update is working in is 100% functional. I've rolled back releases before because an external resource had an unreported outage that caused our deployment validation to fail. With pre/post test you can test the environment before you even start.
Those are all fine things to do if you can afford them. In my case, I'm talking about bugs that come sturdily packaged in a $5000+ chassis sitting on a private VLAN in somebody else's building. So it goes.
Alternately, if it uses a regular expression or other form of pattern matching, it's very likely that the original coder made unit tests for the most common behaviors, but built in assumptions for edge cases. And I'm unaware of any code coverage tools that help someone test every single branch in a regex. So you're reliant on comments, documentation, etc. (thank goodness for Python's verbose regex mode, for example) to make those assumptions explicit. It's not an easy problem to solve.
A gray-box fuzzer such as american fuzzy lop (https://en.wikipedia.org/wiki/American_fuzzy_lop_(fuzzer) ) can generate quite good test cases. If your regex engine can compile to native code, it can generate tests that test every single branch of the regex.
"Back to that two page function. Yes, I know, it’s just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I’ll tell you why: those are bug fixes. One of them fixes that bug that Nancy had when she tried to install the thing on a computer that didn’t have Internet Explorer. Another one fixes that bug that occurs in low memory conditions. Another one fixes that bug that occurred when the file is on a floppy disk and the user yanks out the disk in the middle. That LoadLibrary call is ugly but it makes the code work on old versions of Windows 95."
It's not the first time someone proposes to write a test when fixing a bug, but you have to question the actual ROI.
At my previous job, they checked how likely it is for a fixed bug to reappear again. Odds are really low.
So we decided it wasn't worth the investment to write a test specific for the bugs use-case, because it's unlikely to appear again. We focused our efforts to other, more effective things.
We use the test as a way to prove we know what the bug actually is.
Our debugging "path" normally is:
1. Find out bug exists
2. Write a test that can reproduce the bug
3. Distill that test down to the minimum needed
4. Fix bug until test passes
5. Test bug is actually fixed by trying to reproduce it the same way the user/bugreport did.
When done like that, the test isn't "extra work" but is just part of narrowing down your actual problem most of the time.
And that test (assuming it's not extremely slow to run), takes very little to maintain, so leaving it in the suite of tests is basically a positive only.
Whist I agree with this approach, sometimes it just isn't possible if the code was written in a non-testable way to begin with. Such as code that hits the database a lot or relies on network data. In this case, weighing up the ROI is a worthwhile endeavour, with a note to refactor more thoroughly at a later date.
That's a really good reason to write business logic in a way that doesn't have those dependencies hard wired in. I often end up in those sorts of nightmares in codebases that use the active record pattern (hiding database access everywhere), hooks, critical logic in the HTTP layer, and so on.
> Such as code that hits the database a lot or relies on network data.
well the first problem is not a problem since you can cheat it.
if you rely on the native database you can actually fake your driver and insert transactions (subtransactions or in postgresql savepoints) so for the whole test suite you just rollback to the latest savepoint, this saves a lot of time. (it's actually not that hard in java+di or in languages where you can monkey patc code.
80% of all tests should not be written, or if already written: deleted. A test is only valuable when it fails alerting you to a change you made that introduced a bug. There is a catch though: I don't know which of those tests are valuable.
I find the ROI for tests is high despite the large number of worthless tests because of the ones that fail a small number alert me to something I wouldn't have found otherwise.
The 80% number was of course made up. In my experience it is realistic, but I have never done a formal study.
Furthermore to Klathmon, there is an opportunity cost. Because you don't have the test, changing that method in the future becomes problematic. Is not the bug reappearing, is the refactoring that cannot longer take place.
In practice, the rewrites that I have seen, are not limited to just function bodies, without changing interfaces and class structures.
What I see in refactoring is that whole blocks of code get an overhaul. And in that case, all the test you wrote on those interfaces need to be adapted, rewritten or removed.
So actually, in a proper refactoring, some tests might help you out, because the interface of the classes didn't change. But other tests cost you time.
It's all a pretty complex balancing act. Some tests save you time, some cost you time. They all make the code more stable, but you have to put your effort where it really makes sense.
Were you writing tests before? It could be that writing the test was causing the original bug to be fixed properly and prevented it from showing up again.
Often people suggest rewriting a method because it's so complex and poorly understood that it can't be maintained as-is, and for a method like that, you're right, it probably can't be rewritten without inadvertently changing the behavior.
I think what is being proposed here is rewriting methods that people usually don't consider because they're basically okay. Maybe the name is a little weird now because the usage of the method has changed. Maybe two methods that used to do something different now do effectively the same thing. But everything is basically correct and readable, and people don't want to make a bigger change than necessary.
Personally, I think I lean towards more aggressive rewriting than other people do. I try to make sure names make sense. When special cases are removed I check to see if this allows me to make the rest of the code simpler. When I see the same thing being accomplished in different ways in the same file, I'll take a minute to make it consistent, so that same things look the same.
But people see this as a trade-off, especially when it comes to code review. Several times when I've made major changes that needed to be code reviewed, I've taken extra time to re-order my commits so the change can be reviewed in two steps, first reviewing the refactoring that was done and then reviewing the change that was made. And with a single exception nobody has taken me up on it; they've insisted on reviewing the entire change at once. Instead of a straightforward refactoring and a straightforward change in functionality, now they're trying to make sense of a combination of behavior-preserving and behavior-altering changes. Because code gets reviewed that way, people tend to refactor less than is optimal, because they want to give their coworkers small diffs that highlight the logical change that was made. This is a tendency that can lead to a gradual accumulation of history in the form of duplicated code and nonsensical names, which I think is what the "rewrite every time" rule is meant to counteract. A technical rule won't make people forget a social trade-off, but it might help them remember the other side of the trade-off.
I’ve seen this in two broad circumstances: complex business rules, and spaghetti code. Business rules mostly have to be endured. Spaghetti code should be untangled if the software is still actively developed. This caveat appears much less often in editors, design applications, general horizontal apps and games. Those are more likely to suffer from poor design or coding, and the spec is much more in the control of the software people.
Hey Julian, I have a startup that attempts to address this in a developer friendly way. I'd love to get your feedback if you'd like to see it. Reach out to me at yahn007 @ Gmail dot com. Hope to chat soon!
A tangential thought but related: I would find this absolutely frightening. I understand the motivation behind it but I always like to start from something existing and editing it -- even if by the end nothing remains of the original. To some, an empty editor window is infinite possibility but to me, it's a "coders block", not sure how to phrase it. Infinite analysis paralysis perhaps.
Fill out the comment-header first, then; that's what I do. That informal-prose chatty summary of what the code does and why, what it takes in and what its output is, becomes in my mind the spec for the code while I'm writing it, even in casual one-shot Perl filters. It makes it easier for me to choose variable names which inform rather than mystify, because the master reference is staring me in the face as I write. It's also an attitude-anchor for specific tail-end comments to expand on the summary explanation by detailing why a line is as it is and does what it does. YMMV, but, for me, the payoff is an easier time of drawing thoughts together into code, plus, months down the road, an easier time of deciphering why I wrote the code that way.
I always split my editor window vertically, make some space over the method I want to replace and make sure the old version is visible on the other window, and then I rewrite it while referencing the old one. I only copy and paste stuff over if there's something I really don't wanna write again.
That's remarkable. I typically find it harder to shoehorn some new idea into existing, often badly-written code than to reimplement something appropriately general. This isn't meant to rebut your point; simply another observation.
I find that happens when one - or all - of the following are true:
1. Don’t have a good understanding of the existing system architecture
2. Have not thought through the design of whatever has to be added, both in terms of function expectations or overall flow
3. Overly (?) concerned with “getting it right”; that can lead to paralysis: somehow you have to both demand the best of yourself at that moment and yet accept that despite it all you may not make the best choices - and that’s ok
The first thing I thought when I read the title is what that would do to the history.
Looking at a diff and seeing a small modification to a method tells me more than a method that has entirely changed with for loops changed to while loops, indices replaced with iterators, different whitespace and brace placement, etc...
The headline is misleading IMO. The author is proposing “Never rewrite a method” _as a thought experiment_ to guide you to writing better code. Even the original proponent walked back from that idea moments after he proposed it.
Seems like Poe's law could be generalized into a heuristic about memetic hazards - no matter how much you guide and warn people, any idea you speak or put in writing can be taken in a way you did not intend. The chances increase with the size of the audience.
I've been doing something similar. I name the new method Method, rename the old one as MethodOld and in Method, before the return value, I put an assert(MethodOld(args) == return value). It's one of the things that when I do, I appreciate the value of, but I'm not disciplined enough to do as much as I would like to lol.
For some hairy stuff that require live data you can also reverse the practice: call both yet return the old one, and gather data in production before switching.
Obviously this does not work for side-effectful methods (beware of caches!).
I'm interested in what fields you would use this methodology. Machine learning?
In most fields we have a sample of known input vectors, including edge cases, that map to known outputs. A unit test is enough to test these methods with different implementations.
An example of when you might use this is when you're cutting over to a new service, and you want to validate that the new service behaves the same way as the old one before transitioning all the traffic to the new service. In order to build confidence in the new service, you can assert that the expected output is the same, and raise alerts when there is an inconsistency. This incremental rewrite approach allows you to slowly replace pieces of an application with a new service and helps prevent regressions in behaviour.
Or consider when you're performing a database migration (on large databases) with 0 downtime. Typically this involves something like:
1. Dual Writing: Create 2 tables and write to both and keep them in sync (by duplicating new data, and back-filling old data)
2. Update read paths: Change all code to read from the new table, and validate that the data being read is consistent with the old table. You can use a library like Scientist [0] to validate that the reads are the same.
3. Update write paths: Change all code to write to the new table (and raise alerts if the old path is exercised)
4. Deleting old data: Remove code and data that relies on the old data model
That's actually a good idea, thanks! I'm not sure it would be feasible but it would be neat if methods was automatically versioned (in all but syntaxtic changes) and you could automagically compare results of the different versions.
It kinda falls apart with refactoring but still :) I'm finding it more and more evident that file's as the default unit of storage and viewing of code is an obsolete concept
It's just a general thought I've been having. We're using text and files because it's the way things always have been. But it seems there should be benefits to perhaps move beyond that.
Things like with more canonical representations of code, rather than arguing over coding standards (although it can still text-based rather than completely graphical). Ability to more easily see related code and flows. Manage the meaning of code rather than its text (not changing text but changing symbols when you refactor).
There's been attempts to move in that this direction with things like code bubbles (https://www.youtube.com/watch?v=PsPX0nElJ0k) but yet there hasn't been an approach that's both visually attractive and offers enough benefits, but it's probably coming somewhere down the line
Yes, please just do this. Maybe this thread is full of Poe's Law irony, but if not, I don't look forward to refusing to work on any code developed with the practice of strewing almost-the-same methods all over the place.
In some languages you can use property-based testing to generate thousands of inputs and check that Method(x) == OldMethod(x). See Hypothesis for Python, QuickCheck for Haskell, ScalaCheck for Scala, etc.
That's interesting ! Will try that sometime. One question though: how do you deal with mutating methods? Like methods that would persist a value to storage and only should do so once?
When writing functions often I end up deciding fairly quickly if the function I am writing is intended to be reusable or not. If it is reusable I'll write it as a utility function, helper, lib, ... whatever the project's convention and language paradigm might be. Usually similar reusable functions already exist within the project.
A significant portion of functions fall in another category; functions that actually get some real work done. They are specific and not intended for reuse at all.
Applying the "never edit a method, always rewrite it"-rule to those helper/utility functions would just be painful. The public API you designed is designed with a certain purpose and reuse intent. Improving it slightly to fix an issue and forcing yourself to entirely rewrite it would just break things. Most likely you'll end up writing lots of small copies of the original method (how painful this exactly might be depends on the language you're using).
Applying that rewrite-rule to specific functions/methods makes slightly more sense. Here the logic is more tightly bound by business rules/logic, which if they change will quite often warrent a rethink. Particularly because business likes to just slap a feature on top of everything else; which means for us finding ways to keep everything sane in the codebase requires continuous refactoring. Rewriting a function here does not intend to make the function more reusable, it attempts to make the code more clear. Sometimes reusable patterns emerge, but often they don't.
Obviously the above is a simplification of things. Often programmers are obsessed with abstractions and design patterns (read too much GoF, Martin Fowler & Uncle Bob); or don't bother with anything at all (script kiddies, prototype developers, or any code written during a PhD...). The truth lies somewhere in the middle.
I'd call it "bricks" vs. "mortar". Write your bricks to be reusable and modular. Write your mortar to be simple but don't stress too much about it, because it really only exists to glue your bricks together.
The metaphor that biological systems undergo continuous renewal and so therefore code should be rewritten not edited i think is actually off, code is the dna and thus undergoes evolution not replacement
A big part of what this approach is trying to fight is function complexity. If you really re-wrote the entirely of your functions everytime you went to edit them I think it would end up slowing the programmer and the program down.
Programming is often a state of flow where you have a grand idea for how a mechanism should act and then you task yourself with implementing that idea in code. Sometimes large complicated functions are the best way to translate that idea. The approach the author describes sounds like it would create lots of small functions that don't really help you solve the problem at hand.
Good idea to keep in mind though. Everytime you go to edit a function and find yourself dreading it, maybe it's time to rewrite that one.
Even if you could.... why not modify a method? What's wrong with modifying it? It's only bad if you don't fully understand what it does, and/or all the ways/contexts where it's used.
But, that's a sign, right? You can't ever hope to safely change that which you do not understand. If you're afraid of the codebase - I argue it's better to set aside time for making the codebase less intimidating, than to devise clever hacks/ "software process" for working around your lack of understanding. The latter will just make the codebase more intimidating, over time.
Aside from compile-time optimization, pattern matching is just this additional if at runtime. It is questionable if pattern matches make code more readable since when you see foo(42) call and def foo(42) is defined elsewhere, you can spend some time on original function until realizing that it’s not what you’re looking for. The same for type-based and operator overloading. It only makes code beautifu^W hard to guess.
You definitely could enforce it: isolate the developer from the codebase, give them a function contract to implement. It doesn't sound like a good idea, and I wouldn't want to work that way, but it could be done.
So the point of a methodology like this is that it guides you into hopefully making better code by driving you away from certain decisions. Yes you can certainly sabotage attempts at doing so with patch methods and copy-paste, but the point is that if followed it'd set up an environment which discourages spaghetti.
I had a similar idea some time ago, that all the code should be write only. That is, you should always write new functions (and give them new names) instead of trying to rewrite them (if the spec changes, this assumes they were correct in the first place).
I think it's only really doable in purely functional language (like Haskell), though. It sort of means versioning of individual functions, and also types. It's very similar to rebinding values only as opposed to modification of variables.
But naming is a problem. Maybe.. There are two kinds of names of types and functions - intrinsic and extrinsic. Intrinsic name only describes what the thing is (e.g. array.search()). Extrinsic name relates to the problem being solved (e.g. product.findByPrice()). Maybe the functions (and types) that have only extrinsic names should be versioned and short name should be assigned to the latest version.
All in all, I think it's a concept worth researching.
With VC, you can only call the latest version. With versioned functions, all the old code would still call the old functions, until you would explicitly refactor it (or type system would fail).
I guess there is a philosophical debate behind this - what's in a name? Should a name of function refer to a specific body of code only, or all possible function bodies, past and future? What did the caller of the function want?
You can only guarantee correctness if the former. But the latter gives you more flexibility. I am not saying that this is the right answer.
You could see this proposal as tighter integration between version control and the language proper, which has the potential to solve a lot of problems. E.g. there's a common problem with diamond dependencies: if A depends on B and C, and B depends on D, and C depends on D, what do you do if the versions of D they depend on don't match?
Coming from EE, I like the thinking behind this. Components are similar to functions, they are tested to the nth-degree out of inventory, liability, and other concerns, and everything goes along well enough.
What would really help for that style of development would be a sort of apropos feature for your own code where you could look up methods by keywords rather than just identifiers. For larger projects, having people re-write several variations of what is basically the same method can be a problem even with mutable code. Perhaps Knuth's literate programming would help?
edit: And I bring apropos up because if you break things down into very small functions, and you don't have that, you end up writing several invariants of the same function because you can't find the one you're looking for.
Literate programming, as a concept, is super interesting. But it has a fundamental problem, when applied to software-as-an-industry, that I think is difficult to solve: many, if not programmers, may be literate but are not literate enough. If you go look at a Knuth example of literate programming, there emerges a few fundamental capabilities.
One: the ability to look at code fresh, as if it were to someone wholly unfamiliar with the module or function in question. And to do so without leaping to the smarmy-tech-nerd "I understand it so this is easy" thing--we're talking about a baseline level of empathy cultivated so as to be able to return to the mindset of a novice in order to help them climb from there.
Two: the ability to explain, without the prior knowledge of that module (unless you are to reference that module, which has the same rules), what it does and why it matters. In words. Not "the code is the documentation", but words.
Three: the ability to do both of the above engagingly enough that people don't go take a nap instead of read it. You don't need to be able to write hot fire, but you need to be able to write.
These are difficult things unless you have a decent grounding in the humanities, and even on a place like HN, which self-selects for people who want to write things about stuff, you see pretty serious capability gaps on the regular.
(To be clear, I wish it was the answer, I just don't think it is. Not without a fairly radical rethinking of how much humanities matter to software development.)
True enough, though at this point I'd be grateful for any kind of consistent method-to-description binding.
Having it be compiler enforced would also be nice. Even if most of the descriptions end up along the lines of "do some stuff with the string", at least that's something.
If only business requirements, specs and use cases didn't evolve.
For our Christmas campaign, calculateCostOfGadget() needs to account for the compounding discount. Preferably yesterday, as we need to complete this month's billing to make payroll.
Usually when I want to simplify something I do a copying garbage collection of code. Manually shifting the good stuff from the old version to the new version, this means all code gets reviewed, and poor code gets left behind.
I had a different rule, which I was advocating for PHP 5.3 back in the day:
Have functions take regular parameters and the last one will always be an associative array of options.
This matches how functions evolve. You have some required parameters, then introduce more but the old callers don't know about them so they're optional.
I was trying to argue that PHP could unify the function call syntax and array definition syntax to encourage this. In other words the options would actually be virtual, as the caller would just supply the optional named parameters at the end of the function.
This is basically the same as how many Python functions often have a long list of arguments with defaults. When calling you can ignore them or set them as you like using keyword arguments. I'm a big fan.
Following the rewrite-not-edit method rule forces one to either sink or swim. And the only maintainable way to swim is to write small focused orthogonal methods that do one thing only. Otherwise, it is impossible at any scale to follow rewrite-not-edit.
I like take on different perspectives like this. Doing so ensures one thinks about how to design and what to code before simply diving in.
Perhaps the "Second system effect" would not apply if methods are kept small, but is a method really a good unit of work for these kinds of ideas? Better rewrite units, classes, whatever can be isolated/encapsulated in a meaningful way.
I’m not sure rewrite is the right word here. Always write a new version from scratch, use a different namespace, and never delete old code is a better idea. You can’t cause regressions if you never change old code that’s in use.
What if we did this (where methods are immutable) and also architected a runtime to do this? Also we would use our source control to operate on functions instead of files? All methods would have a UUID and a hash of the contents .
This is actually the APL way. You make the language so high level and dense that even complex methods are one-liners. When you need to update it, you rewrite it. APL people have been screaming this for years.
I understand the sentiment behind this, but in practice this is going to be a terrible idea, in my opinion. I can imagine the codebase littered with method1, method2 kind of edits all over quickly.
how is that different to writing a test for the old method then writing the new method from within the test then moving it to the old location when you're finished.
Saw the headline, first though "I'll bet this is about dynamically typed languages", and lo and behold
> At a recent RubyConf...
EDIT: just wanted to add that when I code in dynamically typed languages I'm inclined to do the same thing. Without rigidity provided by a strong type system it's too easy to mess things up. That's why I prefer strongly typed languages so I can benefit from the structure they enforce.
Don't confuse static/dynamic typing and weak/strong typing. A language can be both dynamically and strongly typed (Python, for example). The thing you like about a "strongly typed language" is the static typing, meaning a variable's type can't change once it's been created.
It has been my experience that rewrites from scratch typically lose all this tacit knowledge and re-implement the original bugs.