I think that discussions in this area get muddied by people using different definitions of “rapidly”. There are (at least) two kinds of speed WRT tests being run for a large code base.
First, there is “rapidly” as pertains to the speed of running tests during development of a change. This is “did I screw up in an obvious way” error checking, and also often “are the tests that I wrote as part of this change passing” error checking. “Rapid” in this area should target low single digits of minutes as the maximum allowed time, preferably much less. This type of validation doesn’t need to run all tests—or even run a full determinator pass to determine what tests to run; a cache, approximation, or sampling can be used instead. In some environments, tests can be run in the development environment rather than in CI for added speed.
Then there is “rapidly” as pertains to the speed of running tests before deployment. This is after the developer of a change thinks their code is pretty much done, unless they missed something—this pass checks for “something”. Full determinator runs or full builds are necessary here. Speed should usually be achieved through parallelism and, depending on the urgency of release needs, by spending money scaling out CI jobs across many cores.
Now the hot take: in nearly every professional software development context it is fine if “rapidly” for the pre-deployment category of tests is denominated in multiple hours.
Yes, really.
Obviously, make it faster than that if you can, but if you have to trade away “did I miss something” coverage, don’t. Hours are fine, I promise. You can work on something else or pick up the next story while you wait—and skip the “but context switching!” line; stop feverishly checking whether your build is green and work on the next thing for 90min regardless.
“But what if the slow build fails and I have to keep coming back and fixing stuff with an 2+ hours wait time each fix cycle? My precious sprint velocity predictability!”—you never had predictability; you paid that cost in fixing broken releases that made it out because you didn’t run all the tests. Really, just go work on something else while the big build runs, and tell your PM to chill out (a common organizational failure uncovered here is that PMs are held accountable for late releases but not for severe breakage caused by them pushing devs to release too early and spend less time on testing).
“But flakes!”—fix the flakes. If your organization draws a hard “all tests run on every build and spurious failures are p0 bugs for the responsible team” line, then this problem goes away very quickly—weeks, and not many of them. Shame and PagerDuty are powerful motivators.
“But what if production is down?” Have an artifact-based revert system to turn back the clock on everything, so you don’t need to wait hours to validate a forward fix or cherry-picked partial revert. Yes, even data migrations.
You are of course entitled to your opinion, and I do appreciate going against the grain, but having worked
in an “hours” environment and a “minutes” environment I couldn’t disagree more. The minutes job is so much more pleasant to work with in nearly every way. And ironically ended up being higher quality because you couldn’t lean on a giant integration test suite as a crutch. Automated business metric based canary rollbacks, sophisticated feature flagging and gating systems, contract tests, etc. and these run in production, so are accurate where integration tests often aren’t in a complicated service topology.
There are also categories of work that are so miserable with long deployment times that they just don’t get done at all in those environments. Things like improving telemetry, tracing, observability. Things like performance debugging, where lower envs aren’t representative.
I would personally never go back, for a system of moderate or more distributive complexity (ie > 10 services, 10 total data stores )
All very fair points! I think it is perhaps much more situational than I made it out to be, and that functioning in an “hours” environment is only possible as described if some organizational patterns are in place to make it work.
yeah i realized as i wrote that out that my personal conclusions probably don't apply in a monoservice type architecture. If you have a mono(or few) service architecture with a single (or few) db, it is actually feasible to have integration tests that are worth the runtime. The bigger & more distributed you get, the more the costs of integration tests go up (velocity, fragility, maintenance, burden of mirroring production config) and the equation doesnt pencil out anymore. Probably other scenarios where im wrong also.
First, there is “rapidly” as pertains to the speed of running tests during development of a change. This is “did I screw up in an obvious way” error checking, and also often “are the tests that I wrote as part of this change passing” error checking. “Rapid” in this area should target low single digits of minutes as the maximum allowed time, preferably much less. This type of validation doesn’t need to run all tests—or even run a full determinator pass to determine what tests to run; a cache, approximation, or sampling can be used instead. In some environments, tests can be run in the development environment rather than in CI for added speed.
Then there is “rapidly” as pertains to the speed of running tests before deployment. This is after the developer of a change thinks their code is pretty much done, unless they missed something—this pass checks for “something”. Full determinator runs or full builds are necessary here. Speed should usually be achieved through parallelism and, depending on the urgency of release needs, by spending money scaling out CI jobs across many cores.
Now the hot take: in nearly every professional software development context it is fine if “rapidly” for the pre-deployment category of tests is denominated in multiple hours.
Yes, really.
Obviously, make it faster than that if you can, but if you have to trade away “did I miss something” coverage, don’t. Hours are fine, I promise. You can work on something else or pick up the next story while you wait—and skip the “but context switching!” line; stop feverishly checking whether your build is green and work on the next thing for 90min regardless.
“But what if the slow build fails and I have to keep coming back and fixing stuff with an 2+ hours wait time each fix cycle? My precious sprint velocity predictability!”—you never had predictability; you paid that cost in fixing broken releases that made it out because you didn’t run all the tests. Really, just go work on something else while the big build runs, and tell your PM to chill out (a common organizational failure uncovered here is that PMs are held accountable for late releases but not for severe breakage caused by them pushing devs to release too early and spend less time on testing).
“But flakes!”—fix the flakes. If your organization draws a hard “all tests run on every build and spurious failures are p0 bugs for the responsible team” line, then this problem goes away very quickly—weeks, and not many of them. Shame and PagerDuty are powerful motivators.
“But what if production is down?” Have an artifact-based revert system to turn back the clock on everything, so you don’t need to wait hours to validate a forward fix or cherry-picked partial revert. Yes, even data migrations.
Hours is fine, really. I promise.