Fragile narrow laggy asynchronous mismatched pipes kill productivity

btreecat · on May 17, 2020

So I have been thinking about software projects lately, and I have come to the conclusion that a lot of these tools/solutions exist to "build houses" when most of us are just throwing together lean-to sheds and dog houses.

Software projects today are naturally more complex and have more complex tooling the same way building a house today requires more knowledge and skill than it did 50 years ago.

Then there are some folks/organizations building cathedrals, and the associated tooling (react, angular, maven, etc) and all the rest of us look up in awe and thing "well I guess if I want to be that good I need to use those tools on this dog house."

But your dog house doesn't have the need to host parties, provide security, or even real weather protection other than a roof to keep the rain and sun out. Yet we all try to build our dog houses in ways that might be better if they are one day converted to a proper living quarters but likely will never have a need for running Water or windows.

colechristensen · on May 18, 2020

French cooking's mise en place or first order retrievability by Adam Savage or Lean Engineering or whatever.

There's a common thread among them, having what you need, when you need it, where you need it, and understanding how to use it.

This is what is lacking. The original Unix philosophy build by greybeards and university linguistics and language professors had this at it's core. The "do one thing well" combined with how the shell worked being driven by people who really really understood language and came up with one that made sense to them for interacting with computers... was, is still somewhat, wonderful.

What's missing today is exactly what you mention. People designing tool sheds with materials often shoddily designed for cathedrals.

Complexity. This is the enemy, the second enemy is bad attempts to reduce complexity which often end up adding more complexity than they take away, just harder to find.

The favourite example of this is - perhaps apocryphal, but entirely believable - is replacing dozens of nodes using fancy big data tools with one node running sed/awk,etc.

One thing is clear, nowhere I've been has had the tools readily available, the documentation clear and forthcoming, and the scale in the right range for projects.

I found myself recently solving a problem with Hashicorp Vault using GCP to verify identity of machines wanting secrets. It was a stretch goal which had been on my plate for six months, every once in a while I would try to go back and figure out how to make it work, and months and months and months after trying, I put it together and it worked perfectly. The documentation to lead to this understanding had to be read out of order on several different pages with some lucky guesses to arrive at the solution, which in the end was just a few steps easily explained. Afterwards the documentation was fine and made perfect sense. Before I grokked the issue the documentation just seemed like a bunch on nonsense which led me to believe what I wanted to do wasn't possible in a constrained security environment.

That is the kind of problem I solve all the time as someone with a decade of DevOps,SysAdmin,whatever experience behind me. Not using knowledge and tools to amplify what I do, but spending 60% of my time confused as hell about something which should be obvious and is only obvious afterwards, 20% trying to convince people of things they're often reluctant to believe, and 20% actually using built up knowledge and tools to do many many things very quickly. It's frustrating.

zrm · on May 18, 2020

> Not using knowledge and tools to amplify what I do, but spending 60% of my time confused as hell about something which should be obvious and is only obvious afterwards, 20% trying to convince people of things they're often reluctant to believe, and 20% actually using built up knowledge and tools to do many many things very quickly. It's frustrating.

It also creates some nasty perverse incentives. Because if you spend the hours to find the best way to do something that should be simple, it often turns out to actually be simple. API function #487 does just exactly what you need, if only it hadn't taken seven hours to find it. So then you spend a day writing five lines of code that should have taken half an hour, but in the end you have a good solution.

But there is an alternative to that. You can reinvent the wheel. Instead of spending your time understanding the unnecessarily complicated thing with poor documentation, write some new code to do just your thing.

And who looks more productive to the boss? The one who spent all day to write five lines of code, or the one who added a complex new feature that clearly took a lot of time and effort (but will now take ten times more effort to maintain)?

This is especially nasty because it's not always an obvious answer. Sometimes the existing wheel is made of glass and is shaped like a triangle and reinventing it is totally worth it, and then you spend all day discovering that and still have to spend tomorrow doing what a lazier person did yesterday.

Whereas without the unnecessary complexity the right answer would have been obvious much sooner.

rkangel · on May 18, 2020

This is why good documentation is such a wonderful feature. I like Elixir for lots of reasons but the documentation for the core language and particularly for Phoenix (the primary web framework) is the best technical documentation I have seen in any context, ever.

It makes such a difference to your ability to get things done with unfamiliar libraries etc.

breeny592 · on May 18, 2020

> Complexity. This is the enemy, the second enemy is bad attempts to reduce complexity which often end up adding more complexity than they take away, just harder to find.

This is true at every level of the systems design process - often by trying to make a system "simpler" i.e. less complex for the end user, the complexity is shifted further down the stack into application code, or even to the underlying infrastructure.

It's easy for those of us with technical backgrounds to see the beauty and simplicity in well designed interfaces, but as the realm of computing and computer interaction shifts away from technical to non-technical people, we start to absorb some of that complexity into our systems design to make up for knowledge shortcomings of end users.

Your example of sed being better than the "fancy data tools" I feel is a good one - whilst sed is incredibly powerful for this use case, if the consumer of what needs to be run there only knows how to use excel, it's often required to create these abstraction layers to allow the end user to do their own primary function/role.

zrm · on May 18, 2020

There is a difference between inherent complexity and manufactured complexity.

Inherent complexity can only be moved around, but even then you still want to move it to a place that can handle it properly. If the best you can do is a complex decision tree then it should at least be well-audited and well-documented instead of having a dozen separate buggy ones that all give different results for unknown reasons.

But much of today's complexity isn't inherent. It's manufactured. It's just not required to be there at all and there is much to be gained by taking it out.

tlarkworthy · on May 18, 2020

BTW Google Secrets Manager is rediculously easy to use. It's not quite at the level of Vault but it does enough for me and is 100x easier to deploy

hinkley · on May 18, 2020

You have one group building dog houses like the dog will always be a puppy, and another group looking at those people, laughing, and wandering off to build a kennel, just in case you might decide at some point to start raising show dogs for 4 different breeds at once.

What is most likely instead is that you'll get a second dog in a couple years and need to make a better version of what you already have. And if it's nice enough, make a couple more for friends or as a gift.

m463 · on May 18, 2020

Software engineering evolves to get the most out of your hardware.

If you sort of turn that idea around in your head, software will always end up trying to barely work on the shittiest hardware.

Think about it. Chips work on 30 nm. What's our next step, to make 30 nm more solid and reliable? No, it's to go to 20 nm or 10 nm and get the same software to work.

The same goes with systems stuff. Do we work on having much more reliable cpus? No, we work on using lots more less-reliable cpus (or gpus) with more unreliable storage over less reliable interconnects and networks and get the software to (imperfectly) abstract away the problems.

hinkley · on May 18, 2020

I've heard it said that the reason bits of the Roman aqueducts are still standing is that they were overbuilt by at least a factor of 2.

The craftsman often wants to overbuild something. It's part of what gets them out of bed in the morning, or lets them sleep at night.

What if you - or someone else - got your margin of error wrong, like the Citibank building? A little extra to deal with unexpected situations isn't so bad.

jupp0r · on May 18, 2020

If you compare the aqueducts to software, you might draw the completely wrong conclusions. Software that was extremely well built in the 70s is mostly gone (or if it's still there, it's not exactly admired by todays software craftsmen (looking at you, airline ticket booking systems)).

What makes software endure over time is its ability to react to changes of wildly different varieties (market shifts, changing hardware, new technology, ...). The Linux kernel might still be popular in 30 years, but it will likely look very different from today. If it looks the same in the way the aqueducts do, it won't be relevant anymore and will most likely be forgotten eventually.

m463 · on May 18, 2020

> gets them out of bed in the morning

I always thought you should do it as best as you could too.

It took a bunch of time in the real world before I could actually grasp the concept of over-engineering in a meaningful way.

The first example of over-engineering I could "get" was when someone (jwz?) ranted about over-engineered high-end audio. The deeper you look, having $1000 00-gauge cables just won't make anything sound better. It's marketing.

The second example was japanese motorcycles. I started realizing that there was an envelope. If you made something bigger and stronger, the weight would go up and the acceleration would get worse. If you made it all motor, acceleration would go up, but handling would get worse. Some of the fastest bikes were actually lighter less powerful bikes that could still accelerate, but also go through turns.

as an example, before the s1000rr era, BMW made big strong durable motorcycles... that weighed a lot and handled in a very mediocre fashion.

(as to safety - yes, bringing safety or security into the equation ... you change the equation)

hoseja · on May 18, 2020

But it won't look as good on the quarterly report. What profit has the managerial class extracted from the aqueduct?

hinkley · on May 19, 2020

What have the Romans done for us?

Someone, and I wish I recall who or even where, once sold me on the notion that Roman expansion was empowered by the same kind of 'replaceable parts' strategy that powered the ascention of the United States.

According to this person, everywhere the Romans went they left cookie cutter solutions to problems that made their military more fungible. If I move these guys from southern France, to northern Greece, everything is set up the same way and they can hit the ground running.

Where we have standardized manufactured equipment, they had standardized infrastructure.

RangerScience · on May 18, 2020

"The stone mason lives in a wooden house."

lostcolony · on May 17, 2020

Failure is such a fun thing to think about, and it gets handwaved away so often. So many devs, architects, product owners, etc, just focus on happy path, and leave failure unspecced, unhandled, and just hope it never happens. And then boast about 99% uptime, but once you start questioning them you find out they get weekly pages they have to go investigate (and really the system is behaving weirdly a solid 10% of the time, but they don't know what to do about it and it eventually resolves itself, and they don't count "pageable weirdness" in their failure metric).

It's actually one of the things I love about Erlang, and how it's changed my thinking. Think about failures. Or rather; don't. Assume they'll happen, in ways you can't plan for. Instead think about what acceptable degraded behavior looks like, how to best ensure it in the event of failure, and how to automatically recover.

jodrellblank · on May 17, 2020

On the subject of failures, you might like this blog post https://danluu.com/postmortem-lessons/ if you haven't seen it before.

LargeWu · on May 17, 2020

As a developer who's trying to move on to either a management or product role, failure modes is one of the things I want to emphasize in that role. The sad fact is that so many product owners really don't understand how software works or gets built, and as such, they are unequipped to reason about such things.

chubot · on May 17, 2020

I think git is a good model for what would otherwise be "laggy async and mismatched" distributed systems.

It has a fast sync algorithm, and after you sync, everything works locally on a fast file system. You explicitly know when you're hitting the network, rather than hitting it ALL THE TIME.

-----

I would like to use something like git to store the source code to every piece of software I use, and the binaries. That is, most of a whole Linux distro.

I have been loosely following some "git for binary data" projects for a number of years. I looked at IPFS like 5 years ago but it seems to have gone off the rails. The dat project seems to have morphed into something else?

Are there any new storage projects in that vein? I think the OP is identifying a real problem -- distributed systems are unreliable, and you can get a lot done on a single machine. But we are missing some primitives that would enable that. Every application is littered with buggy and difficut network logic, rather than having a single tool like git (or rsync) which would handle the problem in a focused and fast way.

It would be like if Vim/Emacs and GCC/Clang all were "network-enabled"... that doesn't really make sense. Instead they all use the file system, and the file system can be sync'd as an orthogonal issue.

Sort of related is a fast distro here I'm looking at: https://michael.stapelberg.ch/posts/2019-08-17-introducing-d...

gumby · on May 17, 2020

Actually git “suffers” from two of the problems he listed: cache coherency and, as with all filesystem based approaches, serialization.

These don’t matter for git which manages to push all the coherency issues onto the user and which can afford to operate (in computational terms, not human terms) very slowly on small amounts of data.

I’m not saying git is slow (it’s gratifyingly fast) but it has a remarkably smaller problem domain than the one described in the article.

chubot · on May 17, 2020

I don't agree with that framing... When you say "cache coherency" you're implicitly assuming some authoritative state. The point of git is that there isn't a single authoritative state.

Not all apps will work with that model, but more apps than you think would.

I guess the difference is how fine-grained you want the updates to be. For example something like Figma in the browser (a collaborative photoshop) implements a lot of custom application-specific sync with CRDTs and so forth.

Maybe you need the really fine-grained updates, or maybe you just need some Github-like site which allows coarse-grained collaboration.

In other words, there can be a network model between "e-mail a .PSD file" and Figma. I think this "in between" would scale to more applications than reinventing sync inside every app. Imagine audio editors, video editors, 3D modellers, etc. Rewriting all those in the style of Figma is prohibtive.

I would rather have open file formats and application-agnostic sync, like git. And I think it's a lot cheaper to develop, although current software business models don't really support its development.

Someone · on May 17, 2020

“The point of git is that there isn't a single authoritative state.“

I think you’re missing the point of the post you refer to.

For git, there _technically_ isn’t a single authoritave state, but for many, if not most, git use cases, there _sociologically_ is. Projects typically have one repo that is _the_repo_: the repo most merges are done to, that releases get built from, whose url you give when you tell people where the project lives, etc.

All other clones of that repo, sociologically, are just caching the main repo and changes you make.

Then, humans have to decide when to flush what part(s) of the write cache to the main server, and when that results in conflicts, humans have to resolve them.

That’s, I think, what “which manages to push all the coherency issues onto the user” means.

gumby · on May 17, 2020

You are correct, I meant that even in a peer-peer merge (which can even be just one person debugging the same program on both a Mac and a Linux machine), any merge conflicts are marked and left for the user to decide about.

chubot · on May 18, 2020

The presence of useful forks means that there isn't a single authoritative state. For example, I've had a fork of bash-completion for a over a year, and I use it all the time.

How do I fork a Google doc? At best you can copy and modify, but here's no merge.

Merging is a feature, not a bug (maybe not for all applications, but 99% of them).

trishume · on May 17, 2020

This is also an interesting system in that it's an example of how you can get away with a non-distributed system if your problem is small enough but eventually that falls over. Once you get to large corporation monorepos git operations start to get real slow and use too much hard disk, so you end up with them either creating/using a new VCS or doing some complicated undertaking like https://devblogs.microsoft.com/bharry/the-largest-git-repo-o...

chubot · on May 17, 2020

Yeah I agree, it's sort of an open problem, but I guess a bunch of arrows are pointing toward FUSE.

Distri uses FUSE and it appears Microsoft's GVFS uses FUSE or whatever Windows technology is the equivalent. (My teammates developed Google's equivalent about 14 years ago, using FUSE, so it's something I've used / seen several times.)

FUSE requires some kernel support (a module), which git of course doesn't require. That is a barrier, but perhaps not an insurmountable one. Basically I would like to offload all the "network" work to the OS, so applications are free of that logic.

hobofan · on May 17, 2020

Not sure how well if fits your use-case, but I've been very happy with git-lfs in combination with a NAS I have at home. The NAS is just mounted as a normal network drive and available to use with LFS via lfs-filestore[0] and available on the go via the builtin DynDNS + VPC of the NAS.

I've been using it for a repo where I store all my university/academic stuff like lectures (recordings), PDFs of books and papers, Anki decks, etc., and it has now grown to be ~120GB big.

Biggest issue was that I'm always running low on disk space with my laptop, and git-lfs doesn't have a good built-in way to only checkout part of the files on your machine, so I built a small tool to make that easier[1]. Since I've been using that it's been a pretty smooth ride.

[0]: https://github.com/sinbad/lfs-folderstore

[1]: https://github.com/hobofan/lfs-unload

dimatura · on May 17, 2020

I have been interested in "git for binary data" for a while, mostly for ML/computer vision purposes.

I've tried quite a few systems. Of course, there's git-lfs (which keeps "pointer" files and blobs in a cache), which I do use sometimes - but it has a quite few things I don't like. It doesn't give you a lot of control on where the files are stored and how the storage is managed on the remote side. The way it works means there'll be two copies of your data, which is not great for huge datasets.

Git-annex (https://git-annex.branchable.com/) is pretty great, and ticks almost every checkbox I want. Unlike git-lfs, it uses symlinks instead of pointer files (by default) and gives you a lot of control in managing multiple remote repositories. On the other hand, using it outside of Linux (e.g., MacOS) has always been a bit painful, specially when trying to collaborate with less technical users. I also get the impression that the main developer doesn't have much time for it (understandably - I don't think he makes any money off it, even if there were some early attempts).

My current solution is DVC (https://dvc.org/). It's explicitly made with ML in mind, and implements a bunch of stuff beyond binary versioning. It does lack a few of the features of git-annex, but has the ones I do care about most - namely, a fair amount of flexibility on how the remote storage is implemented. And the one thing I like the most is that it can work either like git-lfs (with pointer files), like git-annex (with soft- or hard-links), or -- my favorite -- using reflinks, when running on filesystems that support it (e.g. APFS, btrfs). It also is being actively developed by a team at a company, though so far there doesn't seem to be any paid features or services around it.

Pachyderm (https://www.pachyderm.com) also seems quite interesting, and pretty ideal for some workflows. Unfortunately it's also more opinionated, in that it requires using docker for the filesystem, as far as I can tell.

Edit: a rather different alternative I've resorted to in the past -- which of course lacks a lot of the features of "git for binary data" -- is simply to do regular backups of data to either borg or restic, which are pretty good deduplicating backup systems. Both allow you to mount past snapshots with FUSE, which is a nice way of accessing earlier versions of your data (read-only, of course). These days, this kind of thing can also be done with ZFS or btrfs as well, though.

kortex · on May 17, 2020

+1 for DVC. Setting up the backing store can be some extra work if you are doing that yourself, but after that it's a breeze.

What do you use for the backing store?

Git-lfs has been a pain in my seat since my first use of it. Most of the issues stem from the pointer files that have to be filtered/smudged pre/post commit.

Haven't used git-annex myself, but I have heard from coworkers that cross-OS is a pain.

dimatura · on May 18, 2020

Mostly S3. I used to do SSH, but these days I can afford to keep the data in the cloud. I do appreciate the possibility of migrating to other stores if needed in the future, though - might have to soon, for $reasons.

unqueued · on May 18, 2020

Actually, a lot has updated about git-annex in the last few years. It actually does support git pointer files like git lfs, which makes it easier to when you want to modify binary files. In fact, it can even use git-lfs servers as one of its back-ends. However, I still prefer symlinks mode, because operations on them are faster because it bypasses the smudge filter.

Also, git-annex uses reflink copies whenever possible, on zfs, btrfs, or apfs. Also, since people were talking about p2p and git, git-annex does this amazing trick for syncing directly to other git-annex repos, even with the checked out branch. There is no need at all for a seperate server.

I have used git-annex for years on OSX, and have not found it be deficient in any way compared to linux.

dimatura · on May 18, 2020

Yeah, git-annex has a lot of cool features that I have yet to see in other systems. I still use it for some things. My main pain point on MacOS was that the symlink mode didn't work well with some apps that didn't understand symlinks. Obviously this is not git-annex's fault, but it still made it so I couldn't use it. I think I could try again at some point and see if I could get it to use reflinks -- maybe it's a version issue.

I also had weird conflicts with the line ending (the whole CR/LF annoyance) on some of the metadata files git-annex used which I couldn't fix, no matter how many .gitconfigs I tweaked. Again, this is not really git-annex's fault, I think.

chubot · on May 17, 2020

Thanks for the response! I have heard of dvc and git annex, and it's probably time to give them another try :)

tsimionescu · on May 17, 2020

> It would be like if Vim/Emacs and GCC/Clang all were "network-enabled"... that doesn't really make sense. Instead they all use the file system, and the file system can be sync'd as an orthogonal issue.

Well, you may already be aware, but Emacs is actually 'network-enabled' in this way through TRAMP, and to a lesser extent, the emacs client/server protocol.

There are also many issues other than file syncing that networking has to solve. And also, even for regular files, there are many ways to interact with them, and many protocols that existing systems can already speak.

jazzyjackson · on May 17, 2020

I've been bouncing an idea around for a while, on how could I use git as a back-end for a filesharing/chat/collaboration suite -- I think it would work to have a git hook, pre-commit, that replaces all binary files / blacklisted file extensions, with a text file whose contents are the magnet address to download the large binary file over torrent - so the file name, permissions etc don't change, but a small text file is committed instead of the binary.

So, as a consumer of the repo, I would clone the repo, and then have a post-checkout hook check the blacklisted file extensions and grab their contents: the magnet link, all that's left is to find peers, download the file and replace the text file with a hardlink to the binary. Maybe this is all too laggy, asynchronous, and mismatched, but I think git+magnet could be a cool combination.

I found this repo [1] that generates the magnet link for you, I would just need to find away to use the repo to make all the contributors peers to each other, so we can download the binaries from whoever is nearest / highest bandwidth from us.

[1] https://github.com/casey/intermodal

nybble41 · on May 17, 2020

What you are describing sounds a lot like git-annex[1] with the IPFS remote[2]. There is also a bittorrent remote[3] but it doesn't handle uploading content; you have to create the .torrent file or magnet link yourself and add it to the repo, at which point git-annex will download the content automatically on checkout.

[1] https://git-annex.branchable.com/ [2] https://git-annex.branchable.com/special_remotes/ipfs/ [3] https://git-annex.branchable.com/special_remotes/bittorrent/

robochat · on May 17, 2020

I think that a version of this idea underpins the operation of the matrix messaging protocol. Rather than just sending the latest messages, the conversation is synced between clients to ensure that everyone sees the same history.

[1] https://matrix.org

nanomonkey · on May 17, 2020

There is git/github on top of Secure Scuttlebutt:

https://github.com/noffle/git-ssb-intro

benibela · on May 17, 2020

I have searched something like this for backups, but have not found anything good.

But I also have conflicting wishes. I want it to store the history (so that a backup of directory A cannot be overridden by a backup of an unrelated directory B, and it can apply renames without copying large files again), and then I do not want it to store a history (so that large files that are removed from the main system can be permanently removed from the backup)

nixpulvis · on May 17, 2020

I agree that git is a great model, however it's often hard to explain to new users why merge conflict require time to resolve, and nothing saves you from the added work.

Life just isn't completely decentralizable... sorry.

chubot · on May 17, 2020

git isn't completely centralized either. You can centralize it like Github, and that plays an important role in the ecosystem.

Decentralization is a spectrum, not the opposite of centralization.

----

One way to partially address to merge conflict problem is to explode application files into directory hierarchies.

For example, Word .doc files and Photoshop .PSD files are basically huge hierchical data structures inside a single file. I believe video formats also have significant hierarchical structure.

A lot of them even have immutable portions and mutable portions -- e.g. for storing an entire version history.

So if those were exploded into something that the OS (or tools like git/rsync) could understand, like a tree of files, then you would have a lot fewer merge conflicts.

That's how people tend structure their git repos too. If you have a frequently edited file that's hard to merge, then that's a smell, and you can fix it.

This won't solve every problem but again there needs to be something in between "rewrite Photoshop as Figma" and "email around a bunch of PSD files" (which is a very flaky distributed system on top of e-mail.)

j88439h84 · on May 17, 2020

> most of a whole Linux distro.

It's NixOS.

microtherion · on May 18, 2020

git has some robustness problem over slow, fragile connections, though. It will not corrupt data, but if your connection cannot stay up long enough to clone an entire repo, you need to resort to ugly workarounds.

adrianmonk · on May 17, 2020

See also Peter Deutsch's "Fallacies of Distributed Computing" list (https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...).

There's some overlap, but also some new stuff. In particular, "pipes" isn't covered by the Fallacies list and is consistently a pain point and/or issue you always face in some way. Also "asynchronous" isn't covered by the Fallacies list.

smitty1e · on May 17, 2020

> Sometimes a distributed system is unavoidable, such as if you want extreme availability or computing power, but other times it’s totally avoidable.

But so much of our sales pitch involves these shiny cloud systems.

Who ever sold business by telling the customer: "Your use-case really isn't exciting, and a boring batch-driven process is completely appropriate"?

FpUser · on May 17, 2020

>Who ever sold business by telling the customer: "Your use-case really isn't exciting, and a boring batch-driven process is completely appropriate"?

I did many times. Many practical businesses are looking how to solve their real problems in reliable and cheapest way. They mostly do not give a flying hoot about all those buzzwords and super duper newest tech. They just ask how much, how exactly will it work, what are hidden cost, how will it be maintained and they will also look at your profile (clients, references, examples of finished projects etc).

emteycz · on May 17, 2020

I do it like that, it's always met with excitement like "oh wow, all these other companies were telling us how hard and costly and lengthy it will be, thank you"

smitty1e · on May 17, 2020

Do you ever have the experience of doing a prototype, and the customer looks at it and says: "Great work. Put it into production"?

dylan604 · on May 17, 2020

every. damn. time. i'm a freelancer, so there's not a dev team behind me. the client likes what i made, and wants to start using it. realizing this, i've started spending much more time on the UI/UX (even though i'm not one of those guys) to at least make the tool useable in a dogfooding way.

MauranKilom · on May 17, 2020

Maybe it's not 100% relevant to your case, and maybe you've already seen it, but just in case:

https://www.joelonsoftware.com/2002/02/13/the-iceberg-secret...

ChrisMarshallNY · on May 18, 2020

Joel Spolsky is my hero. There are so many nails he’s been hitting on the head for decades.

Engineering (to me) is a craft; not just a vocation.

I also think that the software industry’s Ponce de Leon obsession might be part of the problem.

Younger folks tend to “dream big,” which I think may be a factor behind the ageism.

“They haven’t had some grizzled old fart tell them that it can’t be done!” is the argument that I’ve heard.

Tru dat, but one of the things that older folks have, is experience getting things done, often, highly optimized, polished, and of excellent quality. That tends to require a lot of tenacity and patience.

Shipping is boring.

Ever watch a building go up? A good prefab looks complete after three months, but doesn’t open for another nine months. It looks awesome and shiny, but is still behind a rent-a-fence. What gives?

That’s because all that interior work; the finish carpentry, the drywall, the painting, etc., take forever, and these are the parts of the building that see everyday use, so they need to be absolutely spot-on. The outside is mostly a pigeon toilet. It doesn’t need to be as complete; a solid frame and watertight is sufficient. They just needed it to keep the rain out, while the really skilled craftspeople got their jobs done.

I like to make stuff polished, tested and complete. I don’t like making pigeon toilets.

john-tells-all · on May 18, 2020

This article is spectacular, thank you!

Joel explains that it's hard explaining Dev work to nondevs, e.g. the Business. They think the UI/screenshot is the software, which has a number of consequences.

As a DevOps, I also see a similar challenge explaining Infra/DevOps to Developers :)

tlarkworthy · on May 17, 2020

I love the catagorization. But decent software should be distributed. I dislike single teams that begat 10s of miscroservices, but it should be buy, not build, features from specialized 3rd parties. Thus a decent modern installation should be leaning on a ton of 3rd party services (e.g. Identity providers, databases, caches) because they all do a better job than the hand rolled local one. It's how you outsource expertise.

The vision of the service mesh is to make unreliability and security no longer the job if the application binary. Even without a service mesh, you can put a lot of common functionality into a reverse proxy. Personally I am loving OpenResty for the simplicity of writing adapters and oauth at the proxy layer with good performance.

trishume · on May 17, 2020

I think it should be possible to buy or use software from third parties. One thing I'm disappointed about and think we need better tools to avoid is the fact that third parties provide their tools as services rather than libraries. There's reasons they do that, deploying a library that all your customers can easily use is hard right now, but there's no reason it has to be that way.

Some things can't easily be libraries like databases, but other things like some caches, miscellaneous operations like image resizing (depending on an architecture that can handle the load on those servers), and a bunch of other things could just be libraries.

tlarkworthy · on May 17, 2020

The world is distributed. Money is distributed. You need services to interact with the world. The large portion of useful business services cannot be encapsulated into a hermetic library.

If you think that distributed service clients as libraries is enough to solve the issue of distributed computing, that is incorrect as most of the same distributed systems crap will still happen.

kortex · on May 17, 2020

Totally agree. I have found that all of the pain points OP article mentions, are things that still crop up in monoliths, and by hardening against them, the app is more antifragile. Ex:

Fragile: writing fault tolerant (because the endpoint is unreliable) hedges against the case when unforseen conditions throw a spanner into the works.

Async/Laggy: some operations just take longer. Being able to gracefully route control flow around blockages improves performance.

Pipes: I've found it's helpful to have ser/de interfaces throughout the program, as it forces you to use immutable data structures - aka functional programming. This helps reduce state space.

Put together, I find having various "shear lines" in a monolith codebase, where I could split into a service if I wanted, greatly improves robustness.

nemetroid · on May 17, 2020

> Some things can't easily be libraries like databases

The world's most widely deployed database engine (SQLite) is only available as a library.

tlarkworthy · on May 17, 2020

The worlds most trafficked database is probably Google search* index and its not sqlite and it's certainly distributed.

* Maybe Facebook, I dunno, the point stands.

tosserup478 · on May 17, 2020

Modern browsers all use sqllite, so more than 90% of user accessing that index are doing through an app using sqllite. Lot’s of things come with sqllite without you knowing about it

tlarkworthy · on May 17, 2020

Good point, though the Google index users are 1. the google search users AND 2. the machine crawlers who crawl webpages no-one visits. I am not sure the human users of the index are the majority.

Admittedly this is a fairly pedantic point.

ChrisMarshallNY · on May 17, 2020

This article has a point.

But, as in all things software, "it depends."

It depends on what the tools are, and what we are writing.

In my own case, I have the luxury of writing fully native Swift code for Apple devices. I don't need to work with anything off the device, except for fairly direct interfaces, like USB, TCP/IP or Bluetooth.

Usually.

I have written "full stack" systems that included self-authored SDKs in Swift, and self-authored servers in PHP/JS.

I avoid dependencies like the plague. Some of them are excellent, and well worth the effort, but I have encountered very few that really make my life as an Apple device programmer that much easier. The rare ones I do use (like SOAPEngine, or ffmpeg, for instance), are local to the development environment, and usually quite well-written and supported.

If I were writing an app that reflected server-provided utility on a local device, then there's a really good chance that I'd use an SDK/dependency with network connectivity, like GraphQL, or MapBox. These are great services, but ones that I don't use (at the moment).

I'm skeptical of a lot of "Big Social Media" SDKs. I believe that we just had an issue with the FB SDK.

That said, if I were writing an app that leveraged FB services, I don't see how I could avoid their SDK.

So I write fully native software with Swift, and avoid dependencies. That seems to make my life easier.

But Xcode is still a really crashy toolbox.

perfunctory · on May 17, 2020

Just reading the title I assumed it was a post about business processes and communication between teams. Because this is how working for a big corp sometimes feel.

snazz · on May 17, 2020

It's a bit of Conway's law: the software structure mirrors the organization structure.

lowbloodsugar · on May 18, 2020

Really like the curiosity and thought behind this article. Couple of thoughts:

>probably upwards of 80% of my time is spent on things I wouldn’t need to do if it weren’t distributed.

Sure. Do it on one giant machine. Then you'll be spending 80% of your time doing things you wouldn't need to do if it weren't monolithic.

At the end of the day, if your customer is on the other end of the internet, then all of those complaints apply. If you solve that by running an app on their device, then oh boy are you going to have fun testing.

I prefer scaling out. The stackoverflow peeps prefer scaling up. There are some great write-ups about how they scale. I found this [1] one after some quick googling, but I am certain there are more. So it's really about choose your poison.

>I think people should be more willing to try and write performance-sensitive code as a (potentially multi-threaded) process on one machine in a fast language if it’ll fit rather than try and distribute a slower implementation over multiple machines.

Sure. I once replaced a system that ran on 10 32-core machines with one that ran on one with four cores on one machine and did the work in the same time. Another time I had 96 cores, more threads, and I replaced it with one that had three threads and was faster.

But both of those solutions were evolutionary dead-ends. The tasks were very specific, and not subject to change. The first one was a single C file. The latter was actually java, but with hand-rolled hash tables and optimistic locks. The first one I doubt I could follow it now.

My point is, you can have understandable systems that good people (as opposed to geniuses) can work on, evolve and adapt, and that have well understood failure modes and scaling cliffs. Or you can have bonkers code that everyone is afraid to touch, and which fails in production when it hit a cliff you didn't know about and now your site is dead for eight days.

If you can strike a good balance, then you'll probably have some combination of distributed, and brute force.

[1] http://highscalability.com/stack-overflow-architecture

naniwaduni · on May 18, 2020

> Sure. Do it on one giant machine. Then you'll be spending 80% of your time doing things you wouldn't need to do if it weren't monolithic.

Deeply false equivalence.

evadne · on May 17, 2020

Do you have a moment to talk about our Lord and Saviour, Erlang/OTP?

shijie · on May 17, 2020

My thoughts exactly! As the Elixir code at my company continues to grow at a rapid rate, writing OTP services for everything has allowed me to never have to think about entire classes of bugs and edge cases simply by virtue of the patterns inherent in OTP and Elixir/Erlang.

Right tool for the right job.

crazygringo · on May 17, 2020

Of course they do. But there's no alternative.

No matter how fast or beefy your server is, these days if your product becomes a success, 99% of the time it will outgrow what's possible on a single server. (Not to mention needs for redundancy, geographic latency, etc.) And by the time you see the trend heading upwards so you can predict what day that will happen, you already won't have the time for the massive rewrite necessary.

So yes, it's tons slower to write distributed servers/systems. But what other choice do you usually have?

Though, as much as possible, you can try to avoid the microservices route, and integrate everything as much as possible into monolithic replicable "full-stack servers" that never talk to each other, but rather rely entirely on things like cloud storage and cloud database. Where you're paying your cloud provider $$$ to not fail, rather than handle it yourself. Sometimes this will work for you, sometimes it won't.

nixpulvis · on May 17, 2020

I've seen a handful of applications that attempt to "scale" by going down the micro-service route in a completely flawed way. Only to end up with a tangled mess that's impossible to reliably debug. All progress halts.

There's nothing inherently wrong about your statement, just that it's still far to easy to write shitty distributed system, and so much easier to push that complexity off onto the OS or even network layer itself.

Why should I the application developer care about the way my user's data enters my DB? This should be tightly abstracted away, and traced/logged accordingly. Leave it to me the systems developer to get the details right, and share the fruits of my labor with everyone.

I can imagine no system more deserving of shared resources than network technology. Try and imagine a world without TCP/IP, do you not end up with something similar?

lowbloodsugar · on May 18, 2020

I think there are plenty of examples of teams that hit that cliff and had to rewrite. I am trying to remember specific articles. Counter-argument: I think stackoverflow hit a cliff, and remained monolithic, but that was several years ago [1] (sure they have a ton of servers, but they appear to be classic cache+web-server+database). There was another, maybe Instagram, that was monolithic and had to figure out how to shard their DB overnight.

[1] https://nickcraver.com/blog/2016/03/29/stack-overflow-the-ha...

Also, I don't think your post deserved a down-vote.

nitwit005 · on May 17, 2020

I've found people have these problems inside of their datacenter, where there is reliable low latency bandwidth, but where things might rebooted due to upgrades or maintenance.

Common example is data being pushed between systems with HTTP. Take the simplest case of propagating a boolean value. You toggle some setting in the UI, and it sends an update to another system with an HTTP request, retrying on a delay if it can't connect. This has two problems. The first is that if the user toggles a setting on and then off, you can have two sets of retries going, producing a random result when the far end can be connected to again. The second, is that the machine doing the retries might get rebooted, and people often fail to persist the fact that a change needs to be pushed to the other system.

I've seen this issue between two processes on the same machine, so technically you don't even need a network.

jxcl · on May 18, 2020

If you're dealing with multiple processes persisting data to different data stores, you're dealing with a distributed database problem, which has many more failure modes than the ones you've described here.

If there's anything I've learned across my career, it's to avoid distributed databases unless absolutely necessary, and if it is necessary, then spend a bunch of time trying to make sure you got it right. And then even after that you probably got it wrong.

jancsika · on May 17, 2020

> Untrusted: If you don’t want everything to be taken down by one malfunction you need to defend against invalid inputs and being overwhelmed. Sometimes you also need to defend against actual attackers.

Sorry, but unless your centralized alternative is only used internally by troglodytes you have to at least defend against invalid inputs.

robbrown451 · on May 17, 2020

The title reminded me of the turboencabulator. https://www.thechiefstoryteller.com/2014/07/16/turbo-encabul...

lazyjones · on May 18, 2020

Networking and distributed software are well understood nowadays, the lower productivity of web companies vs. SpaceX etc. comes from all the unstable (both ever-changing and buggy) software they need to use. Most modern software is affected by this, but the web has it worse because of security issues and because the way browsers are evolving (on purpose, one has to add, because it's a cartel of large players on the web trying to stifle competition). SpaceX doesn't get some innovative new alloy they didn't order every couple of weeks and even the games industry has fewer obstructing external dependencies (hardware vendors and their drivers being one).

At least that's my experience from about 20 years of web development (15 professionally).

at_a_remove · on May 17, 2020

Yes, I have a little Python library for managing network shares in Windows.

It has things like automatic retries that "back off" slowly, switching to cached IPs in case DNS is down, and checking to see if all of the drive-letters are full and either re-using a letter or creating a "letter-less" share. I had to develop it during a period of great instability within our network. It's ... large and over-engineered, but it just keeps on truckin'.

On the other hand, it has been quite useful going forward, so that's a plus.

I tend to program fairly defensively, in layers, right down to the much maligned Pokemon exception handling. The results don't have the, ah, velocity that is so often praised but they'll be there ticking along years later.

hn_throwaway_99 · on May 17, 2020

Hah, I saw the title "Fragile narrow laggy asynchronous mismatched pipes kill productivity" and thought it was about the pitfalls of trying to coordinate remote teams across disparate time zones.

acjohnson55 · on May 18, 2020

Yeah, my guess was it was a Slack takedown :)

carapace · on May 17, 2020

> I hope this leads you to think about the ways that your work could be more productive if you had better tools to deal with distributed systems, and what those might be.

We have tools. Promula/SPIN model checker is one just off the top of my head.

Andaith · on May 17, 2020

Just some fun with the English language:

> probably upwards of 80% of my time is spent on things I wouldn’t need to do if it weren’t distributed.

If you don't ever plan on distributing your software you can save a _lot_ of time :)

wpietri · on May 17, 2020

Funnily, I thought the headline was talking about development process, as that also describes how a lot of places (mis-)handle the flow of what get worked on.

_bxg1 · on May 18, 2020

> I also think all these costs mean you should try really hard to avoid making your system distributed if you don’t have to.

There's a point here about microservices.

emmelaich · on May 17, 2020

It's perfectly applicable to people too; one stickler for the rules or slow working in a critical role, say security officer, or change board chair can kill productivity.

ahh · on May 17, 2020

I feel so attacked right now.

RandyRanderson · on May 19, 2020

TLDR he means microservices. We now have a generation that doesn't even recall a non-MS world.

Tristan, the generation between us has created the IT world you describe. You'll probably spend the next 20 years of your career dealing with that mess. Sorry about that.

gfxgirl · on May 17, 2020

doh! I thought this was going to be about remote work.