I propose that developers are allowed to use it in production... so long as they...

neilv · on May 28, 2024

A few years ago, a submarine cable was knocked out, between our startup's MVP servers in Singapore AWS, and our networked factory stations in Asia.

In lieu of the submarine cable, something closer to a wet string was being routed over. It had such high latency and packet loss, that a watchdog timer I'd implemented on the stations was timing out.

Fortunately, the remote access we'd built into our stations (SSH and OpenVPN) still worked, albeit at slow speeds, like a 300 baud dialup, and crazy-high latency.

Having occasionally dealt with performance a bit like that as a kid, and knowing my way around Linux, it was like "I've been training my whole life for this moment."

So I just flexed the old command-line-and-editor-when-you-feel-every-byte-transmitted skill, and got the stations working, before the factory even knew about the submarine cable, saving our infinite-9s uptime.

It was nothing compared to what NASA does, but terminal animation effects would've ended both of our missions.

lm411 · on May 29, 2024

I'm very glad that TCP is so robust.

I'd take a solid 300 baud connection over a spotty cellular connection during most emergencies.

Having worked and played on flaky high-latency and low-throughput networks much of my life, I mostly visualize things in my head—as you likely mean by 'command-line-and-editor-when-you-feel-every-byte-transmitted'. Open a connection and queue your commands; wait for output. It works if you don't make any typos.

Preferably, when the connection is too slow or flaky (bad cellular), I make a local script and echo the script to be executed remotely to a file on the remote server, and then execute it with nohup - with the input to create the script and execute it coming from a pre-made file on my local machine redirected to SSH.

Bad cellular often works in bursts, in my experience. Also, redirect output to a file if needed.

This works nicely in situations where your connection may drop frequently for periods exceeding timeouts. On that note, keep ssh timeouts high. With really spotty cellular you might have the network drop for 5-15 minutes between reliable transmissions. SSH connections can stay alive nearly indefinitely if the timeouts allow for it and IP address don't change.

philsnow · on May 29, 2024

> local script [...] nohup [...] redirect output to a file

  mosh hostname -- screen -S philsnow

I don't think I've had to use this over a truly terrible connection, but mosh worked a treat a ~decade ago while tunneling through DNS from a cruise ship that charged exorbitant rates for wifi while underway, but which allowed unlimited DNS traffic.

> It works if you don't make any typos

mosh helps a bit with that too: you get local predictive echo of your keystrokes and you can't "recall" keystrokes but you can queue up backspaces to cover up your typos. Doesn't help where a single typo-ed keystroke is a hotkey that does something you didn't want, though.

firewolf34 · on May 29, 2024

https://mosh.org/

nneonneo · on May 29, 2024

If you ever find yourself needing to tunnel via DNS, try iodine: https://github.com/yarrick/iodine

lm411 · on May 31, 2024

Mosh looks handy. Thanks for sharing!

cdelsolar · on May 29, 2024

I did something similar once when something broke while I was on a plane. The plane wifi was blocking ssh (I HATE when they do this) so I opened up an existing digital ocean instance, sshed into my home computer from it via the web interface, then uploaded some large files from the home computer and did a kubectl apply. The latency was insane but luckily the file upload was fast since my home connection is fast.

It was fun to think about the path each packet was taking - from my laptop to the plane router, to a satellite, back to a digital ocean computer in the UK, to my home computer in NJ, and back. It wasn’t as bad when you think about the magic there.

neilv · on May 29, 2024

Smart, and not something someone would realize if they'd been too insulated from how things work.

Regarding planes/cafes/guest/etc. WiFi, I now usually put any "emergency remote plumbing access" on port 443 (though usually not HTTPS), to reduce the likelihood of some random non-SPI ruleset blocking us in an emergency.

randomtoast · on May 29, 2024

I think that by 2050, 443 will be the only port used by applications that require a non local connection, because the chance that some participant in the network has blocked any other port is simply too high.

skissane · on May 29, 2024

Future generations will ask - “why does every packet header have this 2 byte reserved field with a fixed value of 01BB?”

fragmede · on May 29, 2024

Or setup a web ssh client and actually run HTTPS over port 443.

beacon294 · on May 29, 2024

You can also proxy 443 out with ssh.

cmrdporcupine · on May 29, 2024

It would truly have been elite if you'd have to pull out Kermit and use that.

"line is ~300 baud but only transmitting 7 bit but we need to send this critical file.. which is in EBCDIC!"

ivanhoe · on May 29, 2024

Nothing like typing in an ssh prompt and each time you press a button on the keyboard you have to wait a few seconds for the char to appear on the screen, because it has to go to the server and back first... that't the true "blind typing" :)))

Dylan16807 · on May 28, 2024

I think my first action in a situation like that would be an attempt to install mosh from a local mirror.

throwanem · on May 28, 2024

That needs a binary to run on the remote host for protocol support, doesn't it? I wonder how long that'd take to get transferred and running, over as slow a link as it sounds like this was.

philsnow · on May 29, 2024

Since you can reach the host, presumably it has some kind of access to system updates / packages. In that case, ideally you paste a command like "sudo apt update && sudo apt install -y mosh\n" or something. You can even paste the newline at the same time and save a round-trip.

Dylan16807 · on May 29, 2024

"from a local mirror" does not involve transferring the binary over the slow link.

throwanem · on May 29, 2024

It isn't perfectly clear even in the detailed description whether any other link was available, but I tend to suspect that if anything faster had been, then it would have been used in preference to the one that actually was.

Dylan16807 · on May 29, 2024

It's a cable going between entire countries that got cut. Anything inside the country should still be full speed.

There is no link that could be used in preference. Any repo that the factory stations could access would be unusable by Singapore AWS. And any repo that Singapore AWS could access would be unusable by the factory stations. But the two ends of the link don't have to use the same repo.

lupusreal · on May 29, 2024

TERM=dumb seems like a good idea in a situation like that.

ultra_chad · on May 29, 2024

How exactly did you get the stations working? Be specific.

neilv · on May 29, 2024

"Be specific" sounds like a high-stakes interview, so I'd better answer. :)

* SITUATION: Factory reports MVP factory stations for pilot customer "not turning on" for the day, reason unknown.

* TASK: Get stations up in time for production line, so startup doesn't go out of business.

* ACTION: Determined cause was unexplained networking problem outside our control, and that was triggering some startup time checks. But that the stations were resilient enough that I could carefully edit the code on the stations to relax some timeouts, and enough packets were getting through that we might be able to operate anyway. After that worked for one station, carefully changed the remaining ones.

* RESULT: Factory stations worked for that production day, and our startup was therefore still in business. We were later advised of the submarine cable failure, and the factory acknowledged connectivity problems. Our internal post mortem discussions included why the newer boot code hadn't been installed, and revisiting backup connectivity options. From there, thanks to various early-startup engineering magic, and an exciting mix of good and bad luck, we eventually finished a year contract in a high-quality brand's factory production line successfully.

No technical wizardry in the immediate story, but a lot of various smart things we'd done proactively (including triaging what we did and didn't spend precious overextended time on), plus some luck (and of course the fact that some Internet infrastructure heroes' work had them routing any packets at all)... all came together... and got us through a freak failure of a submarine cable we'd never heard of, which could've ended our startup right there.

(Details on Action, IIRC: Assumed command of the incident, and activated Astronaut Mode manner. Attempted to remote access, and found network very poor. Alerted factory of network problem in their connectivity to AWS Singapore, but they initially thought there was no problem. Could tell there was some very spotty connectivity problems (probably including using `mtr-tiny`), so focused troubleshooting stations on that assumption. Realized how the station would behave in this situation, and that the boot-ish checks for various network connectivity were probably timing out. Or, less likely, there could be a bug in handling the exceptional situation. Investigating, very slowly due to poor Internet, found that the stations didn't have the current version of the boot code, which would've reported diagnostics better on the station display, so factory personnel might see it and tell us. Using `vi`, made careful, minimal changes to the timeouts directly on one station, in the old version of the the code on there (in either Python or Bash; I forget). Restarted station, and it worked. Carefully did the same to the other stations. Everything worked, and other parts of the station software had been done smart enough that they could cope with the production day's demands. Despite the poor connectivity, and the need to do network requests for each production item that passed through the station, before it could move on.)

8372049 · on May 29, 2024

I love the five point order-esque style, warms my military heart

darkwater · on May 29, 2024

Is it me reading it wrong or are you using two "startup" meanings in sentences very close to each other?

neilv · on May 29, 2024

I realized that, and so tried to say "boot" for one of the term senses, but missed an occurrence, which might've made it even more confusing.

jimmaswell · on May 28, 2024

I actually am getting a VT320 soon that I hope to be able to use for my terminal at work. I'll run these over it if I remember.

kps · on May 29, 2024

It won't work. It assumes 24-bit colour support, even if you have `TERM=vt320` or something else without it.

jimmaswell · on May 29, 2024

Damn, just a programming oversight? You don't need colors to get a lot of these effects. Maybe I'll fork it.

x187463 · on May 29, 2024

There’s a —no-color argument.

Example: ls | tte —no-color slide

I’ll add environment variable checks to the todo list.

mirekrusin · on May 29, 2024

It can work we just need brave soul to make it client side.