Jump Servers

phamilton · on Jan 10, 2023

I've spent the last 7 years using SSH to get onto a prod box a few dozen times. Less than once a month. When I do have to get on the box, we use signed keys that expire in 24 hours (a manager or SRE is required to sign the key if you need to get on a box).

The article is talking about the scaling issues of SSH access via jump boxes or bastions. I'd argue that a better solution to scaling SSH access is to invest in tooling that makes SSH access unnecessary for most of your team. Centralized logs, error reporting, distributed traces, etc. are fairly well solved for most installations. Interactive access to prod (e.g. running DB migrations) requires a little more investment, but tools like ECS Exec make that fairly accessible without requiring SSH access.

debarshri · on Jan 10, 2023

I agree with you. But you are viewing it from a web application centric organisation. There are lot of other types of organisations out there, for eg. a company that uses vendor products and now has to give access to servers, databases etc. to the vendor.

The scenarios you need access to the downstream resources is mainly for doing ops on them. Other things such as viewing logs can is predictable and can be exposed via some tool to developers.

xorcist · on Jan 10, 2023

None of takes away the need for tcpdump, strace, core dumps, etc. Surprisingly much troubleshooting needs to be done in production. I have yet to see views of the contrary survive contact with the real world.

The idea of expiring ssh keys seems completely reasonable, but it's also a good idea to have enough centralized access control to be able to actually give someone temporary access allocations. Key validity is evaluated locally, against system time.

phamilton · on Jan 10, 2023

tcpdump and co do happen occasionally, but less than I had expected. This isn't a small installation. We support around 30M active users and routinely push 20k qps in web traffic. We run on a few thousand vCPUs in EC2.

It's a pretty big architectural philosophy shift. All our production workload runs on spot instances, unhealthy hosts come and go quickly enough and our traffic rebalances fast enough that we just don't spend that much time debugging low-level issues. Core dumps get shipped to S3. We do continuous profiling with Google Cloud Profiler. There's a lot of tooling required but once that investment is made things run very well.

vasco · on Jan 10, 2023

DB migrations should definitely not be done interactively and definitely not in an SSH session. Write them beforehand, have them reviewed with the rest of the code, and have your deployment process run them.

PedroBatista · on Jan 10, 2023

Agree, but in the real world that works 100% most of the time.

ranting-moth · on Jan 10, 2023

100% most of the time. One of those rare cases when I laugh out loud at a HN comment. I'm definitely stealing this phrase!

phamilton · on Jan 10, 2023

Just to clarify our process: DB migrations are part of code review, they run as a script, but we don't have that script run automatically as part of CD. We've been bit by enough migration surprises that we require someone watching and able to interrupt and cancel the migration if needed. But that's the extent of action required. Run this command, Ctrl-C if necessary. Definitely not YOLO'ing in a psql in prod.

vasco · on Jan 10, 2023

That's a really good process that scales forever even if you have a good person for it. Once you want to have more people do it, I find it easier to have hard-coded timeouts that cancel and rollback if the migration will be locked. I've also done auto rollback in these cases.

zie · on Jan 10, 2023

It's still all access control, I don't really see a difference. Yes SSH access could perhaps give you access to more things, with less fine-grained permissions, but it just depends on the replacement(s).

The overall point should be, figure out what your needed access control permissions are and then find and use tools to meet those needs. SSH might be the tool, it might not be.

Shifting from tail -f <logs> to any web UI doesn't remove the access control for logging, it just shifts it into the fancy web UI. That may or may not be better, it's just different. One could certainly argue the fancy web UI is a better UX, but that's a totally different reason for selecting that tool or not. Access Control happens either way.

yjftsjthsd-h · on Jan 10, 2023

Er, so most of those supposed problems sound like you could fix them by just using unprivileged users jumping through the jump server via `ssh -J`/ProxyJump? Or a VPN, which is pretty close to what it sounds like the product they're trying to sell is anyways. And if you're going to try and sell a Teleport competitor, you really should do a better job convincing me that it's going to actually be secure in the first place. I don't see source code or audits anywhere on this website.

POPOSYS · on Jan 10, 2023

But it is written on the website: "without security compromise" - so it must be secure!!!

Also there is a lot of low contrast text on that website - this shows that they really know what they are doing and you can trust them. I am sure it is a very good company and you should give all the login credentials of your org to them and also install their binary on all servers.

Isn´t "trust in company" the reason why you are using Linux servers?

mbreese · on Jan 10, 2023

Even when using a jump server, I've solved the issue of revoking access in the past by using LDAP to control access to servers. Instead of adding user accounts directly to server, the account information is stored in LDAP -- including public SSH keys. If you want to revoke a user's access across the entire infrastructure, then you can do so in one fell swoop.

I also have set this up to restrict SSH access to particular hosts based on the LDAP record.

The problem they are describing isn't a problem with Jump servers... it's a problem with distributed authorization.

debarshri · on Jan 10, 2023

I haven't seen many organisations actually set SSH PAM via LDAP or another delegated authentication system.

You are right about the problem statement. It is distributed authorization problem. And it is a very hard problem to solve or visualize for a fast moving company unless it is a problem.

POPOSYS · on Jan 10, 2023

Do you have some data about how many orgs actually set SSH PAM via LDAP?

Where can I see this data? I would like to check your statements.

We are on the internet, so would you please like to add the source of knowledge to your statements - it is a very basic and good feature called "URL", please use it!

Or do you just want to say "I have not seen many orgs with SSH + LDAP in my career because I have never worked in one and from that I conclude that the whole world works like this"?

ilyt · on Jan 10, 2023

Most orgs are incompetent. We had same discussion with many of our clients, where their requirement was stupid shit like "change password every month" while we had to negotiate that no, we don't even use passwords in the first place, we use hardware tokens for SSH keys.

And I'm talking about few big local banks, where accounts on Red Hat boxes are still created by some ops dude manually according to some docs.

debarshri · on Jan 10, 2023

I don't think neither me nor anyone reading that statement would come to a conclusion that the works the way I suggested in that argument.

I can talk about me interviewing and empirical data of talking to dozens of companies from seed to series B and how they have been managing access to servers. But I won't, I would rather urge you to do basic trend search either on google or your favorite platform for SSH PAM via LDAP or SSH LDAP and see it for yourself where the world is heading [1].

[1] https://trends.google.com/trends/explore?date=today%205-y&q=...

mbreese · on Jan 10, 2023

Oh, I’m sure it’s super rare. It’s actually quite easy to setup, but I’m not sure many people bother with the setup because LDAP (I’m not counting Active Directory) in general isn’t all that common. I know this just from the rarity of articles posted about getting it configured.

But once you do it, it’s something that’s easy to keep using because it’s so useful.

My favorite was setting up LDAP in combination with a jump host where I had a special program for the SSH command shell (like prgmr.com). I had it setup where the use could authenticate with a password, but then upload an SSH key from the custom shell.

debarshri · on Jan 10, 2023

I am not debating the usefulness of LDAP integrated with SSH. I am agree with you.

pnutjam · on Jan 10, 2023

Based on the interviewing I did last year, the clear trending solution, for enterprise, is Cyberark. I saw that all over the place for root password management.

debarshri · on Jan 10, 2023

Cyberark [1] and delinea [2] are definitely leading enterprise solution right now. Okta too has an offering in this space but I haven't seen it used widely yet.

But there are quite some solutions in market at this point that are in growing trend. You would find teleport [3], strongdm [4] in high growth companies where as Adaptive.live [5], Idemium.io [6] and now hoop.dev in the early stage to series B.

[1] https://www.cyberark.com/

[2] https://delinea.com/thycotic

[3] https://delinea.com/thycotic

[4] https://www.strongdm.com/

[5] https://adaptive.live/

[6] https://idemeum.com/

mbreese · on Jan 10, 2023

This isn’t root password management. Or at least, it shouldn’t be. Users shouldn’t have root passwords for end devices. This is about controlling access to remote servers and/or sudo access to those servers. None of which requires the root password on the remote server, unless I’m missing something. Is this for more ephemeral keys?

trey-jones · on Jan 10, 2023

I set up LDAP with sshPublicKey extension in 2021 together with some scripts to configure new servers to use it and a small command line tool to add and remove GIDs per user (across any infrastructure). It's working great. The biggest effort was learning to manage LDAP, which is a bit anachronistic, but the payoff is worth it I think. Even with <20 users, managing public keys across all of our infrastructure was not sustainable.

ilyt · on Jan 10, 2023

The problem the article is describing is frankly problem with lack of good configuration management. Hell, you can use SSH keys to authorize via sudo and use hardware token to store those keys, sidestepping problem of "user left their id_rsa somewhere"

SomaticPirate · on Jan 10, 2023

Google Cloud’s IAP tunneling has basically made this a non-issue. All access is logged and its extremely easy to get around without any public IPs.

Also easy to port-forward instances to your local machine.

I’m always surprised that other clouds don’t have this.

drowsspa · on Jan 10, 2023

Isn't it basically AWS SSM? you can even configure it in the .ssh/config

debarshri · on Jan 10, 2023

I am a bit confused about this product. I had once seen a product called Runops [1], the customer list and product testimonials are exactly same as this product Hoop.dev [2]. [1] https://runops.io/

[2] https://hoop.dev/

renewiltord · on Jan 10, 2023

At this moment, 6 minutes after your post, the top of runops.io reads:

> Announcing a new phase of Runops. Launching hoop.dev

with a link to this https://hoop.dev/blog/launching-hoop/

berkle4455 · on Jan 10, 2023

AWS Session Manager (aws specific) or Tailscale (anything) has solved this problem for me

threeseed · on Jan 10, 2023

Jump hosts seem like an anti-pattern in the era of AWS SSM and Tailscale.

It is far too easy to misconfigure network policies and grant them access to infrastructure that they shouldn't.

And with Tailscale you can run the agent within SaaS products like Github Actions or Terraform Cloud to securely manage their access into your systems.

romanhn · on Jan 10, 2023

I believe you still need a bastion host to query a database, for instance, unless you want to set up SSM on existing hosts - my current project is fully serverless, so I had to set up an EC2 instance to serve as the bastion. The beauty of SSM is that the host can be fully on a private subnet, not exposed to the wider internet as commonly suggested.

threeseed · on Jan 10, 2023

You can setup a Tailscale traffic relay node which allows you to access any services within the defined subnets.

https://tailscale.com/kb/1019/subnets/

So that way you can query that database directly and not using any SSH tunnels.

berkle4455 · on Jan 10, 2023

Yeah, if you’re using managed services within AWS you need a relay host. It doesn’t need to punch a hole to the outside world (like a bastion host) but it still needs some manner to allow tailscale (an ec2 box) to route to those services.

SSM is a cleaner choice on AWS.

theteapot · on Jan 10, 2023

> anti-pattern

Is wearing blue denim jacket and jeans an anti-pattern? Is vegemite an anti-pattern? Can't we just say "obsolete", or "bad idea"?

muppetman · on Jan 10, 2023

You should "reach out" to him/her and sort this out.

[I've always found "reach out" to be hilarious, (what, with my arms?) but it's so common now I'm the oddball]

deathanatos · on Jan 10, 2023

Oh, the one that kills me now is "let's take that offline" used in a meeting to refer to "let's move this to a Slack conversation".

(Or, more cynically, "This problem is thorny and I wish to bury it rather than put in the time and work to fix it.")

theteapot · on Jan 10, 2023

> I've always found "reach out" to be hilarious, (what, with my arms?) ...

Using the expression "reach out" is such an anti-pattern.

kevsim · on Jan 10, 2023

> Is wearing blue denim jacket and jeans an anti-pattern?

The Canadian tuxedo is clearly a best practice.

dspillett · on Jan 10, 2023

"anti-pattern" suggests more than just a singleton bad idea, but one that is very common and sometimes even suggested.

nunez · on Jan 10, 2023

Agreed. Tailscale is absolutely fantastic and unbelievably easy to use while being significantly more secure than a jump host.

ilyt · on Jan 10, 2023

> Jump Servers must be able to reach to a certain private network and this requires specific configuration for each environment;

Yup. Use VPNs and firewall ACLs. Jump servers are leftovers of bad practices where good practices were too hard to implement.

> Burden of managing SSH keys of users throughout all nodes. Rotation is required when someone leaves or enter the organization;

That is extremely trivial if you have (you should) any sensible configuration management in place. We just store them in LDAP with user data and distribute where neede (gitlab, servers)

> Role management requires managing sudoers files, making sure file system permissions are properly configured and users are within their proper groups;

Ah yes, managing a text file, so fucking hard /s

> Nodes must be updated with the tooling necessary to interact with internal services.

>Keep a list of updated services (DNS) available to interact with it

see the point about CM

> Usually, infrastructure enginners are a scarce team and keeping all these components updated are hard to tackle. Over time, these nodes will onboard more users and tooling, which will increase the complexity over managing these resources.

Which is why you write it once and use automation. I don't think we touched our sudoers or ssh key management module in years, it was written once then had some small changes but that's about it

BohdanPetryshyn · on Jan 10, 2023

AWS Session Manager is great. However, to connect to an RDS/Aurora/Elasticache instance, you must still create an intermediate EC2 instance to run SSM commands against.

We use Basti (https://github.com/BohdanPetryshyn/basti/issues) to set up and manage the jump host. The tool automatically starts/stops the instance, which is excellent for irregular access.

migf · on Jan 10, 2023

This is what I was looking for. It seems weird to always have a jump box sitting up, waiting for someone to come mess with it. Part of the tooling should be to spin up / boot an ephemeral instance.

TheHappyOddish · on Jan 12, 2023

It's probably also worth mentioning that you _build_ Basti as well as using it, otherwise you may come across as shilling.

That said, thanks, Basti looks very useful!

fpanzer · on Jan 10, 2023

I'd rather patch a bunch of openssh servers than a bunch of proprietary software agents that essentially keep reverse ssh tunnels open for me. Or did I miss something?

vbernat · on Jan 10, 2023

Use of SSH agent forwarding is dangerous as it allows an attacker to gain access to more key materials to access more servers. Using it casually in an article about SSH security is a bit worrying.

pritambaral · on Jan 10, 2023

Not with the confirm option of ssh-add. I've had agent forwarding on for every host (trusted and untrusted) for a decade now, without worry, because my ssh agent confirms with me each use of any ssh key.

remram · on Jan 10, 2023

Interesting. However in practice, I don't ssh-add my keys, they get loaded on first use by the ssh client. Is there a way to make ssh load keys into the agent with that option set?

pritambaral · on Jan 11, 2023

Yes: https://man.openbsd.org/ssh_config#AddKeysToAgent

Source: I don't ssh-add my keys either.

_joel · on Jan 10, 2023

Umm what's wrong with ssh -AJ ?

larschdk · on Jan 10, 2023

You need to fully trust your target server, so you need to manage your known_hosts diligently and make sure you trust the host you connect to. If you just accept the host key without checking, you allow any host to use your SSH key for authentication. Any SSH server can accept your private key as authentication. Also, if the target host is infiltrated, it can use your private SSH key for authentication elsewhere without your knowledge.

tpmx · on Jan 10, 2023

Also, if the target host is infiltrated, it can use your private SSH key for authentication elsewhere without your knowledge.

Not normally, right? Isn't that the whole point of public-key cryptography?

See e.g. https://www.theregister.com/2016/01/14/openssh_is_wide_open_... for when a vulnerability in the openssh client made this possible back in 2016.

ilyt · on Jan 10, 2023

Explanation: IF you use SSH agent and have ssh options set up, you get a channel thru SSH where you could use your SSH agent on remote host.

Good side: You can then chain authenticate and say use same SSH agent to authorize sudo, hence getting sudo without password, just secured with your private key. Add hardware token to store said key and you're pretty secure.

Bad side: .... so can any other process with right permissions on the system, therefore compromised system can try to impersonate you.

One way to mitigate is is to make sure servers can't talk to eachother to via SSH, if user can access A and B but A can't access B and vice versa the escalation is limited.

Other way is to set agent to ask every time something wants to use the key which half-solves it (attacker would need to time the attack to occur right before "valid" use") but from what I remember it still doesn't show you what is trying to use your key (at least for gpg-agent's ssh agent functionality) so it's kinda not that useful of a feature.

tpmx · on Jan 10, 2023

Ah, thanks. Missed the agent context.

deathanatos · on Jan 10, 2023

The point of ssh-agent is to use the private key to authenticate yourself.

If you forward it to a remote host, you are granting access to the remote to do that (i.e., the remote can authenticate as you), thus, you must trust the remote.

From the docs,

> Agent forwarding should be enabled with caution. Users with the ability to bypass file permissions on the remote host (for the agent's UNIX-domain socket) can access the local agent through the forwarded connection. An attacker cannot obtain key material from the agent, however they can perform operations on the keys that enable them to authenticate using the identities loaded into the agent. A safer alternative may be to use a jump host (see -J).

_joel · on Jan 10, 2023

If I'm getting host changed violations and I know the infra's not changed, I know there's somethng wrong already? I'm not getting what this provides.

jesprenj · on Jan 10, 2023

Why even use -A? Isn't -J enough?

_joel · on Jan 10, 2023

Well if you want to pass your keys to the endpoint without keeping them on the jump hosts, yea, it's kinda useful.

deathanatos · on Jan 10, 2023

You don't need -A with -J.

You need -A if you're running ssh on the intermediate host, i.e., on the jump host in this case. But -J doesn't run ssh on the intermediate host, it more or less runs two ssh's on your local host. The first from local to the jump, the second from local to the eventual target, through a tunnel forwarding the connecting through the jump[1]. But because all the SSH processing is always local, it always has access to the local ssh-agent: you don't need -A.

And, as someone points out upthread, you need to fully trust the remote machine to pass -A. You usually shouldn't, in most cases that I think people using jump hosts in corporate settings would be interacting with jump hosts: it permits other employees to impersonate you, by abusing your forwarded ssh-agent, if they have sufficient access on the jump host.

[1]: -J in ssh(1) documents this

_joel · on Jan 10, 2023

til, cheers :)