There's a lot of speculation about why, with the answer almost certainly security / exploitable (or backdoor), and I'll just throw an extra little tidbit in:
atop seems to run persistently as root, which may be the reason for preventing it from running/uninstalling.
the netatop part of atop installs a persistent kernel module, netatop.ko, as part of its installation. The module hooks netfilter to be able to monitor all traffic.
If there's an exploitable flaw in the kernel module, this would be a max-severity CVE.
netatop _also_ runs a persistent daemon, netatopd, which I believe from inspecting the source runs as root.
The article's language about uninstalling it kinda sorta makes you think one of these three parts is in some way exploitable or backdoored -- any which way it's a privileged process, and one that's monitoring network traffic.
(I'm not sure if netatop is installed by default on systems when you install atop, per czk's comment below)
When we tried deploying it we had netatop crashing kernels with a use after free on a linked list, based on the stack traces and kernel dumps. Every box we trialed it on started going down multiple times a week.
I'm not familiar with atop but the website mentions netatop is optional and what I've found suggests you have to manually install it. Do you know if any distributions/packages install this by default alongside the atop install?
This is a good question - I'm not sure. The rpmspec doesn't seem to install it, so perhaps it's not quite that bad. The atop program _itself_ runs persistently, though, so, uh, still bad. :)
I vaguely remember an old bug in atop, leading to a very unusual consequence.
Atop will do an invalid memory write and crash with a segfault. But this writing is performed on a memory page mapped to a hardware timer. Despite not being able to write into that page, just touching it somehow changes how this hardware timer works. Then, the OS detects that this timer is inaccurate and switches to a different clock source (which you can see in /sys/devices/system/clocksource/clocksource0/current_clocksource). As a result, every call to clock_gettime becomes slower, and the system becomes slower as a whole until it restarts.
In short, a segfault in atop leads to the whole system's performance degradation. But this was found around maybe 7 years ago.
Yeah, from a rando this would be just bad vagueposting but Rachel is absolutely someone who could know about a very good reason why we should uninstall atop but be unable to legally say why. I would heed her warning.
I would disagree and still say that this is bad vagueposting. It doesn't matter how reputable the source is: if you say "don't do X" but don't give a reason why, I'm not inclined to listen. Granted I don't use atop anyways, but I don't think a vague blog post - even one from a respected person - is sufficient justification to change what software one uses.
This seems completely backwards... if someone says to do something but doesn't give a reason, then the ONLY thing to base your decision on whether to listen is their reputation and your trust in them.
If someone I trust tells me to trust them, I will.
First, I decided I am going to avoid atop. Even if Rachel would be wrong, it doesn't hurt not to use some specific software I don't depend on.
> If someone I trust tells me to trust them, I will.
Huh? When I trust someone, then I trust already and there's no need being told to trust. When I don't trust someone, then I run away when being told to trust. Hell, if someone tells me to trust them, it's a red flag and I drop the trust.
Your believe seems to hinge on the idea that there are zero situations where someone could need you to trust them but don't have the ability to tell you why.
I think there ARE some situations like that, especially when the conversation is public like this. It is pretty easy to think of a lot of good reasons why Rachel can't explain why you need to trust them in this situation. I think saying, "I can't tell you why, please trust me" is a perfectly reasonable thing for someone you trust to say, and I would absolutely listen to them if they say that.
That seems.. whatever the opposite of pragmatic is, but not in a good way, as in “principled”. There are very good reasons one would be required to be vague in a situation like this, but still know about a very serious issue.
It’s like seeing a road sign that says “danger ahead” and ignoring it because it wasn’t very specific. It’s just.. not a sensible move.
Yeah, this is the behavior of the stuffy administrator in an 80's sci-fi comedy, minutes before the horror the heroes are trying to warn him from is unleashed.
The only question left is "who is going to deliver the quippy one-liner afterwards?"
"Don't go down 6th street now" means very different things depending on whether it comes from your buddy, or the bomb squad.
> if you say "don't do X" but don't give a reason why, I'm not inclined to listen.
I hear ya, but, there are sometimes valid reasons people can't say things; and this may well be one of those times. You have every right to do as you like, but it's not necessarily smart now that you've been warned by a respected professional.
Lol, this is going over my head a bit, but in case I was misunderstood, I had a role once that was secops adjacent but not strictly "security," just ended up doing a lot of favors for a security team. There was a recommendation that was super low hanging with extremely high impact, but the sec team determined it was "too low risk to action on without better reasoning" or something, they got hit pretty hard by it and I was involved in some triage, shaking my head the entire time. Very similar reasoning. "I need a bulletproof reason to update or change something" is like, to me, not a productive attitude.
Ha ha, "too low risk to action ..." When I was younger I would fight those valiant fights, now only if actual end users would suffer irreparable harm, I give me people my advice, but when the pedantically push back and MAKE YOU MAKE THEM UNDERSTAND, Nawww, I told you what I think and why, I am done.
My comment condensed an exchange that has happened enough times to be a trope. You try to discretely get someones attention to alert them about an opsec issue, you then whisper and they basically look right at the target and then yell back at you WHY ARE YOU WHISPERING. Nawww, you are on your own now.
I get this a lot with AI now, I tell people what is a current capability and what the curve looks like, I send them a gist of those capabilities and they want to get into some goal post moving debate. I don't engage. I don't care about being right, or being taken seriously. The funny thing is, sometimes when they come back months later with a, "hey it turns out ..." that they want me to say I told you so, or glad you turned around. I literally don't care.
I and the world have suffered so many fools, we have to stop giving them the time of day, for ourselves. They don't realize that they have truly lost when people stop giving them advice or criticism. You know the relationship is over when the other party has zero interest in even engaging in any capacity.
Being a system administrator isn't a scientific endeavour where the goal is to seek truth. It's a practical endeavour where the goal is to reduce risk of bad things happening. Sometimes, that means blindly following the advice of reputable people who hint at severe vulnerabilities in a piece of software, even though they can't disclose enough to prove that a vulnerability exists yet.
Keep having atop installed until you get absolute proof that it can be exploited, if that's what you want. But the organization whose systems you're administering might not like the fact that you were forewarned and didn't act.
Skimming through the code (particularly from past issues and PRs) highlights a number of things that look sketchy to me at first glance (in a coding practices way, not in a malicious way) - my gut feeling is that someone smarter than me going through much of this with a fine-toothed-comb would likely find something exploitable.
It could also be any number of other things too, like it's severe enough that the author feels its responsible to wait for mitigation efforts before disclosing anything about the issue that could lead to it being exploited.
"screams NDA" is not the same as "might be covered under an NDA". And in any case, very likely the said company has already taken mitigative action like removing atop already.
At a previous gig, atop was running fleet-wide (> 1k servers) as sort of a resource monitoring tool of last resort, in a similar way as is described in this article[0]. I left a few years ago, but if memory serves, this thing was baked into base-image Puppet configs, and proved itself handy in past investigations of hard-to-find problems. If this turns to be real threat, I wouldn't be surprised if the blast radius for this is substantial.
Why should one trust her? What's her full name and the reason for deferring to her expertise?
And yes I'm aware her posts have made it to the top of HN many times in the past. That I've seen, they've all been unhelpful vague-posts like this one.
Maybe she's actually a real expert I should be listening to! But layer upon layer of vague "if you know, you know" do not make that case.
> system ("gunzip -c %s > %s", tmpname1, tmpname2")
tmpname2 is hardcoded as "/tmp/atopwrkXXXXXX", so that's fine. tmpname1 is '$irawname.gz'. '$irawname' is set by the '-r' flag.
So, presumably if you can get the rest of the code to play nice and get you there, you can escalate from having shell access to run atop, to having shell access. Oh, I guess that's nothing.
Anyway, still a really bad use of system + user-controlled input, don't do that.
> Also tmpname2 could be symlinked to /etc/passwd before it is unlinked..
Yeah, sure, but only if you run atop as root, otherwise it'll just get a "permission denied", and if you can run atop as root with whatever flags you like, you might as well just run 'rm' instead.
It's not a suid binary, so while it's bad code and a smell, I don't think the TOCTOU is a security issue in how it's commonly run (i.e. as an interactive CLI running as your user).
The TOCTOU is relevant (without suid) if someone can quickly make the right prediction of the tmpname2 value that's generated by the PRNG used by mkstemp, and create a symlink with that value before gunzip is executed. After calling mkstemp, the code should use the returned file descriptor, and thereby eliminate all TOCTOU risk. However, on (perhaps?) most devices that would realistically use atop, the PRNG works well enough that that prediction would fail.
Eh? Calling system() for a binary without a path? And why system() using execl() in the first place, when you could do something using execve() without a sh inbetween instead?
Even w/o an exploit this can be prettier and more secure.
We're not disagreeing. Even if there's no 'sploit there, people have spaces in their directory or file names, and it's kinda nice for your tool to work with those, so obviously you should be using an execve variant to pass arguments properly.
I assume the reason for the incorrect system call is that doing a shell redirect ('>') does actually look prettier though.
There's a bunch of interesting recent commits from someone without a public signing key.
Removed excess checks before free()
Fixed possible wrong result bit shifting on 64bit after left op type overflow
Fixed possible wrong result bit shifting on 64bit after left operand type overflow
Fixed possible access out-of-bounds items array better check index before using
Could be legit or flawed. Or even fixes for the possible flaw.
1. Unsigned commits is the norm. It's weird to sign git commits. It's weird to upload your gpg key to github. gpg is a nightmare mess.
2. They aren't introducing the bug, those are all unreleased commits, so advice to "uninstall now" for something no distros are shipping would be silly.
3. The diff is trivial, you can read it and figure out if it looks like they're fixing a real exploitable thing. The answer is obviously no.
I stopped using atop when I found it installs several hooks which automatically run code as root and deposit files around the filesystem, including a "power management" hook.
Do you have any references that describe this behavior? That sounds like exactly the kind of thing that could conceal a backdoor of the sort this seems to be warning about.
Except, she kinda did disclose already. Seems a bit strange to circumvent standard embargo practices, only to publicly hint of an exploit but not give any details.
Maybe because it is a non-essential tool with many alternatives available? It could also be because there are already illicit parties using atop to hack companies? Still, publishing a CVE with the specific exploit and a recommendation to fully delete atop would be better. Even if there is no patch available.
docker images -q | xargs -I{} -t docker run --rm {} sh -c 'type atop && echo "DANGER!!!"'
May produce false negatives, because container images tend to be stripped down compared to desktop and server releases. Probably won't produce false positives, so use as a minimum.
I'd be surprised if any large distros shipped it in a stock configuration.
I typed 'atop' in my Linux Mint 22.1 laptop/desktop, says it's not found but can be installed. So I think Linux Mint is in the clear, I tried my Ubuntu 24.04 server and same thing there as well as my proxmox home lab instance. I checked that Repology link and I did see Ubuntu, but I guess that is for Ubuntu desktop but not sever edition?
ps. If I said anything wrong, please correct me. I'm a linux newb who jumped from Microsoft's world after getting fed up with their Win11 BS. I'm still learning quite a bit about linux daily.
"Ubuntu, Debian, Red Hat Enterprise Linux, Fedora, Linux Mint, SUSE Linux Enterprise, CentOS, Manjaro, elementary OS, Gentoo, Oracle Linux, and Pop!_OS" ~--Google's AI.
You jest but I think it can happen. Grok could be responsible for tagging the output of all the other AI's as "Potential Misinformation, Disinformation per the Ministry of Truth".
The data source seems pretty obvious here. It doesn't know much about atop, but your question has led it to believe that it's something available on Linux distros, so it spat out a likely list of Linux distros based on the weighted average of linux distros listed by other projects in its training set.
This. Not only that, I don't know of a single person (IRL or online) who used atop, like, ever. In fact, this is the first time I'm even hearing of atop.
IIRC, most folks went from top -> htop -> glances -> various btop variants (bashtop, bpytop, btop++ etc)
atop can record to a file and then be replayed in the future. Sometimes a node is so FUBARed that it won’t even emit metrics so atop can sometimes save your ass when it records metrics to disk.
I used atop sporadically at Facebook to debug performance issues. I actually learned about it there, was I think on all the machines. This was bunch of years ago, so not sure if it still is there fleetwide, but it was really helpful to get a past granular view of what happened on the machine on some exact second few days ago where error rate metrics indicate a particular host was struggling.
I'm genuinely stunned to figure out there's a whole set of lore of *tops.
I'm not sure I'm being rational from a textbook security perspective, but, it'd take a whole lot of tangible reward to get me off the binaries supplied with the system.
btop gives you a more holistic overview of the system: individual disk stats, network stats, graphs of mem/cpu/bandwidth usage over time, etc.
I think it's handy having everything on one screen, but if you know your way around all the individual builtin tools for these, more power to you, no reason to change.
First of all, btop is included in the default repos of most Linux distros, so you don't need to worry about security. This also applies to htop and glances by the way.
In terms of tangible feature benefits, btop also offers disk I/O stats, network throughput stats, partition usage, and even GPU usage (if your distro compiled it with GPU support).
In terms of "nice" stuff that's non-essential, the overall UI is a lot more user-friendly and in many ways, better (subjectively). Eg there are visual graphs for various metrics, you can filter process names by substring, get detailed stats of a specific process, see the tree view of all the processes, easily show/hide various parts of the UI (eg you can focus solely on the process list if that's the only thing you're interested in).
There are also some distinct advantages the UI offers easier to send specific signals to processes. Eg in btop I can just select SIGSTOP from the menu, whereas in top, I'd need to remember or lookup the numeric equivalent (eg 19 for SIGSTOP).
Other top alternatives also offer similar feature sets. Glances also shows the most recent warning/errors from the system logs), as well as container resource usage which would be handy for some folks.
Well that ansible job was quickly ran, buhbye atop. Very concerning coming from Rachel and not some rando. I know a number of fortune 5's that use atop for troubleshooting as well. So as others have commented if you had this baked into images or loaded with puppet etc than now may be the time to cleanup.
Repositories controlled by accounts based in mainland China and Russia are always a risk- it's too easy for a dictatorship to force something to happen even if the authors themselves are trying to act in good faith.
> it's too easy for a dictatorship to force something
We really need to get rid of this mentality. Australia has laws that allow undisclosed, compelled, software updates. Verbally by ministers, but written (confidential) changes can be requested by federal agencies. Many western countries have followed to various degrees. There's no stable trusted government that doesn't want its fingers in your code.
I agree it's not good but being realistic: I'd be far less worried about the Australian government stealing/selling customer data, using my servers in a botnet, using my servers to spread malware.. etc.
Mainland China, Russia, North Korea, all have proven track records of doing these things and having corporate espionage rat lines: https://www.youtube.com/watch?v=y27B-sKIUHA
And from outside, it certainly seems like those “good guys” are edging closer and closer to a malicious dictatorship recently. (If you don’t see that from inside, try asking a trans person. Or a non white person. Or a Canadian. Or a woman who wants reproductive health care.)
Where did you see signs of control by Russia or China? The project's github repo states that the project currently has one maintainer, and that maintainer has a very Dutch name and a .nl website.
What about the fact that software is hosted on US/German/Australian/whatever else platforms and infrastructure, what's different with that, technically speaking? The fact that a majority of software we rely on is hosted on GitHub, isn't that scary the same way that a repo owned by someone in a other country is scary?
Does a government need to openly act in a specific way for there to be a risk, or is this perceived risk due to a media bias?
GitHub has a lot to lose if it was leaked that they were knowingly facilitating backdoors behind the scenes- many pay for the convenience and trust.
By the same standard, what are the repercussions for these random fly by night accounts? Just make a new account and try again on an existing project or fork / tweak / rebrand another project.
Steam, VSCode, PyPI, NPM... it would ruin those platforms overnight if they were putting in backdoors themselves.
Reputational loss isn't a good argument either, because what the comment I replied to said is that repositories in control of people in e.g. Russia are dangerous. That implies that a Russian or Chinese maintainer of popular open source software is not safe, whereas someone employed by an American company is.
However, maintainers have a reputational loss risk, just like someone working at a company does, no?
And, of course, GitHub could just replace the file you're served when you download a file from it, and then blame a hacker, a rogue employee, or deny it happened. That is just as well technically possible as any other entity being forced, by their government, to do something, no?
And, of course, if a govt forces you, your reputation is not the thing you're worried about.
I understand your argument, but that seems like it's a different argument from the one I was disagreeing with.
These are all good questions where the answer is usually something along the lines of solving them with reproducible builds and Nix, which sounds good until someone points out where the Nix ecosystem gets its funding.
Again, what is the issue with funding? If I get funding from the German government, am I more trustworthy than someone who gets funding from the Hungarian government, like, really? Is there a real, tangible risk here that does not exist with other governments?
Of course the US government isn't scary if you're in the US, but not everyone is, and governments change.
I'm asking not whether it feels like there's a risk, I'm asking whether, factually speaking, there is a significant enough risk that outweighs all else. Is there?
1. it consumes too much systems resources. So its net-negative impact on the system under observation
2. it's misleading and leads to false diagnoses of situations under review
3. she's under an NDA of some kind related to a CVE or some other high class risk which will come out in due course but she felt a burden to stop people being exposed to risk.
4. I can't count and there are 4, 5, 6 other reasons but these 3 are mine.
I'll go with number 3. She didn't just say "don't run", she said "uninstall". That doesn't sound like "misleading" or "uses too much resources". It sounds very CVE-ish.
That's what it smells like but this is still a weird way to disclose something like that. I imagine some people with free afternoons are taking a stab at auditing atop's PR history right now. I'm not personally up to the task, but the fact that the top 3 contributors other than the original author are ByteDance employees might cause some to jump to conclusions.
Does atop have any legitimate need to connect to the network? I can’t think of any legitimate accidental security holes that might show up in something like atop, but then, these utilities often have funky features I don’t know about!
1) is possible because it uses some interesting options like nice/mlockall/changing its oom score so if the atop process went out of control your box would probably be fucked.
Very simple. From a state level, if they are trying to compromise a system, get persistent access, already have access, but need to escalate, then atop is a solution if it's already on the system.
Is there a mechanism where this sort of advice can flow through security teams to everyone (assuming it is about security) without dropping the details. How are zero days dealt with?
I’m actually surprised I didn’t have it installed, what with all the packages I check out just through sheer curiosity. Thanks Rachel! I’ll avoid it in the future.
Linux newbie here. Jumped into the Linux world after getting tired of Microsoft's BS with Win 11. Running Linux mint on my laptop and desktop. Looks like 'atop' is not installed by default, but regular 'top'. Anyone know which distros I should be worried about that have it? Also I have been dabbling with proxmox, I checked and looks like 'top' is the default there too.
You're probably not running either unless you know what they are. Top is an equivalent of windows taskmanager, most often to used identify "top" processes using memory/cpu (and other resources) and only ran briefly. Atop is a different long-running version used to create logs of the same data to understand trends.
> [...] and only ran briefly. Atop is a different long-running version used to create logs of the same data to understand trends.
atop is also normally only ran briefly. It has an optional mode (enabled by default in some, but not all distributions) in which it runs as a service and saves a snapshot of the system state every few seconds; atop can read and show these snapshots when ran briefly.
atop seems to run persistently as root, which may be the reason for preventing it from running/uninstalling.
the netatop part of atop installs a persistent kernel module, netatop.ko, as part of its installation. The module hooks netfilter to be able to monitor all traffic.
If there's an exploitable flaw in the kernel module, this would be a max-severity CVE.
netatop _also_ runs a persistent daemon, netatopd, which I believe from inspecting the source runs as root.
The article's language about uninstalling it kinda sorta makes you think one of these three parts is in some way exploitable or backdoored -- any which way it's a privileged process, and one that's monitoring network traffic.
(I'm not sure if netatop is installed by default on systems when you install atop, per czk's comment below)