It should also be mentioned, Linux Load Average is a complex beast[1]. However, a general rule of thumb that works for most environments is:
You always want the load average to be less than the total number of CPU cores. If higher, you're likely experiencing a lot of waits and context switching.
On Linux this is not true, on an IO heavy system - with lots of synchronous I/Os done concurrently by many threads - your load average may be well over the number of CPUs, without having a CPU shortage. Say, you have 16 CPUs, load avg is 20, but only 10 threads out of 20 are in Runnable (R) mode on average, and the other 10 are in Uninterruptible sleep (D) mode. You don't have a CPU shortage in this case.
Note that synchronous I/O completion checks for previously submitted asynchronous I/Os (both with libaio and io_uring) do not contribute to system load as they sleep in the interruptible sleep (S) mode.
That's why I tend to break down the system load (demand) by the sleep type, system call and wchan/kernel stack location when possible. I've written about the techniques and one extreme scenario ("system load in thousands, little CPU usage") here:
The proper way is to have a idea of what it normally is before you need to troubleshoot issues.
What is a 'good load' depends on the application and how it works. Some servers something close to 0 is a good thing. Other servers a 10 or lower means something is seriously wrong.
Of course if you don't know what is a 'good' number or you are trying to optimize a application and looking for bottlenecks then it is time to reach for different tools.
Anecdote: In 2022, while visiting San Francisco, I had the chance to explore the campus. Wandering through the quiet, empty halls of the summer buildings, I was just about to leave when I unexpectedly came across Knuth's office [1]. I had to do a double take—it was surprisingly small for someone of his stature. Yet, in a way, it felt perfectly fitting, a reflection of his unassuming nature.
It's public knowledge that he's a prof at stanford and publicly available directories can lead you to his office. Not to mention that he's famous enough that this is almost certainly not the first time someone shares a photo like this.
If it was a photo of his home I'd understand but this is essentially public knowledge.
Great, I am gonna watch this. Hopefully this video also explains what the name 'Netscape' means or implies or is based on. Because I've always found it kind of striking that the name has the same letters (and sort of sounds) like 'NCSA' where Mosaic was originally developed, that seems like more than a coincidence?
> "We've got to make progress on [renaming the company]." And I said,
> "We've got a couple of ideas, but they're not great." Then it just kind
> of popped into my head, and I said, "How about Netscape?" Everyone kind
> of looked around, saying, "Hey, that's pretty good. That's better than
> these other things." It gave a sense of trying to visualize the Net and
> of being able to view what's out there.
Does anyone know what the name 'Netscape' means or implies or is based on?
It's kind of striking that the name has the same letters (and in the same order) as NSCA where Mosaic was originally developed, that seems like more than a coincidence?
And it is not so much WHAT he explained but HOW he explained it, what really made it stick. It was his sheer unbridled genuine enthusiasm that put him on the map for me.
I first came to learn about the complexity of character sets by finding out that SQL Server's default characterset is 2 bytes per character. I eventually came across Scott's video. UTF-8 encoding is only as new as SQL Server 2019.
This is a great video! I’ve been writing about UTF-8 on my blog and I noticed that many programmers don’t understand it, after 10 or 20 years
The main impediment seems to be that languages like Java, JavaScript, and Python treat UTF-8 as just another encoding, but really it’s the most fundamental encoding.
The language abstractions get in the way and distort the way people think about text
Newer languages like Go and Rust are more sensible, they don’t have global mutable encoding variables
It's misleading to describe UTF-8 as "the most fundamental encoding", because of the existence of UTF-32 (essentially just a trivial encoding of "raw" Unicode code points) and UTF-16 (which has certain limitations that later became part of the specification of what code points are valid in any encoding)
The word "fundamental" here probably has the same meaning as "elementary" as in elementary mathematics, i.e. something you should absolutely understand first, not something that makes up everything else.
What's not to understand about UTF8? It's a way of coding up points in the Unicode space. You decode the bits and it yields a number that corresponds to a glyph, possibly with some decorations. The only thing special about UTF8 is that it happens to write ASCII as ASCII, which is nice as long as you realize that there is much outside that.
Either that, or I'm completely off base and part of the vast horde who don't get it.
Variable width encodings are inherently harder to understand.
And while UTF-8 tries very hard to be simple whenever possible, it does have some nonobvious constraints that make it significantly more complex than the actual simplest variable-width encoding (a continuation bit and then 7 data bits).
Unicode is variable length even if you use 32 bits, because glyphs sometimes require multiple codepoints. People sometimes write as if using more bits will remove complexity from Unicode but it doesn't really, you still need to handle multiple units at once sometimes.
I disagree, if you are correctly handling Unicode you already are going through some "decode" function which parses a 32 bit quantity from utf-8 or utf-16. From there you need to handle multiple codepoints together sometimes, like for non-composed diacritics, han unification, and some emojis. This is complex regardless of whether or not you use utf-8 or utf-16, in fact I'd say it's more difficult to handle than those.
If you want to teach someone how an encoding works, why would you not tell them that a single symbol can take multiple codepoints?
It seems like you're advocating people learning incorrect information and forming their impressions of it on falsehood. Which is probably why people think utf-32 frees you from variable-length encoding.
You should tell them, yes. And talk about it more at some point. But you don't have to go into much detail when today's lesson is specifically teaching UTF-8 or UTF-32. I don't know about you, but I think I could teach the latter about ten times faster.
As part of a comprehensive dive into Unicode it's a minor part, but for teaching an encoding it's a significant difference.
> I think I could teach the latter about ten times faster.
I've lectured computer science at the university level, and I think you could introduce all this information to a CS undergrad pretty coherently and design a lab or small assignment on it no problem. Maybe you could ask them to parse some emojis that require multiple 32-bit codepoints.
Even ignoring all the other advantages (mostly synchronization-related, which do objectively make implementing algorithms on UTF-8-encoded data simpler), "the number of set prefix bits is the number of bytes" doesn't seem meaningfully more complex than a single continuation bit.
> Plus UTF-8 has more invalid encodings to deal with than a super-simple format.
If your format supports non-canonical encodings you're in for a bad time no matter what, so a whole lot of that simplicity is fake.
> And it also means you're dealing with three classes of byte now.
If you're working a byte at a time you're doing it wrong, unless you're re-syncing an invalid stream in which case it's as simple as a continuation bit (specifically, it's two continuation bits).
The simple encoding already allows smaller characters to have the same bytes as subsets of larger characters. Non-canonical is not a big deal on top of that. Also there are other banned bytes you don't need to deal with.
> If you're working a byte at a time you're doing it wrong, unless you're re-syncing an invalid stream
It's very relevant to explaining the encoding and it matters if you're worried that invalid bytes might exist. You can't just ignore the extra complexity.
Also if you're not working a byte at a time, that kind of implies you parsed the characters? In which case non-canonical encodings are a non-problem.
If you're going beyond decoding, then you're beyond the stage where canonical and non-canonical versions exist any more.
Non-canonical encodings make it difficult to do things without decoding, but you have bigger problems to deal with in that situation, and the non-canonical encodings don't make it much worse. Don't get into that situation!
Specifically, even with only canonical encodings, one and two byte characters can appear inside the encoding of two and three byte characters. You can't do anything byte-wise at all, unlike UTF-8. But you already said "If you're working a byte at a time you're doing it wrong" so I hope that's not too big of an issue?
More properties of UTF-8: It is self-synchronizing. It has the same lexicographical sort order as UTF-32. It allows substring matches without false positives. It is compatible with null-terminated strings.
There is no shorter command to show uptime, load averages (1/5/15 minutes), logged in users. Essential for quick system health checks!