Hacker Newsnew | past | comments | ask | show | jobs | submit | janvdberg's commentslogin

My first command is always 'w'. And I always urge young engineers to do the same.

There is no shorter command to show uptime, load averages (1/5/15 minutes), logged in users. Essential for quick system health checks!


It should also be mentioned, Linux Load Average is a complex beast[1]. However, a general rule of thumb that works for most environments is:

You always want the load average to be less than the total number of CPU cores. If higher, you're likely experiencing a lot of waits and context switching.

[1] https://www.brendangregg.com/blog/2017-08-08/linux-load-aver...


On Linux this is not true, on an IO heavy system - with lots of synchronous I/Os done concurrently by many threads - your load average may be well over the number of CPUs, without having a CPU shortage. Say, you have 16 CPUs, load avg is 20, but only 10 threads out of 20 are in Runnable (R) mode on average, and the other 10 are in Uninterruptible sleep (D) mode. You don't have a CPU shortage in this case.

Note that synchronous I/O completion checks for previously submitted asynchronous I/Os (both with libaio and io_uring) do not contribute to system load as they sleep in the interruptible sleep (S) mode.

That's why I tend to break down the system load (demand) by the sleep type, system call and wchan/kernel stack location when possible. I've written about the techniques and one extreme scenario ("system load in thousands, little CPU usage") here:

https://tanelpoder.com/posts/high-system-load-low-cpu-utiliz...


Hey Tanel - I wanted to thank you for that blog post and psn tool - it recently helped me in a tricky performance investigation.


Glad to be helpful! :-)


The proper way is to have a idea of what it normally is before you need to troubleshoot issues.

What is a 'good load' depends on the application and how it works. Some servers something close to 0 is a good thing. Other servers a 10 or lower means something is seriously wrong.

Of course if you don't know what is a 'good' number or you are trying to optimize a application and looking for bottlenecks then it is time to reach for different tools.


Glances is nice. I think it is a clone of HP-UX Glance.

https://nicolargo.github.io/glances/

I have also hacked basic top to add database login details to server processes.


Me too! So much so that I add it to my .bashrc everywhere.


This reminds me of a project I saw more than 24 (!) years ago. Someone made a webserver for the GBA.

It seemed magical to me at the time and I still remember going to this site often to see updates (that's why I remember the URL).

Thankfully the Wayback Machine still has it:

https://web.archive.org/web/20030204043536/http://fivemouse....


Anecdote: In 2022, while visiting San Francisco, I had the chance to explore the campus. Wandering through the quiet, empty halls of the summer buildings, I was just about to leave when I unexpectedly came across Knuth's office [1]. I had to do a double take—it was surprisingly small for someone of his stature. Yet, in a way, it felt perfectly fitting, a reflection of his unassuming nature.

https://janvandenberg.blog/wp-content/img_1813-scaled.jpg

About the checks: I have not 1 but 2 checks. Small typos, nothing big: but wonderful to have these two documents.


That's a pretty cool office - not sure what other types of offices are available in the campus, but still. (Typing from the open office of hell)


is he still use that office? I thought he mostly spent his time in his home office


Looks like a recent move or temporary office. Box of random items, basic paper name “plate,” stacks on the desk …


Nice that he has a slide rule on the desk (under the front basket) looks like a K&E Deci-Lon case.


That’s like every professors office


I think it's a neat photo, and I appreciate the spirit of your post, but I recommend you to modify or delete it, unless you got his consent to post.

Not that anyone would, but I would be creeped out if I learned that people were posting pictures of my office without my knowledge.


It's public knowledge that he's a prof at stanford and publicly available directories can lead you to his office. Not to mention that he's famous enough that this is almost certainly not the first time someone shares a photo like this.

If it was a photo of his home I'd understand but this is essentially public knowledge.


Exceptional read! I love it.

It's the most complete history of git that I know now. Exceptional!

I'd love to read more historical articles like this one, of pieces of software that have helped shape our world.


> It's the most complete history of git that I know now.

I wasn't going to read the story until I read your comment. I knew the summary of BitKeeper and the fallout, but wow this was so detailed. Thanks!


+! to that. Great read. The field is young and accelerating. History is quite compressed. It's valuable to have articles like this.


If you like computer/software history, I recommend the Abort Retry Fail[1] mailing list.

[1] https://www.abortretry.fail/


(I meant 'newsletter' , not 'mailing list')


the dream machine was a good one, though a bit more historical. http://folklore.org has a bunch of good Apple stories.


Ditto. This was a really nice read!


Great, I am gonna watch this. Hopefully this video also explains what the name 'Netscape' means or implies or is based on. Because I've always found it kind of striking that the name has the same letters (and sort of sounds) like 'NCSA' where Mosaic was originally developed, that seems like more than a coincidence?


  > "We've got to make progress on [renaming the company]." And I said, 
  > "We've got a couple of ideas, but they're not great." Then it just kind 
  > of popped into my head, and I said, "How about Netscape?" Everyone kind 
  > of looked around, saying, "Hey, that's pretty good. That's better than 
  > these other things." It gave a sense of trying to visualize the Net and 
  > of being able to view what's out there. 
Greg Sands in https://money.cnn.com/magazines/fortune/fortune_archive/2005...


Landscape -> Netscape


Starscape, city scape…


Escape


This! It is so strange when posts don't have a date. It feels like those posts are trying actively to hide something. It's almost suspicious.

I also have a couple of other things I look for in a good blog: https://j11g.com/2024/06/24/a-good-blog-has/


You mean my Python 2 tutorial isn't evergreen content marketing? :(


Does anyone know what the name 'Netscape' means or implies or is based on?

It's kind of striking that the name has the same letters (and in the same order) as NSCA where Mosaic was originally developed, that seems like more than a coincidence?


Almost ten years after I asked! :)

https://news.ycombinator.com/item?id=7847368

Awesome!


Tom is why I understand UTF-8 [1].

And it is not so much WHAT he explained but HOW he explained it, what really made it stick. It was his sheer unbridled genuine enthusiasm that put him on the map for me.

One of the best to do it.

[1] https://www.youtube.com/watch?v=MijmeoH9LT4


And he's why I know I shouldn't roll my own time zone library.

https://youtu.be/-5wpm-gesOY?si=eL38cruwJZDiuHVS


I first came to learn about the complexity of character sets by finding out that SQL Server's default characterset is 2 bytes per character. I eventually came across Scott's video. UTF-8 encoding is only as new as SQL Server 2019.

https://sqlquantumleap.com/2018/09/28/native-utf-8-support-i...


This is a great video! I’ve been writing about UTF-8 on my blog and I noticed that many programmers don’t understand it, after 10 or 20 years

The main impediment seems to be that languages like Java, JavaScript, and Python treat UTF-8 as just another encoding, but really it’s the most fundamental encoding.

The language abstractions get in the way and distort the way people think about text

Newer languages like Go and Rust are more sensible, they don’t have global mutable encoding variables


It's misleading to describe UTF-8 as "the most fundamental encoding", because of the existence of UTF-32 (essentially just a trivial encoding of "raw" Unicode code points) and UTF-16 (which has certain limitations that later became part of the specification of what code points are valid in any encoding)

https://en.wikipedia.org/wiki/UTF-16#U+D800_to_U+DFFF_(surro...

UTF-8 is the most ubiquitous encoding on the web, but that doesn't make it more fundamental than any other.


The word "fundamental" here probably has the same meaning as "elementary" as in elementary mathematics, i.e. something you should absolutely understand first, not something that makes up everything else.


What's not to understand about UTF8? It's a way of coding up points in the Unicode space. You decode the bits and it yields a number that corresponds to a glyph, possibly with some decorations. The only thing special about UTF8 is that it happens to write ASCII as ASCII, which is nice as long as you realize that there is much outside that.

Either that, or I'm completely off base and part of the vast horde who don't get it.


Variable width encodings are inherently harder to understand.

And while UTF-8 tries very hard to be simple whenever possible, it does have some nonobvious constraints that make it significantly more complex than the actual simplest variable-width encoding (a continuation bit and then 7 data bits).


Unicode is variable length even if you use 32 bits, because glyphs sometimes require multiple codepoints. People sometimes write as if using more bits will remove complexity from Unicode but it doesn't really, you still need to handle multiple units at once sometimes.


Sure but that's at another level entirely. Dealing with two types of variable width is double difficult.


I disagree, if you are correctly handling Unicode you already are going through some "decode" function which parses a 32 bit quantity from utf-8 or utf-16. From there you need to handle multiple codepoints together sometimes, like for non-composed diacritics, han unification, and some emojis. This is complex regardless of whether or not you use utf-8 or utf-16, in fact I'd say it's more difficult to handle than those.


We're not talking about fully correctly handling Unicode, we're talking about teaching someone the basics of an encoding. Context is critical here.


If you want to teach someone how an encoding works, why would you not tell them that a single symbol can take multiple codepoints?

It seems like you're advocating people learning incorrect information and forming their impressions of it on falsehood. Which is probably why people think utf-32 frees you from variable-length encoding.


You should tell them, yes. And talk about it more at some point. But you don't have to go into much detail when today's lesson is specifically teaching UTF-8 or UTF-32. I don't know about you, but I think I could teach the latter about ten times faster.

As part of a comprehensive dive into Unicode it's a minor part, but for teaching an encoding it's a significant difference.


> I think I could teach the latter about ten times faster.

I've lectured computer science at the university level, and I think you could introduce all this information to a CS undergrad pretty coherently and design a lab or small assignment on it no problem. Maybe you could ask them to parse some emojis that require multiple 32-bit codepoints.


That sounds entirely reasonable, but you have to deal with a lot less engagement when you're teaching via blog post or similar.


Even ignoring all the other advantages (mostly synchronization-related, which do objectively make implementing algorithms on UTF-8-encoded data simpler), "the number of set prefix bits is the number of bytes" doesn't seem meaningfully more complex than a single continuation bit.


> "the number of set prefix bits is the number of bytes"

Except when it's 1, because that's an invalid start, and except when it's 0, because that means the character is a single byte.

And it also means you're dealing with three classes of byte now.

Plus UTF-8 has more invalid encodings to deal with than a super-simple format.


> Plus UTF-8 has more invalid encodings to deal with than a super-simple format.

If your format supports non-canonical encodings you're in for a bad time no matter what, so a whole lot of that simplicity is fake.

> And it also means you're dealing with three classes of byte now.

If you're working a byte at a time you're doing it wrong, unless you're re-syncing an invalid stream in which case it's as simple as a continuation bit (specifically, it's two continuation bits).


The simple encoding already allows smaller characters to have the same bytes as subsets of larger characters. Non-canonical is not a big deal on top of that. Also there are other banned bytes you don't need to deal with.

> If you're working a byte at a time you're doing it wrong, unless you're re-syncing an invalid stream

It's very relevant to explaining the encoding and it matters if you're worried that invalid bytes might exist. You can't just ignore the extra complexity.

Also if you're not working a byte at a time, that kind of implies you parsed the characters? In which case non-canonical encodings are a non-problem.


> Non-canonical is not a big deal on top of that.

Unless you want to actually do anything with the string beyond decode a codepoint.


If you're going beyond decoding, then you're beyond the stage where canonical and non-canonical versions exist any more.

Non-canonical encodings make it difficult to do things without decoding, but you have bigger problems to deal with in that situation, and the non-canonical encodings don't make it much worse. Don't get into that situation!

Specifically, even with only canonical encodings, one and two byte characters can appear inside the encoding of two and three byte characters. You can't do anything byte-wise at all, unlike UTF-8. But you already said "If you're working a byte at a time you're doing it wrong" so I hope that's not too big of an issue?


I read your comment in the “Simpson comic book nerd” tone of voice.


More properties of UTF-8: It is self-synchronizing. It has the same lexicographical sort order as UTF-32. It allows substring matches without false positives. It is compatible with null-terminated strings.


Half way in, i find this a tiring style of narration. Less enthusiasm would require less focus and thus less energy from the viewer.


Interesting… never heard of the “null problem” as a requirement before.


Does it have j/k navigation yet?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: