Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I loved the idea of QNX. Got way excited about it. We were moving our optical food processor from dedicated DSPs to general purpose hardware, using 1394 (FireWire). The process isolation was awesome. The overhead of moving data through messages, not so much. In the end, we paid someone $2K to contribute isochronous mode/dma to the Linux 1394 driver and went our way with RT extensions.

It was a powerful lesson (amongst others) in what I came to call “the Law of Conservation of Ugly”. In many software problems, there’s a part that just is never going to feel elegant. You can make one part of the system elegant, which often causes the inelegance surface elsewhere in the system.



> what I came to call “the Law of Conservation of Ugly”. In many software problems, there’s a part that just is never going to feel elegant

This may be an instance of the Waterbed Principle: in any sufficiently-complex system, suppressing or refactoring some undesirable characteristic in one area inevitably causes an undesirability to pop up somewhere else. Like there is some minimum amount of complexity/ugliness/etc that it is possible for the entire system to contain while still carrying out its essential functions, and it must leak out somewhere.

https://en.wikipedia.org/wiki/Waterbed_theory


The terms I've seen used and prefer to use are "essential complexity" and "accidental complexity".


I have a really neat idea to improve the message passing speed in QNX: you simply use the paging mechanism to send the message. That means there is no copying of the data at all, just a couple of page table updates. You still have the double TSS load overhead (vs 1 TSS load in a macro kernel), but that is pretty quick.

But you are right that there is a price for elegance. It becomes an easier choice to make when you factor in things like latency and long term reliability / stability / correctness. Those can weigh much heavier than mere throughput.


This is sort of what Mach does with "out-of-line" messages: https://web.mit.edu/darwin/src/modules/xnu/osfmk/man/mach_ms... https://dmcyk.xyz/post/xnu_ipc_iii_ool_data/

(this is used under-the-hood on macOS: NSXPCConnection -> libxpc -> MIG -> mach messages)


Mach has always been a very interesting project. It doesn't surprise me at all to see that they have this already, but at the same time I was not aware of it so thank you. This also more or less proves that that may well be an avenue worth pursuing.


I learned of the idea from some paper or other of Barrelfish, which is a research OS based on seL4. Barrelfish is underrated! Aside from its takes on a kernel architecture, it also has interesting nuggets on other aspects of OS design, such as using declarative techniques for device management.


I haven't seen it implemented anywhere, but that sounds like the "pagetable displacement" approach described here: https://wiki.osdev.org/IPC_Data_Copying_methods#Pagetable_di...

The same idea occurred to me a while ago too, which is how I originally found that link :)


How performant is that in practice? I thought setting pages was a fairly expensive process. Using a statically mapped circular buffer makes more sense to me at least.

Disclaimer: I don't actually know what I'm talking about, lol


To be clear, since the other replies to you don't seem to be mentioning it, the major costs of MMU page-based virtual memory are never about setting the page metadata. In any instance of remapping, TLB shootdowns and subsequent misses hurt. Page remapping is still very useful for large buffers, and other costs can be controlled based on intended usage, but smaller buffers should use other methods.

(Of course I'm being vague about the cutoff for "large" and "smaller" buffers. Always benchmark!)


You can pretty reliably do it on the order of 1 us on a modern desktop processor. If you use a level 2 sized mapping table entry of say 2 MB, that is a transfer speed on the order of 2 TB/s or ~32x faster than RAM for a single core even if you only move a single level 2 sized mapping table entry. If you transfer multiple in one go or use say a level 3 sized mapping table entry of 1 GB that would be 1 PB/s or ~16,000x faster than RAM or ~20x the full memory bandwidth of a entire H200 GPU.


Pretty quick, far faster than inter-process memory copy. The only way to be sure would be to set it up and to measure it, but on a 486/33 I could do this ~200K per second, on modern systems it should be a lot faster than that, more so if the processe(s) do not use FP. But I never actually tried setting up say a /dev/null implementation that used this, it would be an interesting experiment.


Passing the PTE sounds great for big messages (send/recv).

For small messages (open), the userspace malloc is going to have packed small buffers into a single page - so there's a chance you'd need to copy to a new userspace page, the two copies might work out better.


The throughput limitation is really only an issue for big messages, for smaller ones the processing overhead will dominate.


The QNX call to do that is mmap().


Yes, I know. But I rolled my own QNX clone and I figured it would be neat to do this transparently rather than that the application has to code it up explicitly. This puts some constraints on where messages can be located though and that's an interesting problem to solve if you want to do it entirely without overhead.


I have a general distaste for transparent policies, which I always find to fall short for some use case. In this case, the sender would know best what to do with their message. Moreover, for small buffers, page remapping won't be an optimization. I recommend reflecting this as an alternative send interface.

The lower a transparent policy lies in the OS, the worse it contorts the system. Even mechanisms necessarily constrain policy, if only slightly. I strongly believe that microkernels will only be improved by adhering ever closer to true minimality. If backwards compatibility is important, put the policy in a library. But I think transparent policies are generally advisable only when user feedback indicates benefit.


If you want your send/receive/reply mechanism to work transparently across a network then you have already made many such decisions and likely this one will just appear as an optimization in case both client and server are local.


I agree that the decisions will likely be made in the end, but I want to say that the decisions should be made as high/late as possible. It is important to note not just that a decision has been made, but where it has been made, for any dynamic system. This will determine things like overhead or maintenance burden.

Contrary to QNX, I'm not entirely convinced that network transparency by default is ultimately best, though that is a separate concern.


Is "optical food processor" a metaphor, or is this actually a device that would cut up food items based on image feedback?


Usually it's about sorting. Take a lot of whatever (french fries, green beans, etc), accelerate them to something like 3 m/s, launch them off the end of a belt, scan them, looking for defects, and then use air jets to divert the defective items. Look on you tube for it. It's sort of mind boggling to see the scale at which french fries alone are produced. You see one line running at load, and then realize there are multiple lines in most plants, and there are hundreds of plants world wide. It's mind boggling.

The cooler machines were specialized for fries, they use a rotating knife drum above a belt to cut defect spots from fries.

I've not done that for 17 years now; the newer machines are that much cooler.


That's awesome. Thanks for the explanation.

I did find several machines like this on YouTube, and it's amazing to watch. (One of them had little motor-actuated slats that could kick the defective items away, almost like a foot kicking a soccer ball!)


There's an older talk Simon Peyton Jones (IIRC?) gave about some development or other in haskell, in which he suggested that many software systems have some aspect of the swamp or the marsh into which you must eventually wade - that there's a mucky, sticky, irreducible aspect to the problem that must be dealt with somewhere, regardless of how elegant the rest of the system is.

"that marsh thing" has stuck with me, and been a frequent contributor to my work and thinking. I'll happily take Law of Conservation of Ugly as a _much_ better name for the thought :)


Today though, I'd argue that with full DSMP support and much more capable systems, any overhead from message passing is much less of a concern, or at least outweighed by other benefits.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: