*1) Extend font server and services to vend outlines and antialiased masks, supp...

orib · on July 17, 2008

Oh. Oops, noticed the date. How things change in just a few years. =)

Still, it shows how fast X is progressing, now that we've ditched XFree86.

Since the split, X has really moved into the 90s, gaining <sarcasm>esoteric</sarcasm> features like monitor hotplugging, being able to configure more than one input device at once (like a touchpad and external mouse) and other silly things that nobody would ever want.

It's also jumped ahead of other window systems by leaps and bounds in some areas -- for example, multitouch support has been merged in, and X will be the first desktop window system to support multiple pointers, input redirection is on it's way, a new OpenGL framework (Gallium3D) is coming, new 2d acceleration (Exa/GEM/TTM/...) are in various states of merged-inness, etc.

And they've done all of this while deleting between 10,000 and 50,000 lines of code per release. And preserving backwards compatibility.

I'd say the future looks pretty bright for X.

[edit: give a better overview of what I mean by X progressing]

eggnet · on July 17, 2008

More than a few years, the post was in late 2003 but the OS X preview came out in late 2000 where they clearly decided to not use X already. The decision was likely made substantially before that date as well, not to mention NextStep.

anewaccountname · on July 17, 2008

Is this a gimmick post to prove that block quotes lower the quality of discussion?

mechanical_fish · on July 17, 2008

I cannot possibly upmod this enough.

I just don't read posts like the one above. My brain literally refuses to bother anymore -- it scans the first couple of lines, sees the stuff it's already read, sees the repeated lines, and then starts looking around for something else to do.

The author's actual intelligence couldn't be better disguised if it came in the shape of a banner ad.

orib · on July 17, 2008

To be quite honest, I find that format easier to read (well, why else would I have posted it that way?)

A short snippet that is being responded to puts the text in context, allowing the reader to see exactly what I'm going on about, without reading the article over and over, matching up what part, exactly, I'm responding to.

Sure, massive copy-and-paste is a waste of time, and obscures the post. However a lack of context, in my opinion, will make things even more confusing to the reader, especially in the case of the text that's being responded to being on a different page.

Ah well, each to their own.

mechanical_fish · on July 17, 2008

Your repeated lines... aren't repeating anymore, presumably thanks to the magic of editing. That is a remarkable improvement. The sight of the same sentence being echoed over and over, a sight which practically screams "enraged fanboy", is now absent.

My other problem with the point-by-point rebuttal is that all the chaff (in the form of those detailed but disconnected footnotes) threatens to obscure the bottom line, which in this case is "it isn't 2003 anymore and X has actually made a lot of progress since then". Basically, what I wanted to read was your second post. Which is much better than the first because it's written in prose -- connected sentences that speak in your voice and tell a story -- and not in chopped-up bullet points.

Now, if this were a detailed design review of a windowing-system project, things would be different, and I'd agree with you that your format is just fine.

Perhaps the issue here is that you obviously know and care a great deal about the design details of windowing systems, whereas I would rather read a short, colloquial summary of the state of play and then go to bed. ;)

scott_s · on July 17, 2008

I try to avoid rebutting sentence-by-sentence. I think it encourages people to quibble about minor points instead of focusing on the point as a whole.

Context is important, but the best approach is to integrate the context into the reply.

tx · on July 17, 2008

Wow, nice explanation - thank you. Since you seem to know this stuff, where can you send me for a nice introduction to Linux drawing/windowing stack? I've done some searches and tried to read up on this, but it looks like a bunch of loosely-related libraries that I can't compose a solid "picture of everything" from.

I'm coming from years of Windows lower-level UI coding (Win32/GDI/GDI+) and I wanted to look into Gnome/GTK+/Firefox drawing slowness (particularly with scrolling on weaker/older machines). I don't do much "pixel programming" at work, but I figured it would be fun side optimizing project to hack on.

orib · on July 17, 2008

Hm, I can't think of a good overview off the top of my head (maybe I should sit down and write a good one sometime), but the way it works is:

An application will typically use a GUI toolkit like GTK or QT to do it's UI. This library will use Xlib[1] (or XCB[2] in some cases) to do it's event handling. Event handling in X consists of handling button clicks, sending messages to and from the window manager, cut and paste, etc. (The WM to app protocols are detailed in the ICCCM and the EWMH documents)

The painting will be done through a drawing library. for GTK apps (and Firefox, which isn't a GTK app, but does a good job of pretending it is through nasty hacks[3]) will use Cairo[4] to paint, and QT apps will use QT's paint engine (Arthur, I believe) to draw their UI.

Both of these engines will use XRender (which has largely replaced core X drawing) to actually put the pixels onto the screen, but works at a very low level (antialiased trapezoids and porter-duff compositing of images, more or less). They also fall back in software if the X server doesn't have XRender, or they've detected a buggy version of it.

That's the application side of it.

The fallback library that's used in Cairo is shared with the X server's software XRender implementation, and is called pixman[5]. Making this faster will likely make both the X server and the Cairo image surface faster.

Finally, the X server accelerates drawing to the screen through either the Exa accelleration architecture, or through XAA.

Exa is the more modern of the two, and it accelerates commonly used stuff that XRender wants for things to be fast. This mostly means alpha-transparent blends and such.

XAA is rather old, and is somewhat complicated to implement. Furthermore, it doesn't accelerate very much interesting stuff for modern apps -- when was the last time you saw an unantialiased wide line?

    [1] Xlib: the X protocol library and more. http://tronche.com/gui/x/xlib/
    [2] XCB: a new X protocol binding. Designed for transparent thread safety and asynchronousness: http://xcb.freedesktop.org/
    [3] Firefox renders GTK widgets offscreen, mashes them up, and paints them on screen in the XUL layer to pretend to be GTK
    [4] Cairo is a vector graphics library based of the postscript model: http://cairographics.org
    [5] Pixman is essentially a software implementation of XRender's drawing operations: http://cgit.freedesktop.org/pixman/

nailer · on July 18, 2008

Would it be possible or useful to create an X extension for higher lever (ie, toolkit) primitives for remoting? Ie, an app can send 'list box widget in GTK' to the display server?

orib · on July 18, 2008

Possible, although it would be rather difficult to do well in a flexible manner, and would be unusable by every app that currently exists. More or less, it would be a replacement of the entire X event handling model, entire X drawing model, etc.

And it would make writing apps more difficult -- very few of them use entirely stock widgets, and it would be nearly impossible to securely and performantly implement new widgets on this system.

Plus, it's diametrically opposed to the X "Mechanism, not policy" mantra.

I wouldn't expect anything like this to take off, but it's been tried in the Y window system, and more interestingly -- and more dead-in-the-water -- in the NEWS window system.

nailer · on July 18, 2008

Why would it have to replace the existing drawing model? I'm thinking of a way for an app to ask the server if it can render a particular set of high level primitives, if the server responds that it can, the apps sends those instead.

That would allow existing apps and existing display servers, new apps and older display server, and old apps and newer display servers to work fine.

If both the app (at the toolkit level) and the display server are recent, however, remoting is massively sped up.

tx · on July 18, 2008

Like this? http://www.gtk-server.org/

nailer · on July 18, 2008

No, that's a tool for making GTK applications in various scripting languages.

tx · on July 18, 2008

Well... It does exactly what you've described in your question.

tx · on July 18, 2008

Saved to a text file. Thank you for taking the time!

riahi · on July 17, 2008

|by mpaque (655244) on Tuesday August 19 2003

Notice that this is an old comment. Did all those features you describe exist back in 2003 on X?

neilc · on July 17, 2008

Unix domain sockets are fast. really fast. as in, beating shared memory sometimes fast

Really? How would that be possible?

orib · on July 17, 2008

Setting up the TLB and handling the page faults, then doing the interprocess synchronization can be expensive compared a write() for small one-shot transfers (which is what the bulk of X protocol traffic is). A write() is more or less a memcpy() between buffers in the kernel.

Don't get me wrong -- shared memory isn't always slower, but the constant and synchronisation overhead for a write() is smaller than an mmap(), so for small writes it can often be faster than shmem transport. For large writes, shared memory will usually be faster, of course.

mcescher · on July 17, 2008

> Setting up the TLB and handling the page faults,

Should be done once, at the beginning, and the segment re-used.

> then doing the interprocess synchronization

There is IPC anyway when you are sending information between processes. If you delve into the kernel you may be surprised at the number of layers of operations it takes to present data from one process to another using a socket.

> but the constant and synchronisation overhead for a write() is smaller than an mmap()

You're assuming the pages needed are already mapped in the former case. If you're writing anything of size, the pages will probably need to be faulted in anyway. If you're writing something small, then setting up your shared memory at the beginning and then re-using that memory is obviously the way to go, rather than getting and destroying shared memory segments every time you need them.

michaelneale · on July 17, 2008

>So why did they do their own window system from scratch? Well, because they didn't. They merely extended NeXT's disply postscript, which they already owned and had code for.

Perhaps they also wanted as much control as possible. Whilst they have done benevolent forks before (webKit) perhaps they thought that with X it would be unlikely to be taken well if they "had their way with it".

gaius · on July 17, 2008

What they wanted was to have the same display code driving both screen and printer. Literal WYSIWYG. X doesn't (and can't ever) work like that. Which is not a knock on X, it's just the reality that Apple have their own priorities which are not necessarily the same as the traditional Unix world.

ajross · on July 17, 2008

Cairo can, and does work that way. There are current cairo backends for X11, Postscript, PDF, flat image buffers, OpenGL, SVG, win32, quartz, and no doubt a bunch of others I'm forgetting. The whole point of the thing was to have a robust, path-based imaging model that works everywhere.

This is now 4 year old technology. Please flame with more current criticism. :)

gaius · on July 17, 2008

You could always take an X screenshot and dump it to the printer :-P Cairo is more analogous to say Qt or wx than it is DPS/DPDF.

ajross · on July 17, 2008

Your implication seems to be that Cairo doesn't work on an accelerated backend? That's not correct. It is correct that the primitive assembly happens client side in Cairo, which I'm not sure is a misfeature. The IPC overhead of DPS was always one of its biggest problems.

michaelneale · on July 17, 2008

Actually that makes more sense.

It also makes sense with their view on antialiasing fonts (eg look at iphone DPI, it is close to print DPI).

amethyst · on July 17, 2008

Actually, there are other devices, like the Openmoko Neo 1973 and Neo Freerunner, which have much higher DPI screens. The iPhone has a resolution of 480x320, at 3.5", for 160dpi; the Neos have a 2.8" 640x480 resolution screen, at 280dpi. It's really nice to look at and read text on. :)

michaelneale · on July 17, 2008

Wow thats incredible. I haven't seen a Neo in the flesh, I would love to get one of they work here in Australia.

I have an older iphone, I find its quite easy to read at that DPI, but the Neo DPI could be much better (I hear the kindle looks like reading paper, need to check that out, once again, we probably can't use it here yet !).

It sucks being in the "ass end" of the world sometimes.

nailer · on July 18, 2008

The Neo has a 280 DPI screen?

That's awesome - I don't think I've ever seen a display that close to print quality. Its a pity the styling and software don't seem very consumer friendly.

mcescher · on July 17, 2008

> Unix domain sockets are fast. really fast. as in, beating shared memory sometimes fast.

Not sure where you got this but it's wrong.

ajross · on July 17, 2008

It's a little apples-to-oranges, but it's not "wrong". Lots of IPC problems really are solved fastest using a socket or pipe. Writing to a shared memory buffer doesn't clue the kernel in to the fact that there is new data available, which means you need a syscall anyway to transfer control. And it makes the application responsible for all the buffering that occurs, and applications don't have a "big picture" available to properly optimize things.

Don't diss the pipe. It doesn't do everything, but if you're going to pick just one IPC mechanism, it's the one.

mcescher · on July 17, 2008

Your arguments are orthogonal to speed.

> Don't diss the pipe

I guess you've been smoking.

You don't make any sense when it comes to "buffers" (writing to shared memory IS writing to a buffer); a pipe doesn't give you a "big picture" to "properly optimize things" (a WTF "argument"), and no syscall is needed to "transfer control", unless you're talking about futexes or some other method to clue the kernel in, in which case, you do indeed have a way to clue the kernel in.

Writing once is faster than writing multiple times. That's why shared memory is obviously faster than not sharing and copying. Time needed to set up shared memory segments is negligible if done right -- once, at the beginning, and then re-used as needed.

dfox · on July 17, 2008

I think that the argument there is that shared memory is not useful to transfer data from one process to another, but to share data between processes. Which is quite different use case.

Shared memory is fast, but it is just a shared memory. You have to synchronize accesses to it in some way.

When you want to transfer relatively small records from one process to another, it is probably faster to send them through pipe, because the copying overhead is negligible compared to synchronization overhead, which would be there in shared memory case too (and will probably be larger).

So to wrap up:

1) In most cases it is really needed to tell kernel that you are done and want to wait on some event in other process (relying on preemption or explicit timed sleeps to switch to other process is considered bad practice)

2) Writing once is indeed faster than multiple times, which is in this case of many small writes argument for pipes. Because pipe lets you hand over some data to kernel and let it do any low-level synchronization details, which it can generally do better.

3) In UNIX, pipes/sockets are the most flexible means of IPC. And in case where you have server that reads large chunks of data from multiple clients using shared memory, synchronizing accesses to this shared memory by sockets/pipe is good idea, because you can use select(2)/poll(2) on such socket. Using mutexes or semaphores or something like that would in most cases lead to busy-waiting or unnecessary convoluted code.

mcescher · on July 17, 2008

> You have to synchronize accesses to it in some way.

Pretty simple.

> it is probably faster to send them through pipe, because the copying overhead is negligible compared to synchronization overhead, which would be there in shared memory case too (and will probably be larger).

There is still synchronization overhead -- just because it's hidden from you in the kernel doesn't mean it isn't there. You should check out the source of your favorite kernel and see how much work is done behind the scenes to transfer data over a socket.

So you have an extra copy, PLUS extra synchronization overhead. The only reason people ever think that method is faster is because of operator error, e.g. creating and destroying shared memory segments thinking it's like malloc.

> 1) In most cases it is really needed to tell kernel that you are done

    man futex

Or just busy-wait, or do a nonblocking "read" like you would anyway when reading from a socket.

> kernel and let it do any low-level synchronization details,

Look at the code. Really. It's a lot of overhead compared to userspace solutions.

> In UNIX, pipes/sockets are the most flexible means of IPC

s/flexible/common

Obviously shared memory is the most flexible because you can implement whatever scheme you want with it. There's always shared memory somewhere anyway, even with sockets, it's just hidden from you in the case of kernel code.

> Using mutexes or semaphores or something like that would in most cases lead to busy-waiting or unnecessary convoluted code.

Busy-waiting is done in multiple places in the kernel. If you're blocking on read anyway, what's the difference -- and if you're not, it's no more convoluted to check a shared condition variable on each pass than nonblocking reads and accumulations.