Hacker Newsnew | past | comments | ask | show | jobs | submit | x2rj's commentslogin

I've always thought a problem with Nagel's algorithm is, that the socket API does not (really) have a function to flush the buffers and send everything out instantly, so you can use that after messages that require a timely answer.

For stuff where no answer is required, Nagel's algorithm works very well for me, but many TCP channels are mixed use these days. They send messages that expect a fast answer and other that are more asynchronous (from a users point of view, not a programmers).

Wouldn't it be nice if all operating systems, (home-)routers, firewalls and programming languages would have high quality implementations of something like SCTP...


> the socket API does not (really) have a function to flush the buffers and send everything out instantly, so you can use that after messages that require a timely answer.

I never thought about that but I think you're absolutely right! In hindsight it's a glaring oversight to offer a stream API without the ability to flush the buffer.


Yeah, I’ve always felt that the stream API is a leaky abstraction for providing access to networking. I understand the attraction of making network I/O look like local file access given the philosophy of UNIX.

The API should have been message oriented from the start. This would avoid having the network stack try to compensate for the behavior of the application layer. Then Nagel’s or something like it would just be a library available for applications that might need it.

The stream API is as annoying on the receiving end especially when wrapping (like TLS) is involved. Basically you have to code your layers as if the underlying network is handing you a byte at a time - and the application has to try to figure out where the message boundaries are - adding a great deal of complexity.


the whole point of TCP is that it is a stream of bytes, not of messages.

The problem is that this is not in practice quite what most applications need, but the Internet evolved towards UDP and TCP only.

So you can have message-based if you want, but then you have to do sequencing, gap filling or flow control yourself, or you can have the overkill reliable byte stream with limited control or visibility at the application level.


For me, the “whole point” of TCP is to add various delivery guarantees on top of IP. It does not mandate or require a particular API. Of course, you can provide a stream API over TCP which suits many applications but it does not suit all and by forcing this abstraction over TCP you end up making message oriented applications (e.g request /response type protocols) more complex to implement than if you had simply exposed the message oriented reality of TCP via an API.


TCP is not message-oriented. Retransmitted bytes can be at arbitrary offsets and do not need to align with the way the original transmission was fragmented or even an earlier retransmission.


I don’t understand your point here or maybe our understanding of the admittedly vague term “message oriented” differs.

I’m not suggesting exposing retransmission, fragmentation, etc to the API user.

The sender provides n bytes of data (a message) to the network stack. The receiver API provides the user with the block of n bytes (the message) as part of an atomic operation. Optionally the sender can be provided with notification when the n-bytes have been delivered to the receiver.


Is this a TCP API proposal or a protocol proposal?

Because TCP, by design, is a stream-oriented protocol, and the only out-of-band signal I'm aware of that's intended to be exposed to applications is the urgent flag/pointer, but a quick Google search suggests that many firewalls clear these by default, so compatibility would almost certainly be an issue if your API tried to use the urgent pointer as a message separator.

I suppose you could implement a sort of "raw TCP" API to allow application control of segment boundaries, and force retransmission to respect them, but this would implicitly expose applications to fragmentation issues that would require additional API complexity to address.


I think you're misunderstanding their point.

Your API is constrained by the actual TCP protocol. Even if the sender uses this message-oriented TCP API, the receiver can't make any guarantees that a packet they receive lines up with a message boundary, contains N messages, etc etc, due to how TCP actually works in the event of dropped packets and retransmissions. The receiver literally doesn't have the information needed to do that, and it's impossible for the receiver to reconstruct the original message sequence from the sender. You could probably re-implement TCP with retransmission behaviour that gives you what you're looking for, but that's not really TCP anymore.

This is part of the motivation for protocols like QUIC. Most people agree that some hybrid of TCP and UDP with stateful connections, guaranteed delivery and discrete messages is very useful. But no matter how much you fiddle with your code, neither TCP or UDP are going to give you this, which is why we end up with new protocols that add TCP-ish behaviour on top of UDP.


Fair enough - I didn’t really fully consider the effect of retransmission and segmentation on the receiver view.


> message oriented

Very well said. I think there is enormous complexity in many layers because we don't have that building block easily available.


It's the main reason why I use websockets for a whole lot of things. I don't wanna build my own message chunking layer on top of TCP every time.


WebSocket is full of web-tech silliness; you'd be better off doing your own framing.


Well, it also has the advantage of providing pretty decent encryption for free through WSS.

But yeah, where that's unnecessary, it's probably just as easy to have a 4-byte length prefix, since TCP handles the checksum and retransmit and everything for you.


It's just a standard TLS layer, works with any TCP protocol, nothing WebSocket-specific in it.

You should ideally design your messages to fit within a single Ethernet packet, so 2 bytes is more than enough for the size. Though I have sadly seen an increasing amount of developers send arbitrarily large network messages and not care about proper design.


Meh I've worked enough with OpenSSL's API to know that I never ever want to implement SSL over TCP myself. Better let the WebSocket library take care of it.


I think you could try to add the flat MSG_MORE to every send command and then do a last send without it to indirectly do a flush.


The socket API is all kinds of bad. The way streams should work is that, when sending data, you set a bit indicating whether it’s okay to buffer the data locally before sending. So a large send could be done as a series of okay-to-buffer writes and then a flush-immediately write.

TCP_CORK is a rather kludgey alternative.

The same issue exists with file IO. Writing via an in-process buffer (default behavior or stdio and quite a few programming languages) is not interchangeable with unbuffered writes — with a buffer, it’s okay to do many small writes, but you cannot assume that the data will ever actually be written until you flush.

I’m a bit disappointed that Zig’s fancy new IO system pretends that buffered and unbuffered IO are two implementations of the same thing.


TCP_CORK?


Sadly linux only (and apparently some BSDs). Would love to have more (and more generalized) tcp socket modes like that.


Something like sync(2)/syncfs(2) for filesystems.

Seems like there's been a disconnect between users and kernel developers here?


You are confusing UG with something like GbR. UG is basically a baby GmbH with constraints of the name of the company and how much money the associates can extract from it (until the company got 25k€ captial and can become a normal GmbH).


If you write cross platform software, you might want MSVC on Windows and gcc on linux and their assemblers masm and gas have very different syntax. nasm outputs object files for both directly. Also the syntax in the context of especially macros is often a little better.


Is is legally binding since the 80s in Germany. Just mostly nobody bothers to protest a occasional error and it would mostly affect the teenager who delivers the ads anyway.

https://dejure.org/dienste/vernetzung/rechtsprechung?Gericht...


It typically gets me some remorseful response and they actually manage to honor the signage for a year or two.

I also send nasty letters to parties who consider themselves exempt from that before elections (they're not).


> would mostly affect the teenager who delivers the ads anyway.

If their job is to litter, then tough luck.


GGA-DFT (+ some corrections) used here seems quite ok to me for this system. For more trust into this, I would like similar calculations with other methods to see how similar or different they are. LDA-DFT will most likely not be great (as in most cases), but I would be very interested in some DFT+GW calculations, even though LK99 might not be it's strength.


Atom is an order of magnitude more complex and strict standard by people who really love xml in contrast to the really simple and less strict rss 2.0. For example almost everything is optional in rss 2.0 so you can have a reasonable feed for stuff like tweets or linkblogs where there is no obvious title. In contrast atom enforces a title for every item which makes this a messy expirience.

I have implemented rss 2.0 parser faster then understanding the atom specification. Atom can do encode stuff like encode html inline the xml instead of as a CDATA string. In theory this sounds great, but is ends up in a big mess of complexity (e.g. a blogpost with handwritten invalid html).

These days there is also JSONFeed which is really easy to parse, simple and flexible, but it is not supported everywhere yet.


The RSS 2.0 spec has a horrible flaw:

https://www.rssboard.org/rss-2-0-1-rv-6#hrelementsOfLtitemgt

> An item may also be complete in itself, if so, the description contains the text (entity-encoded HTML is allowed; see examples)

Note "is allowed", not "is required". This caused SO MANY problems back in the day, because the spec didn't clarify if you should or should not include HTML in that element - and there was no way of telling, when parsing a feed, if the author was in the "entity-encoded HTML" or "YOLO and just stick plain text in there" camp.

IIRC, Atom came about precisely because the RSS specifications didn't provide the level of detail needed for a spec to be truly interoperable.


In practice, the description is universally considered to be html encoded. Everything is decoded. If you try to stick unencoded html in there it gets rendered as text. If you really don’t want to encode you can stick it in CDATA and it will just work per the xml spec. I’m trying to remember what the downside of this approach is - I think maybe it kept people from sticking unencoded ampersands in plain text or something.

But I think it’s worth noting that a cultural tradition emerged that papered over the flawed spec. I think that is actually pretty common with specs, even if the rss2 one is extra loose.

Maybe having a correct spec isn’t everything.


> But I think it’s worth noting that a cultural tradition emerged that papered over the flawed spec.

Back in 2013 a developer of a feed crawler wrote a selection of things people get wrong with their feeds.

https://inessential.com/2013/03/18/brians_stupid_feed_tricks


This is definitely a good example of "RSS culture."

(This particular one isn't papering over flaws in the spec, many of thse are advising against doing things that violate either RSS or XML spec, or are subjective opinions additive to the spec (e.g. always have a datetime). But ya this is basically what I mean.)


Atom is not a magnitude more complex or strict. It has _two_ places where it requires something even slightly onerous, and that's in the summary and content elements, where, shockingly, it allows you to specify if the content is XHTML, HTML, or text, and for HTML, it's just a matter of escaping the contents or putting it in CDATA. That's it.

I don't know what you're doing that RSS 2.0 is somehow faster to parse than Atom. I've written parsers for both over the past twenty years with a negligible difference between the two besides the fact that the RSS feeds often need hacks. I've also wrote a whole bunch of blog and linkblog backends that produce Atom feeds, and have never and issue with any. Let's look at the required elements of an entry: updated, title, id. Nothing remotely onerous there. In fact, it's purposely minimal, more minimal than RSS. And in RSS 2.0, title is a required element (because if something it's explicitly noted as optional in the RSS 2.0 spec, it's assumed to be optional).

In my personal linklog, I use the title of the target page of the link as the title, because it's the sensible option. With tweets, you have half a point. Only half a point, because title is required, but Twitter also post-dates the early 2000s considerably. But here's the thing: 'title' is required in RSS and Atom, but there's nothing saying it can't be empty. I know, I've blown your mind!

And then there's JSONFeed, which, of course, can somehow gracefully cope with people dropping '"' in random parts of the file because people generate JSON files like that by hand, right?

Right?

Just like they write RSS and Atom feeds, right?

Right?


> I have implemented rss 2.0 parser faster then understanding the atom specification. Atom can do encode stuff like encode html inline the xml instead of as a CDATA string. In theory this sounds great, but is ends up in a big mess of complexity (e.g. a blogpost with handwritten invalid html).

The same thing can also happen in RSS feeds (and JSON Feeds): Entity-encoded HTML strings or CDATA HTML strings do not have any guarantee of well-formed-ness. The direct embedding of XHTML into Atom as namespaced elements just surfaces potential invalid markup higher up.


> The same thing can also happen in RSS feeds […]: Entity-encoded HTML strings or CDATA HTML strings do not have any guarantee of well-formed-ness.

I wrote a podcast validator, and I don't think that's true — every RSS feed must be "well-formed" XML.

(Note that all "valid" XML documents are "well-formed", but "well-formed" XML documents are not necessarily "valid".)


I was talking about the (X)HTML in that RSS feed and its well-formed-ness.

In a perfect world people would construct their XML documents with an API which guarantees that the generated serialisation is a well-formed XML document. E.g. the API guarantees that the element tree is nested, that namespaces are declared and that the serialiser escapes any text nodes. Then people could add their well-formed XHTML fragments as a child to <atom:content type="xhtml"> and then serialise the whole document, guaranteeing well-formed-ness across namespaces.

In practice people have a tagsoup string from their data store which they concatenate inside their RSS template in <description>. If you’re lucky, they replace "<" and "&" beforehand or do the CDATA thing. But in XML terms that is just a string, not well-formed markup.


Interesting, thank you. Every podcast RSS feed (a tiny subset of RSS feeds) I've seen in the wild is well-formed in the strict XML sense, so the tagsoup problem must be more endemic on the text syndication side.


I can imagine that that is potentially a result of Apple’s dominant podcast directory. Podcasters submit their feeds to Apple’s Podcast Connect, which I think flags warnings and errors. Other forms of feed don’t have that big motivation to validate.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: