Does adding timestamps not handle this case?

jurre · on Oct 2, 2020

Now you need to queue up events for some time, reorder them using the timestamp, and then process them. It’s possible, but has overhead in both performance and custom code you’ll have to maintain. If there is no guarantee of order, two separate systems consuming those same events also might get different results, depending on the implementation, that can be problematic

sumtechguy · on Oct 2, 2020

For a single process on one box with one thread you can use something like that.

If you involve more than 1 box that goes out the window. Sometimes you can still get 'one timestamp' by making something else the owner of the timestamp. It also depends on your resolution of time and the process that does the ingesting. For example if that ingest process has more than one thread to handle things you can still get out of order/sametimestamp if not coded correctly.

jedmeyers · on Oct 2, 2020

If events are generated by different processes you cannot really guarantee that time is exactly the same for them, unless you do something fancy to ensure that.

zaphirplane · on Oct 2, 2020

Interesting. The ordering here is when the event was generated or when the event entered the queue ? I think the later and so I think the examples here don’t apply without something on top and a trade off

jedmeyers · on Oct 2, 2020

The queue entrypoint is not always the same process either, especially in a system like Pub/Sub.

GordonS · on Oct 2, 2020

With a single producer and consumer, yes - but of course that's seldom the case.

With multiple producers and consumers, clock skew would be an issue, with the time on different machines being off from each other slightly.

One option is to use a single source for generating IDs, but that introduces another failure point, and comes at a hefty performance cost.

dragonwriter · on Oct 2, 2020

> Does adding timestamps not handle this case?

If you have one message source (a single thread or some kind of coordination), and the messages have lower frequency than the timestamp resolution, yes.

The farther you get from that, the more the answer is no.

amelius · on Oct 2, 2020

Just make the source part of the timestamp?

(1,A) < (1,B) < (2,A) etc.

Also use serial numbers instead of true timestamps.

rsynnott · on Oct 2, 2020

That way lies madness, and/or eventually accidentally writing your own Dynamo-paper database.

valleyjo · on Oct 2, 2020

That won’t work in all cases. For instance, if you get messages from devices which can be reimaged they may have clock skew in a period of time before they’re synchronized again.

easytiger · on Oct 2, 2020

But in any case you can't rely on the order of message ingress to your system to represent anything meaningful either? It would have to ensure that the key for defining order would have some hard logical ordering purpose for which time is not relevant or useful.

jlokier · on Oct 2, 2020

The order of message ingress can still be meaningful even if device clocks are skew or jump due to rebooting, reimaging, network time sync, frequency drift, etc.

A hard logical order arises from interactions. E.g. if the device receives a message, does something locally, goes through a clock change, and then sends a message dependent on one it fetched earlier, that's a logical order with out-of-order clock.

Or if a device gets a message, processes, sends something to another device, that one processes too then sends another message back to the original source, there's a logical order but with three different clocks. Even if the clocks are synchronised, there will be some drift and the messages may be processed fast enough that the drift puts their timestamps out of order.

easytiger · on Oct 2, 2020

That guarantee/assumption can never really be made.

Message A on event a' from system a might be sent to system b effecting event b' and thus message B to be sent by <insert medium here>, and consumed and correlated by consumer software MC on hardware mc.

However system B might take longer to flush it's hardware/software buffer and the message arrives at mc before message A, for example.

I've encountered this many times. That data has no meaning in itself except in the meta.

> Even if the clocks are synchronised, there will be some drift and the messages may be processed fast enough that the drift puts their timestamps out of order.

If you are consuming from sources which you cannot control the accuracy of the clocks, then you must inherently either reduce your reliance on the need for accuracy (many Windows systems have horrendous clock discipline in their software implementation) or find a way of ensuring accuracy. E.g. close proximity NTP or close proximity PTP etc etc.

Hope that makes sense.

jlokier · on Oct 2, 2020

I think you and I are agreeing, but it's not obvious ;-)

> However system B might take longer to flush it's hardware/software buffer and the message arrives at mc before message A, for example.

There are two message As in your system, the one sent to system b, and the one consumed by hardware mc. Let's call them Ab and Amc.

In that situation, message B is a consequence of message Ab which can be tracked, and at system mc (depending on semantics) it might be necessary to use that tracking to process message Amc before message B, at least logically.

For example message Ab/Amc might be "add new user Foo the our database with standard policy", and system B might react with "ok, my job is to set the policy for new users, I'll tell mc to add permission Bar to user Foo then activate Foo".

That works out fine as long as the logical order relation is maintained, and system mc, the database, processes message Amc first, regardless of arrival order.

Dependency tracking can ensure that (without clocks or timestamps), even if messages are transmitted independently. For example by message B containing a header that says it comes logically after message A.

The pubsub queue can also guarantee that order (without clocks or timestamps or dependency tracking), provided all messages go through the same pubsub system, and Ab+Amc are fanned-out by the pubsub system rather than sent independently by A to each destination. All bets are off if Ab and Amc take other routes.

> If you are consuming from sources which you cannot control the accuracy of the clocks, then you must inherently either reduce your reliance on the need for accuracy (many Windows systems have horrendous clock discipline in their software implementation) or find a way of ensuring accuracy. E.g. close proximity NTP or close proximity PTP etc etc.

If you think Windows is bad, try just about any cloud VM, which has a virtual clock and is stalled all the time in the guest, including just after the guest reads its virtual clock and before using the value :-)

I prefer systems which rely on logical ordering guarantees as much as possible, so clock drift doesn't matter.

When you are rely on a global clock order to ensure correct behaviour, you have to slow down some operations to accommodate for clock variance across the system, as well as network latency variance (because it affects clock synchronisation).

If you rely on logical order, then there's no need for time delays; no upper speed limit. Instead you have to keep track of dependencies, or have implicit causal dependencies. And it's more robust on real systems because clock drift and jumps don't matter.

In practice you need some clock dependence for timeouts anyway, and there are protocol advantages when you can depend on a well synchronised clock. So I prefer to mix the two for different layers of the protocol, to get advantages of both. Unfortunately for correct behaviour, timeouts are often the "edge case" that isn't well tested or correctly implemented at endpoints.