Can I describe a data queueing problem that I feel like there is a specific data...

zbobet2012 · on July 22, 2022

There are a variety of solutions to this but CRDTs are a very good one (CRDTs solve your problem (https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...). If the operations you're doing commute (that is a ○ b = b ○ a, e.g. the order in which you apply the operations doesn't matter) then one could apply all operations in parallel, and only send the final result of doing them all. Casandra uses LWW-Element-Set CRDTS to solve this exact problem (https://cassandra.apache.org/doc/latest/cassandra/architectu...).

dabears · on July 22, 2022

Ideally you have a reliable Change Data Capture (CDC) mechanism like a Binlog Reader. Debezium, for example, can write directly to a queue like Kafka. A Kafka consumer picks up the events and writes to your secondary datastore. Something like that can probably handle all of your events without bucketing them, but if you want to cut down the number of messages written to the queue you can add that logic into the Binlog Reader so it emits a burst every 5 seconds or so. During those 5 seconds it buffers the messages in the processes memory or externally in something like Redis using a key so only the latest message is stored for a given record.

nl · on July 22, 2022

If you want everything to be within 10 seconds, then you build a state-change tracker (which only tracks all state changes since last update) and then you send the updates every 10 seconds.

Don't worry about debouncing - the state tracker should handle representing the 300 updates as a single state change, and if there are more then they just get sent in the next update 10 seconds later.

js2 · on July 22, 2022

Sentry used Redis to buffer writes for a similar use case:

https://blog.sentry.io/2016/02/23/buffering-sql-writes-with-...

8note · on July 22, 2022

I've seen this done with kinesis streams.

Basically just update a cache, and forward the results every so often.

If you put things in a stack, you can keep using the most recent for the update. Compare times and you won't add a bad value