I'm not the commenter, but yes, often trading firms record all order gateway tra...

pclmulqdq · 2024-11-29T14:59:26 1732892366

The ultimate source of truth about what orders you sent to the exchange is the exact set of bits sent to the exchange. This is very important because your software can have bugs (and so can theirs), so using the packet captures from that wire directly is the only real way to know what really happened.

generic92034 · 2024-11-29T16:15:04 1732896904

But then the software capturing, storing and displaying the packets can also have bugs.

bostik · 2024-11-29T16:52:50 1732899170

Among all the software installed in a reputable Linux system, tcpdump and libpcap are some of the most battle tested pieces one can find.

Wireshark has bugs, yes. Mostly in the dissectors and in the UI. But the packet capture itself is through libpcap. Also, to point out the obvious: pcap viewers in turn are auditable if and when necessary.

wjholden · 2024-11-29T19:14:08 1732907648

Cisco switches can mirror ports with a feature called Switch Port Analyzer (SPAN). For a monitored port, one can specify the direction (frames in, out, or both), and the destination port or VLAN.

SPAN ports are great for network troubleshooting. They're also nice for security monitors, such as an intrusion detection system. The IDS logically sees traffic "on-line," but completely transparent to users. If the IDS fails, traffic fails open (which wouldn't be acceptable in some circumstances, but it all depends on your priorities).

generic92034 · 2024-11-30T09:34:58 1732959298

When I think "Cisco" I think error-free. /s

No, really, I get where you and your parent are coming from. It is a low probability. But occasionally there is also thoroughly verified application code out there. That is when you are asking yourself where the error really is. It could be any layer.

kortilla · 2024-11-29T21:15:17 1732914917

They can, but it’s far less likely to be incorrect on the capture side. They are just moving bytes, not really doing anything with structured data.

Parsing the pcaps is much more prone to bugs than capturing and storing, but that’s easier to verify with deserialize/serialize equality checks.

chairmansteve · 2024-11-29T19:09:46 1732907386

The result of bitter lessons learnt I'm sure. Lessons the fintechs have not learned.

baq · 2024-11-29T16:01:16 1732896076

That makes sense - but it's still somewhat surprising that there's nothing better. I guess that's the equivalent of the modern paper trail.

HolyLampshade · 2024-11-29T22:26:23 1732919183

It’s the closest to truth you can find (the network capture, not the drop copy). If it wasn’t on the network outbound, you didn’t send it, and it’s pretty damn close to an immutable record.

ajb · 2024-11-29T22:48:24 1732920504

It makes sense. I'm a little surprised that they'd do the day to day reconciliation from it but I suppose if you had to write the code to decode them anyway for some exceptional purpose, you might as well use it day to day as well.

thomasjudge · 2024-11-29T16:09:50 1732896590

The storage requirements of this must be impressive

bostik · 2024-11-29T18:01:47 1732903307

Storage is cheap, and the overall figures are not that outlandish. If we look at a suitable first page search result[0], and round figures up we get to about 700 GB per day.

And how did I get that figure?

I'm going to fold pcap overhead into the per-message size estimate. Let's assume a trading day at an exchange, including after hours activity, is 14 hours. (~50k seconds) If we estimate that during the highest peaks of trading activity the exchange receives about 2M messages per second, then during more serene hours the average could be about 500k messages per second. Let's guess that the average rate applies 95% of the time and the peak rate the remaining 5% of the time. That gives us an average rate of about 575k messages per second. Round that up to 600k.

If we assume that an average FIX message is about 200 bytes of data, and add 50 bytes of IP + pcap framing overhead, we get to ~250 bytes of transmitted data per message. At 600k messages per second, 14 hours a day, the total amount of trading data received by an exchange would then be slightly less than 710GB per day.

Before compression for longer-term storage. Whether you consider the aggregate storage requirements impressive or merely slightly inconvenient is a more personal matter.

0: https://robertwray.co.uk/blog/the-anatomy-of-a-fix-message

tetha · 2024-11-29T18:28:47 1732904927

And compression and deduplication should be very happy with this. A lot of the message contents and the IP/pcap framing overheads should be pretty low-entropy and have enough patterns to deduplicate.

It could be funny though because you could be able to bump up your archive storage requirements by changing an IP address, or have someone else do that. But that's life.

oblio · 2024-11-30T07:26:30 1732951590

Why? They're not streaming 4k video, it's either text protocol or efficient binary protocols.