Hacker News new | past | comments | ask | show | jobs | submit login

I'm not the commenter, but yes, often trading firms record all order gateway traffic to from brokers or exchanges at the TCP/IP packet level, in what are referred to as "pcap files". Awkwardly low-level to work with, but it means you know for sure what you sent, not what your software thought it was sending!



The ultimate source of truth about what orders you sent to the exchange is the exact set of bits sent to the exchange. This is very important because your software can have bugs (and so can theirs), so using the packet captures from that wire directly is the only real way to know what really happened.


But then the software capturing, storing and displaying the packets can also have bugs.


Among all the software installed in a reputable Linux system, tcpdump and libpcap are some of the most battle tested pieces one can find.

Wireshark has bugs, yes. Mostly in the dissectors and in the UI. But the packet capture itself is through libpcap. Also, to point out the obvious: pcap viewers in turn are auditable if and when necessary.


Cisco switches can mirror ports with a feature called Switch Port Analyzer (SPAN). For a monitored port, one can specify the direction (frames in, out, or both), and the destination port or VLAN.

SPAN ports are great for network troubleshooting. They're also nice for security monitors, such as an intrusion detection system. The IDS logically sees traffic "on-line," but completely transparent to users. If the IDS fails, traffic fails open (which wouldn't be acceptable in some circumstances, but it all depends on your priorities).


When I think "Cisco" I think error-free. /s

No, really, I get where you and your parent are coming from. It is a low probability. But occasionally there is also thoroughly verified application code out there. That is when you are asking yourself where the error really is. It could be any layer.


They can, but it’s far less likely to be incorrect on the capture side. They are just moving bytes, not really doing anything with structured data.

Parsing the pcaps is much more prone to bugs than capturing and storing, but that’s easier to verify with deserialize/serialize equality checks.


The result of bitter lessons learnt I'm sure. Lessons the fintechs have not learned.


That makes sense - but it's still somewhat surprising that there's nothing better. I guess that's the equivalent of the modern paper trail.


It’s the closest to truth you can find (the network capture, not the drop copy). If it wasn’t on the network outbound, you didn’t send it, and it’s pretty damn close to an immutable record.


It makes sense. I'm a little surprised that they'd do the day to day reconciliation from it but I suppose if you had to write the code to decode them anyway for some exceptional purpose, you might as well use it day to day as well.


The storage requirements of this must be impressive


Storage is cheap, and the overall figures are not that outlandish. If we look at a suitable first page search result[0], and round figures up we get to about 700 GB per day.

And how did I get that figure?

I'm going to fold pcap overhead into the per-message size estimate. Let's assume a trading day at an exchange, including after hours activity, is 14 hours. (~50k seconds) If we estimate that during the highest peaks of trading activity the exchange receives about 2M messages per second, then during more serene hours the average could be about 500k messages per second. Let's guess that the average rate applies 95% of the time and the peak rate the remaining 5% of the time. That gives us an average rate of about 575k messages per second. Round that up to 600k.

If we assume that an average FIX message is about 200 bytes of data, and add 50 bytes of IP + pcap framing overhead, we get to ~250 bytes of transmitted data per message. At 600k messages per second, 14 hours a day, the total amount of trading data received by an exchange would then be slightly less than 710GB per day.

Before compression for longer-term storage. Whether you consider the aggregate storage requirements impressive or merely slightly inconvenient is a more personal matter.

0: https://robertwray.co.uk/blog/the-anatomy-of-a-fix-message


And compression and deduplication should be very happy with this. A lot of the message contents and the IP/pcap framing overheads should be pretty low-entropy and have enough patterns to deduplicate.

It could be funny though because you could be able to bump up your archive storage requirements by changing an IP address, or have someone else do that. But that's life.


Why? They're not streaming 4k video, it's either text protocol or efficient binary protocols.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: