Tweets are fanned out to more than a single feed and, in the "most important" ca...

apaprocki · on July 9, 2013

Securities work a similar same way -- When you're looking at your portfolio you're only directly monitoring the individual fields for the securities in your portfolio, not the entire firehose. The OPRA feed is an equivalent to the raw Twitter firehose, only much fatter. Once you integrate it into the last-mile display to the users, you're doing all the same things. This is typically done with multicast topic-style subscriptions by ticker / type of data. It makes sense that if Lady Gaga writes a tweet you only want it to be "1 timeline delivery" (multicast) as opposed to "31 million timeline deliveries", which is making your infrastructure do a lot more work. Granted, you're kind of limited by what the browser can do in this regard, so you're kind of stuck with the socket model.

thezilch · on July 9, 2013

I think we're conflating topics. While the Firehose can be done with some combination of Pub-Sub/Topic/Sub-Topic Fanout, it should have little to do with the QPS for Timeline fanouts. I'd imagine the Firehose footprint is a really small part of their architecture or throughput pains. Timeline is a per-user join on multiple graphs and with paging.

jacques_chester · on July 9, 2013

Securities feeds are, I suspect, more write-dominant and have a much lower fanout rate. They also don't have the outlier-retweeting-outlier or outlier-tweeting-@-outlier problems.

apaprocki · on July 9, 2013

Yes, much more write-dominant. Apps which are built on top of the feeds can create issues, though. E.g., An obscure ticker pasted into a chat room with 500 people instantly starts monitoring in all their windows -- not just the static price (equiv to a RT), but the live feed. That would be as if A RT'd B and suddenly all of A's followers added B to their timeline. The degree to which that happens depends on how many features like that are integrated into the app.

granitepail · on July 9, 2013

I have no idea why you're getting so much pushback. Products like those offered by Bloomberg ingest and distribute massive amounts of data in real time, as well. The comparison is completely suitable.

While their products for investment managers are mostly run off of persistent databases, the trader terminals rely on a high-volume, nebulous fan out. For many traders, a five second latency is unacceptable.

Incredibly interesting talk and a good write up. Twitter continues to impress! I was surprised to see Redis playing such a critical role, too.

jacques_chester · on July 9, 2013

Sometimes the pushback is right, sometimes it's wrong. He's been very good at explaining the different and heavy requirements of financial data feeds.