Measuring "Twitter Scale" by tweets per second seems to be not how I would measure it.
Updates per second to end users who follow the 7K tweets per second seems more realistic, it's the timelines and notifications that hurt, not the top of ingest tweets per second prior to the fan out... and then of course it's whether you can do that continuously so as not to back up on it.
That's why we're saying "at 403 fanout". The bottleneck of Mastodon/Twitter is timeline writes, which is posts/second multiplied by the average number of followers per post. So our instance is doing 1.4M timeline writes / second.
Another important metric is "time to deliver to follower timelines", which is tricky due to how much variance there can be every second due to the extremely unbalanced social graph. When someone with 20M followers posts, that multiples the number of needed timeline writes by 15x. We went into depth in our post on how we handled that to provide fairness by preventing these big users from hogging all the resources all at once.
I heard somewhere that one of the particular challenges of Twitter's scale is not the average fanout, but the outliers where millions or tens of millions of users follow a single account. Does your simulation take that into account?
Updates per second to end users who follow the 7K tweets per second seems more realistic, it's the timelines and notifications that hurt, not the top of ingest tweets per second prior to the fan out... and then of course it's whether you can do that continuously so as not to back up on it.