what is the state-of-the-art posix bench tool in your opinion? I'm aware of things like gatling but not sure where on the spectrum it sits in terms of features or popularity
I don't think anything has the flexibility of tsung to be honest. It can test many different protocols already. A better way would probably be to optimize it a bit for lower memory usage.
For web server benchmarking, only wrk2 by Gil Tene does things correctly. Everything else usually does coordinated omission:
Imagine you have 10.000 connections. Each connection is doing 3 req/s. Let's say one connection blocks for 1 second, which means that 2 req's should have fired on that connection "in between". wrk2 will count those two as being "late" whereas most other load generators won't count at all. This means a framework can opt to "stall" some connections in order to get better performance and fewer bad results in the upper latencies.
As an example, here are the Erlang/Cowboy numbers for such a test in wrk2:
Note how the median latency and the 75th percentile is better for Haskell, but that it occasionally stalls requests for quite some time, probably due to a GC pause or some other cleanup that happens and then in an unfortunate moment all has to happen at the same point in time.
If you go look at typical benchmarks their latency reporting is way off compared to this, which is a surefire way of knowing they did not account for coordinated omission.
Mind, when benchmarks disagree, the trick is to explain why this happens. It often leads to an insight in design difference.