if httperf is using select() with 65k odd sockets then that could be a bottleneck...
the "dumb" C impl can be faster too!
it should probably fork() a few times so multiple accept()s can fight over the socket, which should probably be put into in non-blocking mode, and should also turn nagle off.
for even more points you can reduce the copying of the trivial response from userspace into the kernel using sendfile()/splice(), if you mlock() it into RAM first!
the printf likely reduces the throughput by a large amount too!
(I've spent far too much fiddling with various syscalls for synthetic benchmarks!)
the "dumb" C impl can be faster too! it should probably fork() a few times so multiple accept()s can fight over the socket, which should probably be put into in non-blocking mode, and should also turn nagle off.
for even more points you can reduce the copying of the trivial response from userspace into the kernel using sendfile()/splice(), if you mlock() it into RAM first!
the printf likely reduces the throughput by a large amount too!
(I've spent far too much fiddling with various syscalls for synthetic benchmarks!)