I am the author of this blog post and I didn't expect to see it on the front pag...

austin-cheney · 2024-12-17T08:38:22 1734424702

It sounds like most of the answer suggested by the paper is asynchronous IO, so maybe I am misunderstanding something.

There is a lot, I mean A LOT as in huge and tremendous amount, of overhead in managing data via any form of SQL versus just writing to files. The overhead pays for itself if the size of the data is large enough and the cost of read and write operations is high enough.

Given those factors couldn't similar performance improvements be achieved at far lower cost by piping data via streams to opened files using an asynchronous interface like an event loop or child processes? That would eliminate the blocking of synchronous operations and so much of the CPU overhead associated with query interpretation during writes. There would still be a cost to precise data extraction at read time though.

If just using file system operations all operational overhead only occurs at execution time. For example managing and reading data still incurs CPU cost, but there is virtually no management cost to replicating a database if that replication is just a matter of copying files as opposed to the more complex operations concerned with replicating a SQL database.