I found the script they used for copying data really interesting: https://gist.g...

teej · on April 19, 2022

For ETL out of Postgres, it is very hard to beat psql. Something as simple as this will happily saturate all your available network, CPU, and disk write. Wrapping it in Python helps you batch it out cleanly.

    psql -c "..." | pigz -c > file.tsv.gz

mrbabbage · on April 19, 2022

Thanks Simon! I can indeed confirm that this script managed to saturate the database's hardware capacity (I recall CPU being the bottleneck, and I had to dial down the parallelism to leave some CPU for actual application queries).

SahAssar · on April 19, 2022

Sounds to me like this is the exact thing that the normal parallel command was made for, not sure python is needed here if the end result is shelling out to os.system anyway.