Using short lived Postgres servers for testing

whalesalad · on June 27, 2024

Echoing the other comment here, https://testcontainers.com/ is a great tool for this. (Or just vanilla docker in general)

In our core makefile we have `make pytest` to run standalone unit tests, and then `make pytest_db` to run full database tests using a docker container. The test suite fires up a completely fresh DB, runs migrations against it, and then the test suite proceeds as usual. On a per-module basis a simple fixture is used to determine if the db should get reset for each test, each file, or the entire module.

Works great on Github actions, too.

When the test is done, the container and all noise is automatically cleaned up. No state left behind.

JBorrow · on June 27, 2024

Yes, I use this kind of setup in a number of places. I am also a fan of running the full test suite using SQLite (if you are using a compatible ORM) to make sure where you are (or not) running into DB-specific behavior.

esafak · on June 27, 2024

With testcontainers you can use exactly the same database you do in production.

pvorb · on June 27, 2024

If your application only uses one database in production, you probably don't want to use an ORM, because that might prevent you from using all of your db’s features.

esafak · on June 27, 2024

The testcontainer is agnostic to how you use it; it's a real docker container.

JBorrow · on June 28, 2024

Sure, but you may want to use different databases as your application may be expected to be deployed in different environments. I like to keep some of my apps db-agnostic, so I test with postgres in a testcontainer and sqlite outside of one to ensure consistency.

DasIch · on June 27, 2024

> On a per-module basis a simple fixture is used to determine if the db should get reset for each test, each file, or the entire module.

That sounds interesting. How do you determine, if the db should get reset?

whalesalad · on June 27, 2024

At this time we basically reset the db after each module run, but there is some flexibility.

Inside a test module, the developer has the ability to dicate whether the entire module should share the same db state, or if each individual test should get clean slate.

Here is an example of our `test/db/conftest.py` file that is auto-loaded when the specific db test dir is invoked. This essentially bootstraps everything: https://gist.github.com/whalesalad/6ecd284460ac3836a6c2b9ca8...

The main fixture is `database` which is set to autouse at the session level. This produces one db for the entire test session. This could easily be augmented to be scoped to the module or even function level.

You will see `truncate_db` which is currently set to autouse at the module level, so the tables are truncated after each module is finished. This can be totally customized for your own purposes.

I combined a few files to produce this example, but I still think it is pretty concise. We are using PostgreSQL 15.x at this time. Our migrations are perhaps unconventional - we just have a directory in the root called `migrations/` with files like 001_foo.sql, 002_bar.sql ... and we manually run them on deploys. So those same migrations get run sequentially all at once on boot.

The part here that would need to be modified for others are the `service.db` monkeypatching parts. That is our core DB module that everything else utilizes for grabbing a psycopg2 conn from the pool. The test monkeypatches these methods so that the rest of the codebase 'just works' and is handed this conn from the test container versus the traditional one.

pletnes · on June 27, 2024

Do you populate with test data, and if so, how?

vhodges · on June 27, 2024

Yes, when the container is started you get the db url connect string, you connect to it, load the schema and any test data/seeds.

For me I just run the ddl (sitting in a .sql file) and run my tests. They run with an empty db but I could have sample data too the same way.

canadiantim · on June 27, 2024

Seems like a great setup, thanks for sharing.

boustrophedon · on June 27, 2024

Related self-promotion: I built pgtemp (https://github.com/boustrophedon/pgtemp) to automate doing exactly these mkdir/initdb/load/destroy steps.

rkaveland · on June 27, 2024

Author of the blog post here -- that looks like exactly what I needed, so I'm probably going to add a dependency to it to https://github.com/kaaveland/eugene/ so I can delete a ton of code. :-)

That looks fantastic, so I'm actually just going to put a link to it in the post so that more people see it.

boustrophedon · on June 27, 2024

Thanks! Feel free to file an issue if there's something missing from the library or the daemon.

switchbak · on June 27, 2024

There's a similar JVM version here: https://github.com/zonkyio/embedded-postgres

It's well maintained and seems to do what it says on the tin. I start it up once across all the suites (managing DB state as required in the suite) and the overhead is very minimal.

suralind · on June 27, 2024

I started to use testcontainers for it, however my biggest problem is that each test needs to start its own container which is slow (right now I have about 20 test cases in my PoC).

testcontainers allows to stop and start containers and to reset their state to a snapshot, but Go implementation seems to have a bug which I reported recently [1]. It's going to be huge when they fix it, as it will make tests much faster.

[1]: https://github.com/testcontainers/testcontainers-go/issues/2...

sakras · on June 27, 2024

Disclaimer: I work at Neon

Neon makes this very ergonomic in a cloud setting (actually it works somewhat similarly to how the article does it). You can create a copy-on-write version of your prod database (a branch) and use it for testing. There's also some automation we've built up for doing it from GitHub CI: https://neon.tech/docs/guides/neon-github-app

horse666 · on June 27, 2024

I’d not heard of Neon, it looks interesting.

When you say “a copy-on-write version of your prod database”, from reading the docs, this includes both schema and data?

I couldn’t find anything on data anonymisation/obfuscation - how do you handle that?

sakras · on June 27, 2024

Yep - it's an exact copy at the page level, so _everything_ is copied.

Regarding data anonymization, you can use the pg_anon extension on your branch. There's a blog post about it: https://neon.tech/blog/easily-anonymize-production-data-in-p...

Scramblejams · on June 28, 2024

Link is currently 404.

horse666 · on June 28, 2024

Until Neon can fix that, I found it on the Wayback Machine - https://web.archive.org/web/20240524181543/https://neon.tech...

ajc23 · on June 28, 2024

you can use Neon + Neosync on your branches: https://neon.tech/docs/introduction

treybrick · on July 1, 2024

Disclaimer: I work at Tonic AI

We have a product called Ephemeral (in some ways similar to Neon) that orchestrates a kubernetes cluster to spin up and down ephemeral databases from "snapshots" for this exact use case.

We have a more well established product called Structural that does de-identification and anonymization from production data, and we have a pretty clean integration between the two products to make creating your fleet of ephemeral testing databases nice and easy.

In case you're interested -> https://www.tonic.ai/ephemeral

iteratorx · on June 28, 2024

You can use Bunnyshell Ephemeral environments for this (bunnyshell.com). - The Postgres instances will be deployed in Kubernetes (bring your own infra if you want), - you can configure seeding, - and also you can run any other containers next to Postgres (for example E2E tests). - You can trigger the process manually or automatically (PR, GH actions, cli tool, API, etc).

For example, the ArgoCD project uses Bunnyshell to create ephemeral envs on each PR. https://github.com/argoproj/argo-cd/pull/18851

Disclaimer: I work for Bunnyshell

_DeadFred_ · on June 27, 2024

Needs a reminder warning that you need scripts to massage your data in any test database if you have compliance requirements (HIPAA, PCI, customer confidentiality agreements, etc). It's always the worst having all kinds of restrictions that limit productivity in order to meet compliance only to get a self own by a compliance scan picking up a random test database hanging out somewhere.

twh270 · on June 27, 2024

If you work in the JVM ecosystem, I can recommend (Java/Kotlin) + Quarkus + JDBI with Postgres. Quarkus supports testcontainers out of the box for tests, and it just plain works.

I don't have experience with it, but some people also recommend jooq.

esafak · on June 27, 2024

I use jooq with testcontainers in Kotlin just fine.

esafak · on June 27, 2024

I'd eventually like to use dagger.io so I can use the same technology in testing, CI, and CD, but it is not mature enough yet, so I'm using testcontainers for the testing part, with success.

ludamn · on June 27, 2024

For those interested in a easier way to set up databases, and maybe other containerized services, for testing I recommend using TestContainers[1], there's integrations for well known languages like Go, Java, Node.js and the set up is easier than what is described in the article (at least if you're already developing on a container-ready workspace)

[1]: https://testcontainers.com/