Echoing the other comment here, https://testcontainers.com/ is a great tool for this. (Or just vanilla docker in general)
In our core makefile we have `make pytest` to run standalone unit tests, and then `make pytest_db` to run full database tests using a docker container. The test suite fires up a completely fresh DB, runs migrations against it, and then the test suite proceeds as usual. On a per-module basis a simple fixture is used to determine if the db should get reset for each test, each file, or the entire module.
Works great on Github actions, too.
When the test is done, the container and all noise is automatically cleaned up. No state left behind.
Yes, I use this kind of setup in a number of places. I am also a fan of running the full test suite using SQLite (if you are using a compatible ORM) to make sure where you are (or not) running into DB-specific behavior.
If your application only uses one database in production, you probably don't want to use an ORM, because that might prevent you from using all of your db’s features.
Sure, but you may want to use different databases as your application may be expected to be deployed in different environments. I like to keep some of my apps db-agnostic, so I test with postgres in a testcontainer and sqlite outside of one to ensure consistency.
At this time we basically reset the db after each module run, but there is some flexibility.
Inside a test module, the developer has the ability to dicate whether the entire module should share the same db state, or if each individual test should get clean slate.
The main fixture is `database` which is set to autouse at the session level. This produces one db for the entire test session. This could easily be augmented to be scoped to the module or even function level.
You will see `truncate_db` which is currently set to autouse at the module level, so the tables are truncated after each module is finished. This can be totally customized for your own purposes.
I combined a few files to produce this example, but I still think it is pretty concise. We are using PostgreSQL 15.x at this time. Our migrations are perhaps unconventional - we just have a directory in the root called `migrations/` with files like 001_foo.sql, 002_bar.sql ... and we manually run them on deploys. So those same migrations get run sequentially all at once on boot.
The part here that would need to be modified for others are the `service.db` monkeypatching parts. That is our core DB module that everything else utilizes for grabbing a psycopg2 conn from the pool. The test monkeypatches these methods so that the rest of the codebase 'just works' and is handed this conn from the test container versus the traditional one.
Author of the blog post here -- that looks like exactly what I needed, so I'm probably going to add a dependency to it to https://github.com/kaaveland/eugene/ so I can delete a ton of code. :-)
That looks fantastic, so I'm actually just going to put a link to it in the post so that more people see it.
It's well maintained and seems to do what it says on the tin. I start it up once across all the suites (managing DB state as required in the suite) and the overhead is very minimal.
I started to use testcontainers for it, however my biggest problem is that each test needs to start its own container which is slow (right now I have about 20 test cases in my PoC).
testcontainers allows to stop and start containers and to reset their state to a snapshot, but Go implementation seems to have a bug which I reported recently [1]. It's going to be huge when they fix it, as it will make tests much faster.
Neon makes this very ergonomic in a cloud setting (actually it works somewhat similarly to how the article does it). You can create a copy-on-write version of your prod database (a branch) and use it for testing. There's also some automation we've built up for doing it from GitHub CI: https://neon.tech/docs/guides/neon-github-app
We have a product called Ephemeral (in some ways similar to Neon) that orchestrates a kubernetes cluster to spin up and down ephemeral databases from
"snapshots" for this exact use case.
We have a more well established product called Structural that does de-identification and anonymization from production data, and we have a pretty clean integration between the two products to make creating your fleet of ephemeral testing databases nice and easy.
You can use Bunnyshell Ephemeral environments for this (bunnyshell.com).
- The Postgres instances will be deployed in Kubernetes (bring your own infra if you want),
- you can configure seeding,
- and also you can run any other containers next to Postgres (for example E2E tests).
- You can trigger the process manually or automatically (PR, GH actions, cli tool, API, etc).
Needs a reminder warning that you need scripts to massage your data in any test database if you have compliance requirements (HIPAA, PCI, customer confidentiality agreements, etc). It's always the worst having all kinds of restrictions that limit productivity in order to meet compliance only to get a self own by a compliance scan picking up a random test database hanging out somewhere.
If you work in the JVM ecosystem, I can recommend (Java/Kotlin) + Quarkus + JDBI with Postgres. Quarkus supports testcontainers out of the box for tests, and it just plain works.
I don't have experience with it, but some people also recommend jooq.
I'd eventually like to use dagger.io so I can use the same technology in testing, CI, and CD, but it is not mature enough yet, so I'm using testcontainers for the testing part, with success.
For those interested in a easier way to set up databases, and maybe other containerized services, for testing I recommend using TestContainers[1], there's integrations for well known languages like Go, Java, Node.js and the set up is easier than what is described in the article (at least if you're already developing on a container-ready workspace)
In our core makefile we have `make pytest` to run standalone unit tests, and then `make pytest_db` to run full database tests using a docker container. The test suite fires up a completely fresh DB, runs migrations against it, and then the test suite proceeds as usual. On a per-module basis a simple fixture is used to determine if the db should get reset for each test, each file, or the entire module.
Works great on Github actions, too.
When the test is done, the container and all noise is automatically cleaned up. No state left behind.