The proxy vs packet capture debate is a bit of a non-debate in practice — the moment TLS is on (and it should always be on), packet capture sees nothing useful. eBPF is interesting for observability but it works at the network/syscall level — doing actual SQL-level inspection or blocking through eBPF would mean reassembling TCP streams and parsing the Postgres wire protocol in kernel space, which is not really practical.
I've been building a Postgres wire protocol proxy in Go and the latency concern is the thing people always bring up first, but it's the wrong thing to worry about. A proxy adds microseconds, your queries take milliseconds. Nobody will ever notice. The actual hard part — the thing that will eat weeks of your life — is implementing the wire protocol correctly. Everyone starts with simple query messages and thinks they're 80% done. Then you hit the extended query protocol (Parse/Bind/Execute), prepared statements, COPY, notifications, and you realize the simple path was maybe 20% of what Postgres actually does. Once you get through that though, monitoring becomes almost a side effect. You're already parsing every query, so you can filter them, enforce policies, do tenant-level isolation, rotate credentials — things that are fundamentally impossible with any passive approach.
You can decode TLS traffic with a little bit of effort, tho you have to control the endpoints which makes it a bit moot as if you control them you can just... enable query logging
True, but logging tells you what happened, a proxy lets you decide what's allowed to happen before it hits the database. Policy enforcement, tenant isolation, that kind of thing. They're complementary really.
Yeah, more and more. Zero-trust is pushing TLS everywhere, even inside VPNs — lateral movement is a real thing. And several compliance frameworks now expect encryption in transit regardless of network topology. With connection pooling the overhead is basically zero anyway.
Indeed, if you're running the db in production and aren't using TLS, you're doing it wrong nowadays. Nearly every compliance framework will require it, and it's a very good idea anyway even if you don't care about compliance.
Encrypted in transit yes, but only between the VPN endpoints. Anything already inside the network (compromised host, rogue container, bad route) sees your queries in cleartext.
TLS on the connection itself gives you end-to-end encryption between your app and Postgres, no matter what's going on in the network in between. Same reason people moved to HTTPS everywhere instead of just trusting the corporate firewall. And with connection pooling you pay the TLS handshake once and reuse it, so the overhead is basically nothing.
Fair point, if it's a true point-to-point VPN between just the two boxes, there's not much "in between" to worry about. TLS on top is mostly defense in depth at that point. What I had in mind was the more common setup where your app and DB sit on a shared network (VPC, corporate LAN). The traffic between them is unencrypted, and you're trusting every piece of infrastructure in that path (switches, hypervisors, sidecar containers) to not be compromised.
Even then, though, it needs to run on the server so it's hard to guarantee to not impact performance and availability. There are many Postgres/Mysql proxies used for connection pooling and such, so at least we understand their impact pretty well (and it tends to be minimal).
This perfectly illustrates this broken mental model that leads to endless frustration.
Unless you put the car in reverse, you are still making forward progress. If someone merges in front of you at 30mph then you traveled hundreds of feet towards your destination in the time it took them to do that. Chill out.
A rule that has suited me well is to take an estimate, double it, and increase by an order of magnitude for inexperienced developers.
So a task the say would take two weeks ends up being 4 months.
For experienced developers, halve the estimate and increase by an order of magnitude. So your 10 days estimate would be 5 weeks.
The biggest estimation effort I ever took part in was a whole-system rewrite[1] where we had a very detailed functional test plan describing everything the system should do. The other lead and I went down the list, estimated how long everything would take to re-implement, and came up with an estimate of 2 people, 9 months.
We knew that couldn't possibly be right, so we doubled the estimate and tripled the team, and ended up with 6 people for 18 months - which ended up being almost exactly right.
[1]: We were moving from a dead/dying framework and language onto a modern language and well-supported platform; I think we started out with about 1MLOC on the old system and ended up with about 700K on the rewrite.
10 days was already after I used this algorithm. Previous several tasks on that codebase were estimated pretty good. Problem with this is that some tasks can indeed take SEVERAL orders of magnitude more time that you thought.
One of the hardest problems with estimating for me is that I mostly do really new tasks that either no one wants to do because they are arduous, or no one knows how to do yet. Then I go and do them anyway. Sometimes on time, mostly not. But everybody working with me already knows, that it may be long, but I will achieve the result. And in rare instances other developers ask me how did I managed to find the bug so fast. This time I was doing something I have never before done in my life and I missed some code dependencies that needed changing when I was revieving how to do that task.
https://github.com/circonus-labs/wirelatency