Forcing analytics to go through the API doesn’t actually reduce load on the prod...

yowlingcat · on Sept 22, 2019

Ding ding ding. Dedicated read replica and an ETL gets you to a point where queries don't bring down prod. If you have an analyst org running wild making bad decisions about data that they think says things it doesn't -- that's probably a good sign that it's time for a dedicated data engineering team, and potentially a BI flavored data science team as well.

zbentley · on Sept 23, 2019

Analytics queries bringing down prod seems . . . pretty amateur hour. I'm more interested in whether or not analytics queries actually get the data they're interested in when they want it. The reporting team is likely not better versed in what means what than the developers who work on the application databases. What about multiple internal DBs that reporting wants to analyze as if they were one? What about schemas that change over time, obsoleting the analytics team's assumptions? Reliable, versioned data access APIs address both of those families of problems. Yes, it's harder than "YOLO query prod". It also works for longer without breaking, and jives with the scale out plan (usually discrete APIs, sharding, and then maybe microservices and more families of APIs if you're mature enough).

yowlingcat · on Sept 24, 2019

> Reliable, versioned data access APIs address both of those families of problems.

They only address it in so far as they push it downstream to the analyst, who as you mentioned, "is likely not better versed in what means what than the developers who work on the application databases."

There's a reason why datalakes exist, and having used them at past N companies, I think this is why data engineering of the BI flavor becomes necessary at the point that reporting becomes critical. An API is strictly worse than a datalake, and it's not hard to set up and maintain the latter. API versioning and communication are for frontend integrations, paid partner integrations and potentially (although I'd probably lean more on gRPC and the ilk) microservice to microservice interactions. But, I like to avoid building unnecessary API surface when I can.