> Computers are just too complex and there are days when the complexity gremlins win.
I'm sorry for your data loss, but this is a false and dangerous conclusion to make. You can avoid this problem.
There are good suggestions in this thread, but I suggest you use Postgres's permission system to REVOKE DROP action on production except for a very special user that can only be logged in by a human, never a script.
And NEVER run your scripts or application servers as a superuser. This is a dangerous antipattern embraced by many and ORM and library. Grant CREATE and DROP to non-super users.
As a mid level developer contributing to various large corporate stacks, I would say the systems are too complex and it's too easy to break things in non obvious ways.
Gone are the days of me just being able to run a simple script that accesses data read only an exports the result elsewhere as an output.
This is why I am against the current trend of over-complicating stacks for political or marketing reasons. Every startup nowadays wants microservices and/or serverless and a mashup of dozens of different SaaS (some that can't easily be simulated locally) from day 1 while a "boring" monolithic app will get them running just fine.
My point today is that, if we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.
I'm yet to encounter a point in my career where KISS fails me. OTOH this is nothing new, I don't have my hopes up that the current trends of overcomplicating things are going to change in the near future.
For the most part, we are not complicating stuff. Today's requirements are complicated. We used to operate from the commandline on a single processor. Now things are complicated: People expect a Web UI, High availability, integration with their phone, Email notification, 2FA authentication, and then you have things like SSL/HTTPS, Compliance, and you need to log the whole thing for errors or compliance or whatever.
Sometimes it's simpler to go back to a commandline utility, sometimes it's not.
Yup, 100% agree. It may be that you will eventually need an auto-scalable message queue and api gateway, but for most people a web server and csv will serve the first thousand customers
There is sense in not building more services than you need. But many folks end up finding it hard to break away from their monolith and then it becomes an albatross. Not sure how to account for that.
If a team doesn’t have the engineering know how to split a monolith into distinct services in a straightforward fashion, then I’m not sure that team will have the chops to start with microservices.
Dealing with existing code and moving it forward in significant ways without taking down production is always much more challenging than writing new code, whatever form those new ways take.
You can get by with one strong lead defining services and interfaces for a bunch of code monkeys that write what goes behind them.
Given an existing monolithic codebase, you can’t specify what high level services should exist and expect juniors to not only make the correct decisions on where the code should land up but also develop a migration plan such that you can move forward with existing functionality and customer data rather than starting from zero.
But potentially within the budget of a business with a large base of proven customers.
While annoying technically, for early stage startups, performance problems caused by an overly large number of users are almost always a good problem to have and are a far rarer sight than startups that have over-architected their technical solution without the concomitant proven actual users.
Do use CSV (and other similar formats) for read-only data which fits entirely in the memory.
It is great for data safety -- chown/chmod the file, and you can be sure your scripts won't touch this. And if you are accessing live instance, you can be pretty sure that you won't accidentally break it by obtaining a database lock.
Now "csv" in particular is kinda bad because it never got standardized, so if you you have complex data (punctuation, newlines, etc..), it you might not be able to get the same data back using a different software.
So consider some other storage formats -- there are tons. Like TSV (tab-separated-values) if you want it simple; json if you want great tooling support; jsonlines if you want to use json tools with old-school Unix tools as well; protobufs if you like schemas and speed; numpy's npy if you have millions of fixed-width records; and so on...
There is no need to bother with SQL if the app will immediately load every row into memory and work with native objects.
If you can make sure the data has no newlines or tabs, the TSV needs no quoting. It is just "split" function, which is present in every language and very easy to use. When I use it, I usually add a check to writer that there is no newlines or tabs in data, and assert if this is not the case.
You use this tsv with Unix tools like "cut", "paste", and ad-hoc scripts.
There is also "tsv" as defined by excel, which has quoting and stuff. It is basically a dialect of CSV (Python even uses the same module to read it), and has all the disadvantages of CSV. Avoid it.
If you don't know how to use a database for whatever reason and are doing something as a test-of-concept then CSV is fine. But for anything serious - databases, particularly older featureful ones like postgres, have a lot of efficiency tricks; clever things like indexes and nobody has ever come up with anything that is decisively better organised than the relational model of data.
If you use a relational database, the worst-case outcome is you hit tremendous scale and have to do something special later on. The likely case scenario is some wins from the many performance lessons databases have learned. Best case outcome is avoiding a very costly excursion relearning lessons the database community has known about since 1970 (like reinventing transactions).
Managing data with a csv (without a great reason) is like programming a complex GUI in assembly without a reason - it isn't going to look like a good decision in hindsight. Most data is not special, and databases are ready for it.
High-quality software which interacts with real transactional SQL databases is readily available, free, and easy to use. Building an inferior solution yourself, like a CSV-based database, doesn't make your product any better. (If anything, it will probably make it worse.)
It was never suggested to use a csv database, it was suggested that for small amounts of read only data a csv or other format file is a better option and I agree.
The way I understand sushsh‘s suggestion is that CSV is an alternative to an API gateway and message queue. I don’t think the suggestion was to replace a database with CSV.
"for most people a web server and csv will serve the first thousand customers"
I hear this kind of thought-terminating cliche a lot on here and it makes absolutely no sense.
If # of users is a rough approximate of a company's success and more successful companies tend to hire more engineers ... then actually the majority of engineers would not have the luxury of not needing to think about scalability.
With engineering salaries being what they are, why would you think that "most people" are employed working on systems that only have 1000 users?
Microservices are easy to build and throw away these days. In startups time to market is more important than future investment in devops. In terms of speed of delivery they are not worse than monolithic architecture (also not better). For similar reasons SaaS is and must be the default way to build IT infrastructure, because it has much lower costs and time to deploy, compared to custom software development.
If you're talking about a web application or API back end with a smallish startup team, time to market is definitely going to be much longer for a microservices architecture compared to developing a "monolith" in a batteries included framework like eg rails, using a single database.
Just to be fair: You are combining microservice characteristics and "using a single database" in your argument.
Please also consider that especially for smallish teams, microservices are not required to be the same as big corp microservices.
I have encountered a trend towards calling surprisingly many things non-monolithic a microservice. So what kind of microservice are you all referring to in your minds?
If you think it’s much longer, then you haven’t done it with modern frameworks. I’m CTO of a German startup, which went from a two-founder team to over 250 employees in 70 locations in 3 years. For us microservice architecture was crucial to deliver plenty of things in time. Low coupling, micro-teams of 2 ppl max working on several projects at once... we did not have luxury of coordinating monolith releases. Adding one more microservice to our delivery pipeline now takes no more than one hour end-to-end. Building infrastructure MVP with CI/CD on test and production environments in AWS took around a week. We use Java/Spring Cloud stack, one Postgres schema per service.
It is probably not what you intended, but this is how it sounds like: we have a hundred micro-teams of 2 working in silo on low-coupling microservices and we don't have the luxury of coordinating an end to end design.
Edit: 2 questions were asked, too deep to reply.
1. You said 250 people, nothing about IT. Based on the info provided, this was the image reflected.
2. "the luxury of coordinating a monolith". Done well, it is not much more complicated that coordinating the design of microservices, some can argue it is the same effort.
That’s an interesting interpretation, but...
1. our whole IT team is 15 people, covering every aspect of automation of a business with significant complexity of the domain and big physical component (medical practices).
2. can you elaborate more on end to end design? I struggle to find the way of thinking which could lead to that conclusion.
What? I thought that was just the opposite. The advantage of serverless is that I pay AWS to make backups so I don't have to. I mean under time pressure if it do it myself I'll skip making backups, setting permissions perfectly, and making sure I can actually restore those backups. If I go with a microservice, the whole point is they already do those things for me. No?
What does serverless have to do with making backups? Any managed database can do this for you. Microservices attempt to facilitate multiple teams working on a system independently. They’re mostly solving human problems not technical.
The grandparent comments point is a single person or team can deploy a monolith on herkou and avoid a huge amount of complexity. Especially in the beginning.
I'm pretty sure the advantage of serverless is that you can use microservices for your MVPs.
The gains for a one-man team might not be ovious, but I like to believe that once a project needs scaling it is not as painful as tearing down the monolith.
Why are those days gone? I do it all the time in an organisation with 10,000 employees. I obviously agree with the parent poster in that you should only do such things with users that have only the right amount of access, but that’s what having many users and schemas are for. I don’t, however, see why you’d ever need to complicate your stack beyond a simple python/powershell script, a correct SQL setup, an official sql driver and maybe a few stores procedures.
I build and maintain our entire employee database with a python script, from a weird non-standard XML”like” daily dump from our payment system, and a few web-services that hold employee data in other requires systems. Our IT then builds/maintains our AD from a few powershell scripts, and finally we have a range of “micro services” that are really just independent scripts that send user data changes to the 500 systems that depend on our central record.
Sure, sure, we’re moving it to azure services for better monitoring, but basically it’s a few hundred lines of scripting that, combined with AD and ADDS, does more than a 1 million USD a year license IDM.
Just a few weeks ago, I set up a read-only user for myself, and moved all modify permission to role one must explicitly assume. Really helped me with peace of mind while developing the simple scripts that access data read only. This was on our managed AWS RDS database,
If you use terraform to deploy the managed production database, do you use the postgresql terraform provider to create roles or are you creating them manually?
Yes, backups are vitally important, but no it is not possible to accidentally rm -rf with proper design.
It's possible to have the most dangerous credentials possible and still make it difficult to do catastrophic global changes. Hell it's my job to make sure this is the case.
You can make the most dangerous credentials involve getting a keycard from a safe, and multi party sign off, not possible to deploy to more than X machines at a time with a sliding window of application, independent systems with proper redundant and failback design, canary analysis, etc etc etc.
I didn't even mean you can only make it difficult, I meant you can make it almost impossible to harm a real production environment in such a nuclear way without herculean effort and quite frankly likely collusion from multiple parties.
> but this is a false and dangerous conclusion to make
Until we get our shit together and start formally verifying the semantics of everything, their conclusion is 100% correct, both literally and practically.
I'm sorry for your data loss, but this is a false and dangerous conclusion to make. You can avoid this problem. There are good suggestions in this thread, but I suggest you use Postgres's permission system to REVOKE DROP action on production except for a very special user that can only be logged in by a human, never a script.
And NEVER run your scripts or application servers as a superuser. This is a dangerous antipattern embraced by many and ORM and library. Grant CREATE and DROP to non-super users.