Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Computers are just too complex and there are days when the complexity gremlins win.

I'm sorry for your data loss, but this is a false and dangerous conclusion to make. You can avoid this problem. There are good suggestions in this thread, but I suggest you use Postgres's permission system to REVOKE DROP action on production except for a very special user that can only be logged in by a human, never a script.

And NEVER run your scripts or application servers as a superuser. This is a dangerous antipattern embraced by many and ORM and library. Grant CREATE and DROP to non-super users.



As a mid level developer contributing to various large corporate stacks, I would say the systems are too complex and it's too easy to break things in non obvious ways.

Gone are the days of me just being able to run a simple script that accesses data read only an exports the result elsewhere as an output.


This is why I am against the current trend of over-complicating stacks for political or marketing reasons. Every startup nowadays wants microservices and/or serverless and a mashup of dozens of different SaaS (some that can't easily be simulated locally) from day 1 while a "boring" monolithic app will get them running just fine.


I think we're hitting peak tech. All this "technical" knowledge just dates itself in a year's time anyway.

Eventually, you come to realise that the more tech you've got, the more problems you have. .

Now developers spend more time googling errors and plugging in libraries and webservices together than writing any actual code.

Sometimes I wish for a techless cloudless revolution when we just go back to the foundations of computers and is use plain text wherever possible.


My point today is that, if we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.

I'm yet to encounter a point in my career where KISS fails me. OTOH this is nothing new, I don't have my hopes up that the current trends of overcomplicating things are going to change in the near future.


> Sometimes I wish for a techless cloudless revolution when we just go back to the foundations of computers and is use plain text wherever possible.

... because software in the 60s/70s/80s was so reliable and bug-free?!


It most likely had less moving parts & failure modes than a modern microservice mess.


It actually was. Shocking, isn't it?


For the most part, we are not complicating stuff. Today's requirements are complicated. We used to operate from the commandline on a single processor. Now things are complicated: People expect a Web UI, High availability, integration with their phone, Email notification, 2FA authentication, and then you have things like SSL/HTTPS, Compliance, and you need to log the whole thing for errors or compliance or whatever.

Sometimes it's simpler to go back to a commandline utility, sometimes it's not.


All of these can be done just fine in a monolithic Django/Ruby/PHP/Java/etc app.


Yup, 100% agree. It may be that you will eventually need an auto-scalable message queue and api gateway, but for most people a web server and csv will serve the first thousand customers


There is sense in not building more services than you need. But many folks end up finding it hard to break away from their monolith and then it becomes an albatross. Not sure how to account for that.


If a team doesn’t have the engineering know how to split a monolith into distinct services in a straightforward fashion, then I’m not sure that team will have the chops to start with microservices.


Dealing with existing code and moving it forward in significant ways without taking down production is always much more challenging than writing new code, whatever form those new ways take.

You can get by with one strong lead defining services and interfaces for a bunch of code monkeys that write what goes behind them.

Given an existing monolithic codebase, you can’t specify what high level services should exist and expect juniors to not only make the correct decisions on where the code should land up but also develop a migration plan such that you can move forward with existing functionality and customer data rather than starting from zero.


What you end up with is a set of tiny monoliths.


parallel rewrites before the current prod system ever hits performance problems :)


Sounds expensive


But potentially within the budget of a business with a large base of proven customers.

While annoying technically, for early stage startups, performance problems caused by an overly large number of users are almost always a good problem to have and are a far rarer sight than startups that have over-architected their technical solution without the concomitant proven actual users.


Please don't use csv. At the very least use SQLite. But hosted sqls are probably the smart thing to do.


Do use CSV (and other similar formats) for read-only data which fits entirely in the memory.

It is great for data safety -- chown/chmod the file, and you can be sure your scripts won't touch this. And if you are accessing live instance, you can be pretty sure that you won't accidentally break it by obtaining a database lock.

Now "csv" in particular is kinda bad because it never got standardized, so if you you have complex data (punctuation, newlines, etc..), it you might not be able to get the same data back using a different software.

So consider some other storage formats -- there are tons. Like TSV (tab-separated-values) if you want it simple; json if you want great tooling support; jsonlines if you want to use json tools with old-school Unix tools as well; protobufs if you like schemas and speed; numpy's npy if you have millions of fixed-width records; and so on...

There is no need to bother with SQL if the app will immediately load every row into memory and work with native objects.


> complex data (punctuation, newlines, etc..)

Oh, the irony! Text with punctuation and newlines is complex data.

CSV is doomed but the world runs on it and pays its cost with engineer tears.


I agree with the JSON suggestion, but what advantage is there to TSV versus CSV?

I have experienced pain with both characters (tab and comma), particularly when I am not the one creating the output file.


Tabs do not appear in common literature. They're easier to justify inputs not have them in order to avoid having a quotation or escaping mess.

Commas are _way_ too common.

CSV is an awful format anyway.


If you can make sure the data has no newlines or tabs, the TSV needs no quoting. It is just "split" function, which is present in every language and very easy to use. When I use it, I usually add a check to writer that there is no newlines or tabs in data, and assert if this is not the case.

You use this tsv with Unix tools like "cut", "paste", and ad-hoc scripts.

There is also "tsv" as defined by excel, which has quoting and stuff. It is basically a dialect of CSV (Python even uses the same module to read it), and has all the disadvantages of CSV. Avoid it.


> Please don't use csv

Could you elaborate? I'm interested in the specific reasons.


If you don't know how to use a database for whatever reason and are doing something as a test-of-concept then CSV is fine. But for anything serious - databases, particularly older featureful ones like postgres, have a lot of efficiency tricks; clever things like indexes and nobody has ever come up with anything that is decisively better organised than the relational model of data.

If you use a relational database, the worst-case outcome is you hit tremendous scale and have to do something special later on. The likely case scenario is some wins from the many performance lessons databases have learned. Best case outcome is avoiding a very costly excursion relearning lessons the database community has known about since 1970 (like reinventing transactions).

Managing data with a csv (without a great reason) is like programming a complex GUI in assembly without a reason - it isn't going to look like a good decision in hindsight. Most data is not special, and databases are ready for it.


Reliability, atomic updates, roll backs, proper backups, history management, proper schema, etc.


High-quality software which interacts with real transactional SQL databases is readily available, free, and easy to use. Building an inferior solution yourself, like a CSV-based database, doesn't make your product any better. (If anything, it will probably make it worse.)


It was never suggested to use a csv database, it was suggested that for small amounts of read only data a csv or other format file is a better option and I agree.



The way I understand sushsh‘s suggestion is that CSV is an alternative to an API gateway and message queue. I don’t think the suggestion was to replace a database with CSV.


"for most people a web server and csv will serve the first thousand customers"

I hear this kind of thought-terminating cliche a lot on here and it makes absolutely no sense.

If # of users is a rough approximate of a company's success and more successful companies tend to hire more engineers ... then actually the majority of engineers would not have the luxury of not needing to think about scalability.

With engineering salaries being what they are, why would you think that "most people" are employed working on systems that only have 1000 users?


"what is an MVP even", the post!


Microservices are easy to build and throw away these days. In startups time to market is more important than future investment in devops. In terms of speed of delivery they are not worse than monolithic architecture (also not better). For similar reasons SaaS is and must be the default way to build IT infrastructure, because it has much lower costs and time to deploy, compared to custom software development.


If you're talking about a web application or API back end with a smallish startup team, time to market is definitely going to be much longer for a microservices architecture compared to developing a "monolith" in a batteries included framework like eg rails, using a single database.


Just to be fair: You are combining microservice characteristics and "using a single database" in your argument.

Please also consider that especially for smallish teams, microservices are not required to be the same as big corp microservices.

I have encountered a trend towards calling surprisingly many things non-monolithic a microservice. So what kind of microservice are you all referring to in your minds?

edit: orthography


If you think it’s much longer, then you haven’t done it with modern frameworks. I’m CTO of a German startup, which went from a two-founder team to over 250 employees in 70 locations in 3 years. For us microservice architecture was crucial to deliver plenty of things in time. Low coupling, micro-teams of 2 ppl max working on several projects at once... we did not have luxury of coordinating monolith releases. Adding one more microservice to our delivery pipeline now takes no more than one hour end-to-end. Building infrastructure MVP with CI/CD on test and production environments in AWS took around a week. We use Java/Spring Cloud stack, one Postgres schema per service.


It is probably not what you intended, but this is how it sounds like: we have a hundred micro-teams of 2 working in silo on low-coupling microservices and we don't have the luxury of coordinating an end to end design.

Edit: 2 questions were asked, too deep to reply. 1. You said 250 people, nothing about IT. Based on the info provided, this was the image reflected. 2. "the luxury of coordinating a monolith". Done well, it is not much more complicated that coordinating the design of microservices, some can argue it is the same effort.


That’s an interesting interpretation, but... 1. our whole IT team is 15 people, covering every aspect of automation of a business with significant complexity of the domain and big physical component (medical practices). 2. can you elaborate more on end to end design? I struggle to find the way of thinking which could lead to that conclusion.


What? I thought that was just the opposite. The advantage of serverless is that I pay AWS to make backups so I don't have to. I mean under time pressure if it do it myself I'll skip making backups, setting permissions perfectly, and making sure I can actually restore those backups. If I go with a microservice, the whole point is they already do those things for me. No?


What does serverless have to do with making backups? Any managed database can do this for you. Microservices attempt to facilitate multiple teams working on a system independently. They’re mostly solving human problems not technical.

The grandparent comments point is a single person or team can deploy a monolith on herkou and avoid a huge amount of complexity. Especially in the beginning.


I'm pretty sure the advantage of serverless is that you can use microservices for your MVPs. The gains for a one-man team might not be ovious, but I like to believe that once a project needs scaling it is not as painful as tearing down the monolith.


Why are those days gone? I do it all the time in an organisation with 10,000 employees. I obviously agree with the parent poster in that you should only do such things with users that have only the right amount of access, but that’s what having many users and schemas are for. I don’t, however, see why you’d ever need to complicate your stack beyond a simple python/powershell script, a correct SQL setup, an official sql driver and maybe a few stores procedures.

I build and maintain our entire employee database with a python script, from a weird non-standard XML”like” daily dump from our payment system, and a few web-services that hold employee data in other requires systems. Our IT then builds/maintains our AD from a few powershell scripts, and finally we have a range of “micro services” that are really just independent scripts that send user data changes to the 500 systems that depend on our central record.

Sure, sure, we’re moving it to azure services for better monitoring, but basically it’s a few hundred lines of scripting that, combined with AD and ADDS, does more than a 1 million USD a year license IDM.


Why are these days gone?

Just a few weeks ago, I set up a read-only user for myself, and moved all modify permission to role one must explicitly assume. Really helped me with peace of mind while developing the simple scripts that access data read only. This was on our managed AWS RDS database,


I'm on similar position as you, but I say systems are as complex as their designed made them and it's on you to change it.


Tom Scott made a mistake with a similar outcome as this article, but with an SQL query that is much more subtle than DROP.

https://www.youtube.com/watch?v=X6NJkWbM1xk

By all means, find ways to fool-proof the architecture. But be prepared for scenarios where some destructive action happens to a production database.


He would not have done that if he were simply using a database transaction for this operation.


That’s exactly the point he’s trying to get across with that video.


> You can avoid this problem.

The article isn’t claiming that the problem is impossible to solve.

On the contrary: “However, we will figure out what went wrong and ensure that this particular error doesn’t happen again.”.


If you use terraform to deploy the managed production database, do you use the postgresql terraform provider to create roles or are you creating them manually?


> You can avoid this problem.

No, you can't. No matter how good you are, you can always "rm -rf" your world.

Yes, we can make it harder, but, at the end of the day, some human, somewhere, has to pull the switch on the stuff that pushes to prod.

You can clobber prod manually, or you accidentally write an erroneous script that clobbers prod. Either way--prod is toast.

The word of the day is "backups".


excuse me, but no. this is harmful bullshit.

Yes, backups are vitally important, but no it is not possible to accidentally rm -rf with proper design.

It's possible to have the most dangerous credentials possible and still make it difficult to do catastrophic global changes. Hell it's my job to make sure this is the case.


> not possible to accidentally rm -rf with proper design.

Can you say more about this?

I understand rm -rf, but not sure how I could design that to be impossible for the most dangerous credentials.


You can make the most dangerous credentials involve getting a keycard from a safe, and multi party sign off, not possible to deploy to more than X machines at a time with a sliding window of application, independent systems with proper redundant and failback design, canary analysis, etc etc etc.

I didn't even mean you can only make it difficult, I meant you can make it almost impossible to harm a real production environment in such a nuclear way without herculean effort and quite frankly likely collusion from multiple parties.


He said "difficult", not impossible.


Just don’t use the most dangerous credentials.

The most dangerous credentials are cosmic rays and we use the Earth’s atmosphere and ECC to fight that.


Difficult, but not impossible. Which was the point, I think.


> but this is a false and dangerous conclusion to make

Until we get our shit together and start formally verifying the semantics of everything, their conclusion is 100% correct, both literally and practically.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: