Hacker News new | past | comments | ask | show | jobs | submit | more JesseAldridge's comments login

It should be stated that the flip side of not giving up is incurring the massive opportunity cost of years of lost compensation. That can easily be over a million dollars for a good software engineer.

If you're a genius, then sure, not giving up makes a lot of sense. But if you're just pretty smart, you better think two, three, or four times before committing to do a startup. It's really easy to end up over thirty and broke.


> add each number in a list x to each number in the corresponding row of a two-dimensional array y

I mean, I could write a function `add(x, y)` that does that in any language. You could even inspect the data or type to make it polymorphic. I believe NumPy does this, for example.

Skimming the rest of the article, I don't get how this is different from any other language. The primary difference seems to be the function names are all one or two characters long for some reason. I must be missing something...


The fact that they're built into the language is in fact significant. Yes, you obviously could implement J's set of primitives in nearly any language and then use those. In practice though, you won't, and if you did then many of the people reading the code wouldn't understand it very well.

Imagine if lodash was built into JS, and optimized by the implementations to the point where it was faster than not using it. Idiomatic JS would look very different even though it hasn't made anything possible that wasn't previously possible.


> "These planets are much, much too far away. Even in the very optimistic case of a livable planet that is not too far, say a few dozen light years...We are talking about hundreds of millions of days using the means we have available today. We must take care of our planet, it is very beautiful and still absolutely liveable."

That's actually a good point and one I hadn't really thought about.

The only way I can think of for humanity to get to these other planets would be to build a network of self sustaining space station colonies. And if we can do that we don't really need planets any more.


Process 100,000 rows?


Embodied cognition is definitely a thing: https://en.wikipedia.org/wiki/Embodied_cognition

And in software development, the primary bottleneck is thinking. Anything you can do to enable greater clarity of thought is going to have huge leverage. So I think this is a super promising angle.

Bret's actively developing his ideas at DynamicLand: http://dynamicland.org But clearly it will be quite some time before that tech becomes viable in the real world.

Now I'm wondering if there are simpler ways to use the rest of my body to facilitate thinking. Like, I can't mount lasers and network connected cameras on the ceiling... but maybe there's some software that can use a webcam to do simple gesture recognition mapped to bash scripts or something like that.


This was my position for a while. ORMs introduce a layer of magic which obscures what's actually going on under the hood. I decided I would just make raw SQL queries and handle mapping data explicitly.

I quickly ended up with a lot of duplicated code. So then I thought, "Well ok, I should add a bit of abstraction on top of this..." I started coding some simple functions to help map the tabular data to objects. One thing led to another and suddenly I looked at what I had done and said, "Wait a minute..."


There is a big difference between just writing helper functions to construct SQL and convert data types, and OO-style magical auto-persisted objects. The latter is what I don't like about ORMs but the former is fine. I feel that this is an important distinction to make.

As an example, the sqlalchemy docs[0] make this very clear: there's an ORM, but there's also just a core expression library that simply helps you connect to the database and construct queries.

[0]: https://docs.sqlalchemy.org/en/13/


Agreed. Helpers (and indeed types) can make working with SQL an actual pleasure. You do need to learn the SQL, though. (My TypeScript/Postgres solution, in this vein: https://github.com/jawj/mostly-ormless/blob/master/README.md).


I've never been a fan of codegen, but I think I could get past it for this library - it looks great!

I love how it let's you use SQL, while taking full advantage of TypeScript's wonderful typing system to give you intellisense and compile-time checking. Reminds me a bit of the SQL type provider for F# (which I was amazed by when I first saw it in action).

I really like the way the readme has been written too - it gives a real insight into the thought processes that led to the final result.


I've never been a fan of codegen, but I think I could get past it for this library - it looks great!

Good news. :)

Maybe we can agree that typegen smells less than codegen?

Reminds me a bit of the SQL type provider for F# (which I was amazed by when I first saw it in action).

I should look that up — sounds interesting.


I worked for a bit on a code gen based typescript postgres builder, but haven't had time lately to build it out - https://github.com/Sammons/morbid

I really think typescript would benefit from a good solution to this.


I do like the code-gen solution a lot. You can create code that is far less bloated than a generic framework.

I am old fashioned. I like to start with the database schema and generate code from that. I make a change in the schema I regen the code. Thanks for partial classes in C# I can persist customizations between code-gens if necessary.


Wow this is great! Very well written README.

What just blew me away is the thing with the `JOIN` and the `to_jsonb(authors)`, all with complete typing support for the nested author object. I was actually looking to use a classical, attribute driven query generator (with the sort of chaining API everyone is used to: `tableName.select(...coumns)` etc.) for my next project involving to maybe replace/wrap/rewrite a Rails app and its ORM with Typescript and Node. Maybe I'm trying this instead I'm already half sold. Just worried about forcing colleagues having to learn SQL instead of using a fancy wrapper.


I'm also awed by this!

> Just worried about forcing colleagues having to learn SQL instead of using a fancy wrapper.

My current team is pretty junior, and I don't see any problem with this. Simple SQL queries are really easy to learn, and complex queries are harder to understand with ORMs than in raw SQL.

Moreover, knowing SQL is a useful, marketable skill that will stay relevant for many years to come. If there's some resistance, I can easily convince the team that going this route will benefit them personally.

Back to the README, there are two questions I'd like to see addressed:

1. Whether `Selectable[]` can be used to query for a subset of fields and how.

2. In the `to_jsonb(authors)` example, what would you get back in the `author` field if there were multiple authors with the same `author.id` value? An array of `author.Selectable` objects? This part is awesome but brittle, isn't it?

I would love to see this move forward! I will definitely play with it and consider it for my next project.


I'm also awed by this!

:)

1. Whether `Selectable[]` can be used to query for a subset of fields and how.

Right — this is not (currently) supported. I guess if you had wide tables of large values, this could be an important optimisation, but it hasn't been a need for me as yet.

2. In the `to_jsonb(authors)` example, what would you get back in the `author` field if there were multiple authors with the same `author.id` value? An array of `author.Selectable` objects? This part is awesome but brittle, isn't it?

Multiple authors with the same id isn't going to happen, since id is intended as a primary key, so I'd argue that the example as given isn't brittle. On the other hand, there's a fair question about what happens for many-to-many joins, and since my use case hasn't yet required this I haven't given it much thought.


OK, I gave the one-to-many queries a bit more thought, and the converse join query (getting each author with all their books, rather than all books each with their author) works nicely with a GROUP BY:

    type authorBookSQL = s.authors.SQL | s.books.SQL;
    type authorBookSelectable = s.authors.Selectable & { books: s.books.Selectable };

    const
      query = db.sql<authorBookSQL>`
        SELECT ${"authors"}.*, jsonb_agg(${"books"}.*) AS ${"books"}
        FROM ${"books"} JOIN ${"authors"} 
          ON ${"authors"}.${"id"} = ${"books"}.${"authorId"}
        GROUP BY ${"authors"}.${"id"}`,

      authorBooks: authorBookSelectable[] = await query.run(db.pool);
This exploits the fact that selecting all fields is, logically enough, permitted when grouping by primary key (https://www.postgresql.org/docs/current/sql-select.html#SQL-... and https://dba.stackexchange.com/questions/158015/why-can-i-sel...)

I'll update demo.ts and README shortly.


Right, only querying a few fields seems not to be a builtin feature. Looks like you have to create the partial selectable type yourself and there is no support to typecheck that the correct columns in the select are included.

Your second case, if I recall this correctly (ActiveRecord made my SQL skills fade away), this plain JOIN would just return a row with the same book but a different author. `to_jsonb(authors.*)` is just operating on a single row. But what you want is possible (aggregating rows into a JSON object) by using `jsonb_agg`. Whether the lib supports inferring the correct typings for that is another question though.


> Just worried about forcing colleagues having to learn SQL instead of using a fancy wrapper

I'd argue that learning SQL is essential for any developer.

It's also a "reusable" skill that will stand them in good stead for decades - whereas learning how to use the fancy wrapper is only useful until the next new shiny comes along.


I’d add that it’s essential so you can understand how to optimise and debug a query. You lose a lot of power if you can’t open up a console to describe or explain things.

The long-standing ORMs do a pretty decent job of writing efficient queries these days though. You can go pretty far without knowing much and that’s not a bad thing either.


This looks great. It looks similar to JooQ.


I'm not even someone that has used multiple orm styles extensively, but it is disturbing/darkly humorous how many orm libs there are.

That lastpost you can map a half dozen Java frameworks to each of the acts.

Personally I never found an orm that tracked which attrs in objects were actually mutated so that only mutated columns would be updated/inserted, but again I never did a lot of orm.


I really like this. There is only one thing that bothered me. I'd rather pass the job to the pool than pass the pool to the job... something like...

const existingBooks = await pool.exe(select("books", { authorId }));

or

const existingBooks = await pool(select("books", { authorId }));


QueryDSL (http://www.querydsl.com/) does something like this for Java. It can generate classes from tables, but even with those, all queries / statements that hit the database are manually built using a query builder to avoid syntax and type errors. I.e. no caching or automatic database updates.


knex.js is another example of such a query builder library in the Node.js ecosystem


Fully agree! Some mapping code is _required_ in your application, otherwise it wouldn't be able to talk to your database at all.

However, I've never understood why people write this mapping code manually. I believe in code generation tooling as a potential solution for this (where types and maybe a full data access API is auto-generated based on the database schema).


Yeah, ORMs have grown to mean more than mapping relational data to objects. An example of "just" this mapping can be seen in PureORM[0].

[0]: https://github.com/craigmichaelmartin/pure-orm


> There is a big difference between just writing helper functions to construct SQL and convert data types, and OO-style magical auto-persisted objects. The latter is what I don't like about ORMs but the former is fine. I feel that this is an important distinction to make.

What's the big difference? Why do you like the former but not the latter? What are the characteristics of the former that makes it distinct from the latter?


Out of curiosity what platform and tech where you using? I am making the assumption of a predominately OO one based on the virtues of ORM. I have always found that when I try to solution back end or middleware based platforms with OO dominate languages (read Java, C#, et. al.) that there quickly becomes an impedance mismatch and any communication with the database becomes a monster of mapping OO philosophy to relational theory, whether that be via home rolled or an ORM.

That being said, I personally have found that I do not like OO languages for back end dev and I find that functional languages such as any variety of LISP marry extremely well to the transnational and process oriented nature of back-end systems as well as lend themselves to not having to jump thru hoops to contort the data into relational sets (Clojure's destructuring is an absolute life saver here). I find that there is little to no duplication of code in regards to transferring data to the db. You may want to give Clojure or F# (depending on your stack) a try for your back end and see if it does not alleviate a host of issues with trying to develop a process and transaction oriented system, which most back ends fit that definition.

I find the converse to be true for the front end. I find most attempts to deal with the UI in anything other than objects and components (read jQuery, React Hooks), turns to spaghetti rather quickly.

If you are using OO languages to communicate and transfer data to the DB you may very well be trying to solution for the impedance mismatch that is easily solved by using a functional language.


I can recommend hugsql to anyone who wants to work with SQL in Clojure. You basically write pure SQL with some minor templating helpers and then you get the data in a map with Clojure data types. Very nice and minimal overhead: https://www.hugsql.org/


> I find the converse to be true for the front end. I find most attempts to deal with the UI in anything other than objects and components (read jQuery, React Hooks), turns to spaghetti rather quickly.

What alternatives have you tried?


ClojureScript, Reflex, Grapefruit, Seesaw and a host of others, It's my opinion (so take it with a grain of salt) and it very well may be the way my brain works but I just find functional to not marry well to UI development. For the service and process oriented stuff associated with the front end I think it is great, but when it comes to modeling components, I find objects and inheritance work far better.

This is one of the reasons I have long been a huge proponent of Javascript despite it warts, as it can be OO when I need it to be OO and functional when I need it to be functional.


I've used ClojureScript with re-frame professionally for over 2 years, and it's been the first time in over 20 years of development that I've enjoyed working on the front-end. Unidirectional data flow with a single app atom has been a dream to work with.


It was a Flask app using SqlAlchemy (so Python). I'm not sure functional programming would have changed the situation much. I imagine there would still be repeated patterns involving reading and writing to the database in slightly different ways, and it would still make sense to use some sort of library. But I haven't used functional languages much, so I can't say for sure either way.


Well, the difference is that in a data oriented language, you do not need to map Objects to relations. You can get the data back in the relational format and use it as is in your app. So you don't need an ORM. You might still have a library to help you build dynamic SQL queries, but no object-relational mapping needed.


> Well, the difference is that in a data oriented language, you do not need to map Objects to relations. You can get the data back in the relational format and use it as is in your app.

You can do that in an OO language, too; relations (whether constant or variable, and whether there data is held locally by the program or remotely, as in an RDBMS) are perfectly valid objects.


Query builders != ORMs.

A query builder carefully preserves the underlying relational and RPC semantics and exposes all of that to the user in an easier-to-use form. That’s just good cautious modest abstraction.

An ORM believes it knows way better than those dumb RDBs how a database ought to behave by ingeniously pretending that everything you’re dealing with is just nice simple familiar arrays of local native class instances. Which, like all lies, ends up spawning more lies as each one starts to crack under inspection, until the whole rotten pile catastrophically collapses under the weight of its own total bullshit.

And of course it goes without saying which of these approaches our arrogant grandstanding consequence-shirking industry most likes to adopt.


But now you know it's not magic :)

Maybe the problem is not that you don't need a mapping layer, but because ORMs are obscure. And maybe they are obscure not because SQL is such a cursed spot, but because object-oriented programming ITSELF drift toward obscurity and magic. Don't you get the same feeling of obscurity about other libraries, e.g. web servers or clients? I often find the bare specs much clearer than (supposedly simplified) OO libraries that implement them.


Yes.

If you stick to using the ORM for what amounts to (mostly) just PODs, it's syntactic sugar that can really help readability.


Sometimes you just should live with duplicated code. It's OK.


No, it is not okay. If you need insert/update/select for every object/table that is way too much duplication. It becomes very irritative when the schema changes. There should be table meta data in such a case but one should also know what one is doing. Having no idea that under the water 14 joins are done is not a good situation either.


Well sort of. In my view, duplicate SQL chunks that have defined business logic should either be a new table/view or extremely well-documented with really rigid communication policies for changes.

For example, a company with many data analysts/scientists who may each be writing their own queries. As a basic example, the definition of some “very important” company metric changes, then there would need to be a large number of disperse queries to change.

But an ORM isn’t the answer for the above situation either.


It's relative, if the duplication is that large maybe you do need to abstract.

It also sounds like you would be well served using a service abstraction at that point to remove the data layer from client scope entirely.

The "model changes, now we have to change it every where" isn't going to be solved by abstraction, it's only limited by the amount you're willing to limit access to the underlying model, if you need that information, you need to share the model.

The best solution to this I've seen in practice is domain modelling, colocating shared code near other users. When things get too distant you start using anti corruption layers which allows more flexible model changing.

But at the end of the day this is essential complexity, orm, or any other solution is never going to be able to hide the fact that you need information elsewhere in the system to be useful.


Thank you! You just said something I very much agree to but never dare to say out loud.


I think that the way to say it without starting a war is to preface it with something like:

> Well, redundancy and dependency both have downsides, but in this case...


But you shouldn’t live with a hand-rolled pseudo ORM that stumbled into existence when there’s developed alternatives


Why not?

I've used hand rolled pseudo ORMs before.

I prefer just plain SQL but for the application I had there was a common access pattern that was worth abstracting out in a DRY sense.

That doesn't mean I want or need a complete ORM. Just a consistent access at certain table types.


I’ve never seen a homegrown ORM that was better than a third party one. Whenever there is an issue - and there are always issues - you have to dig into the code, because they are never documented well.

There is usually a feature that no one thought about and then you have to make modifications to the custom ORM and you get an even bigger mess.


Better is a subjective term.

I wouldn't say what I built was a better ORM. But I would 100% say it's a better solution to the problem I faced.

It didn't get in the way of writing SQL, but it did reduce the boilerplate and repetitive gruntwork.

"Issues" - no, it was a function call deep and incredibly clear and close to what was happening.


sort of agree but also: the good third party ORMs started life as homegrown ORMs.


Usually third party stuff has documentation. Home grown stuff often hasn't. And usually better tested.


Entity Framework didn’t.....


You just said "you know, if you wanted to have a shitty burger, you can get it right there for half the price of a national chain and it will be at least as good"


> I've used hand rolled pseudo ORMs before.

Is your pseudo ORM as well documented as a commonly used ORM? Can I google (or search your wiki) for common issues?


There's a middle ground.

Micro ORMs.


A micro ORM is just an ORM, written well and modularly. It isn't a middle ground - it's choosing to use a well written library.

A lot of people conflate ORM's leaking because of poor designed library with ORM being a bad abstraction in general.


One of the most popular Micro ORMs for C# is Dapper which is used by Stack Overflow.

There is no real abstraction. You write standard SQL and it maps your recordset to an object. You know exactly what code is running.

There are extensions that will take a POCO object and create an insert statement and I believe updates, but where ORMs usually get obtuse and do magic are Selects. It’s hard to generate a suboptimal Insert or Update.


So.. pattern I see emerging. Use orm for the common stuff and execute sql for complicated queries (like reports)


Ruby on Rails’ ActiveRecord, for all its heft, is excellent at this. You can use raw SQL any time you like. It was an explicit design goal from day 1.

There are times I dislike things about it, and it can be quite heavy, but it’s very easy to mix and match ActiveRecord ORM code and raw SQL even within a single model class.


That’s how I’ve done it on my last two projects. We used TypeORM for the standard repeated simple queries, and then wrote custom SQL for our complicated queries that the ORM failed at and then just executed them with the ORM. It was really nice and made for easier table refactors because we didn’t have to go through and audit every query that was calling that table.


TypeORM is a step in the right direction for JS ORMs bit it's like 1/8th of the way there IMHO. A nearly fully typed ORM is possible now with Typescript and of course proxy's are out now.. TypeORM was doing too much ADHOC string building under the covers as well. I believe a SQL AST is the way to go. It can be transformed and compiled to database specific SQL allowing for things like predicate push down, optimization, and a sane way to implement db specific optimizations and extensions.


I really like your point about type-safety! I think one major issue with the current ORMs in the Node.js/TypeScript ecosystem (Sequelize, TypeORM, ...) is that they're not fully type-safe. As you mention, TypeORM is definitely a step into the right direction here, but the way how it leverages the TypeScript type system isn't where it could be. I work at Prisma where we're building [Photon.js][1], an auto-generated database client with a fully type-safe data access API.

[1][https://github.com/prisma/photonjs]


Actually, DDD recommends that domain object in general should be loaded by id only, and complicated query for grid should load projection


And for ETL and data loads. An ORM isn’t going to usually do a multi insert statement or an update that involves more than one table.


The informal definition I have of a micro ORM is an ORM without an identity map and without lazy fetching through proxy properties. Are there any more concrete definitions?


> There's a middle ground.

> Micro ORMs.

And there's the mystical fourth option of simply not bothering with objects in the first place. No objects, no need to do object-relational mapping.


So then you’re mapping to whatever data structure your app uses instead of objects. In OOP languages like Python, everything is some type of object anyway.


That’s just a semantic game. If your language returns the result of a query as a generic array of generic dictionaries (or whatever), that isn’t mapping, nor is it object oriented in principle.


But then your generic array of generic dictionaries needs to be mapped to whatever data structures make sense for your application. ORMs save you that step.


I've written many apps and I've never experienced what you've described. I write my queries to return exactly the right data in exactly the right format in exactly the right order—so I can go straight from the generic data structures to the screen interface or document layout.

Nothing says the structure which make sense for an application can't be a generic array of generic dictionaries.

If your favourite programming language forces you to go through the silly hoops of data mapping in order to do useful things with query output, I can understand why an ORM might make sense for you.


also implicit row mappers so one can still do the queries manually without writing all the oo glue


An in house solution is almost always better than an external dependency


This is correct. An in-house solution is a solution developed in-house for your specific problem, which no one else has ever had exactly. The more specific the need, the more the benefit of the made-to-measure solution. The alternatives are something your organization didn't develop, which may be better, but you don't know how to use it, or may be worse, but you don't know that when you pick it, or may be slower, but you don't know that when you start using it, or may have vendor lock in, but you don't know that when they sell it to you as "open", or may have hidden pitfalls, but they aren't in the glossy brochure, or may be unmaintained by anyone except your org in ten years, but you can't know that until ten years from now, or may be full of security holes because it was developed by idiots, but you can't know that because you didn't see who wrote it, or might be full of solid security features and a great design cleverly compromised by a hidden flaw placed in a specification you haven't read by a nation state, but you don't see that because why would you, or... etc etc etc. <sarcasm>But don't worry, at least you didn't have to understand the problem space well enough to be able to sit down and solve it yourself, so you sure saved some effort there!</>


Isn't this an argument against using any library at all?


Yes!

Which bring us to the topic of tradeoffs and the synthesis of balance, by way of weighing competing advantages and costs fairly.

On the one hand, code you must write and understand. On the other, code someone else wrote, that you can just use. There is no clear winner here. It's always a tradeoff.


Haha, so enterprise-chat.

I rewrote our entire database layer in Hibernate for our (incredibly complex monolith) webapp. Then I was tasked with rewriting a major core piece of search functionality that builds a query from user selections / saved queries.

I was told that contrary to previous work, Hibernate Criteria would not be allowed, since it was deprecated. Hibernate's official replacement for programmatic queries is JPA Criteria, but Hibernate's support for this was not feature equivalent to Hibernate Criteria, so this was out too.

So what I got the green-light to go on was rewriting my own pseudo-ORM wrapper that generates HQL query strings and parameters. Hql is not deprecated, you see.

It's ended up working out moderately well, it's a thin layer and as long as you avoid the rough edges it actually works fairly well, as well as providing a convenient point to translate query language from the old kodo format into hibernate (cringe, code smell).

There have been times I've had to do some very awkward query shit that I've only managed to lever in via HQL. You have no idea, views on top of views.

No idea what'll happen after I leave, that's their problem!

Thanks for the job security, Hibernate team. Your incredibly-poorly-executed transition from a well-supported standard to the "new new" has been exquisitely great for my job security.

Bet there's Python3 devs who feel the same way!


>JPA Criteria

Feature parity aside, I also found this to perhaps be the most verbose API for query building I've ever used.


Hah... I’m probably falling for Poe’s law here, but anyways... there are certainly cases where in-house is better than external dependency - specifically when your team knows the tech domain better than anyone external can... but in general well-maintained (preferably open source with a community, or a well funded company) external dependencies are almost always better. They usually would have the years of fixing edge cases and features that you would inevitably run into if you were to roll ur own.


They’re also tailoring their solution to be as generic as possible.

Having written OSS and also having written enterprise applications, it seems plainly obvious to me why a homegrown solution is preferred. Code developed internally is understood by the team (you may not understand the underlying implementation of a dependency), and can be tailored exactly to suit your needs (ignoring edge cases that aren’t relevant, removing unneeded features). And you never have to worry about maintainers disappearing, breaking changes being introduced, or bugged releases that you can’t do anything about.

I don’t mean to sound crass but how on earth could you think this is an example of Poe’s law? What’s so extreme about being a responsible developer? I didn’t say “every solution should be developed in house” (though I think most large projects would be better for it!) obviously there’s is a cost associated with in house solutions and you should gauge that cost to see if it’s worth it for your application. But if you’re going to be working with that application for years and years to come then I highly recommend trying to write your own code instead of relying on libraries.


To counter your arguments, I'm going to use a couple of typical examples of when an in-house versus open-source / external debate comes up. I'm not counting the infamous "leftpad" cases, those are usually trivial, and really don't matter in the grand scheme of things. If it's a one-liner, just implement it yourself.

1. A high level database or queue lib, or a custom / powerful serialization lib or, relevant to this topic) an ORM or other foundational/low-level part of your tech stack.

What you can expect to happen is a bunch of very good programmers early on build powerful abstractions using macros, metaprogramming, advanced type system concepts and build up a codebase adding up to a few thousands of lines. It just works, it's a good system - a few bugs are patched by the team every month, but that's fine. Fast forward a few years, the programmers have moved on, "onboarded" the rest of the team to the codebase during their respective last week, but given how complicated the codebase is no one is really capable of debugging it and fixing issues. And given that it's not open-source, it never got an opportunity to build a community of contributors. Your team is now SOL, and it's going to take _months_ to replace it with a more well-maintained open-source solution.

2. Building a A/B testing implementation in-house - again a couple of good programmers build a working, scalable, basic system in a weekend. It actually works and the code is good, simple, readable and well-tested. But then, your PM team or your Marketing team wants you do add graphs. Then export the data to RedShift. And then tweak the algorithms powering the backend. Then multi-arm bandit. And so on. Now, what was now a weekend project, turns into months of work - whereas there exist paid services that do this for you.

Sometimes, it's unavoidable, external alternatives are genuinely not good*. But I strongly think, you have to be very, very careful about building systems in-house when they are not your business.

> I don’t mean to sound crass but how on earth could you think this is an example of Poe’s law.

Sorry for this. I do feel quite strongly against your original comment (at least the way it was written without context), and I think it's the _opposite_ of being a "responsible developer" in all but edge cases, and think you are wrong. But calling it an example of Poe's law was not right on my part, and was harsh.

> But if you’re going to be working with that application for years and years to come then I highly recommend trying to write your own code instead of relying on libraries.

I've done this, and have done both in-house and oss code, but in-house, very reluctantly - for example - when there's just one or two maintainers committing code, and there's no alternative. But even then, I have usually forked the code and used that or parts of that as the base, rather than starting from scratch


There is hardly ever a time where an in house solution is better than a third party one for cross cutting concerns. Most of the packages are open source.


Idunno man, my day job working Rails code uses a custom mailer and job queueing system and everytime I have to work with it I really wish they'd used ActiveMailer and ActiveJob


Like I said, “almost” always. Really the larger the application and the longer time you as a dev will work with it, the more meaningful it becomes to write your own solutions.

It’s really a balance, but I don’t think it’s a balance most devs consider and they really should.


While I agree with you, I'd like to point out there are interesting exceptions: software components that can never be "complete". Such components require permanent maintenance workforce, and you might not want to dedicate resources for this.

Such as:

- API abstraction layers (like SDL, Allegro, SFML, etc.): you want to support new operating systems / new APIs by default. And most of the time, you don't want to spent time learning about the specifics of X11 window creation or Win32 events, as this would be throw-away knowledge anyway.

- hardware abstraction layers: you want to support new hardware by default, this is why we use operating systems and drivers.

- Format/protocols abstraction layers: if your game engine only uses JPEG files directly coming from your in-house asset pipeline, it's perfectly fine to develop in-house loaders (from scratch or from stb_image). But if your picture processing command-line tool aims to support every file format (especially, the ones that don't exist yet), then you should rather go with an updeatable third-party library, which will allow you to get all new formats by default.

- all kind of optimizers, including compilers, code JIT-ters, audio/video encoders, etc. More generally, all code that uses some heuristic so try to solve a problem that's not completely solved/solvable. You might be ready to accept the performance of a specific version of, for example, libjit. But you might instead consider that in your case, not having state-of-the-art JIT performance might be detrimental to your business, in this case you want to get the performance enhancements by default.


Lack of testing, lack of documentation and lack of use would be reasons that your claim is usually untrue. You can't Stack Overflow a problem and see if anyone else has encountered it before.


This is a terrible reply, what are you even trying to say? You can’t stack overflow a problem so don’t write your own in-house solutions? Lack of testing? We write our own tests. We write our own documentation.

It’s crazy to me how many people on HN are ignorant to the costs of third party dependencies and the benefits of in house solutions when building large applications.


I am trying to say that most of the home grown solutions I have seen have been pretty poor quality, and lack documentation especially. Do enough maintenance programing and you will understand.

If you do test and document your own stuff properly you are in a small minority. Why not release it for others to use?


In house means more customized to the specific problem but with far less expertise in the general technology. I find the latter almost always outweighs the former when working at any cost center tech shop.


Otherwise known as NIH in steroids.


Opposite; an in-house solution is almost always worse than an external dependency when that dependency is something as important to get right as an ORM.


The best of both worlds is write your own universal preprocessor...


The problem in many cases is actually in the OO part, in my experience - in the vast majority of cases where databases and persistence is concerned, staying in the procedural/structured + relational world keeps things simple, whereas objects often obscure what is actually happen, and invoke opaque magic such as ORMs.

I wonder what your experiences had been if after dropping the ORM you had gone one step more and dropped the objects.


This.

After contemplating my distaste for ORMs more carefully, I've come to the realisation that my objections aren't so much to do with the concept of an ORM but rather object orientation itself—and the fetish of treating it as the perfect hammer for every nail.

For the projects I've worked on, I've almost never wanted to turn data into objects. And on the occasions when I've thought otherwise, it usually turns out to be a mistake; de-objectifying can often result in simpler, shorter code with fewer data bugs.

Ultimately, the right answer depends on the nature of your particular business logic, how data flows in your wider ecosystem, and pragmatically, the existing skills of your workforce.


Hahaha! ^_^

Seriously Jesse, isn't this the very same reason __some__ people end up implementing yet another programming language without realizing it?

First they start out of exasperation with X language they use, because they hit some obstacles or limitations, and before they know it they end up implementing a newly created language.

You know what's the fun part? In their attempt to fix the aforementioned language's issues, they end up introducing __the very same problems__ in their own language, only under different "cloak" so to speak.

It's a vicious cycle I'm afraid...


So are you suggesting that we can’t do better than assembly machine code?


SQL is great if you will have multiple applications looking at the same dataset. E.g. An employee management program and a payroll program. In this case you should design a sane schema and mold the app around it.

ORMs are terrible in this sort of world since they tightly couple the application to the data. But if you will only ever have one application anyway the abstraction of a separate schema is pointless.


A lot of people who believe only one app (or one language) accesses their org's datastore are mistaken. You have to take extreme measures to prevent ad hoc uses from popping up.


Yes, yes, yes.

Why is this the case?

1. If you are doing anything interesting, people are going to ask questions about what you are doing, and the best way to answer those questions is going to be by querying your database.

2. One day you might want to rewrite some of your service/s, split them into microservice/s, etc. At that point, there will be a minimum of two services talking to your datastore: the legacy service and whatever you're replacing it with. I suspect any alternative to this arrangement will be an even worse idea, e.g. taking a deliberate outage to perform a likely-irreversible migration.


> One day you might want to rewrite some of your service/s, split them into microservice/s, etc. At that point, there will be a minimum of two services talking to your datastore.

You should not do this. It removes almost all of the benefits of extracting things into a separate service (services should own their data and the only means of accessing it should be via their APIs). That's not utopian; that's one of the main reasons you do a service extraction in the first place.


Right, so let’s suppose you already segmented the data to two different backing datastores, and your monolith is now connecting to both of them instead of just the one. Now you can do the service migration, at which point you still run into the situation I’m discussing.


Cutovers are hard, to be sure. Ideally they should also be short (the time time a service undergoing mitosis spends talking to the old and new locations should be measured in days or hours or less).

Don't choose general data access patterns for the infrequent occurrence of cutover. Cutover is when you break a few rules and then immediately stop doing so. Build for everyday access patterns instead (which should be through the API of whatever owns the data--SQL is a powerful language and a really shitty API).


Stored procedures are a better API than arbitrary SQL. You may even be able to enforce it by granting EXECUTE permissions but not SELECT permissions.


The simple solution to 1 is to never allow direct database access. Api only.


Of course. But surely you don't let anyone access your API, and you put it behind another API, right? Just in case you need to change that first API without breaking all the users.


Never even tell you have one, else the founder will pat on the back of one of your most junior dev and ask if he can give access to the db to that other team who needs to make money :D


So you do all your analytics by running a series of service calls and then writing a script to collate them into the needed results? Seriously?


I'm not the GP, but yes, absolutely. There are plenty of things that make this less than awful:

- The existence of tools that allow structured access to multiple APIs (GraphQL is a nice middle ground between "YOLO any queries you want" and "you only get row-by-row access exposed by the web APIs").

- The existence of data on multiple internal data stores. Analytics folks usually are not prepared to engage with the complexity of data being stored across handfuls or more of different stores with different schemas. The owner of the application knows how to join that stuff better than they do.

- Building intermediate/denormalized stores isn't frowned upon just because analytics shouldn't run ad hoc queries on the main production DBs. Expose change streams or bulk ("too much" data) endpoints and make it easy to load their results into a reporting system, which can be raw SQL. It's not redundant; if you don't do this, the following conversation starts to happen often: Q: "I'm running raw analytics queries on production and it's not quite working, can we just make $substantial_schema_change so my report works/is fast?" A: "No, we explicitly chose not to structure the DB/index/whatever like that because it seriously fucks up a real user access pattern."


Forcing analytics to go through the API doesn’t actually reduce load on the production DB, it just increases load on the API itself. Step 1 should probably be a dedicated read replica and step 2 should probably be an ETL process.


Ding ding ding. Dedicated read replica and an ETL gets you to a point where queries don't bring down prod. If you have an analyst org running wild making bad decisions about data that they think says things it doesn't -- that's probably a good sign that it's time for a dedicated data engineering team, and potentially a BI flavored data science team as well.


Analytics queries bringing down prod seems . . . pretty amateur hour. I'm more interested in whether or not analytics queries actually get the data they're interested in when they want it. The reporting team is likely not better versed in what means what than the developers who work on the application databases. What about multiple internal DBs that reporting wants to analyze as if they were one? What about schemas that change over time, obsoleting the analytics team's assumptions? Reliable, versioned data access APIs address both of those families of problems. Yes, it's harder than "YOLO query prod". It also works for longer without breaking, and jives with the scale out plan (usually discrete APIs, sharding, and then maybe microservices and more families of APIs if you're mature enough).


> Reliable, versioned data access APIs address both of those families of problems.

They only address it in so far as they push it downstream to the analyst, who as you mentioned, "is likely not better versed in what means what than the developers who work on the application databases."

There's a reason why datalakes exist, and having used them at past N companies, I think this is why data engineering of the BI flavor becomes necessary at the point that reporting becomes critical. An API is strictly worse than a datalake, and it's not hard to set up and maintain the latter. API versioning and communication are for frontend integrations, paid partner integrations and potentially (although I'd probably lean more on gRPC and the ilk) microservice to microservice interactions. But, I like to avoid building unnecessary API surface when I can.


What do you mean by this? What other way , other then digging right into the data is there to access the database? Isn't it all through APIs?


I like Clojure’s HugSQL[1] for this reason: you can simply write raw SQL, but when you start duplicating code, you can start factoring those bits out into composable “snippets”. The best of both worlds: composability and reuse, while still writing raw SQL.

[1] https://www.hugsql.org/


I have the opposite view. I find ORMs annoying and obscure, and I think they introduce duplicated code.

If you need to run a certain query in multiple places, you need to repeat the same ORM expression or refractor it into a function. I find much better to have a module with all my SQL queries as strings. That way whenever I need to run a query I reference it from there. Of course it helps to use meaningful names.

This approach has a lot of advantages over ORMs: * you know exactly what gets executed * automatic DRY code * the names of the SQL queries in the code are self-explanatory and the reader doesn't have to parse the ORM expression every time

Schema definitions are in standalone SQL files, as well as my migrations.

The only disadvantage is that it may be difficult to switch to a different database system, but that is not a problem for us.


Your argument does not stand up. I can have a file full of ORM sql fragments the same as you file of strings. And I can compose mine together safely and more flexibly than strings.


Have you tried Python's SQLAlchemy, the ORM parent posts are praising? The `sqlalchemy.sql` module is awesome and pretty much maps 1:1 to raw SQL.

Composing SQL expressions using this library instead of using string interpolation/concatenation has several advantages:

* DRY and composition * safety * portability (if you have switch the underlying DBMS)

Often the result is as good or better than my raw SQL. The fact that Python has an amazing REPL makes the process pretty much like testing queries in the database prompt but with less cognitive switch between languages.

In the end it is a matter of taste, but I have to agree with parent posts, SQLAlchemy raises the bar for other ORMs.


> I find much better to have a module with all my SQL queries as strings.

But you can't compose them, so there is a lot of duplication. Also, how would you handle dynamic filters and columns? Concatenating strings? That seems error prone. At least a nice query builder would be useful, but then the whole just write sql thing falls apart.


Hahaha, wow! That’s just about the most awful thing I’ve ever heard. If you ever find yourself sitting across a table from an interviewer, I would definitely recommend not including this little tidbit in the conversation.


ORMs seem to be a typical example of over-engineering. Often you don't need all that complexity they come with and when you do, you are probably better of understanding exactly what you are doing.

So maybe building a minimal API, wrapping your SQL queries isn't such a bad idea after all.


The things I've found positive about ORMs are exactly that mapping of results to business objects. The things I've found "not worth it" are the query-building APIs baked into the objects. These principles can be seen in a lightweight ORM I made, PureORM [1].

[1] https://github.com/craigmichaelmartin/pure-orm


> and handle mapping data explicitly.

Here's your problem


There is a lot of ancillary complexity in database connection libraries that we could attack before replacing the standard structured query language by some poorly considered mapping of objects to and from relation(s), inspired by poorly understood bad old OOP, which is generally what all ORMs boil down to.


you could consider something like slick onstead of an ORM:

http://slick.lightbend.com/doc/3.2.0/orm-to-slick.html


This is exactly what MyBATIS is for.

You throw in SQL, provide a simple mapper, done. IMHO it's far superior to ORMs when your database is or may become complicated.


It is striking the balance between your own queries and ORM.

My rule of thumb is that I always go with ORMs for MVPs and small apps. Optimizing for speed usually means going deeper and building a system or queries for yourself. Until that point I usually stick to less verbose code and more to business rules.


Views are another way to gain reuse.


As are stored procs, user-defined functions, triggers, scheduled jobs,.. SQL databases are programming environments, not pure datastores.


Most are exceptionally bad at being programming environments, though.


> I started coding some simple functions to help map the tabular data to objects.

Maybe you should not do that? Can you give us an idea of the domain problem you were trying to solve that made you feel the need for that?


I use jdbi.org in Java all the time because it does just that for me.


MyBatis


According to the Amazon summary from the linked book:

"a major shift in prosecutor behavior that occurred in the mid-1990s, when prosecutors began bringing felony charges against arrestees about twice as often as they had before"


If you don't want to do time, Don't do the crime.


How does that work exactly when the crime is against a plant you use for personal medical reasons. Love it or hate it, medical weed is cheaper and easier to produce then most opioid alternatives. [i.e. morphine, codeine, oxytocin, etc.] And its safer to produce, process, and distribute to end users [consumers]. So grandma gets a felony conviction because she grew her medical weed in her backyard garden and now has 10-15yrs of her life wasted in a prison cell... How exactly does the math of justice add that up? Grandma grew 20 plants because she wanted to make her medicine and have enough left over to make some edible cookies... Grandma's of America union doesn't exist, grandma is left to the mercy of a corrupt justice systems that mandates minimum federal sentencing guidelines regardless of a defendants circumstances. The math doesn't add up, it's cruel punishment and unfair treatment of the law. I mean if grandma had a meth lab then yeah she deserves to "due the time" but for weed, no one should go to jail for weed... it's a complete waste of tax money..IMHO


> My little brother is 16 and I know he won't use this because no one he knows uses Facebook.

Is this really true? I heard from someone at Facebook that while teenagers say they don't use Facebook, if you look at the stats, they definitely do.


Very true in my experience. While many (maybe 60%) still have an account, they basically do nothing with it. If you're a Facebook employee you may see that number of signups number and go "yeah they are using it" but the amount of use is VERY different from what I can tell.

Generally I'm very in agreement with the parent: The older won't want to use Facebook for dating, the younger would never look to Facebook.

That said, I'd go even farther and say the original claims are offbase here. First, teens absolutely have multiple personas they show, but they tend to be privacy circle based in that there are maybe 3/4 levels and they share increasingly more/less with each level.

Not only that, Facebook knows this. Instagram launched "close friends" explicitly because everyone was using "sinstas" and "finstas" to post to a smaller subset of people.

To me, all of this says that Facebook is probably targeting an older market of maybe say Match/OkCupid users who aren't used to a type of Hinge style interface. Anyone I know on Hinge would laugh you out of the room if you asked them to switch to Facebook dating.


> number of signups

Definitely not. DAU and MAU are what they base their user numbers and big milestones on


Wrong verbiage, point still stands. Probably a decent number of MAU on Facebook in that range but logging in and checking one thing twice a month is a big usage difference from daily or semi daily sessions that last for minutes, not seconds.


"Active" there means opened, not actively consumed. I'm "active" almost every day, but the majority of my sessions are below 20 seconds. They mean nothing to Facebook, and they mean nothing to me. Facebook lost the value it had for me.


Another anecdata, but our kid's babysitter (also 16) doesn't have a Facebook account, and neither does her twin sister. They both have IG and Snap though. She said the same thing: "Facebook is for old people".

Also my 20 year old cousin didn't have a FB account until he went to college and was forced to get one to join some local FB groups. But he only uses it for that. Otherwise he's on IG and Snap.


They use FB Messenger for sure. From what I hear it's still popular among younger cohorts.


I use the desktop version of Draw.io + some custom code to make the diagrams searchable: https://github.com/JesseAldridge/draw.io-convert

I do think a truly useful knowledge repository will fundamentally need to be a graph structure. This roughly reflects the neural connections in the human mind -- knowledge itself is a graph structure.

Draw.io is the best diagramming tool I've found, but it still kind of sucks in a lot of ways. I've started working on an alternative: https://github.com/JesseAldridge/consideration

The key features I'd like to add are multi-user collaboration and search across many diagrams.


What is Tree Notation? This is the first Google result: https://en.wikipedia.org/wiki/Newick_format

Also, what do you do if you want to put the same node in two different places in the tree? (i.e. you want a graph)


http://treenotation.org/

For graph nodes I use an id.

  organization Microsoft
   hq Redmond

  language C#
   corporateSponsor Microsoft


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: