Hacker News new | past | comments | ask | show | jobs | submit login

GraphQL is a great experience when you consume it and the service fulfills your query needs. Because you just ask stuff and you get them. It's really cool.

On the other hand, when you are the one to implement the Graphql server, it feels like writing your own database. You have to create a query plan, handle optimizations, corner cases, etc.

Also if you really want to provide a graph experience, with inverse connections, filter on relationships and other advanced stuff... get ready to burn your mind and your soul.




> GraphQL is a great experience when you consume it and the service fulfills your query needs. Because you just ask stuff and you get them. It's really cool.

I guess it's better when the tooling you use has direct gql integration and builds the queries for you?

Because in my experience accessing the github APIs with "basic" HTTP libraries is way more annoying using v4 (graphql) than v3 ("rest") — it could also be that github's v4 API is dreadful mind, I wouldn't be surprised.

GQL should be more efficient because it's not returning 95% of garbage I don't need, but having to write 5-deep queries (because of the edge indirections) by hand is way more of a pain in the ass than performing two GET requests with a few parameters munged in the URLs. And then I still have to go and retrieve the information 5-deep in the resulting document.

Pagination is also awkward, because now you probably want multiple different queries (and thus multiple different resulting documents) so that your 2+ fetches don't retrieve unpaginated information you got the first time around. And it gets worse when nesting comes in.

I don't think graphql is generally a great experience when you consume it either.


100% agree that pagination is extremely awkward, especially with nesting. Between the pagination problem and the "oops I asked for too much data and blew up the server" problem, I think it's more work than one might think to run a GraphQL API.

For my own work, I took things in a different direction: I made a query language that parses as legal GraphQL using a vanilla parser, but has directives like `@filter, @recurse, @optional` etc. Instead of returning a giant fully-nested result, it flattens results and emits them row-by-row like a SQL database. This means the query evaluation can be lazy and incremental — if you write a query that has a billion results but only load 20 of them, then only 20 rows' worth of work happens.

My company has been using this in production for 6 years now across everything from TB-scale SQL clusters with X00,000 tables/views, to querying our own codebase, configuration, and deployment information to find and prevent bugs. I gave a 10min talk at a conference about this recently, if you'd like to learn more: https://www.hytradboi.com/2022/how-to-query-almost-everythin...

Project GitHub: https://github.com/obi1kenobi/trustfall


Starred and noted for a project I'm working on. Thanks!


This is super cool. Thanks for sharing


I work at GitHub and would love to hear more. Can you describe some of the data interactions that you find more convenient with the v3 API?


Not parent, but my biggest challenge is some v3 APIs are not there in v4 yet. For example, activity and notifications (https://docs.github.com/en/rest/activity/notifications) is something I'm still looking forward to, but forced to keep using REST until it becomes available via GraphQL.

The pagination point is described well in nearby comments. It only applies when attempting to paginate across more than 1 dimension at once, like "get all pages of comments in an issue, and all pages of reactions for each comment".


Thank you! I agree that there is more we could do to reach parity here.

Is paginating across multiple dimensions possible via the REST API?


Everything is easier to use with rest because it's so simple it works with curl trivially.


Not just cURL; most of the time I want something from the GitHub API it's something fairly simple; using REST from Python, Go, Ruby, $preferred_language is easier than using GraphQL, too. I'm sure there are libraries out there, but hard to beat a simple "fetch me data from that URL yo".


GraphQL uses HTTP like the REST API and speaks JSON. There's no need for a library if you're comfortable sending a POST request.

It seems to me like the main friction that you and others are getting at is that GraphQL is more work to use than REST because you have to write a query. That's a fair point! Perhaps we could publish "canned" queries that are the equivalent of the most commonly used REST endpoints, or make them available for use in the API with a special param.


Yes, you need to write a query; and it's also not at all that obvious how to write a query. Let's say you want to list all repos belonging to a user or organisation, a fairly simple and common operation. I found [1] in about 30 seconds. I've been trying to do the same with GraphQL for five minutes now using the docs and GraphQL Explorer, and thus far haven't managed to get the same result.

I worked a bit with GraphQL in the past, but never all that much. Now, I'm sure I could figure it out of I sit down, read the basics of GraphQL, familiarize myself with GitHub's GraphQL schema, etc. But ... it's all a lot of effort and complex vs. the REST API; even with a solid foundation in GraphQL there's still lots more parts.

GraphQL is kind of like giving people a schema to an SQL database and telling them that's an "API"; it kind of is, but also isn't really. There's a reason almost all applications have some sort of database layer (API!) to interact with the database, rather than just writing queries on the fly whenever needed.

[1]: https://docs.github.com/en/rest/repos/repos


That's completely fair. I think the analogy to SQL as an API is very apt. No one would argue that full SQL access isn't a powerful API but it takes some legwork to understand the schema and write queries to get the data you need.

There's a divide between at least two types of persona here. On one side is integrators building products and features on top of the GitHub API. For these people GraphQL is arguably superior since the learning curve is manageable and in exchange for scaling it you can make an extremely specific query that returns your product's exact data requirements. The cost of writing this query is amortized over the lifetime of your integration.

On the other side are e.g. users automating some part of their GitHub workflow to save themselves time. I can see how the REST API feels like a better choice here, it's certainly simpler to get started with.

For what it's worth, here[0] is an example of using the `gh` CLI's graphql feature to print a list of all repository URLs for a given organization by login, sorted in a relatively complicated way. It's more verbose than doing this with the REST API but significantly more flexible. This could just as easily be done with curl but as others have pointed out, pagination requires a minimal level of logic to implement, so it's more convenient to use an existing helper. This output gets flushed 10 lines at a time as pages come in, making it suitable to compose with other commands using pipes.

  $ gh api graphql --paginate -f query='
    query($endCursor: String) {
      organization(login: "rails") {
        repositories(first: 10, after: $endCursor,
                     orderBy: { field: STARGAZERS, direction: DESC }
        ) {
          nodes {
            url
            stargazers { 
              totalCount
            }
          }
          pageInfo { endCursor hasNextPage } # needed for auto-pagination
        }
      }
    }' -q '.data.organization.repositories.nodes.[]'
  # output follows
  {"stargazers":{"totalCount":50643},"url":"https://github.com/rails/rails"}
  {"stargazers":{"totalCount":5298},"url":"https://github.com/rails/webpacker"}
  {"stargazers":{"totalCount":4849},"url":"https://github.com/rails/thor"}
  {"stargazers":{"totalCount":4075},"url":"https://github.com/rails/jbuilder"}
  # snip
  {"stargazers":{"totalCount":3},"url":"https://github.com/rails/hide_action"}
  {"stargazers":{"totalCount":2},"url":"https://github.com/rails/gem-buildkite-config"}
  {"stargazers":{"totalCount":2},"url":"https://github.com/rails/sqlite2_adapter"}
  {"stargazers":{"totalCount":1},"url":"https://github.com/rails/fcgi_handler"}

[0]: https://gist.github.com/brasic/14222db7b3f5873b84820477cca27...


The GraphQL API works just as well with curl. There's no getting around the fact that you need to pass a query text but assuming you put the query in a file the curl syntax is identical to the REST API:

   curl https://api.github.com/graphql -H "Authorization: bearer $TOKEN" --data @query.graphql


> Everything is easier to use with rest because it's so simple it works with curl trivially.

Dead comment.


> Pagination is also awkward, because now you probably want multiple different queries (and thus multiple different resulting documents) so that your 2+ fetches don't retrieve unpaginated information you got the first time around. And it gets worse when nesting comes in.

You don't need to write a different graphql query, use variables. Good graphql APIs will expose a start and limit field for Pagination.


I think you misunderstood the issue. Of course you will use variables for the pagination itself, the issue is that your head of line query will be grabbing other fields than the paginated one.

You don't want to repeat these fetches in the followup queries, they're redundant, and assuming the API is rate-limited they will decrease your query budget for no value.

That counts double if you're fetching multiple paginated fields (which also adds to the awkwardness).


Agreed — consider what happens when you have a node(start: Int, limit: Int) inside or alongside another such node with start and limit.

Your pagination is now two-dimensional, with each node's start/limit as points on its own axis.

Now add a third node to the query. Now you have three-dimensional pagination. This quickly goes off the rails.

Try writing a generic N-dimensional paginator for such a query to see why it's difficult. Even designing a sensible and reasonably flexible API for one is a headache.


I think I got what you are saying now. If you want to get paginated children nodes, then the root will be fetched again which is a problem.


Indeed. It may not be a huge problem depending on how much data you need, but there are lots of cases where you'd really rather avoid refetching the rest of the root.

TBF you could also deal with it using fragments I think, but still, not great.


Pagination is a bear.

Most of the simple ways of doing it with SQL are problematic. In these docs

https://docs.spring.io/spring-batch/docs/current/reference/h...

there is some discussion of the problems and some answers that go back to the mainframe era.


The server side of pagination is really complex if you want to make sure all the results are returned exactly once. If that's your case, consider not paginating at all.

But very often, a result missing or duplicated in a few queries isn't a showstopper. On that case, pagination is very simple.


Spring Batch covers the cases where you have to get the right answer!

That is you are "full scanning" and making a report or doing something like a reconciliation process in a financial institution, it is not like some image boards where there is a link to the 1781th page of images but it spends forever loading it if you click on it.


> but having to write 5-deep queries (because of the edge indirections) by hand is way more of a pain in the ass than performing two GET requests with a few parameters munged in the URLs. And then I still have to go and retrieve the information 5-deep in the resulting document.

I usually write my queries in GraphiQL (check some boxes) and then paste them into the app after I have them working right.


Having consumed both and recently rewriting a lib to graphql i know enough to not wanting to roll my own..


I think GraphQL is best understood as an incomplete ORM that you have to complete yourself on the backend. If GraphQL generated SQL (given some tooling or what have you) pretty much all the problems are solved. Indeed backend-as-a-service products like Hasura or Postgraphile are this missing piece. I guess we're uncomfortable with SQL over the wire or open-to-the-world databases, but we shouldn't be.

Or maybe TLDR "dear next generation of engineers: SQL is actually pretty good".


SQL is excellent but exposing your database is not. The inventors of graphql specifically said that they don't think that it is a good idea to do so. They never intended it for that.


> SQL is excellent but exposing your database is not.

That was definitely true before Postgres got row security (admittedly in 2016, after GraphQL was released in 2015), but these days there's really no need to run an entire app server in front of your database just to implement permissions.


I am with you. Every time I looked at GraphQL or asked to implement one, I had to say no.

How is this a good thing for the backend or infra engineers? It's a mega facade without a lot of toolings to help the backend.

GraphQL reminded me of common ORM criticisms. Wide API surface area with a lot of rooms for accidents. And GraphQL made it worse by being exposed as a service.


Infrastructure is one thing that seems to catch a lot of people off guard. So many infrastructure tools are based on monitoring HTTP codes, but even when there are errors graphql servers send 200s unless modified. It turned into quite the headache for us.


100% - I can see how it might be great for FB where they have the capacity to optimize but without that engineering capacity it seems like it would turn into a net negative.


No one sees the backend. So who cares? /s

With ORMs at least, the developer is likely either thinking about limitations, or really needs the guide-rails an ORM provides.

I doubt a lot of front end engineers, many of which probably have never optimized a DB, are thinking about the consequences of their queries.


ORMs are also pretty easy to use on a case-by-case basis if needed, either by using the escape hatches the ORM provides or by bypassing it altogether. Deciding "oh GraphQL isn't good for this particular use case so I'll spin up a parallel REST API" is a much bigger decision to make.


what you say is unpopular, but it's a lot more true than most people (especially front end people in this case) want to admit. Of course there are plenty of exceptions (people on FE who think about, care about, and know about what happens on the backend), particularly the Venn diagram of FE people reading HN, but the majority in the industry definitely do not. The bigger and/or more specialized the company, the worse that problem gets.

To be clear, this is not just a problem for FE people. extremely normal for humans to become myopic in the areas they spend the most time in. FE does it, BE does it, management does it, everyone does it. Find a standard mobile engineer doing native iOS or Android, and they're going to be even more disconnected from the effects on the backend, and they come by it honestly. If you tend to specialize more in one area, building an awareness of your own biases/perspective, and exercising intentional empathy, can make a huge difference in how easy you are to work with.

When looking at dysfunctional engineering orgs, one of the first things I do is figure out where the "power" is and figure out their backgrounds. The most extreme example might be a company founded by a FE eng for whom backend is just a necessarily evil to support their app. Or a company founded by a BE guy for whom the real value is the API, and the clients are just there to abstract it for normal people.

Taking this in and finding a healthy balance of the way things are structured can help improve a dysfunctional org a lot. FE, BE, DevOps/Infra, etc are important pieces in an overall puzzle. Without a well-functioning team behind each, the company and product suffer.


I'm still pretty new to dev, but what's wrong with ORMs? And, importantly, what would you recommend instead?


SQL is a transferable skill. ORMs are not. If you already know SQL and have to use an ORM on top of that, then it's a net loss.

It's trivial to use SQL to build objects from individual rows in a database. Which is all an ORM is really good for. Once you start doing reporting or aggregates, then ORMs fall apart. I've seen developers who, because they have a library built up around their flavor of ORM, go and do reporting with that ORM. What happens is this ORM report consumes all the RAM in the system and takes minutes to run. Or crashes.

ORM code hits performance issues because so many objects have to be built (RAM usage) and the SQL queries underneath the framework are not at all efficient. You can use OOP on top of SQL and get decent performance. But you need shallow objects built on top of complex SQL. ORM goes the opposite: large hierarchies of nested objects built from many dumb SQL queries.

This also ties into GraphQL. Think careful about the hierarchies you design. A flat API call that does one thing well is often better than some deeply-nested monster query that has to fetch the entire universe before it can reach that one field you need.


GraphQL is not an ORM. An individual GraphQL server implementation can act as an ORM for a specific use case, but GraphQL can do much more than that.

You should not ever need to implement your own GraphQL server. There are plenty to choose from.


I am personally fond of:

1. query builder APIs, which can only generate valid queries, but you can control exactly what that query will be, 2. APIs that return basic data structures from the database, like maps or tuples.

Query languages like SQL are very powerful and easy to learn. And in my opinion, preferable to ORM approach of "what method calls do I need to make to trick the engine into executing the SQL I know would make this work?" ORMs add complexity and limitations that, in my opinion, are not worth the benefits.


Personally I don't think there's anything wrong with them. They're just tools.

but like any abstraction, if you don't know what's going on behind the curtain, they can turn into foot guns quick.


People will have different opinions, but for me, there's nothing wrong with ORMs themselves, they are a significant productivity boost for 80% of the database interactions in your app. The tricky part is recognizing the 20% where ORMs are a bad idea, which ends up meaning that an ORM is best used not as a replacement for knowing SQL, but as a tool to make you more productive when you already know SQL.


ORM's are fine for the majority of simple use cases. When things get complicated you end up either fighting with the ORM or just overriding it and writing the sql yourself anyway.


Yes, there definitely need to be escape hatches for situations where you need to write sql.

But that should be rare. If you're commonly bailing to raw sql, I'd say there's something wrong, probably a poor fit of the orm to the problem.


I'll use raw SQL (maybe not as an entire query, but something like a computed column) pretty often, for situations where I want to query things like "give me all foos with a count of bars related to them", or "give me a list of foos with the name of the latest related baz". Most ORMs would want to hydrate the graph of related objects to do that, or at least have multiple round trips to the DB server.


That doesn't sound like a good ORM. They should be lazy until you access the relevant data.

That said.

This is where the knowing what's going on behind the scenes matters.


Oh they would be lazy, it's just that expressing something like that efficiently (i.e. something like "SELECT foo.*, (SELECT count(1) FROM bar where foo_id = foo.id") is usually really hard to do. Most ORMs I've seen would N+1 on that with a naive approach, and even the "optimized" approach will want to fetch all bars vs. just the counts.


I can say for sure this is the case


Also REST APIs work nicely with caching proxies and such..


It’s not, really, but it IS a good thing for feature development speed if that’s what you’re into, and might help a team figure out quickly which data is critical to optimize for once you start putting more serious data loads through your APIs?


That's what you get for GraphQL not having an algebra.

If it had an algebra you could build a database engine that answers GraphQL queries like a conventional database engine or you could write a general purpose schema mapping and some tool would write the code that converts GraphQL queries to SQL queries or some other language.

As it is, GraphQL provides a grammar that looks like something people want to believe in but behind it all is a whole lot of nothing.


If you want to see what a GraphQL with an algebra could look like, I built one! The query language is parsed with a vanilla GraphQL parser, but has directives like `@filter, @recurse, @optional` etc.

10min talk video: https://www.hytradboi.com/2022/how-to-query-almost-everythin...

GitHub: https://github.com/obi1kenobi/trustfall


This seems like a "be careful what you wish for" situation.

Sure, you could set up an algebra that allows you to handle arbitrary queries for zero extra programmer effort, just like a SQL database engine does. And then you could even expose it to users, and let them execute arbitrary queries.

And then, later, after you're done cleaning the molten slag off the server room floor, you could stop and reflect on whether that was really such a necessary thing to do.


If you had a rigorously defined system you could put rigorous limits on it.

If it's not rigorously defined there are no limits, just what people can get away with.

With GraphQL you get the worst of both worlds that people can't write arbitrary queries but they can still trash the system. At least with undefined semantics people don't need to argue about whether or not they got the right answers.


"Rigorous limits" for a sufficiently large database means "uses our hand-picked indexes effectively", which reduces down to "provides the same functionality as a REST API" since you need basically a whitelisted list of acceptable operations. At best you can reduce transfer time by limiting columns returned, which is something but not really worth the added complexity.


I guess I've always assumed the graphQL would be a nice way of implementing a rest api, not something you'd expose to the customer directly.


My experience trying to maintain databases that are directly exposed to multiple development teams tells me that even exposing a fully generic querying API internally is risky.

Which, just for context - that's not me saying "graphQL is bad", it's me saying, "graphQL making it hard to do that is a feature, not a bug."


Ok, use https://github.com/join-monster/join-monster. If you need autogeneration from the DB instead of hand-curated joins defined on the schema, consider https://www.graphile.org/postgraphile/ or https://hasura.io/.


hand rolling a custom query engine - the exact opposite of what every business wanted when the engineers sold it graphql


Why hand-roll one when you can use one that's already available and thoroughly tested :)

https://github.com/obi1kenobi/trustfall

(Hi Dustin!)


I've yet to encounter a GraphQL off-the-shelf server (from Python and JS spaces) where hitting a slow query didn't immediately turn into half a day's work

The whole concept is what happens when you let a smart person work on a small problem for far, far too long


I'd recommend checking out the project link in the comment to which you replied. It is designed _specifically_ to avoid the problem you mention: instead of a fully materialized, fully-nested result, it returns flattened row-oriented results (like a SQL database).

This allows for lazy evaluation i.e. rows are produced only as they are consumed. So if you accidentally write a query that would produce a billion rows but only load 20, the execution of the query only happens for 20 rows + any batching or prefetch optimizations in the adapter used to bind the dataset to the query engine.


It is a fundamental problem of a "graph".

(1) There are usually some nodes of very high degree and traversing those nodes will explode your query, (2) if you are following N links and the average degree is d, you are going to come across dᴺ nodes and that is a lot of nodes as N gets big!

Tim-Berners Lee told me that if you can't send the whole graph you should send a subset of the graph that contains the most important facts.

It's a right answer but also a frustrating one to a programmer who sees correct implementation of algorithms to mean that you get the ticket done and they don't come at you with a ticket about it again. That is, that query I'm writing is part of an algorithm that depends on getting a certain answer and getting an uncertain answer for one query is like some spoiled milk that ruins the whole batch.


> It is a fundamental problem of a "graph".

So why are we using it for so many naturally non-graph problems? 90%+ of developers' exposure to graphs is through tightly abstract interfaces, I could name maybe 3 graph-related algorithms off the top of my head, but could implement none of them without reading.

We could represent the text of this comment in a graph using one node for each unique character, but the result would be stupid, the operations would be slow, the representation needlessly complex, and implementations guaranteeably hard to work with

> Tim-Berners Lee told me that if you can't send the whole graph you should send a subset of the graph that contains the most important facts.

Indeed, I also caught the ReST buzz around the 2000-2003 timeframe, and turns out 20 years later nobody does that either, because in its purest form it's a pain in the ass for comparable reasons to the topic at hand


It's funny to see a blog post on HN almost every day where somebody rediscovers the power of columnar query answering engines which are almost the opposite of graph databases.

I've lost count of how many columnar SQL databases have been donated to the apache project and there are so many systems like Actian and Alteryx where data analysts hook together relational operators with boxes and lines.

I had a prototype of a stream processing engine that passed RDF graphs along the lines between the boxes that enable an "object-relational" model, you could eliminate the need for hard-to-maintain joins but I found that firms that had bought multiple columnar processing database companies believed in performance at all cost and couldn't care less for any system that couldn't be implemented with SIMD instructions.


How are they opposite? There are plenty of graph databases out there using columnar storage, even ones directly compatible with GraphQL Federation. Best of both worlds, so to speak.


> So why are we using it for so many naturally non-graph problems? 90%+ of developers' exposure to graphs is through tightly abstract interfaces, I could name maybe 3 graph-related algorithms off the top of my head, but could implement none of them without reading.

It's a reasonable abstraction for structuring related bits of data (like would go in a typical relational database), and that abstraction can align with the developer's mental model easier.

E.g. ORMs basically convert SQL data into an in-memory graph. Likewise, graph database APIs are natively more object-y; you follow the edge from child to parent, instead of making a bit of data the same in both tables and then querying matching rows.

They're not perfect, and shouldn't be used everywhere (nor even many places they currently get used), but I can see the appeal of abusing them.


Because graphs are a good abstraction for relations and with the right tech choices, are much more manageable and malleable than traditional relational databases.


> It's a right answer but also a frustrating one to a programmer who sees correct implementation of algorithms to mean that you get the ticket done and they don't come at you with a ticket about it again.

This rather sounds like a problem about the project manager and the project management methods that he uses.


No. I had a time in my career where I was the guy who finished projects that other people started and couldn't finish.

Some coders really don't have discipline and projects never get done because they don't think things throw and keep sending half-baked patches that get sent back by test or the customer.

The role of management is to get those people working for their competitor and then have the "fixer" move in.


Nah it's what you get for GraphQL only being an API which people inevitably conflate with the database itself (a harmful trend that probably started with SQL databases).

If you want to use GraphQL you should look for a database supporting it as an interface, or failing that look for an ORM system that supports GraphQL and whatever backend you want.

Trying to convert SQL to GraphQL or GraphQL to SQL is both equally difficult and has little to do with it not having an algebra (also I think most of it is just algebraic types, possibly lacking a proper sum type).

God forbid you should try to modify anything with GraphQL though, that part makes no sense whatsoever.




> GraphQL is a great experience when you consume it and the service fulfills your query needs.

Unless you already know SQL, and you realise how small and simple the queries could be, then it's really not a great experience to be forced to use graphql.


Exactly.

I've been in web dev for 20 years but mostly in the front end space.

A couple of years ago I started doing full stack and trying different databases. For the past year or so I've been using Postgres and learning SQL. This is by far the best solution I've used so far. SQL is extremely expressive, powerful, and elegant.

The problem is that SQL has a strong learning curve which many devs want to avoid. I'm convinced this is the main reason stuff like Mongo or Prisma are so popular. I actually tried Prisma before raw SQL and I vastly prefer SQL for writing queries.

I deeply regret not having spent some time learning SQL years ago.


This might be just over-familiarity on my part, but does SQL really have a strong learning curve, or is it just not used often enough directly these days that people can get by without knowing it?

Standard SQL is a really simple grammar and a very small keyword set - there's basically selecting, updating, deleting, filtering with where, aggregate queries, grouping and joins, and that's like 95% of it. Sub-queries maybe too.


> This might be just over-familiarity on my part, but does SQL really have a strong learning curve, or is it just not used often enough directly these days that people can get by without knowing it?

I think the problem is that it's declarative instead of imperative, which is really kind of a shock if you are not used to it (you can't go step by step, there's no debugger, there are no branches etc), and also that you have to think in sets in terms of your solution, which is also awkward when you're not used to it.

I think it's definitely worth it though, as nothing we have beats the relational model for CRUD, and there are so many great learning tools online, for example: https://sqlbolt.com/


SQL has a large learning curve, you can keep learning new thing on it for ears. But not a particularly steep one, you can start using it with very little knowledge, and anything extra your learn immediately improves your situation.


> does SQL really have a strong learning curve

Depends. It's easy if all you want to do is select * from whatever;

When you get into subqueries and a whole ton of joins to get the information you need, it can get pretty complicated.

I mean, we have a full university course which was 80% SQL, spanning 6 months.

TL;DR the language is not complicated. Actually using it, can be.


In my experience mentoring entry-level/junior devs, mongo's API anecdotally seems to have a much steeper learning curve over SQL. Once you get past the fundamental CRUD idioms, there are a multitude of implementation details that, if treated as opaque by devs, can introduce significant footguns in even moderate throughput load services.

Some of these details go all the way down to the WiredTiger storage engine, but others are more vanilla (e.g. indexing strategies, atomicity guarantees, causal consistency, etc).

I personally abandoned SQL about a decade ago, but I can appreciate how clean the interface semantics are for even non-technical folks. There are certainly platform-specific implementation details that can matter, especially when you get into the world of partitioning. But largely for most service loads, you're writing queries that satisfy the known index constraints that you imposed on yourself rather than constraints resultant from implementation details.

(I totally realize that even with SQL, that last statement completely changes at a certain scale threshold.)


> I personally abandoned SQL about a decade ago

So you're in the camp that nosql data stores like dynamo/mongo is a good replacement for most SQL workflows? Can you expand on this a bit if you have the time?


I don't actually view decisions like this as binary or mutually exclusive at all. I'm a big proponent of polyglot persistence [1], use the data stores you know and double down on what you know well. I use mongo primarily in my current work, but also have redis, elasticsearch, graphite, and etcd as sidecars in the same ecosystem.

I didn't jettison SQL because there was some fundamental limitation from a storage or scalability perspective. It was clear SQL wasn't going anywhere and would be a good infrastructure bet moving forward.

But initially what drew me to mongo was the clean interop between it and JS (Node.js is my runtime of choice). The shell is written in JS, you query (filter) using objects, you insert and update with objects. This seems like a small thing but this sort of developer experience over time is impactful. Everything feels very native to JS and it does so without any heavy abstractions like ORM/ODM.

After having used it now for 10+ years though, there's much more that I admire about it. Both from a pure architecture lens, but also from API perspective as well as it continues to get better.

To cherry pick an example— The new time series collection feature is a good example of that. For years, folks were using the bucket pattern [2] as a means to optimize query strategies for time series data by lowering timestamp cardinality. Now in v5.0, they give you a native way to specify the same kind of collection but they handle all the storage implementation details related to bucketing for you. Your API to interface with that collection remains mostly the same as any other collection. This sort of community-driven roadmap inertia is attractive to me as an engineer.

(Somewhat of a stream of consciousness here, but hopefully gives you some context as to why I made the switch so long ago)

[1] https://martinfowler.com/bliki/PolyglotPersistence.html [2] https://www.mongodb.com/blog/post/building-with-patterns-the...


Excellent post. I'm very much in the SQL camp myself, but the clean mapping between MongoDB and Javascript data models is outstanding. If you have a front end that just needs persistence as opposed to complex queries MongoDB is the obvious answer.


Its worth noting, for clarity/posterity, that my prior post is actually discussing mongo in a (mostly) backend distributed systems environment. I find its just as impactful there, not just in more conventional CRUD/browser/full-stack apps.


I'll bite, I have use cases alone these lines. I don't use SQL in new projects.

When iterating on new systems, especially one with live users. I can keep users at different document schemas. If I'm careful I can make it so that all document schema changes don't break old ones yet also allow for new functionality not requiring mass migrations of documents.

CouchDB allows the db to just be exposed to the world directly. Projects where it's reasonable (user owned/controlled data particularly), I can stand up the system with 97% front end code 3% backend. Having the near entirety of your application stack in one place means you can use smaller more specialized teams and your overall areas of concerns are smaller without needing to draw up a formal spec for your data transport.

The whole GraphQL vs REST debate is meaningless when you don't even have to think about your transport stack between the server and the browser. There are other perks to this model such as providing a fully functional website/webapp even while offline. It's trivial to switch between a couchdb backend and local pouchdb copy of your db. Potentially lower bandwidth use while just transferring updated docs instead of consistent queries or asset fetching where the same data moves across the wire over multiple uses (not a win for single visits to a single page style sites). Keeping multiple clients on the same document set in sync without socket.io work.


"SQL has a strong learning curve which many devs want to avoid"

Really? Of the dozens of languages that I've learned, SQL has been the easiest.

It really feels like it was designed for non-programmers.


I started a toy project with the intention of using raw SQL, but I ended up starting to build my own ORM around all the models.

If I have a User and who is trying to create a new Post, with Prisma you eventually set it up to do something like User.createPost(content).

What does createPost method look like with raw SQL? Does it read in from a .sql file that you pass values to?


The big benefit of ORMs is in the query builders they provide: basically, syntax checking for SQL inside your language of choice, and nicer composition of SQL query parts (to make your code more DRY). Actual mapping to objects is always too heavy in my experience.

However, a slightly unrelated comment on your choice of API design: this approach always introduces an asymmetry in the model that restricts what you can do. If you start allowing post imports that auto-detect authors, you now need a Post.create(content) and Post.setUser(user) too. And then your API users start wondering what's the idiomatic way to create a new post.

The problem is that you are making an early assumption that all posts will belong to a user, yet representing that in an SQL database with database relations, one being independent of the other (User: id, name, email...), and another referencing the first (Post: id, date, user -> User, content...). Your database model allows easy transition to allowing nulls for `user`, yet your API doesn't.

Moving to a more functional API makes this much more natural and less restrictive. Shallow DAOs for User and Post and a function create_post(content, user) may look just like a namespacing difference, but they match your database design more closely. If you want to allow nulls for user in the database, you just do the same in the create_post function.

You can wrap related functions into modules (or classes) — in the domain driven design, most of these would be port/adapter functions, but if your DAO classes are sufficiently shallow, they could be service or domain functions too, etc — they are still ultimately functions (no shared state or side effects).


Thanks for clarifying, that makes a lot of sense.


Just a string in my case, JDBC prepared statement or the equivalent. But if I could really choose freely, I would put all queries as functions/procedures inside the DB to achieve real decoupling from the schema, get consistency with transactions etc, but if I mention that idea, the pitchforks come out and I get chased off the property by the backend developers who become pretty much obsolete in that architecture.


That, or just a string in your application’s code.

The problem with using the ORM as you describe is that when you hit any sort of scale, you need to be doing bulk operations, otherwise your latency goes through the roof, to the point that the number of inefficient queries you are doing can tank the database. I speak from the experience of having seen a database collapse under the load of a backend written in this fashion having request load grow past a certain point — not pretty! The interim solution is to bulkify existing queries and functions in place to the greatest extent possible, while preparing for:

Converting a codebase from having endpoints doing individual ORM operations as described to having proper separation of concerns with a business logic layer between the endpoints and the database is a _massive_ cost. The earlier you implement that, the happier you will be in the longer term. It doesn’t have to be with raw SQL, but many bulk operations are much easier to express with SQL than with the ORM.


Doesn't an escape hatch on the ORM provide that though? I seem to remember in both sqlalchemy and (libraries that use) knex being able to dip down into SQL when needed.


Coincidentally, modern graphQL backend libraries will do this for you. See e.g. graphql-java, apollo-server, many others.


If you put PostgREST in front of your Postgres instance, it looks like

POST https://my.website.com

{ title: “Cool New Technology”, article: “I learned a thing today”, user_id: 1234 }


I'm fond of query builder APIs, that only allow you to generate valid queries.

So createPost would just generate the appropriate query with the necessary parameters, and execute it.


How many different entity types and relationships between entities does your typical application have?


Hell yes. Good knowledge of SQL is a superpower and is becoming a rare art form.

The new generation of devs thinks that frameworks and ORMs will do the magic for them at no cost, but they don't. There is no substitute for leveraging your storage engine to the max.

The sad part is that databases have evolved and became much better in the last 20 years (I started with MySQL 3.x), but we just don't use them. Everyone acts like "microservices" solved all of our technical challenges. Right.


It's "previous" generation of devs that build Hibernates and Entity Frameworks and other ORMs.

I work with "data" systems, where everything has been migrating in the other direction - to SQL - from custom code for last 5 years or more.


> The sad part is that databases have evolved and became much better in the last 20 years (I started with MySQL 3.x), but we just don't use them.

Yes, it's a bit like buying a set of silverware and insisting on using the handle as the business end incase you decide to switch your tool.


In my opinion, the value of a well architected micro service is to figure out how to optimize and leverage the capabilities of the underlying storage engine, while presenting a simple performant and correct API to consumers, while not requiring those consumers to understand the underlying details of the datastore.


I am not talking about the customers. Of course they are not supposed to understand it. I am talking about the system design. And microservices do not solve problems in most companies, just create new ones. Distributed systems did not magically become simpler to reason about just because there is Docker.


GraphQL is just an API language. It doesn't free you from writing database queries.

The point of GraphQL is mostly separation of backend/frontend and avoiding over-fetching/under-fetching. If those don't sound like a benefit, you should use REST.


I would agree, but would add one note at the end: "Yes you can do this with REST by including query params to reduce/tune what gets returned, but that can quickly balloon into a monster when you get beyond pagination, ?expanded=1 for full objects (vs partials/abbreviated), etc."


The same is true of GraphQL, if you want to control how much of nested objects you get (eg. introduce limits on the number of nested objects).

Basically, with GraphQL you hide all the complexity behind generic-seeming API requiring one API call, whereas with REST you'd usually hit multiple endpoints for different data types.

GraphQL has the benefit of allowing the backend to smartly decide how to restrict the data (eg. do the joins between what would have been two bigger REST queries), but that incurs a development cost. The complexity is in marrying all the familiar optimization tricks for SQL databases with exposing that in a generic but still restricted way.


But wait, doesn’t that directly contradict the first commenter’s next paragraph?

> On the other hand, when you are the one to implement the Graphql server, it feels like writing your own database. You have to create a query plan, handle optimizations, corner cases, etc.

If it’s so easy to craft any GraphQL query as an SQL query and let the RDBMS plan an execute the query, then shouldn’t it be easy to implement the GraphQL server on the backend?


I think your point is a fair one. The distinction is that it's easy to write a contextual SQL query for any one GraphQL query when your database model closely matches your API objects. "Contextual" means that sometimes this requires a "side effect" to happen (eg. creating an index on a column in the SQL DB).

Making it generic and performant at the same time is where the complexity is.

It would be akin to saying how, since knowing that you might need an index in the SQL database is simple, a RDBMS could decide to create those indexes for you.


You're conflating a hand coded and optimized query vs. Building a system to take a tree and generate said optimized query automatically and quickly and correctly.


I don't think I'm conflating it. jseban's comment indicates that anyone who knows how to write simple SQL queries would get no benefit from using GraphQL to consume data, which must mean that there is a simple SQL query that can be written to fulfill any GraphQL query.


> there is a simple SQL query that can be written to fulfill any GraphQL query

This is true (as long as you expect both queries to be simple, or allow both to be complex).

But the conclusion you get up there is wrong and (obviously) does not follow from that. Creating a software that translate any one query into the other is a very difficult task.


Not all of the data sources for graphql may be in a single database. They may not necessarily even be stored in something that can be accessed with SQL.


[flagged]


Let's not stereotype a whole category of people.


> it feels like writing your own database. You have to create a query plan, handle optimizations, corner cases, etc.

The culprit is "micro" services. The whole thing was invented by a "software consulting" firm to milk as much billable hours as possible to make a system over-engineered and costy to support but easy to split into multi-layer/multi-stage outsourcing teams/phases, the industry fail into this stupid trap, and the burden was shifted into web & mobile clients, the next thing they realize is sometimes they have to make queries inside while loops.

If your data can be "planed" or "optimized" via a single centralized "GraphQL gateway", then it probably can be centralized inside a single database transaction call with so-out-of-date-you-should-never-use JOINs.

I recently had to render a user feed page, query uid for fids then fid with cmt-ids then each cmt-id for uids for avatar/nick and such, all from a stupid user profile lookup "micro" service, provided by another department, which only accept one param per query (spoiler alert: it's an "anti-pattern", but a sweet "optimization goals" for your next "sprint milestone"), I had to carefully and cleverly combine all those data needed make them as parallel lookups in async with a very good re-usable batch loader class. Which makes me wonder, if all those data sits right inside the same db, why bother scatter them into so many service pipes, then gather them in an PITA fashion?

As a developer I am not against GraphQL or Microservices because it pays, and it's a good pile of tech jargon to confuse the non-tech people and it really sticks, but from a pure technical point of view it's a waste of cpu0 power and emits needless CO2.


Although the microservices terminology might have been invented by a software consulting firm, distributed architecture already existed and solved problems for many large companies that needed to scale their products (and development processes) beyond what a small team hitting a single database could achieve.

However, I think that's the key point to keep in mind when considering whether GraphQL is a good fit - if you don't already have multiple domain-specific services in your infrastructure, then adding a GraphQL gateway service doesn't make a huge amount of sense to me, because you could've just had your small team of front end developers talk to your small team of back end developers to create exactly the optimized endpoints they needed to solve the problem.

To me GraphQL really seems like a solution for an organizational problem, where there are dozens of teams who all maintain their own services and apps, and now a variety of front end teams want to combine different sets of data from services maintained by different sets of back end teams in a way that doesn't have alignment across the company as as far as deployment/release schedules go... Well now it makes sense to construct a flexible API schema maintained by and for front end specialists - it's just moving their already-existing data processing/join logic out of their various clients into a common server-side component.


> because you could've just had your small team of front end developers talk to your small team of back end developers to create exactly the optimized endpoints they needed to solve the problem.

I think the OP (and many of the comments in this discussion) is about making a graphql endpoint for public consumption.

if you are doing it solely for internal use, it does make sense that the "break even" point would be different.


> because you could've just had your small team of front end developers talk to your small team of back end developers to create exactly the optimized endpoints they needed to solve the problem

What you are describing is called BFF I guess.

And apparently it's already out of date so let's again split BFFs into smaller parts.

https://martinfowler.com/articles/micro-frontends.html#Backe...

I am not against services, it's organizational motivated "micro" services I am very afraid of.


> The whole thing was invented by a "software consulting" firm

I don't know the whole origin story. It definitely does feel like something Martin Fowler would come up with. But I blame Google for really making it a trend:

https://www.youtube.com/watch?v=3Ea3pkTCYx4

And you can see they understand the whole problem with microservices. It's the same thing The Mythical Man-Month was trying to tell everyone decades ago[1]

> When n people have to communicate among themselves, as n increases, their output decreases

Microservices exasperates this. It is the Multics model. Each microservice implements its own, often wildly different, API. Which every bit of code that needs to use that microservice has to go and implement.

[1] https://en.wikipedia.org/wiki/The_Mythical_Man-Month


The point of microservices is only partly to support individual teams owning a service. AFAIK the main points are isolating failures, independent deployments and horizontal scaling of individual components.

I do agree that without a good API design it can become a mess quickly and most companies go for microservices without a clear understanding of what goals they are trying to achieve with microservices. For those companies, sticking with a monolith would've probably worked better.

I've even heard cases where companies went back to a monolith and I think that is actually a smart decision in some cases.

But I definitely don't think it is a waste of CPUs power.


I spend about 3/4 of a full time job building, maintaining and improving a corporate GraphQL API, and have for the last few years. What you are describing is not my experience. In fact it is far easier than it was in the old days when each new requirement meant code changes to a REST API.

Certainly there have been problems with queries that had unacceptable performance, even those that took down the whole API server and database. Of course that wasn't a novelty with our REST APIs either. It certainly is an issue with GraphQL, but it has been a managable one for us.

Largely this is because the API is not public, and it doesn't have to simply handle anything that is thrown at it. When we see a frequent, slow GraphQL query in a report, we have many options to deal with it, including going to the front-end team and asking them to query in another way, and I have had to resort to that. Often I can optimize the code instead.

But that hasn't been a huge problem, especially compared to the great benefits of pushing most of the data work to the front-end. And the size and complexity limitations we've built into the API handle such problems seemlessly most of the time. The caller gets a clear error message that specifies the problem, and they can usually compensate very quickly with an altered query.

When they can't then I get involved and sometimes have to say, uh, we can't do that ... without scads of extra work. And the work on my plate today is for one of those scads.

I was doing lots of REST API work before GraphQL APIs, and my own and our corporate experience is that GraphQL solves a lot more problems than it causes.


The problem with GraphQL is on the front end. Suddenly, the FE team becomes responsible for understanding the entire data model, which resources can be joined together, by what keys, and what is actually performant vs what isn't.

Instead of doing a simple GET /a/b/1/c and presenting that data structure, they now need to define a query, think about what resources to pull into that query etc. If the query ends up being slow, they have to understand why and work on more complex changes than simply asking the BE team for a smaller response with a few query params.

I hit this problem when contemplating exposing the API of the application I work on to customers, to be used in their automation scripts.

We quickly realized that expecting them to learn our data model and how to use it efficiently would be much more complicated than exposing every plausible use-case explicitly. We could do this on the "API front-end" by building a set of high-level utilities that would embed the GraphQL queries, but that would essentially double much of the work being done in the front-end (and more than double if some customers want to use Python scripting while others want JS and others want TCL or Perl).

So, we decided that the best place to expose those high-level abstractions is exactly the REST API, where it is maximally re-usable.


I think what you are basically saying is the people working on the front-end are a bunch of children that cannot be trusted to do the right thing.

I’ve seen this a lot from backend teams, and it’s beyond frustrating.

Because now my nice clean frontend code suddenly has to deal with a bunch of franken query logic simply because the backend team cannot be bothered to alter their “pristine” API.

Never mind that this means a thousand requests where one graphql query would have sufficed.


Can confirm. Getting developers (Nevermind QA's) to build data model savvy appears to be one of those things some have taken for granted right up until you realize other people really did mean it when they said you were nuts.

I've never seen it as nuts and a bare pre-req ofodern computing. Apparently this view is the subject of widespread controversy amongst peers.


> Largely this is because the API is not public, and it doesn't have to simply handle anything that is thrown at it.

I think this is the key point.


Can’t you just have a gql query abort the moment it takes too much time to retrieve the requested data?


You can. There's the concept of query complexity as well that lets you simply reject queries that are too complex and likely to cause trouble.


100% this. Folks see the cleanliness and simplicity of the front end without realizing the mountainous costs on the back.


Hmm.. that hasn't been my experience at all. I wrote the public GraphQL API for my company, and it was a pretty straight-forward experience. Yes, I had to spend some time on the basic plumbing, but now if something needs to be added, it's just a matter of defining some interface for it and fetching when required. Grabbing an object from the network or DB doesn't need optimizations, a query plan, or have corner cases. Even grabbing i objects starting at offset j only adds a bit more busy work.

Maybe the trick is to keep it simple? There's no need for a bi-directional graph or advanced filtering. But if there really is, it's not like sticking to REST would make that any easier. Some things are just hard, no matter the interface.


GraphQL itself is not a trap, but its easy to fall into the "object graph modelling" trap with it. You probably shouldn't do that unless you have a lot of resources to spend on it. I think "Graph" in the name is what leads people astray, as long as you stick to TreeQL one should be fine.


You are right. Some things are just hard. I went deep into Graphql because I wanted to explore the possibility of it being an more comprehensible interface for the end user in comparison to a REST interface. In such cases, it is not.

Graphql gives a better way to request nested schemas and handle relationships and recursion. But when you cross that line, the client now would get ideas and starts asking "Why not be able to do that operation on the 5 level deep object?". Now you have to either not allow the client to do that, or you have to "rewrite the database" to make recursion optimal.

This is not a problem of Graphql. This is an HTTP problem. When you need to promote the database querying layer over HTTP, then you have a problem regardless.


Try a nested pagination (i.e. open the 352nd page of the 7th book on the 3rd shelf in the 5th room of the 3rd city library. Make it performant. Have fun with GraphQL! /s


That sounds trivial I think because you are looking for exactly one item and there's no pagination involved.

The problem might be to get those 352nd pages of every book with the title starting with "A" sitting on 3rd shelf of every city library: when there are unbounded results nested deeper than top-level, and possibly those multiple times, that's when it gets hairy.


Actually, it's about opening 10 different books at different pages and similarly upwards... There is no GraphQL mechanism for it outside some hack in individual libraries that does more server roundtrips, and it seems like GraphQL authors simply avoid discussing it.


How would you implement this with rest / something more traditional?


For example, you can make a separate REST method returning all levels you need. However, it still doesn't account for items lost while paginating (e.g. some visible items are deleted while one scrolls on a device etc.).


The flip side of this is a lot of folks are adopting GraphQL who are not prepared to do it well, so they make something half baked, missing things you need, and their documentation is absolutely useless.

This isn't new, there's plenty of sloppy REST APIs, but it was so much easier and less painful to explore and stitch together pieces of an imperfect REST API than it is to interact with a bad GraphQL API.


I’ve found it easiest to implement in Node due to the explicit event loop structure. You use data loaders, which is a super generic term that means “batch all requests for this resource into the next event loop tick.”

So when a query requests a list of users, and then every users friends, that becomes two queries: one to load all the users, and one to load all the friends for all those users. The net effect is that your number of queries is O(query depth) rather than O(objects requested).

Admittedly this does tend to work best with more K-V oriented data that truly relational data, and might be hard to retrofit onto a brownfield project, but i’ve never found it all that hard to do.


I've found that this is a simple and effective way to handle relational data either in REST or Graphql. But imagine having to traverse trees, filter on data of different types and levels. Sort and filter on edges. I mean it can get pretty complex and I am not saying that REST would be easier on those complex cases.

In my opinion graphql and rest can both be super cute in simple everyday queries. But I am thinking that people are creating databases like postgres, mongodb, neo4j, etc, are doing exactly that. Trying to give us the power to query our data efficiently. Why not be able to expose directly the database and just add a layer for security, control, decorating and other stuff that would add value. Why rewrite databases?


There are products that do that! Hasura comes to mind.

But APIs can have different use cases. It’s usually considered bad to directly expose your table schema over GraphQL because it locks you in and makes it hard to change your data model over time. And not all API access is “get this data” and “set this data” — it can be difficult to express complex logic in just a database. And of course, some GraphQL APIs aren’t backed by a database - they’re backed by other services (a la the “backend for frontend” pattern).

I’m very pro choosing the simplest solution that works — but sometimes, the simplest solution does bring some complexity in exchange for other trade offs (like flexibility).


I agree with you. I am currently working on a project that need to give very sophisticated querying capabilities, so I'm kind of seeing everything from that prism.


css meets backend ?



that was unplanned


I worked in the data federation space for a number of years (it's actually quite an old term, I worked in it back in 2013, around the time there was an early wave of activity around this and the concept of a "data fabric").

When I saw GraphQL come out, I knew that what you are saying would happen.

In the data federation tool I worked in, SQL was the interface abstraction to join across heterogenous platforms (think of things like Presto/Trino or Dremio). GraphQL as an interface requires the same underlying infrastructure as that data federation tool I worked on in terms of query analysis, parsing, planning, optimization, execution, etc.

Those are "hard problems" due to lack of standardized interfaces, access patterns, direct data access, I/O, network bandwidth and infra related latency, costs, compatibility, data types, etc. These problems are distributed system problems coupled with often incompatible interface layers (e.g., even if you are using multiple SQL databases with GraphQL, you run into the same).

If your scale is such that you can build GraphQL on a handful of systems and for a handful of use cases, great! If you have to go to a certain larger scale, you're back into federation territory (which in the app layer might also be called API composition).

One potential option - when you reach the point where you need complex GraphQL query coordination, more than seems to make sense to implement, pair it with a data federation tool such as Presto/Trino, Dremio, Denodo, or research approaches such as caching/materialized views (engines like that are becoming decoupled from databases, such as Materialize.io) - and let those engines do the hard work.

In that case your work becomes more like GraphQL -> SQL or API -> a data federation, caching or materialization platform. CQRS and event sourcing plays a role here too.

Consider also, the possibility that if you are willing to accept a bit of delay in aggregated results from multiple systems, doing those compositions or aggregations in the data platform layer, and simple feeding those to the GraphQL interface. That could even be done in a single database/data platform if you really wanted without too much fancy federation tech.

Federation is powerful but complex. It seems like a fun hard problem, but for many tech teams, it can be a complexity and time suck. My recommendation would be try to avoid building that if you can.


A good summary, and similar to my own experience.


> GraphQL is a great experience when you consume it and the service fulfills your query needs. Because you just ask stuff and you get them. It's really cool.

How about caching? It feels like GraphQL tries to win some (arguable) flexibility in putting together clients and in the process throws out most of the operational advantages of resource-based APIs with significant disadvantages in both how to put together a backend.


One solution is "Persistent Queries". Other solution is to throw a Varnish Cache and cache the hell out of POST requests :P

Is it elegant? No.

Do you have another service to manage? Yes

Will you pay someone else to do it for you? Possibly

I think there are already products trying to do that. But how many layers of abstractions and dependencies are you willing to have in your everyday processes?


Checkout Relay.js: https://relay.dev/

It does a lot of client-side caching for you. The documentation is atrocious though IMO. I'm not sure if there exists a similar framework for backend caching.


> Checkout Relay.js: https://relay.dev/

Relay is a GraphQL client. That's the irrelevant side of caching, because that can be trivially implemented by an intern, specially given GraphQL's official copout of caching based on primary keys [1], and doesn't have any meaningful impact on the client's resources.

The relevant side of caching is server-side caching: the bits of your system that allow it to fulfill results while skipping the expensive bits, like having to hit the database. This is what really matters both in terms of operational costs and performance, and this is what GraphQL fails to deliver.

[1] https://graphql.org/learn/caching/


https://relay.dev/docs/principles-and-architecture/thinking-...

Take a look at this. Either you didn't know what's challenging about caching nested graph data, or we have different definitions of triviality/interns.


> Take a look at this.

I repeat: client-side caching is not a problem, even with GraphQL.

The technical problems regarding GraphQL's blockers to caching lies in server-side caching.

For server-side caching, the only answer that GraphQL offers is to use primary keys, hand-wave a lot, and hope that your GraphQL implementation did some sort of optimization to handle that corner case by caching results.

Don't take my word for it. It's really that bad.

https://graphql.org/learn/caching/


You can execute GraphQL queries via GET and set a cache up for it like REST. Technically it's also allowed to cache POST requests but I guess anyone who comes across that is going to raise their eyebrows.


> You can execute GraphQL queries via GET and set a cache up for it like REST.

Does it, though? It seems it really doesn't, nor was GraphQL designed with HTTP caching in mind.

The only references to caching in GraphQL are vague hand-waiving arguments about how theoretically GraphQL implementations might be implemented with some sort of support for caching primary keys.

But any type of HTTP caching is automatically excluded from GrahQL.

To put it differently, is there any third-party caching solution for GraphQL? As far as I could gather, the answer is no.


It looks like you are fixated on caching in GraphQL, but that's unnecessary, you can just cache GraphQL like REST, because in the end they are just GET requests. Just cache the GET request.


> It looks like you are fixated on caching in GraphQL, but that's unnecessary

Oh so one of the most basic feature of any API, one which has a direct impact on scalability and motivates entire product lines and businesses like CDNs, is now "unnecessary"?

> you can just cache GraphQL like REST,

Go ahead and show one example, please.


Here's how we do it: https://wundergraph.com/docs/overview/features/caching We "persist" all operations using their name (unique) as part of the path. Variables are passed via query parameters. We also add a unique hash as query param which is generated by the configuration. This way, each "iteration" of the application invalidates the cache automatically. Additionally, we generate ETags for each request to reduce response payloads to zero when nothing changed. (https://wundergraph.com/docs/overview/features/automatic_con...) Combined with the generated type-safe client, this is a very solid setup.


It's just a GET request. Same as REST. I don't know what you want me to show as an example. You can use Apache, nginx, squid, any proxying webserver worth its salt...


> It's just a GET request. Same as REST.

I'm not sure you understand the problem at all.

Are you actually able to show an example or not? Because changing the HTTP verb doesn't magically change the problem, and passing query parameters as a request document renders these queries uncacheable.

> You can use Apache, nginx, squid, any proxying webserver worth its salt...

Great, pick the one you're familiar with, and just show the code. Well, unless you're not "worth its salt" or are completely oblivious to the problem domain.


Here's the code: https://github.com/RedShift1/graphql-cached-get

Note that I made a very primitive implementation. Depending on which GraphQL node is queried, the request will be cached by the proxy or not. Apollo GraphQL server has much more fine grained methods of allowing caching (see https://www.apollographql.com/docs/apollo-server/performance...) however I left the example code crude so you see exactly what's going on under the hood.


A GET request is a GET request and anything that can cache one doesn’t know or care whether it’s for REST, GraphQL, a binary file or anything else.


Check back in a couple of hours, I'm on my mobile right now



> see https://www.apollographql.com/docs/apollo-server/performance...

Sorry, that doesn't cut it at all. Far from it. Being able to cache a response is not the same thing as optimizing queries that hit your database. Being able to cache a response means not having to hit your database to begin with, and save up on things like traffic going into your database, and round trip time.

With REST APIs I can get a web service to return responses that are HTTP cacheable, put a nginx instance between the ingress controller and the web service, and serve responses to REST clients without even touching the web service that provides the REST API. I can even deploy nginx instances in separate regions.

What's GraphQL's response to this basic usecase?


Did you take the time to read the article? I'll try a more specific link to a section https://www.apollographql.com/docs/apollo-server/performance...


i share your sentiment; in neo4j, a popular graph store, you can’t even get primary keys without a third-party plug-in (apoc)


Unless you happen to be using PostgreSQL, in which case some tools like Hasura and Graphile can automate all of that.


I’ve never used those tools, but I don’t see how you can automate away authorization issues. The GraphQL spec[1] says authorization in the GraphQL layer is fine for prototyping or toy programs, but for a production system it needs to be in the business logic layer.

[1]: https://graphql.org/learn/authorization/


In Hasura, you authenticate externally -- can be custom API endpoint that signs a JWT/auth webhook, or an auth provider like Auth0, Okta, Firebase, Keycloak, etc. Doesn't matter, just have to return some claims values.

You can then use these claims values in your authorization (permissions) layer.

IE, when a user logs in, you can sign a claim with "X-Hasura-User-ID" = 1, and "X-Hasura-Org-ID" = 5, and then put rules on tables like:

  > "Role USER can SELECT rows in table 'user' WHEN X-Hasura-User-ID = user.id"

  > "Role USER can SELECT rows in table 'organization' WHEN X-Hasura-Org-Id = organization.id"
There's more depth to it than this, but this is the gist of it.


this is really powerful stuff when working with a CISO “the data itself defines who may access it”


PostgreSQL and other databases have fine grained authorization controls down to the column level, what more does one need?


I say it over and over again, row/column level permissions are not even close to enough for any larger app. How do you translate access restriction like "user X cannot view more than 10 articles per month" or "customer Y cannot insert any more orders if total outstanding/unpaid amount of invoices in last 30 days succeeds Z" into row/columnlevel permissions? You don't, that's why you have a business layer.


> How do you translate access restriction like "user X cannot view more than 10 articles per month"

That's not permissions in the general sense of authorisation, that should just be modelled as any other "business logic". Put that article_limit in the users table and join it in the selects.

Edit: or tracking the users article views with timestamps, and making a select aggregate over the month..

> "customer Y cannot insert any more orders if total outstanding/unpaid amount of invoices in last 30 days succeeds Z"

insert into orders select from user a inner join invoices b etc.. any kind of limitless complexity here on how you want to limit the orders.


But if you design a public GraphQL API, you cannot trust the query issuer (Browser or App on the Client). You have to enforce those rules outside of the query. And yes, there is tons of ways to do this outside of GraphQL, which is exactly my point, that row/column permissions alone do not suffice.


Ok I get you now, yes you are correct, if you have a public API and you have business constraints that need to be enforced, you have to do this server side.

But you can still implement this easily in queries, so for example granting the client only read permissions, plus execution permissions on certain database functions, and inside these functions you can implement the constraints.

What I'm trying to say, is that you don't necessarily need any backend "services" java etc, to implement these type of constraints, they can be modelled as any other business logic in the database.


agreed. i've worked extensively with graphile, and from working with people who were really strong in postgres i learned all the things we needed could be modeled in postgres. many of the less obvious solutions would involve functions, triggers, or stored procedures. but i liked that there was less ambiguity about where that kind of logic was implemented.


Observability, flexibility (I don't need to push a migration to change auth), SSO integration, and the ability to keep a clean separation between user and "machine" (service, replication, etc) accounts.


What does observability mean? Can you translate this into a concrete question?

As for flexibility, do you mean authentication or authorization?

SSO can be done at the SQL server level (MSSQL has it and so does Postgres, don't know about others), but handling the SSO part in your app and using "set role" and passing a user ID for your row level security policies to use is easier to set up and more flexible.

Clean separation between user and machine accounts can absolutely be done with MSSQL and Postgres.


Yeah this must be one of the most underused features ever. People don't realise that you can solve little bobby tables by just setting the permissions correctly in the database.


You cannot model every business constraint in DB permissions; Stuff like "If customer X has less than 3 active contracts, new contract activations require sign-off of Manager of at least level Y" etc.


That can absolutely be done via triggers or limit access by using functions for certain operations instead of direct table access


And how do you unit test triggers? Yes you can do it in a ton of ways, but you just end up scattering your business layer all over the database, the GraphQL adapter, API gateways etc. The alternative is just to create a dedicated BE service endpoint (in whatever you prefer, REST,HTTP/JSON,SOAP, gRPC etc.), which does the required checks for you.

Triggers, functions or whatever you use are just code; yes, that is my pitch: Have your buisness logic in code, ideally in a dedicated BE API endpoint instead in the DB.


Are you suggesting that you don't use any constraints (unique, not null...) or defaults in the database either?

That's your business logic in the database right there.


I for one think that the code path that leads to an insert conflict should be integration/unit tested. Something needs to codify how the error is reported to the client.

So no, you don't get to wiggle out of that one.

Unit testing PL/pgSQL is nasty.


I agree for the most part.

But unit testing is not nasty at all: just look up pgTAP.

What's nasty is that the incumbent development model intersperses data modelling constraints in the database and in the non-db code, and then we test one layer implicitly from another layer.


Unit testing is nasty? You can use whatever tool you like. I've used phpunit, junit and jasmine for unit testing databases. Choose whichever tool you like most.


It can get hairy due to how most apps are developed.

You'd usually have backend code guarding against constraint-violating entries, and then you'd have a database constraint too.

So what you need to do now is test almost exactly the same thing from your code test and from your database test to get proper coverage.

Enter declarative ORM schemas, and suddenly there's not even a guarantee that the schema you are running really matches what your ORM thinks the schema is.

For that reason, I prefer all those SQL-based database migration/evolution approaches over ORM-based schema generation, coupled with pure SQL tests (eg. with pgTAP, but yeah, any tool can do).

Basically, even for declarative code, there should be a unit test, a la double-entry bookkeeping in accounting.

And even if this is what I prefer and believe is only right, I never worked on a large company project that did all of these things.

So I don't think the entire topic should be easily dismissed: while unit testing is simple, have you ever worked on a project that tested db-schema embedded logic exactly at the right level (and not the level up)?


That’s the path I ended up taking. The GraphQL resolvers had no idea idea there was a database. They talked to a layer that understood all the business objects and that sat on top of a layer that understood authorization and only that layer had any connection to the data store.


In my mind that's just an insert that joins the contracts table and makes a case active_contracts < 3 then true else false, for the require_sign_off column.


You cannot trust the query issuer (Browser or App on the client). If you have a public GraphQL API, you need to enforce these rules. If you can just alter the query to bypass the business rule, this is called a security hole.


You just handle this part my your code and leave the rest to Hasura.


And now you're adding an extra layer you don't control with its own set of problems.


Using postgraphile for my current big project is the best technical choice I've ever made. There's been the occasional obscure sql incantation to learn but otherwise has been so much more productive than hand-crafting REST endpoints.


Every time I start with Graphql in surprised that I'm writing all the routes and middleware I'd need with with a restful api in express. I feel like I'm missing the point.


To what extent does this headache go away if autogenerating graphql from a relational db, using tools like Postgraphile or Hasura? I never considered making my "own" graphql service but those tools sure make it look easy to create a nice API controller through db migrations.


We have had a wonderful experience with https://prisma.io


Do you worry about over-coupling when using Prisma? I'm hesitant to let front-end control the schema in any scenario where they're not the only users of that DB. Works great until it doesn't and can be a pain to migrate control to a backend/API team.


Our Prisma schema resides in our "backend" (Prisma essentially governs our master API). So I'm not sure why you're concerned that the front-end might control the schema.

The nicest thing about Prisma is that it is a declarative single-source-of-truth for our data models. Everything resides in one schema, all changes to database models and all migrations run through Prisma, and, best of all, strong types are inherently built in.

The team is also building useful middleware like field-level encryption; all of this together makes Prisma a very complete package.

Of course, there is a price for this convenience — we sacrifice some higher-level DB-side features. But Prisma is such a competent tool that we don't miss them much.


But that's true for any solution. This goes back to "avoid db server specific SQL", you gain the portability advantage but you're willingly giving up advanced features the db server has. How far do you want to take this to be "independent"?


I'm not concerned with independence from the database _implementation_ but independence of the database schema from any one consumer. This is one of the more interesting things about tools like Hasura/Postgres/Postgraphile in my eyes, they encourage you to separate frontends from the backend early on. That might be one team to start, but you can divide labor and add more services without rearchitecting like you would if the database was controlled by ORM from a single front end.


Prisma only queries your own database. A GraphQL API could talk to many services and give the consumer one endpoint which all of this can be queried.


About 90% of our GraphQL API passes through Prisma. We have a master API that talks to many different microservices to process data and so-on, but all the data ultimately ends up residing in our Postgres DB. One of the nice things about Prisma is that it gives you a very declarative way to manage your data, and encourages using your DB as your "single source of truth".

Querying everything through one API (which relays requests to other microservices, if necessary) and having one Postgres DB which acts as the "endpoint" for all of our data is a very clean model.

For edge cases, it's also possible to write custom resolvers. Prisma doesn't prohibit that.


Philosophically, is it really a "microservice" if it doesn't have its own database? In my opinion, if multiple services are ultimately all connecting to and storing their data in the same database, then you haven't really gained very much, since one misbehaving client can still take down everyone else's service. The point of microservices was always sold to me as "every team owns their own stack", and specifically if one team's stack goes down, everyone else can cheerfully continue. (Or less cheerfully, if the team whose stack went down was identity or user service.)


GraphQL is great for consumers, it's a nightmare for producers.


I think this is solved by creating "full stack" teams where the front end developers who want the GraphQL API are also the same team who define the schema and build the service that serves that API. In large companies where GraphQL makes sense, that GraphQL API service would just call into pre-existing services that serve JSON, Protobuf etc maintained by 100% back end teams.


Forming full stack teams doesn't remove the pain of having to build the producer api in the first place. It merely shifts the burden from a backend only team to a full stack team.


> “On the other hand, when you are the one to implement the Graphql server, it feels like writing your own database. You have to create a query plan, handle optimizations, corner cases, etc.”

Is this still true if the structure of the data is relatively simple, but you have tens of millions of users? Say the data that is returned (per user) has 20 or 30 properties in total (for each user), and you are only ever asking for specific data about an individual user.


You have to do your own optimiser to avoid, for instance, the N+1 query problem. (Just Google that, plenty of explanations around.) Many GraphQL frameworks have a “naive” subquery implementation that performs N individual subqueries. You either have to override this for each parent/child pairing, or bolt something on the back to delay all the “SELECT * FROM tbl_subquery WHERE id = ?” operations and convert them into one “… WHERE id IN (…)”. Sounds like a great use of your time.

In the end you might think to yourself “why am I doing this, when my SQL database already has query optimisation?”. And it’s a fair question, you are onto it. Try one of those auto-GraphQL things instead. EdgeDB (https://edgedb.com) does it as we speak, runs atop Postgres. Save yourself the enormous effort if you’re only building a GraphQL API for a single RDBMS, and not as a façade for a cluster of microservices and databases and external requests.

Or just nod to your boss and go back to what being a backend developer has always meant: laboriously building by hand completely ad hoc JSON versions of SQL RDBMS schemas, each terribly unhappy in its own way. In no way does doing it manually but presenting GraphQL deviate from this Sisyphean tradition.

I read in the article that NOT having GraphQL exactly match your DB schema is a best practice. My response is “did a backend developer write this?” Sounds awfully convenient for job security!


If you're using embedded database like sqlite and lmdb, the N+1 query pattern may be less impactful to the performance


Thank you for the response, I really appreciate it


My experience using GraphQL is the same as using React. Look great at firs glance and it makes sense. Using it for a while and I realize it designed to be used by the fb team. For example, they are design for a large team to work on a small components separately. Most developers are NOT fb, thank goodness. There are better, fast and light weight alternatives for smaller or other kind of teams.


There's nothing that stops you from exposing only what you would expose in a "restful" API. You can even specify the exact queries that can be used by the client. And even then GraphQL gives some nice advantages, such as introspection and endpoint discovery, as well as smoother error handling and increased type-safety.


This seems to echo SQL. It is amazing. You have to be mindful of how much effort a query will take. And very careful.


Filtering on relationships is a big issue for us. Each nested node in the query graph (tree?) generates a new SQL query. We seem to committed to that approach at this point, to trying and migrate to a world where we inspect the whole thing, then make 1 query, isn't going to happen.


Not trying to specifically shill my own library, but I developed this a while ago before there were any established patterns with filtering on relationships in graphql. https://github.com/tandg-digital/objection-filter

Out of curiosity, would functionality like this implemented in graphql solve your issues?


i know relationships don’t typically have props in a store like neo4j, and moreover you can reproduce that in something like postgres with a foreign key

we had a challenge like what you describe though, and were able to avoid new queries by representing the relationships as objects. in so doing, we leverage row level security and jwt claims, which is an approach to authorization which has high epistemical legibility.


I think similarly. If you have control over the back end environment, it's not worth the extra effort, additional complexity (e.g. caching challenges) and performance overheads to run a GraphQL server.


This is why I wouldn't use GraphQL without something like Hasura, where a relational Db schema is used to automatically generate GraphQL and REST apis


Thank you, I’ll check it out


So much this - as a consumer it is wonderful, but implementing your own is ... fun.


Have you tried Hasura?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: