GraphQL Is a Trap?

namelosw · on May 6, 2022

The problem that GraphQL was trying to resolve is real: reasonably-sized REST projects usually ended up inventing their own awkward, ad-hoc mini query languages on top of REST.

But GraphQL as a solution is not great. It looks nice in the first place, but there is too much hassle to deal with.

P.S. I've seen quite a fraction of GraphQL projects actually using Node.js as the backend. If that's the case I would recommend TRPC[0] over GraphQL - it's more seamless and straightforward.

[0] https://trpc.io/

015a · on May 6, 2022

I'd argue GraphQL really doesn't even solve that "mini-query-language" problem all that well. But, I'm with you; that's how its sold. Its one of the big things its proponents say. And it fails at it.

Let me pick on an example of one of these rest-api-mini-query-language specs: the Microsoft Graph API, which uses OData. It supports:

* Count (don't return items; return a count of them) * Expand (graph traversal on related resources) * Filter * Format * Order (sorting) * Search * Select * Skip * SkipToken * Top * A bevy of others

Of these; GraphQL solves Select and Expand. That's it. Everything else INEVITABLY becomes a pseudo-odata-mini-query-language on top of GraphQL; the exact same problem REST APIs had! Pagination. Skip/limits. Response reducing/counting/analytics. Filtering. Etc.

Of course, a framework has to stop somewhere. Lest you become OData, which isn't all that great to use. So, I'm not proposing that GraphQL should do more; but rather that its proponents need to stop listing this as an advantage of the framework, because its actually a disadvantage. Its only "good" at this relative to literally `npm i express`; the most basic-possible-REST-API. The REST & RPC ecosystems have a wide array of higher level tooling to select from, at every possible level of "nothing" to "everything and the kitchen sink"; GraphQL is startlingly boring in comparison, and proponents who list this as an advantage of GraphQL really aren't doing much more than admitting how little exposure they have in competing frameworks (or, similarly, how poorly APIs were built at whatever company they worked at last).

obi1kenobi · on May 7, 2022

I made a query language that parses as legal GraphQL using a vanilla parser, but has directives like `@filter, @recurse, @optional` etc. that make it a real query language. Also, instead of returning a giant fully-nested result, it flattens results and emits them row-by-row like a SQL database. This means the query evaluation can be lazy and incremental — if you write a query that has a billion results but only load 20 of them, then only 20 rows' worth of work happens.

My company has been using this in production for 6 years now across everything from TB-scale SQL clusters with X00,000 tables/views, to querying our own codebase, configuration, and deployment information to find and prevent bugs. I gave a 10min talk at a conference about this recently, if you'd like to learn more: https://www.hytradboi.com/2022/how-to-query-almost-everythin...

Project GitHub: https://github.com/obi1kenobi/trustfall

jhardy54 · on May 7, 2022

Your first link is 404ing btw.

gmokki · on May 7, 2022

Just add 'g' at the end of the url

tootie · on May 6, 2022

Software architecture really needs to acknowledge the human aspect of building platforms. GraphQL is great solution to organizational problem of slow communication and/or misaligned incentives between frontend and backend teams. GraphQL is essentially a self-service approach where API developers can create a flexible, open-ended data access plane that end users can consume as they wish. That incurs a lot of extra technical complexity, but obviates a lot of organizational complexity. That could be a very valid concern if your backend is public or otherwise serves a really diverse group of clients.

The most efficient and effective API integration projects I've done is where the API and frontend teams are tightly knit, working off a shared backlog and able to pass a chain of requirements from design, to contract writing, to development on both sides and get really tight alignment. That lets us create very tightly optimized REST endpoints that are very cache-friendly and can deliver precise payloads to optimize both round trips and bandwidth. It's actually easier to build because requirements are really clear, but comes at the cost of doing all that communication to align on requirements.

dhzhzjsbevs · on May 7, 2022

Pretty much this sums it up.

I still don't understand the need for graphql.

Then again, it only takes me 5 minutes to add a new endpoint that just dumps back a specific SQL query result.

I think a lot of companies make adding endpoints into a big deal, often a whole new file or class with documentation and tests.

An SQL query doesn't need tests.

I think another part of it is that SQL still scares people. That is, I'd put graphql in the same camp as active record, orms, and other silly attempts at creating a query language to avoiding using a query language.

eurasiantiger · on May 7, 2022

>An SQL query doesn't need tests.

Have you ever done financial stuff on SQL? Transaction processing or even just CRM/billing?

Spend 15 minutes imagining a single query that fetches invoices, combines them with bank transaction data and produces a categorized list of unpaid-but-not-late, unpaid-and-late, not-fully-paid-but-not-late, not-fully-paid-and-late, fully-paid-in-time and paid-too-much invoices for this year’s billing cycle, selected from tables that contain info for all historical years too.

You want to have tests for those.

dhzhzjsbevs · on May 7, 2022

You'll need tests for those on graphql too.

I was referring to how hard some companies make it to ship basic code.

The graphql setups I've had to work with are all far more complicated than a function that returns an array and if your framework needs more than that to dump it out on an http request, you've got problems, but SQL isn't it.

eurasiantiger · on May 7, 2022

GraphQL also has advantages that are otherwise difficult to realize, at least without an API schema. Request and response validation and object-level caching come to mind. How would you otherwise share cached objects between API endpoints? Need to set up a custom redis integration. With GraphQL, such things often come in nicely wrapped packages.

francislavoie · on May 6, 2022

I use https://www.jsonrpc.org/specification. I hate REST. With JSON-RPC, I can have true 1:1 mapping on both ends to how I write code, and it's transport agnostic. Doesn't rely on any language. I use PHP in the backend, TS in the frontend.

There are ways to make it somewhat type safe with tools like https://open-rpc.org/ but I tend to just go vanilla with it and write TypeScript types in the frontend for the results on the frontend.

hatch_q · on May 6, 2022

In all honesty, we do this with REST now. Just define interface with OpenAPI - then create clients, servers in virtually any language you like.

francislavoie · on May 7, 2022

The problem with REST for me is that it locks you to HTTP verbs and URIs. And it's not a great mapping for much more than CRUD. I find it really restrictive and annoying. It's specific to HTTP as the transport, so I can't use REST over websockets or some raw TCP server-to-server pipe, etc, without making a kludge of it.

dimgl · on May 8, 2022

This is actually really cool! Any chance you can point me to an implementation for JSON-RPC? I'm interested to see the benefits with JSON-RPC over a traditional REST API.

francislavoie · on May 8, 2022

The spec explains pretty much everything you need to know. Look for a client and server implementation in your language of choice; essentially all it needs to do is decode the JSON message and validate that it uses the right structure, and handle message IDs for batching, then call your custom message handling logic.

Over HTTP, basically you just have a single endpoint which always takes POST requests, and always responds with 200 status, even on errors. If you're using this in a frontend JS application, you'd want a wrapper over fetch() which rejects the promise if the RPC response has an error in the message.

Someone · on May 7, 2022

IMO, the main problem with GraphQL is that a query language isn’t much without a query planner.

Without a planner, you end up with the backend picking a set of queries that the caller can make (and with it, you probably want to restrict callers, anyways). I don’t see how that’s much different from writing N stored procedures and exposing those.

If you give those consistent names, the caller has a clear list of things that can be done. With GraphQL, the API looks more flexible than it is.

I think GraphQL‘s main innovation is that it allows the caller to specify what fields to return. That makes it easy to provide f! - 1 variants of queries that return up to f fields, cutting down on traffic significantly in many cases.

saurik · on May 7, 2022

> That makes it easy to provide f! - 1 variants of queries...

I mean, I'd hope you only need 2^f - 1 variants...

Someone · on May 7, 2022

Oops. Thanks!

eadmund · on May 6, 2022

The major problem with GraphQL is that it tries to replace REST, instead of augmenting it. REST is a damned fine model, and arguably what every web API should use.

I guess it wouldn't have been as much fun to augment REST with query language than to invent a new one.

Kaze404 · on May 6, 2022

How does GraphQL try to replace REST? You can very easily have a graphql endpoint working in a REST API.

eadmund · on May 9, 2022

> You can very easily have a graphql endpoint working in a REST API.

The fact that it is an additional endpoint with non-REST semantics is what makes it a replacement rather than an augmentation of REST. In a REST API, endpoints correlate with resources; the resources are queried and updated with HTTP verbs acting on those endpoints. GraphQL introduces a parallel world.

What should have been done instead, IMHO, was to design a standard (not necessarily limited to JSON!) way to query and update resources through their endpoints.

Kaze404 · on May 11, 2022

I don't understand how that makes it a replacement. Breaking conventions of REST for one endpoint doesn't necessarily mean you're replacing it, unless you're using REST as a dogma rather than a tool.

mikewhy · on May 6, 2022

You can easily augment a rest API with graphql.

whakim · on May 7, 2022

Having recently had to take over a GraphQL + Relay project in the last ~6 months, I agree with this. When things work (e.g., the frontend/backend magically staying in sync) it feels really great. But with the amount of tech you're throwing at the problem I found it difficult to learn (the ecosystem is immature, the documentation often sucks, your problem space grows with GraphQL + GraphQL client of choice + web framework of choice). I found the bugs and opaque performance issues to be harder to track down. I'm sure that with time and mastery these problems will resolve themselves, but I'd much rather spend my "cool stuff budget" on tech that (IMO) provides a significantly greater benefit for the inconveniences caused.

cachehit · on May 6, 2022

I recently gave a talk about our experience with tRPC. So far it's looking very promising! https://www.youtube.com/watch?v=k1TCueEhhJo

hderms · on May 6, 2022

In my relatively naive opinion, graphql exists to solve a few things (which are mostly benefits for large scale companies):

1. Reduce the total number of HTTP connections required to get the data for a specific component or page. Imo this is presumably less important with http3 but the driving force is probably similar for both.

2. Give backend-for-frontend style services more autonomy to provide an interface that works for a specific feature on the front end without coupling it to specifics of how the backend organizes resources

3. Allow performance issues with frontend query logic to be addressed without changing anything on the front end

I think the situations where it would be useful are large teams that want to operate more independently where the overhead of dealing with graphql is worth it.

An example being someone trying to render a component for a list of comments. It would be great if the front end could just write some "magic" query that gives them everything they want and they don't have to worry about batching requests to specific endpoints and the order in which they should make those requests. That doesn't mean the problem magically goes away, but it's now someone else's problem and that's good: the app development can free itself from things it shouldn't care about.

andyfleming · on May 6, 2022

I think you've got number 3 backwards. GraphQL is a generic query solution that is harder to optimize than an endpoint that was built as a one-off for a front-end use case. Endpoints designed to support a specific UI view or operation are easy to optimize because their use case is narrow and the queries can be modified simply and safely. There might be more ceremony, but it's much more maintainable in my opinion.

hderms · on May 7, 2022

Graphql servers are typically not run in a manner that allows arbitrary query execution, usually only a select set of queries that were written by the devs, tested, and potentially profiled before deployment. from that perspective, if a mobile client is written to work against the interface they provide, then if you need to fix things on the backend, you can potentially do so without the frontend being aware of it. In that sense it's the same thing as a specifically designed REST endpoint, so I'll give you that, but it's hardly a negative in GraphQL's favor.

mmcnl · on May 6, 2022

Number 1 is valid. But becoming less relevant as you mentioned. Number 3 also valid.

Number 2 I really doubt this. Having more flexible API contracts (GraphQL) usually only makes dependencies worse, not less. Sure you can write any query you want, but how do you agree on expectations if there is no contract on the API level? Strict API contracts usually force you up-front to agree on expectations.

hderms · on May 7, 2022

I probably wasn't entirely clear in my post that I'm not necessarily a believer in GraphQL. I'm mostly repeating/surmising the reasons that other people are interested in it, and not necessarily myself. You could argue with any of the points, really.

pier25 · on May 6, 2022

Yes 100%

GraphQL solves these issues but it also has a cost that only makes sense when the team/project is large enough.

oxff · on May 6, 2022

It is a good summary, I would also add the need for combining lots of internal badly documented microservices as a prerequisite (basically what they used it originally for in Facebook)

vlunkr · on May 6, 2022

I've used GraphQL on several projects now, with small teams (2-5 devs) where we control the front and back end, and for that case, it's actually been really good. It provides a solid contract that is easy to get wrong with a REST API. (Will the server send IDs as ints or strings? Will it accept either? Which params are optional? What shape does the data come back in, an array of results, or an objects with results nested somewhere? etc.) I've even setup CI to fail if it finds queries on the front-end that don't match the schema. And of course it's easy to request exactly the data you want on the front -end.

A big downside is that the tooling is immature, and all subject to big changes. And I haven't written a server meant for public consumption, but that does add new layers that you can kind of ignore when you trust the client.

waynesonfire · on May 6, 2022

If you own the backend and the frontend, I don't see how graphql helps you. You can just implement the APIs that you need to solve your business problems. Now all you're left with are the downsides. I'll pass.

strken · on May 7, 2022

One of the problems backend systems sometimes have is presenting similar data in slightly different ways for use in many different places. One of the solutions to this is GraphQL, which lets you break your data loading into reusable pieces (resolvers) and define endpoints based on those reusable pieces (queries).

You could "just" do this yourself, of course. GraphQL is nice because it gives you a sensible starting point and a lot of free tooling. You get Graphiql to help you quickly write new queries, you get well-defined schemas between server and client, you get linters for queries, you get GraphQL itself to glue your resolvers together and interpret your queries.

If your system has a grand total of 20 API endpoints and they're nearly all unique, you don't need GraphQL.

vlunkr · on May 7, 2022

I listed some ways that it helps. You get a strict, well-defined, easily-documented API. Yeah, you can do all that with REST, but I like having a defined structure for it.

eropple · on May 9, 2022

Isn't OpenAPI that defined structure, too?

jsiaajdsdaa · on May 7, 2022

What happens if it breaks, and now you have to debug a framework/library instead of your own code?

chillacy · on May 8, 2022

On a large team I’ll often be debugging someone else’s code anyways, and the framework code is usually better than hacked together product code (and comes with a public bug tracker).

jsiaajdsdaa · on May 8, 2022

On the team I lead, we keep our domain logic pure, encapsulated, and layered. This means that if I have to debug a fellow teammates code, I can be confident that it is only business logic related code and not more technical, library dependent code.

When our code has to interact with other systems, the other systems' responses have to pass through an anti corruption layer and be translated into something we care about. All of this is surrounded with try catch and we quite obsessively check and validate everything coming in and out.

We do all of this because if/when something breaks in some library or framework that we didnt write, it doesnt bleed through to our POJOs. And, if we ever need to switch to the new flavor of the week library which happens quite often, we don't have to rewrite everything, just the adapter classes.

marcosdumay · on May 6, 2022

> Will the server send IDs as ints or strings? Will it accept either? Which params are optional? What shape does the data come back in, an array of results, or an objects with results nested somewhere? etc.

Hum... It seems you are missing a data definition language. GraphQL isn't a very good solution to that.

mikewhy · on May 6, 2022

Graphql literally forces you to define a schema and adhere to it.

CRConrad · on May 7, 2022

So does SQL.

mikewhy · on May 12, 2022

You're aware GraphQL isn't limited to calling out to SQL databases, right?

vlunkr · on May 8, 2022

Sure, but you're not writing SQL queries on the front end.

CRConrad · on May 12, 2022

That's the problem -- you should be! ;-)

shroompasta · on May 7, 2022

I don't understand how if you're on a team in which you control both the front end and back end, you wouldn't understand these factors - "Will the server send IDs as ints or strings? Will it accept either? Which params are optional? What shape does the data come back in, an array of results, or an objects with results nested somewhere? etc"

Surely you would know all the answers to these questions because your team control the entire stack.

If it was the case that the FE and BE teams were separate, then these would be reasonable considerations, but I just don't see it in a team dealing with full control.

digitxpc · on May 7, 2022

Designing a solid contract can be useful even when you’re one person, for the same reason that decoupling classes or concerns is useful. I don’t need to consult another person or read the code to understand what it’s expecting (especially true in contexts where documentation may not have been written yet).

RustyRussell · on May 7, 2022

I use JSON schema for this, by integrating it into our CI so it checks all the JSONRPC calls and responses.

(Helps that our test suite is in Python, which has great JSON schema support).

rizzaxc · on May 7, 2022

why is protobuf not an option to you?

lexx · on May 6, 2022

GraphQL is a great experience when you consume it and the service fulfills your query needs. Because you just ask stuff and you get them. It's really cool.

On the other hand, when you are the one to implement the Graphql server, it feels like writing your own database. You have to create a query plan, handle optimizations, corner cases, etc.

Also if you really want to provide a graph experience, with inverse connections, filter on relationships and other advanced stuff... get ready to burn your mind and your soul.

masklinn · on May 6, 2022

> GraphQL is a great experience when you consume it and the service fulfills your query needs. Because you just ask stuff and you get them. It's really cool.

I guess it's better when the tooling you use has direct gql integration and builds the queries for you?

Because in my experience accessing the github APIs with "basic" HTTP libraries is way more annoying using v4 (graphql) than v3 ("rest") — it could also be that github's v4 API is dreadful mind, I wouldn't be surprised.

GQL should be more efficient because it's not returning 95% of garbage I don't need, but having to write 5-deep queries (because of the edge indirections) by hand is way more of a pain in the ass than performing two GET requests with a few parameters munged in the URLs. And then I still have to go and retrieve the information 5-deep in the resulting document.

Pagination is also awkward, because now you probably want multiple different queries (and thus multiple different resulting documents) so that your 2+ fetches don't retrieve unpaginated information you got the first time around. And it gets worse when nesting comes in.

I don't think graphql is generally a great experience when you consume it either.

obi1kenobi · on May 6, 2022

100% agree that pagination is extremely awkward, especially with nesting. Between the pagination problem and the "oops I asked for too much data and blew up the server" problem, I think it's more work than one might think to run a GraphQL API.

For my own work, I took things in a different direction: I made a query language that parses as legal GraphQL using a vanilla parser, but has directives like `@filter, @recurse, @optional` etc. Instead of returning a giant fully-nested result, it flattens results and emits them row-by-row like a SQL database. This means the query evaluation can be lazy and incremental — if you write a query that has a billion results but only load 20 of them, then only 20 rows' worth of work happens.

My company has been using this in production for 6 years now across everything from TB-scale SQL clusters with X00,000 tables/views, to querying our own codebase, configuration, and deployment information to find and prevent bugs. I gave a 10min talk at a conference about this recently, if you'd like to learn more: https://www.hytradboi.com/2022/how-to-query-almost-everythin...

Project GitHub: https://github.com/obi1kenobi/trustfall

pstuart · on May 6, 2022

Starred and noted for a project I'm working on. Thanks!

Southland · on May 6, 2022

This is super cool. Thanks for sharing

brasic · on May 6, 2022

I work at GitHub and would love to hear more. Can you describe some of the data interactions that you find more convenient with the v3 API?

dmitshur · on May 6, 2022

Not parent, but my biggest challenge is some v3 APIs are not there in v4 yet. For example, activity and notifications (https://docs.github.com/en/rest/activity/notifications) is something I'm still looking forward to, but forced to keep using REST until it becomes available via GraphQL.

The pagination point is described well in nearby comments. It only applies when attempting to paginate across more than 1 dimension at once, like "get all pages of comments in an issue, and all pages of reactions for each comment".

brasic · on May 7, 2022

Thank you! I agree that there is more we could do to reach parity here.

Is paginating across multiple dimensions possible via the REST API?

sieabahlpark · on May 6, 2022

Everything is easier to use with rest because it's so simple it works with curl trivially.

Beltalowda · on May 6, 2022

Not just cURL; most of the time I want something from the GitHub API it's something fairly simple; using REST from Python, Go, Ruby, $preferred_language is easier than using GraphQL, too. I'm sure there are libraries out there, but hard to beat a simple "fetch me data from that URL yo".

brasic · on May 7, 2022

GraphQL uses HTTP like the REST API and speaks JSON. There's no need for a library if you're comfortable sending a POST request.

It seems to me like the main friction that you and others are getting at is that GraphQL is more work to use than REST because you have to write a query. That's a fair point! Perhaps we could publish "canned" queries that are the equivalent of the most commonly used REST endpoints, or make them available for use in the API with a special param.

Beltalowda · on May 7, 2022

Yes, you need to write a query; and it's also not at all that obvious how to write a query. Let's say you want to list all repos belonging to a user or organisation, a fairly simple and common operation. I found [1] in about 30 seconds. I've been trying to do the same with GraphQL for five minutes now using the docs and GraphQL Explorer, and thus far haven't managed to get the same result.

I worked a bit with GraphQL in the past, but never all that much. Now, I'm sure I could figure it out of I sit down, read the basics of GraphQL, familiarize myself with GitHub's GraphQL schema, etc. But ... it's all a lot of effort and complex vs. the REST API; even with a solid foundation in GraphQL there's still lots more parts.

GraphQL is kind of like giving people a schema to an SQL database and telling them that's an "API"; it kind of is, but also isn't really. There's a reason almost all applications have some sort of database layer (API!) to interact with the database, rather than just writing queries on the fly whenever needed.

[1]: https://docs.github.com/en/rest/repos/repos

brasic · on May 7, 2022

That's completely fair. I think the analogy to SQL as an API is very apt. No one would argue that full SQL access isn't a powerful API but it takes some legwork to understand the schema and write queries to get the data you need.

There's a divide between at least two types of persona here. On one side is integrators building products and features on top of the GitHub API. For these people GraphQL is arguably superior since the learning curve is manageable and in exchange for scaling it you can make an extremely specific query that returns your product's exact data requirements. The cost of writing this query is amortized over the lifetime of your integration.

On the other side are e.g. users automating some part of their GitHub workflow to save themselves time. I can see how the REST API feels like a better choice here, it's certainly simpler to get started with.

For what it's worth, here[0] is an example of using the `gh` CLI's graphql feature to print a list of all repository URLs for a given organization by login, sorted in a relatively complicated way. It's more verbose than doing this with the REST API but significantly more flexible. This could just as easily be done with curl but as others have pointed out, pagination requires a minimal level of logic to implement, so it's more convenient to use an existing helper. This output gets flushed 10 lines at a time as pages come in, making it suitable to compose with other commands using pipes.

  $ gh api graphql --paginate -f query='
    query($endCursor: String) {
      organization(login: "rails") {
        repositories(first: 10, after: $endCursor,
                     orderBy: { field: STARGAZERS, direction: DESC }
        ) {
          nodes {
            url
            stargazers { 
              totalCount
            }
          }
          pageInfo { endCursor hasNextPage } # needed for auto-pagination
        }
      }
    }' -q '.data.organization.repositories.nodes.[]'
  # output follows
  {"stargazers":{"totalCount":50643},"url":"https://github.com/rails/rails"}
  {"stargazers":{"totalCount":5298},"url":"https://github.com/rails/webpacker"}
  {"stargazers":{"totalCount":4849},"url":"https://github.com/rails/thor"}
  {"stargazers":{"totalCount":4075},"url":"https://github.com/rails/jbuilder"}
  # snip
  {"stargazers":{"totalCount":3},"url":"https://github.com/rails/hide_action"}
  {"stargazers":{"totalCount":2},"url":"https://github.com/rails/gem-buildkite-config"}
  {"stargazers":{"totalCount":2},"url":"https://github.com/rails/sqlite2_adapter"}
  {"stargazers":{"totalCount":1},"url":"https://github.com/rails/fcgi_handler"}

[0]: https://gist.github.com/brasic/14222db7b3f5873b84820477cca27...

brasic · on May 7, 2022

The GraphQL API works just as well with curl. There's no getting around the fact that you need to pass a query text but assuming you put the query in a file the curl syntax is identical to the REST API:

   curl https://api.github.com/graphql -H "Authorization: bearer $TOKEN" --data @query.graphql

Rapzid · on May 6, 2022

> Everything is easier to use with rest because it's so simple it works with curl trivially.

Dead comment.

monstermachine · on May 6, 2022

> Pagination is also awkward, because now you probably want multiple different queries (and thus multiple different resulting documents) so that your 2+ fetches don't retrieve unpaginated information you got the first time around. And it gets worse when nesting comes in.

You don't need to write a different graphql query, use variables. Good graphql APIs will expose a start and limit field for Pagination.

masklinn · on May 6, 2022

I think you misunderstood the issue. Of course you will use variables for the pagination itself, the issue is that your head of line query will be grabbing other fields than the paginated one.

You don't want to repeat these fetches in the followup queries, they're redundant, and assuming the API is rate-limited they will decrease your query budget for no value.

That counts double if you're fetching multiple paginated fields (which also adds to the awkwardness).

obi1kenobi · on May 6, 2022

Agreed — consider what happens when you have a node(start: Int, limit: Int) inside or alongside another such node with start and limit.

Your pagination is now two-dimensional, with each node's start/limit as points on its own axis.

Now add a third node to the query. Now you have three-dimensional pagination. This quickly goes off the rails.

Try writing a generic N-dimensional paginator for such a query to see why it's difficult. Even designing a sensible and reasonably flexible API for one is a headache.

monstermachine · on May 6, 2022

I think I got what you are saying now. If you want to get paginated children nodes, then the root will be fetched again which is a problem.

masklinn · on May 6, 2022

Indeed. It may not be a huge problem depending on how much data you need, but there are lots of cases where you'd really rather avoid refetching the rest of the root.

TBF you could also deal with it using fragments I think, but still, not great.

PaulHoule · on May 6, 2022

Pagination is a bear.

Most of the simple ways of doing it with SQL are problematic. In these docs

https://docs.spring.io/spring-batch/docs/current/reference/h...

there is some discussion of the problems and some answers that go back to the mainframe era.

marcosdumay · on May 6, 2022

The server side of pagination is really complex if you want to make sure all the results are returned exactly once. If that's your case, consider not paginating at all.

But very often, a result missing or duplicated in a few queries isn't a showstopper. On that case, pagination is very simple.

PaulHoule · on May 6, 2022

Spring Batch covers the cases where you have to get the right answer!

That is you are "full scanning" and making a report or doing something like a reconciliation process in a financial institution, it is not like some image boards where there is a link to the 1781th page of images but it spends forever loading it if you click on it.

atom_arranger · on May 6, 2022

> but having to write 5-deep queries (because of the edge indirections) by hand is way more of a pain in the ass than performing two GET requests with a few parameters munged in the URLs. And then I still have to go and retrieve the information 5-deep in the resulting document.

I usually write my queries in GraphiQL (check some boxes) and then paste them into the app after I have them working right.

dustymcp · on May 6, 2022

Having consumed both and recently rewriting a lib to graphql i know enough to not wanting to roll my own..

camgunz · on May 6, 2022

I think GraphQL is best understood as an incomplete ORM that you have to complete yourself on the backend. If GraphQL generated SQL (given some tooling or what have you) pretty much all the problems are solved. Indeed backend-as-a-service products like Hasura or Postgraphile are this missing piece. I guess we're uncomfortable with SQL over the wire or open-to-the-world databases, but we shouldn't be.

Or maybe TLDR "dear next generation of engineers: SQL is actually pretty good".

discreteevent · on May 6, 2022

SQL is excellent but exposing your database is not. The inventors of graphql specifically said that they don't think that it is a good idea to do so. They never intended it for that.

camgunz · on May 7, 2022

> SQL is excellent but exposing your database is not.

That was definitely true before Postgres got row security (admittedly in 2016, after GraphQL was released in 2015), but these days there's really no need to run an entire app server in front of your database just to implement permissions.

didip · on May 6, 2022

I am with you. Every time I looked at GraphQL or asked to implement one, I had to say no.

How is this a good thing for the backend or infra engineers? It's a mega facade without a lot of toolings to help the backend.

GraphQL reminded me of common ORM criticisms. Wide API surface area with a lot of rooms for accidents. And GraphQL made it worse by being exposed as a service.

ishjoh · on May 6, 2022

Infrastructure is one thing that seems to catch a lot of people off guard. So many infrastructure tools are based on monitoring HTTP codes, but even when there are errors graphql servers send 200s unless modified. It turned into quite the headache for us.

brentm · on May 6, 2022

100% - I can see how it might be great for FB where they have the capacity to optimize but without that engineering capacity it seems like it would turn into a net negative.

ehutch79 · on May 6, 2022

No one sees the backend. So who cares? /s

With ORMs at least, the developer is likely either thinking about limitations, or really needs the guide-rails an ORM provides.

I doubt a lot of front end engineers, many of which probably have never optimized a DB, are thinking about the consequences of their queries.

ryanbrunner · on May 6, 2022

ORMs are also pretty easy to use on a case-by-case basis if needed, either by using the escape hatches the ORM provides or by bypassing it altogether. Deciding "oh GraphQL isn't good for this particular use case so I'll spin up a parallel REST API" is a much bigger decision to make.

freedomben · on May 6, 2022

what you say is unpopular, but it's a lot more true than most people (especially front end people in this case) want to admit. Of course there are plenty of exceptions (people on FE who think about, care about, and know about what happens on the backend), particularly the Venn diagram of FE people reading HN, but the majority in the industry definitely do not. The bigger and/or more specialized the company, the worse that problem gets.

To be clear, this is not just a problem for FE people. extremely normal for humans to become myopic in the areas they spend the most time in. FE does it, BE does it, management does it, everyone does it. Find a standard mobile engineer doing native iOS or Android, and they're going to be even more disconnected from the effects on the backend, and they come by it honestly. If you tend to specialize more in one area, building an awareness of your own biases/perspective, and exercising intentional empathy, can make a huge difference in how easy you are to work with.

When looking at dysfunctional engineering orgs, one of the first things I do is figure out where the "power" is and figure out their backgrounds. The most extreme example might be a company founded by a FE eng for whom backend is just a necessarily evil to support their app. Or a company founded by a BE guy for whom the real value is the API, and the clients are just there to abstract it for normal people.

Taking this in and finding a healthy balance of the way things are structured can help improve a dysfunctional org a lot. FE, BE, DevOps/Infra, etc are important pieces in an overall puzzle. Without a well-functioning team behind each, the company and product suffer.

blakesley · on May 6, 2022

I'm still pretty new to dev, but what's wrong with ORMs? And, importantly, what would you recommend instead?

deckard1 · on May 6, 2022

SQL is a transferable skill. ORMs are not. If you already know SQL and have to use an ORM on top of that, then it's a net loss.

It's trivial to use SQL to build objects from individual rows in a database. Which is all an ORM is really good for. Once you start doing reporting or aggregates, then ORMs fall apart. I've seen developers who, because they have a library built up around their flavor of ORM, go and do reporting with that ORM. What happens is this ORM report consumes all the RAM in the system and takes minutes to run. Or crashes.

ORM code hits performance issues because so many objects have to be built (RAM usage) and the SQL queries underneath the framework are not at all efficient. You can use OOP on top of SQL and get decent performance. But you need shallow objects built on top of complex SQL. ORM goes the opposite: large hierarchies of nested objects built from many dumb SQL queries.

This also ties into GraphQL. Think careful about the hierarchies you design. A flat API call that does one thing well is often better than some deeply-nested monster query that has to fetch the entire universe before it can reach that one field you need.

eurasiantiger · on May 6, 2022

GraphQL is not an ORM. An individual GraphQL server implementation can act as an ORM for a specific use case, but GraphQL can do much more than that.

You should not ever need to implement your own GraphQL server. There are plenty to choose from.

jimbokun · on May 6, 2022

I am personally fond of:

1. query builder APIs, which can only generate valid queries, but you can control exactly what that query will be, 2. APIs that return basic data structures from the database, like maps or tuples.

Query languages like SQL are very powerful and easy to learn. And in my opinion, preferable to ORM approach of "what method calls do I need to make to trick the engine into executing the SQL I know would make this work?" ORMs add complexity and limitations that, in my opinion, are not worth the benefits.

ehutch79 · on May 6, 2022

Personally I don't think there's anything wrong with them. They're just tools.

but like any abstraction, if you don't know what's going on behind the curtain, they can turn into foot guns quick.

ryanbrunner · on May 6, 2022

People will have different opinions, but for me, there's nothing wrong with ORMs themselves, they are a significant productivity boost for 80% of the database interactions in your app. The tricky part is recognizing the 20% where ORMs are a bad idea, which ends up meaning that an ORM is best used not as a replacement for knowing SQL, but as a tool to make you more productive when you already know SQL.

anthonypasq · on May 6, 2022

ORM's are fine for the majority of simple use cases. When things get complicated you end up either fighting with the ORM or just overriding it and writing the sql yourself anyway.

ehutch79 · on May 6, 2022

Yes, there definitely need to be escape hatches for situations where you need to write sql.

But that should be rare. If you're commonly bailing to raw sql, I'd say there's something wrong, probably a poor fit of the orm to the problem.

ryanbrunner · on May 6, 2022

I'll use raw SQL (maybe not as an entire query, but something like a computed column) pretty often, for situations where I want to query things like "give me all foos with a count of bars related to them", or "give me a list of foos with the name of the latest related baz". Most ORMs would want to hydrate the graph of related objects to do that, or at least have multiple round trips to the DB server.

ehutch79 · on May 7, 2022

That doesn't sound like a good ORM. They should be lazy until you access the relevant data.

That said.

This is where the knowing what's going on behind the scenes matters.

ryanbrunner · on May 8, 2022

Oh they would be lazy, it's just that expressing something like that efficiently (i.e. something like "SELECT foo.*, (SELECT count(1) FROM bar where foo_id = foo.id") is usually really hard to do. Most ORMs I've seen would N+1 on that with a naive approach, and even the "optimized" approach will want to fetch all bars vs. just the counts.

dustymcp · on May 6, 2022

I can say for sure this is the case

jopsen · on May 6, 2022

Also REST APIs work nicely with caching proxies and such..

sailfast · on May 6, 2022

It’s not, really, but it IS a good thing for feature development speed if that’s what you’re into, and might help a team figure out quickly which data is critical to optimize for once you start putting more serious data loads through your APIs?

PaulHoule · on May 6, 2022

That's what you get for GraphQL not having an algebra.

If it had an algebra you could build a database engine that answers GraphQL queries like a conventional database engine or you could write a general purpose schema mapping and some tool would write the code that converts GraphQL queries to SQL queries or some other language.

As it is, GraphQL provides a grammar that looks like something people want to believe in but behind it all is a whole lot of nothing.

obi1kenobi · on May 6, 2022

If you want to see what a GraphQL with an algebra could look like, I built one! The query language is parsed with a vanilla GraphQL parser, but has directives like `@filter, @recurse, @optional` etc.

10min talk video: https://www.hytradboi.com/2022/how-to-query-almost-everythin...

GitHub: https://github.com/obi1kenobi/trustfall

mumblemumble · on May 6, 2022

This seems like a "be careful what you wish for" situation.

Sure, you could set up an algebra that allows you to handle arbitrary queries for zero extra programmer effort, just like a SQL database engine does. And then you could even expose it to users, and let them execute arbitrary queries.

And then, later, after you're done cleaning the molten slag off the server room floor, you could stop and reflect on whether that was really such a necessary thing to do.

PaulHoule · on May 6, 2022

If you had a rigorously defined system you could put rigorous limits on it.

If it's not rigorously defined there are no limits, just what people can get away with.

With GraphQL you get the worst of both worlds that people can't write arbitrary queries but they can still trash the system. At least with undefined semantics people don't need to argue about whether or not they got the right answers.

ryanbrunner · on May 6, 2022

"Rigorous limits" for a sufficiently large database means "uses our hand-picked indexes effectively", which reduces down to "provides the same functionality as a REST API" since you need basically a whitelisted list of acceptable operations. At best you can reduce transfer time by limiting columns returned, which is something but not really worth the added complexity.

lubesGordi · on May 6, 2022

I guess I've always assumed the graphQL would be a nice way of implementing a rest api, not something you'd expose to the customer directly.

mumblemumble · on May 7, 2022

My experience trying to maintain databases that are directly exposed to multiple development teams tells me that even exposing a fully generic querying API internally is risky.

Which, just for context - that's not me saying "graphQL is bad", it's me saying, "graphQL making it hard to do that is a feature, not a bug."

striking · on May 6, 2022

Ok, use https://github.com/join-monster/join-monster. If you need autogeneration from the DB instead of hand-curated joins defined on the schema, consider https://www.graphile.org/postgraphile/ or https://hasura.io/.

dustingetz · on May 6, 2022

hand rolling a custom query engine - the exact opposite of what every business wanted when the engineers sold it graphql

obi1kenobi · on May 6, 2022

Why hand-roll one when you can use one that's already available and thoroughly tested :)

https://github.com/obi1kenobi/trustfall

(Hi Dustin!)

bfz · on May 6, 2022

I've yet to encounter a GraphQL off-the-shelf server (from Python and JS spaces) where hitting a slow query didn't immediately turn into half a day's work

The whole concept is what happens when you let a smart person work on a small problem for far, far too long

obi1kenobi · on May 6, 2022

I'd recommend checking out the project link in the comment to which you replied. It is designed _specifically_ to avoid the problem you mention: instead of a fully materialized, fully-nested result, it returns flattened row-oriented results (like a SQL database).

This allows for lazy evaluation i.e. rows are produced only as they are consumed. So if you accidentally write a query that would produce a billion rows but only load 20, the execution of the query only happens for 20 rows + any batching or prefetch optimizations in the adapter used to bind the dataset to the query engine.

PaulHoule · on May 6, 2022

It is a fundamental problem of a "graph".

(1) There are usually some nodes of very high degree and traversing those nodes will explode your query, (2) if you are following N links and the average degree is d, you are going to come across dᴺ nodes and that is a lot of nodes as N gets big!

Tim-Berners Lee told me that if you can't send the whole graph you should send a subset of the graph that contains the most important facts.

It's a right answer but also a frustrating one to a programmer who sees correct implementation of algorithms to mean that you get the ticket done and they don't come at you with a ticket about it again. That is, that query I'm writing is part of an algorithm that depends on getting a certain answer and getting an uncertain answer for one query is like some spoiled milk that ruins the whole batch.

bfz · on May 6, 2022

> It is a fundamental problem of a "graph".

So why are we using it for so many naturally non-graph problems? 90%+ of developers' exposure to graphs is through tightly abstract interfaces, I could name maybe 3 graph-related algorithms off the top of my head, but could implement none of them without reading.

We could represent the text of this comment in a graph using one node for each unique character, but the result would be stupid, the operations would be slow, the representation needlessly complex, and implementations guaranteeably hard to work with

> Tim-Berners Lee told me that if you can't send the whole graph you should send a subset of the graph that contains the most important facts.

Indeed, I also caught the ReST buzz around the 2000-2003 timeframe, and turns out 20 years later nobody does that either, because in its purest form it's a pain in the ass for comparable reasons to the topic at hand

PaulHoule · on May 6, 2022

It's funny to see a blog post on HN almost every day where somebody rediscovers the power of columnar query answering engines which are almost the opposite of graph databases.

I've lost count of how many columnar SQL databases have been donated to the apache project and there are so many systems like Actian and Alteryx where data analysts hook together relational operators with boxes and lines.

I had a prototype of a stream processing engine that passed RDF graphs along the lines between the boxes that enable an "object-relational" model, you could eliminate the need for hard-to-maintain joins but I found that firms that had bought multiple columnar processing database companies believed in performance at all cost and couldn't care less for any system that couldn't be implemented with SIMD instructions.

eurasiantiger · on May 6, 2022

How are they opposite? There are plenty of graph databases out there using columnar storage, even ones directly compatible with GraphQL Federation. Best of both worlds, so to speak.

everforward · on May 6, 2022

> So why are we using it for so many naturally non-graph problems? 90%+ of developers' exposure to graphs is through tightly abstract interfaces, I could name maybe 3 graph-related algorithms off the top of my head, but could implement none of them without reading.

It's a reasonable abstraction for structuring related bits of data (like would go in a typical relational database), and that abstraction can align with the developer's mental model easier.

E.g. ORMs basically convert SQL data into an in-memory graph. Likewise, graph database APIs are natively more object-y; you follow the edge from child to parent, instead of making a bit of data the same in both tables and then querying matching rows.

They're not perfect, and shouldn't be used everywhere (nor even many places they currently get used), but I can see the appeal of abusing them.

eurasiantiger · on May 6, 2022

Because graphs are a good abstraction for relations and with the right tech choices, are much more manageable and malleable than traditional relational databases.

q-big · on May 6, 2022

> It's a right answer but also a frustrating one to a programmer who sees correct implementation of algorithms to mean that you get the ticket done and they don't come at you with a ticket about it again.

This rather sounds like a problem about the project manager and the project management methods that he uses.

PaulHoule · on May 6, 2022

No. I had a time in my career where I was the guy who finished projects that other people started and couldn't finish.

Some coders really don't have discipline and projects never get done because they don't think things throw and keep sending half-baked patches that get sent back by test or the customer.

The role of management is to get those people working for their competitor and then have the "fixer" move in.

contravariant · on May 6, 2022

Nah it's what you get for GraphQL only being an API which people inevitably conflate with the database itself (a harmful trend that probably started with SQL databases).

If you want to use GraphQL you should look for a database supporting it as an interface, or failing that look for an ORM system that supports GraphQL and whatever backend you want.

Trying to convert SQL to GraphQL or GraphQL to SQL is both equally difficult and has little to do with it not having an algebra (also I think most of it is just algebraic types, possibly lacking a proper sum type).

God forbid you should try to modify anything with GraphQL though, that part makes no sense whatsoever.

michael_j_ward · on May 6, 2022

This may interest you

https://www.edgedb.com/docs/edgeql/

spion · on May 6, 2022

See Hasura https://hasura.io/

jseban · on May 6, 2022

> GraphQL is a great experience when you consume it and the service fulfills your query needs.

Unless you already know SQL, and you realise how small and simple the queries could be, then it's really not a great experience to be forced to use graphql.

pier25 · on May 6, 2022

Exactly.

I've been in web dev for 20 years but mostly in the front end space.

A couple of years ago I started doing full stack and trying different databases. For the past year or so I've been using Postgres and learning SQL. This is by far the best solution I've used so far. SQL is extremely expressive, powerful, and elegant.

The problem is that SQL has a strong learning curve which many devs want to avoid. I'm convinced this is the main reason stuff like Mongo or Prisma are so popular. I actually tried Prisma before raw SQL and I vastly prefer SQL for writing queries.

I deeply regret not having spent some time learning SQL years ago.

mjfisher · on May 6, 2022

This might be just over-familiarity on my part, but does SQL really have a strong learning curve, or is it just not used often enough directly these days that people can get by without knowing it?

Standard SQL is a really simple grammar and a very small keyword set - there's basically selecting, updating, deleting, filtering with where, aggregate queries, grouping and joins, and that's like 95% of it. Sub-queries maybe too.

jseban · on May 6, 2022

> This might be just over-familiarity on my part, but does SQL really have a strong learning curve, or is it just not used often enough directly these days that people can get by without knowing it?

I think the problem is that it's declarative instead of imperative, which is really kind of a shock if you are not used to it (you can't go step by step, there's no debugger, there are no branches etc), and also that you have to think in sets in terms of your solution, which is also awkward when you're not used to it.

I think it's definitely worth it though, as nothing we have beats the relational model for CRUD, and there are so many great learning tools online, for example: https://sqlbolt.com/

marcosdumay · on May 6, 2022

SQL has a large learning curve, you can keep learning new thing on it for ears. But not a particularly steep one, you can start using it with very little knowledge, and anything extra your learn immediately improves your situation.

outworlder · on May 6, 2022

> does SQL really have a strong learning curve

Depends. It's easy if all you want to do is select * from whatever;

When you get into subqueries and a whole ton of joins to get the information you need, it can get pretty complicated.

I mean, we have a full university course which was 80% SQL, spanning 6 months.

TL;DR the language is not complicated. Actually using it, can be.

spmurrayzzz · on May 6, 2022

In my experience mentoring entry-level/junior devs, mongo's API anecdotally seems to have a much steeper learning curve over SQL. Once you get past the fundamental CRUD idioms, there are a multitude of implementation details that, if treated as opaque by devs, can introduce significant footguns in even moderate throughput load services.

Some of these details go all the way down to the WiredTiger storage engine, but others are more vanilla (e.g. indexing strategies, atomicity guarantees, causal consistency, etc).

I personally abandoned SQL about a decade ago, but I can appreciate how clean the interface semantics are for even non-technical folks. There are certainly platform-specific implementation details that can matter, especially when you get into the world of partitioning. But largely for most service loads, you're writing queries that satisfy the known index constraints that you imposed on yourself rather than constraints resultant from implementation details.

(I totally realize that even with SQL, that last statement completely changes at a certain scale threshold.)

Xeronate · on May 6, 2022

> I personally abandoned SQL about a decade ago

So you're in the camp that nosql data stores like dynamo/mongo is a good replacement for most SQL workflows? Can you expand on this a bit if you have the time?

spmurrayzzz · on May 8, 2022

I don't actually view decisions like this as binary or mutually exclusive at all. I'm a big proponent of polyglot persistence [1], use the data stores you know and double down on what you know well. I use mongo primarily in my current work, but also have redis, elasticsearch, graphite, and etcd as sidecars in the same ecosystem.

I didn't jettison SQL because there was some fundamental limitation from a storage or scalability perspective. It was clear SQL wasn't going anywhere and would be a good infrastructure bet moving forward.

But initially what drew me to mongo was the clean interop between it and JS (Node.js is my runtime of choice). The shell is written in JS, you query (filter) using objects, you insert and update with objects. This seems like a small thing but this sort of developer experience over time is impactful. Everything feels very native to JS and it does so without any heavy abstractions like ORM/ODM.

After having used it now for 10+ years though, there's much more that I admire about it. Both from a pure architecture lens, but also from API perspective as well as it continues to get better.

To cherry pick an example— The new time series collection feature is a good example of that. For years, folks were using the bucket pattern [2] as a means to optimize query strategies for time series data by lowering timestamp cardinality. Now in v5.0, they give you a native way to specify the same kind of collection but they handle all the storage implementation details related to bucketing for you. Your API to interface with that collection remains mostly the same as any other collection. This sort of community-driven roadmap inertia is attractive to me as an engineer.

(Somewhat of a stream of consciousness here, but hopefully gives you some context as to why I made the switch so long ago)

[1] https://martinfowler.com/bliki/PolyglotPersistence.html [2] https://www.mongodb.com/blog/post/building-with-patterns-the...

hodgesrm · on May 9, 2022

Excellent post. I'm very much in the SQL camp myself, but the clean mapping between MongoDB and Javascript data models is outstanding. If you have a front end that just needs persistence as opposed to complex queries MongoDB is the obvious answer.

spmurrayzzz · on May 9, 2022

Its worth noting, for clarity/posterity, that my prior post is actually discussing mongo in a (mostly) backend distributed systems environment. I find its just as impactful there, not just in more conventional CRUD/browser/full-stack apps.

BackBlast · on May 6, 2022

I'll bite, I have use cases alone these lines. I don't use SQL in new projects.

When iterating on new systems, especially one with live users. I can keep users at different document schemas. If I'm careful I can make it so that all document schema changes don't break old ones yet also allow for new functionality not requiring mass migrations of documents.

CouchDB allows the db to just be exposed to the world directly. Projects where it's reasonable (user owned/controlled data particularly), I can stand up the system with 97% front end code 3% backend. Having the near entirety of your application stack in one place means you can use smaller more specialized teams and your overall areas of concerns are smaller without needing to draw up a formal spec for your data transport.

The whole GraphQL vs REST debate is meaningless when you don't even have to think about your transport stack between the server and the browser. There are other perks to this model such as providing a fully functional website/webapp even while offline. It's trivial to switch between a couchdb backend and local pouchdb copy of your db. Potentially lower bandwidth use while just transferring updated docs instead of consistent queries or asset fetching where the same data moves across the wire over multiple uses (not a win for single visits to a single page style sites). Keeping multiple clients on the same document set in sync without socket.io work.

pmoriarty · on May 6, 2022

"SQL has a strong learning curve which many devs want to avoid"

Really? Of the dozens of languages that I've learned, SQL has been the easiest.

It really feels like it was designed for non-programmers.

reidjs · on May 6, 2022

I started a toy project with the intention of using raw SQL, but I ended up starting to build my own ORM around all the models.

If I have a User and who is trying to create a new Post, with Prisma you eventually set it up to do something like User.createPost(content).

What does createPost method look like with raw SQL? Does it read in from a .sql file that you pass values to?

necovek · on May 6, 2022

The big benefit of ORMs is in the query builders they provide: basically, syntax checking for SQL inside your language of choice, and nicer composition of SQL query parts (to make your code more DRY). Actual mapping to objects is always too heavy in my experience.

However, a slightly unrelated comment on your choice of API design: this approach always introduces an asymmetry in the model that restricts what you can do. If you start allowing post imports that auto-detect authors, you now need a Post.create(content) and Post.setUser(user) too. And then your API users start wondering what's the idiomatic way to create a new post.

The problem is that you are making an early assumption that all posts will belong to a user, yet representing that in an SQL database with database relations, one being independent of the other (User: id, name, email...), and another referencing the first (Post: id, date, user -> User, content...). Your database model allows easy transition to allowing nulls for `user`, yet your API doesn't.

Moving to a more functional API makes this much more natural and less restrictive. Shallow DAOs for User and Post and a function create_post(content, user) may look just like a namespacing difference, but they match your database design more closely. If you want to allow nulls for user in the database, you just do the same in the create_post function.

You can wrap related functions into modules (or classes) — in the domain driven design, most of these would be port/adapter functions, but if your DAO classes are sufficiently shallow, they could be service or domain functions too, etc — they are still ultimately functions (no shared state or side effects).

reidjs · on May 6, 2022

Thanks for clarifying, that makes a lot of sense.

jseban · on May 6, 2022

Just a string in my case, JDBC prepared statement or the equivalent. But if I could really choose freely, I would put all queries as functions/procedures inside the DB to achieve real decoupling from the schema, get consistency with transactions etc, but if I mention that idea, the pitchforks come out and I get chased off the property by the backend developers who become pretty much obsolete in that architecture.

jeffdn · on May 6, 2022

That, or just a string in your application’s code.

The problem with using the ORM as you describe is that when you hit any sort of scale, you need to be doing bulk operations, otherwise your latency goes through the roof, to the point that the number of inefficient queries you are doing can tank the database. I speak from the experience of having seen a database collapse under the load of a backend written in this fashion having request load grow past a certain point — not pretty! The interim solution is to bulkify existing queries and functions in place to the greatest extent possible, while preparing for:

Converting a codebase from having endpoints doing individual ORM operations as described to having proper separation of concerns with a business logic layer between the endpoints and the database is a _massive_ cost. The earlier you implement that, the happier you will be in the longer term. It doesn’t have to be with raw SQL, but many bulk operations are much easier to express with SQL than with the ORM.

cloverich · on May 6, 2022

Doesn't an escape hatch on the ORM provide that though? I seem to remember in both sqlalchemy and (libraries that use) knex being able to dip down into SQL when needed.

foobarian · on May 6, 2022

Coincidentally, modern graphQL backend libraries will do this for you. See e.g. graphql-java, apollo-server, many others.

pycal · on May 6, 2022

If you put PostgREST in front of your Postgres instance, it looks like

POST https://my.website.com

{ title: “Cool New Technology”, article: “I learned a thing today”, user_id: 1234 }

jimbokun · on May 6, 2022

I'm fond of query builder APIs, that only allow you to generate valid queries.

So createPost would just generate the appropriate query with the necessary parameters, and execute it.

eurasiantiger · on May 6, 2022

How many different entity types and relationships between entities does your typical application have?

papito · on May 6, 2022

Hell yes. Good knowledge of SQL is a superpower and is becoming a rare art form.

The new generation of devs thinks that frameworks and ORMs will do the magic for them at no cost, but they don't. There is no substitute for leveraging your storage engine to the max.

The sad part is that databases have evolved and became much better in the last 20 years (I started with MySQL 3.x), but we just don't use them. Everyone acts like "microservices" solved all of our technical challenges. Right.

KptMarchewa · on May 6, 2022

It's "previous" generation of devs that build Hibernates and Entity Frameworks and other ORMs.

I work with "data" systems, where everything has been migrating in the other direction - to SQL - from custom code for last 5 years or more.

bcrosby95 · on May 6, 2022

> The sad part is that databases have evolved and became much better in the last 20 years (I started with MySQL 3.x), but we just don't use them.

Yes, it's a bit like buying a set of silverware and insisting on using the handle as the business end incase you decide to switch your tool.

jimbokun · on May 6, 2022

In my opinion, the value of a well architected micro service is to figure out how to optimize and leverage the capabilities of the underlying storage engine, while presenting a simple performant and correct API to consumers, while not requiring those consumers to understand the underlying details of the datastore.

papito · on May 6, 2022

I am not talking about the customers. Of course they are not supposed to understand it. I am talking about the system design. And microservices do not solve problems in most companies, just create new ones. Distributed systems did not magically become simpler to reason about just because there is Docker.

tracyhenry · on May 6, 2022

GraphQL is just an API language. It doesn't free you from writing database queries.

The point of GraphQL is mostly separation of backend/frontend and avoiding over-fetching/under-fetching. If those don't sound like a benefit, you should use REST.

freedomben · on May 6, 2022

I would agree, but would add one note at the end: "Yes you can do this with REST by including query params to reduce/tune what gets returned, but that can quickly balloon into a monster when you get beyond pagination, ?expanded=1 for full objects (vs partials/abbreviated), etc."

necovek · on May 6, 2022

The same is true of GraphQL, if you want to control how much of nested objects you get (eg. introduce limits on the number of nested objects).

Basically, with GraphQL you hide all the complexity behind generic-seeming API requiring one API call, whereas with REST you'd usually hit multiple endpoints for different data types.

GraphQL has the benefit of allowing the backend to smartly decide how to restrict the data (eg. do the joins between what would have been two bigger REST queries), but that incurs a development cost. The complexity is in marrying all the familiar optimization tricks for SQL databases with exposing that in a generic but still restricted way.

tshaddox · on May 6, 2022

But wait, doesn’t that directly contradict the first commenter’s next paragraph?

> On the other hand, when you are the one to implement the Graphql server, it feels like writing your own database. You have to create a query plan, handle optimizations, corner cases, etc.

If it’s so easy to craft any GraphQL query as an SQL query and let the RDBMS plan an execute the query, then shouldn’t it be easy to implement the GraphQL server on the backend?

necovek · on May 6, 2022

I think your point is a fair one. The distinction is that it's easy to write a contextual SQL query for any one GraphQL query when your database model closely matches your API objects. "Contextual" means that sometimes this requires a "side effect" to happen (eg. creating an index on a column in the SQL DB).

Making it generic and performant at the same time is where the complexity is.

It would be akin to saying how, since knowing that you might need an index in the SQL database is simple, a RDBMS could decide to create those indexes for you.

tehbeard · on May 6, 2022

You're conflating a hand coded and optimized query vs. Building a system to take a tree and generate said optimized query automatically and quickly and correctly.

tshaddox · on May 6, 2022

I don't think I'm conflating it. jseban's comment indicates that anyone who knows how to write simple SQL queries would get no benefit from using GraphQL to consume data, which must mean that there is a simple SQL query that can be written to fulfill any GraphQL query.

marcosdumay · on May 6, 2022

> there is a simple SQL query that can be written to fulfill any GraphQL query

This is true (as long as you expect both queries to be simple, or allow both to be complex).

But the conclusion you get up there is wrong and (obviously) does not follow from that. Creating a software that translate any one query into the other is a very difficult task.

charcircuit · on May 6, 2022

Not all of the data sources for graphql may be in a single database. They may not necessarily even be stored in something that can be accessed with SQL.

ushakov · on May 6, 2022

[flagged]

jdlshore · on May 6, 2022

Let's not stereotype a whole category of people.

est · on May 6, 2022

> it feels like writing your own database. You have to create a query plan, handle optimizations, corner cases, etc.

The culprit is "micro" services. The whole thing was invented by a "software consulting" firm to milk as much billable hours as possible to make a system over-engineered and costy to support but easy to split into multi-layer/multi-stage outsourcing teams/phases, the industry fail into this stupid trap, and the burden was shifted into web & mobile clients, the next thing they realize is sometimes they have to make queries inside while loops.

If your data can be "planed" or "optimized" via a single centralized "GraphQL gateway", then it probably can be centralized inside a single database transaction call with so-out-of-date-you-should-never-use JOINs.

I recently had to render a user feed page, query uid for fids then fid with cmt-ids then each cmt-id for uids for avatar/nick and such, all from a stupid user profile lookup "micro" service, provided by another department, which only accept one param per query (spoiler alert: it's an "anti-pattern", but a sweet "optimization goals" for your next "sprint milestone"), I had to carefully and cleverly combine all those data needed make them as parallel lookups in async with a very good re-usable batch loader class. Which makes me wonder, if all those data sits right inside the same db, why bother scatter them into so many service pipes, then gather them in an PITA fashion?

As a developer I am not against GraphQL or Microservices because it pays, and it's a good pile of tech jargon to confuse the non-tech people and it really sticks, but from a pure technical point of view it's a waste of cpu0 power and emits needless CO2.

alisonatwork · on May 6, 2022

Although the microservices terminology might have been invented by a software consulting firm, distributed architecture already existed and solved problems for many large companies that needed to scale their products (and development processes) beyond what a small team hitting a single database could achieve.

However, I think that's the key point to keep in mind when considering whether GraphQL is a good fit - if you don't already have multiple domain-specific services in your infrastructure, then adding a GraphQL gateway service doesn't make a huge amount of sense to me, because you could've just had your small team of front end developers talk to your small team of back end developers to create exactly the optimized endpoints they needed to solve the problem.

To me GraphQL really seems like a solution for an organizational problem, where there are dozens of teams who all maintain their own services and apps, and now a variety of front end teams want to combine different sets of data from services maintained by different sets of back end teams in a way that doesn't have alignment across the company as as far as deployment/release schedules go... Well now it makes sense to construct a flexible API schema maintained by and for front end specialists - it's just moving their already-existing data processing/join logic out of their various clients into a common server-side component.

jrochkind1 · on May 6, 2022

> because you could've just had your small team of front end developers talk to your small team of back end developers to create exactly the optimized endpoints they needed to solve the problem.

I think the OP (and many of the comments in this discussion) is about making a graphql endpoint for public consumption.

if you are doing it solely for internal use, it does make sense that the "break even" point would be different.

est · on May 6, 2022

> because you could've just had your small team of front end developers talk to your small team of back end developers to create exactly the optimized endpoints they needed to solve the problem

What you are describing is called BFF I guess.

And apparently it's already out of date so let's again split BFFs into smaller parts.

https://martinfowler.com/articles/micro-frontends.html#Backe...

I am not against services, it's organizational motivated "micro" services I am very afraid of.

deckard1 · on May 6, 2022

> The whole thing was invented by a "software consulting" firm

I don't know the whole origin story. It definitely does feel like something Martin Fowler would come up with. But I blame Google for really making it a trend:

https://www.youtube.com/watch?v=3Ea3pkTCYx4

And you can see they understand the whole problem with microservices. It's the same thing The Mythical Man-Month was trying to tell everyone decades ago[1]

> When n people have to communicate among themselves, as n increases, their output decreases

Microservices exasperates this. It is the Multics model. Each microservice implements its own, often wildly different, API. Which every bit of code that needs to use that microservice has to go and implement.

[1] https://en.wikipedia.org/wiki/The_Mythical_Man-Month

Blokje5 · on May 6, 2022

The point of microservices is only partly to support individual teams owning a service. AFAIK the main points are isolating failures, independent deployments and horizontal scaling of individual components.

I do agree that without a good API design it can become a mess quickly and most companies go for microservices without a clear understanding of what goals they are trying to achieve with microservices. For those companies, sticking with a monolith would've probably worked better.

I've even heard cases where companies went back to a monolith and I think that is actually a smart decision in some cases.

But I definitely don't think it is a waste of CPUs power.

hirundo · on May 6, 2022

I spend about 3/4 of a full time job building, maintaining and improving a corporate GraphQL API, and have for the last few years. What you are describing is not my experience. In fact it is far easier than it was in the old days when each new requirement meant code changes to a REST API.

Certainly there have been problems with queries that had unacceptable performance, even those that took down the whole API server and database. Of course that wasn't a novelty with our REST APIs either. It certainly is an issue with GraphQL, but it has been a managable one for us.

Largely this is because the API is not public, and it doesn't have to simply handle anything that is thrown at it. When we see a frequent, slow GraphQL query in a report, we have many options to deal with it, including going to the front-end team and asking them to query in another way, and I have had to resort to that. Often I can optimize the code instead.

But that hasn't been a huge problem, especially compared to the great benefits of pushing most of the data work to the front-end. And the size and complexity limitations we've built into the API handle such problems seemlessly most of the time. The caller gets a clear error message that specifies the problem, and they can usually compensate very quickly with an altered query.

When they can't then I get involved and sometimes have to say, uh, we can't do that ... without scads of extra work. And the work on my plate today is for one of those scads.

I was doing lots of REST API work before GraphQL APIs, and my own and our corporate experience is that GraphQL solves a lot more problems than it causes.

shroompasta · on May 6, 2022

The problem with GraphQL is on the front end. Suddenly, the FE team becomes responsible for understanding the entire data model, which resources can be joined together, by what keys, and what is actually performant vs what isn't.

Instead of doing a simple GET /a/b/1/c and presenting that data structure, they now need to define a query, think about what resources to pull into that query etc. If the query ends up being slow, they have to understand why and work on more complex changes than simply asking the BE team for a smaller response with a few query params.

I hit this problem when contemplating exposing the API of the application I work on to customers, to be used in their automation scripts.

We quickly realized that expecting them to learn our data model and how to use it efficiently would be much more complicated than exposing every plausible use-case explicitly. We could do this on the "API front-end" by building a set of high-level utilities that would embed the GraphQL queries, but that would essentially double much of the work being done in the front-end (and more than double if some customers want to use Python scripting while others want JS and others want TCL or Perl).

So, we decided that the best place to expose those high-level abstractions is exactly the REST API, where it is maximally re-usable.

Aeolun · on May 6, 2022

I think what you are basically saying is the people working on the front-end are a bunch of children that cannot be trusted to do the right thing.

I’ve seen this a lot from backend teams, and it’s beyond frustrating.

Because now my nice clean frontend code suddenly has to deal with a bunch of franken query logic simply because the backend team cannot be bothered to alter their “pristine” API.

Never mind that this means a thousand requests where one graphql query would have sufficed.

salawat · on May 6, 2022

Can confirm. Getting developers (Nevermind QA's) to build data model savvy appears to be one of those things some have taken for granted right up until you realize other people really did mean it when they said you were nuts.

I've never seen it as nuts and a bare pre-req ofodern computing. Apparently this view is the subject of widespread controversy amongst peers.

frosted-flakes · on May 6, 2022

> Largely this is because the API is not public, and it doesn't have to simply handle anything that is thrown at it.

I think this is the key point.

Aeolun · on May 6, 2022

Can’t you just have a gql query abort the moment it takes too much time to retrieve the requested data?

Kaze404 · on May 6, 2022

You can. There's the concept of query complexity as well that lets you simply reject queries that are too complex and likely to cause trouble.

Calamitous · on May 6, 2022

100% this. Folks see the cleanliness and simplicity of the front end without realizing the mountainous costs on the back.

pkulak · on May 6, 2022

Hmm.. that hasn't been my experience at all. I wrote the public GraphQL API for my company, and it was a pretty straight-forward experience. Yes, I had to spend some time on the basic plumbing, but now if something needs to be added, it's just a matter of defining some interface for it and fetching when required. Grabbing an object from the network or DB doesn't need optimizations, a query plan, or have corner cases. Even grabbing i objects starting at offset j only adds a bit more busy work.

Maybe the trick is to keep it simple? There's no need for a bi-directional graph or advanced filtering. But if there really is, it's not like sticking to REST would make that any easier. Some things are just hard, no matter the interface.

spion · on May 6, 2022

GraphQL itself is not a trap, but its easy to fall into the "object graph modelling" trap with it. You probably shouldn't do that unless you have a lot of resources to spend on it. I think "Graph" in the name is what leads people astray, as long as you stick to TreeQL one should be fine.

lexx · on May 6, 2022

You are right. Some things are just hard. I went deep into Graphql because I wanted to explore the possibility of it being an more comprehensible interface for the end user in comparison to a REST interface. In such cases, it is not.

Graphql gives a better way to request nested schemas and handle relationships and recursion. But when you cross that line, the client now would get ideas and starts asking "Why not be able to do that operation on the 5 level deep object?". Now you have to either not allow the client to do that, or you have to "rewrite the database" to make recursion optimal.

This is not a problem of Graphql. This is an HTTP problem. When you need to promote the database querying layer over HTTP, then you have a problem regardless.

bitL · on May 6, 2022

Try a nested pagination (i.e. open the 352nd page of the 7th book on the 3rd shelf in the 5th room of the 3rd city library. Make it performant. Have fun with GraphQL! /s

necovek · on May 6, 2022

That sounds trivial I think because you are looking for exactly one item and there's no pagination involved.

The problem might be to get those 352nd pages of every book with the title starting with "A" sitting on 3rd shelf of every city library: when there are unbounded results nested deeper than top-level, and possibly those multiple times, that's when it gets hairy.

bitL · on May 7, 2022

Actually, it's about opening 10 different books at different pages and similarly upwards... There is no GraphQL mechanism for it outside some hack in individual libraries that does more server roundtrips, and it seems like GraphQL authors simply avoid discussing it.

mikewhy · on May 7, 2022

How would you implement this with rest / something more traditional?

bitL · on May 8, 2022

For example, you can make a separate REST method returning all levels you need. However, it still doesn't account for items lost while paginating (e.g. some visible items are deleted while one scrolls on a device etc.).

coffeefirst · on May 6, 2022

The flip side of this is a lot of folks are adopting GraphQL who are not prepared to do it well, so they make something half baked, missing things you need, and their documentation is absolutely useless.

This isn't new, there's plenty of sloppy REST APIs, but it was so much easier and less painful to explore and stitch together pieces of an imperfect REST API than it is to interact with a bad GraphQL API.

travisd · on May 6, 2022

I’ve found it easiest to implement in Node due to the explicit event loop structure. You use data loaders, which is a super generic term that means “batch all requests for this resource into the next event loop tick.”

So when a query requests a list of users, and then every users friends, that becomes two queries: one to load all the users, and one to load all the friends for all those users. The net effect is that your number of queries is O(query depth) rather than O(objects requested).

Admittedly this does tend to work best with more K-V oriented data that truly relational data, and might be hard to retrofit onto a brownfield project, but i’ve never found it all that hard to do.

lexx · on May 6, 2022

I've found that this is a simple and effective way to handle relational data either in REST or Graphql. But imagine having to traverse trees, filter on data of different types and levels. Sort and filter on edges. I mean it can get pretty complex and I am not saying that REST would be easier on those complex cases.

In my opinion graphql and rest can both be super cute in simple everyday queries. But I am thinking that people are creating databases like postgres, mongodb, neo4j, etc, are doing exactly that. Trying to give us the power to query our data efficiently. Why not be able to expose directly the database and just add a layer for security, control, decorating and other stuff that would add value. Why rewrite databases?

travisd · on May 6, 2022

There are products that do that! Hasura comes to mind.

But APIs can have different use cases. It’s usually considered bad to directly expose your table schema over GraphQL because it locks you in and makes it hard to change your data model over time. And not all API access is “get this data” and “set this data” — it can be difficult to express complex logic in just a database. And of course, some GraphQL APIs aren’t backed by a database - they’re backed by other services (a la the “backend for frontend” pattern).

I’m very pro choosing the simplest solution that works — but sometimes, the simplest solution does bring some complexity in exchange for other trade offs (like flexibility).

lexx · on May 6, 2022

I agree with you. I am currently working on a project that need to give very sophisticated querying capabilities, so I'm kind of seeing everything from that prism.

agumonkey · on May 6, 2022

css meets backend ?

lexx · on May 6, 2022

you reminded me of this one https://news.ycombinator.com/item?id=30191729

agumonkey · on May 6, 2022

that was unplanned

sixdimensional · on May 6, 2022

I worked in the data federation space for a number of years (it's actually quite an old term, I worked in it back in 2013, around the time there was an early wave of activity around this and the concept of a "data fabric").

When I saw GraphQL come out, I knew that what you are saying would happen.

In the data federation tool I worked in, SQL was the interface abstraction to join across heterogenous platforms (think of things like Presto/Trino or Dremio). GraphQL as an interface requires the same underlying infrastructure as that data federation tool I worked on in terms of query analysis, parsing, planning, optimization, execution, etc.

Those are "hard problems" due to lack of standardized interfaces, access patterns, direct data access, I/O, network bandwidth and infra related latency, costs, compatibility, data types, etc. These problems are distributed system problems coupled with often incompatible interface layers (e.g., even if you are using multiple SQL databases with GraphQL, you run into the same).

If your scale is such that you can build GraphQL on a handful of systems and for a handful of use cases, great! If you have to go to a certain larger scale, you're back into federation territory (which in the app layer might also be called API composition).

One potential option - when you reach the point where you need complex GraphQL query coordination, more than seems to make sense to implement, pair it with a data federation tool such as Presto/Trino, Dremio, Denodo, or research approaches such as caching/materialized views (engines like that are becoming decoupled from databases, such as Materialize.io) - and let those engines do the hard work.

In that case your work becomes more like GraphQL -> SQL or API -> a data federation, caching or materialization platform. CQRS and event sourcing plays a role here too.

Consider also, the possibility that if you are willing to accept a bit of delay in aggregated results from multiple systems, doing those compositions or aggregations in the data platform layer, and simple feeding those to the GraphQL interface. That could even be done in a single database/data platform if you really wanted without too much fancy federation tech.

Federation is powerful but complex. It seems like a fun hard problem, but for many tech teams, it can be a complexity and time suck. My recommendation would be try to avoid building that if you can.

captaincaveman · on May 6, 2022

A good summary, and similar to my own experience.

arinlen · on May 6, 2022

> GraphQL is a great experience when you consume it and the service fulfills your query needs. Because you just ask stuff and you get them. It's really cool.

How about caching? It feels like GraphQL tries to win some (arguable) flexibility in putting together clients and in the process throws out most of the operational advantages of resource-based APIs with significant disadvantages in both how to put together a backend.

lexx · on May 6, 2022

One solution is "Persistent Queries". Other solution is to throw a Varnish Cache and cache the hell out of POST requests :P

Is it elegant? No.

Do you have another service to manage? Yes

Will you pay someone else to do it for you? Possibly

I think there are already products trying to do that. But how many layers of abstractions and dependencies are you willing to have in your everyday processes?

tracyhenry · on May 6, 2022

Checkout Relay.js: https://relay.dev/

It does a lot of client-side caching for you. The documentation is atrocious though IMO. I'm not sure if there exists a similar framework for backend caching.

arinlen · on May 6, 2022

> Checkout Relay.js: https://relay.dev/

Relay is a GraphQL client. That's the irrelevant side of caching, because that can be trivially implemented by an intern, specially given GraphQL's official copout of caching based on primary keys [1], and doesn't have any meaningful impact on the client's resources.

The relevant side of caching is server-side caching: the bits of your system that allow it to fulfill results while skipping the expensive bits, like having to hit the database. This is what really matters both in terms of operational costs and performance, and this is what GraphQL fails to deliver.

[1] https://graphql.org/learn/caching/

tracyhenry · on May 6, 2022

https://relay.dev/docs/principles-and-architecture/thinking-...

Take a look at this. Either you didn't know what's challenging about caching nested graph data, or we have different definitions of triviality/interns.

arinlen · on May 6, 2022

> Take a look at this.

I repeat: client-side caching is not a problem, even with GraphQL.

The technical problems regarding GraphQL's blockers to caching lies in server-side caching.

For server-side caching, the only answer that GraphQL offers is to use primary keys, hand-wave a lot, and hope that your GraphQL implementation did some sort of optimization to handle that corner case by caching results.

Don't take my word for it. It's really that bad.

https://graphql.org/learn/caching/

RedShift1 · on May 6, 2022

You can execute GraphQL queries via GET and set a cache up for it like REST. Technically it's also allowed to cache POST requests but I guess anyone who comes across that is going to raise their eyebrows.

arinlen · on May 6, 2022

> You can execute GraphQL queries via GET and set a cache up for it like REST.

Does it, though? It seems it really doesn't, nor was GraphQL designed with HTTP caching in mind.

The only references to caching in GraphQL are vague hand-waiving arguments about how theoretically GraphQL implementations might be implemented with some sort of support for caching primary keys.

But any type of HTTP caching is automatically excluded from GrahQL.

To put it differently, is there any third-party caching solution for GraphQL? As far as I could gather, the answer is no.