The problem that GraphQL was trying to resolve is real: reasonably-sized REST projects usually ended up inventing their own awkward, ad-hoc mini query languages on top of REST.
But GraphQL as a solution is not great. It looks nice in the first place, but there is too much hassle to deal with.
P.S. I've seen quite a fraction of GraphQL projects actually using Node.js as the backend. If that's the case I would recommend TRPC[0] over GraphQL - it's more seamless and straightforward.
I'd argue GraphQL really doesn't even solve that "mini-query-language" problem all that well. But, I'm with you; that's how its sold. Its one of the big things its proponents say. And it fails at it.
Let me pick on an example of one of these rest-api-mini-query-language specs: the Microsoft Graph API, which uses OData. It supports:
* Count (don't return items; return a count of them)
* Expand (graph traversal on related resources)
* Filter
* Format
* Order (sorting)
* Search
* Select
* Skip
* SkipToken
* Top
* A bevy of others
Of these; GraphQL solves Select and Expand. That's it. Everything else INEVITABLY becomes a pseudo-odata-mini-query-language on top of GraphQL; the exact same problem REST APIs had! Pagination. Skip/limits. Response reducing/counting/analytics. Filtering. Etc.
Of course, a framework has to stop somewhere. Lest you become OData, which isn't all that great to use. So, I'm not proposing that GraphQL should do more; but rather that its proponents need to stop listing this as an advantage of the framework, because its actually a disadvantage. Its only "good" at this relative to literally `npm i express`; the most basic-possible-REST-API. The REST & RPC ecosystems have a wide array of higher level tooling to select from, at every possible level of "nothing" to "everything and the kitchen sink"; GraphQL is startlingly boring in comparison, and proponents who list this as an advantage of GraphQL really aren't doing much more than admitting how little exposure they have in competing frameworks (or, similarly, how poorly APIs were built at whatever company they worked at last).
I made a query language that parses as legal GraphQL using a vanilla parser, but has directives like `@filter, @recurse, @optional` etc. that make it a real query language. Also, instead of returning a giant fully-nested result, it flattens results and emits them row-by-row like a SQL database. This means the query evaluation can be lazy and incremental — if you write a query that has a billion results but only load 20 of them, then only 20 rows' worth of work happens.
My company has been using this in production for 6 years now across everything from TB-scale SQL clusters with X00,000 tables/views, to querying our own codebase, configuration, and deployment information to find and prevent bugs. I gave a 10min talk at a conference about this recently, if you'd like to learn more: https://www.hytradboi.com/2022/how-to-query-almost-everythin...
Software architecture really needs to acknowledge the human aspect of building platforms. GraphQL is great solution to organizational problem of slow communication and/or misaligned incentives between frontend and backend teams. GraphQL is essentially a self-service approach where API developers can create a flexible, open-ended data access plane that end users can consume as they wish. That incurs a lot of extra technical complexity, but obviates a lot of organizational complexity. That could be a very valid concern if your backend is public or otherwise serves a really diverse group of clients.
The most efficient and effective API integration projects I've done is where the API and frontend teams are tightly knit, working off a shared backlog and able to pass a chain of requirements from design, to contract writing, to development on both sides and get really tight alignment. That lets us create very tightly optimized REST endpoints that are very cache-friendly and can deliver precise payloads to optimize both round trips and bandwidth. It's actually easier to build because requirements are really clear, but comes at the cost of doing all that communication to align on requirements.
Then again, it only takes me 5 minutes to add a new endpoint that just dumps back a specific SQL query result.
I think a lot of companies make adding endpoints into a big deal, often a whole new file or class with documentation and tests.
An SQL query doesn't need tests.
I think another part of it is that SQL still scares people. That is, I'd put graphql in the same camp as active record, orms, and other silly attempts at creating a query language to avoiding using a query language.
Have you ever done financial stuff on SQL? Transaction processing or even just CRM/billing?
Spend 15 minutes imagining a single query that fetches invoices, combines them with bank transaction data and produces a categorized list of unpaid-but-not-late, unpaid-and-late, not-fully-paid-but-not-late, not-fully-paid-and-late, fully-paid-in-time and paid-too-much invoices for this year’s billing cycle, selected from tables that contain info for all historical years too.
I was referring to how hard some companies make it to ship basic code.
The graphql setups I've had to work with are all far more complicated than a function that returns an array and if your framework needs more than that to dump it out on an http request, you've got problems, but SQL isn't it.
GraphQL also has advantages that are otherwise difficult to realize, at least without an API schema. Request and response validation and object-level caching come to mind. How would you otherwise share cached objects between API endpoints? Need to set up a custom redis integration. With GraphQL, such things often come in nicely wrapped packages.
I use https://www.jsonrpc.org/specification. I hate REST. With JSON-RPC, I can have true 1:1 mapping on both ends to how I write code, and it's transport agnostic. Doesn't rely on any language. I use PHP in the backend, TS in the frontend.
There are ways to make it somewhat type safe with tools like https://open-rpc.org/ but I tend to just go vanilla with it and write TypeScript types in the frontend for the results on the frontend.
The problem with REST for me is that it locks you to HTTP verbs and URIs. And it's not a great mapping for much more than CRUD. I find it really restrictive and annoying. It's specific to HTTP as the transport, so I can't use REST over websockets or some raw TCP server-to-server pipe, etc, without making a kludge of it.
This is actually really cool! Any chance you can point me to an implementation for JSON-RPC? I'm interested to see the benefits with JSON-RPC over a traditional REST API.
The spec explains pretty much everything you need to know. Look for a client and server implementation in your language of choice; essentially all it needs to do is decode the JSON message and validate that it uses the right structure, and handle message IDs for batching, then call your custom message handling logic.
Over HTTP, basically you just have a single endpoint which always takes POST requests, and always responds with 200 status, even on errors. If you're using this in a frontend JS application, you'd want a wrapper over fetch() which rejects the promise if the RPC response has an error in the message.
IMO, the main problem with GraphQL is that a query language isn’t much without a query planner.
Without a planner, you end up with the backend picking a set of queries that the caller can make (and with it, you probably want to restrict callers, anyways). I don’t see how that’s much different from writing N stored procedures and exposing those.
If you give those consistent names, the caller has a clear list of things that can be done. With GraphQL, the API looks more flexible than it is.
I think GraphQL‘s main innovation is that it allows the caller to specify what fields to return. That makes it easy to provide f! - 1 variants of queries that return up to f fields, cutting down on traffic significantly in many cases.
The major problem with GraphQL is that it tries to replace REST, instead of augmenting it. REST is a damned fine model, and arguably what every web API should use.
I guess it wouldn't have been as much fun to augment REST with query language than to invent a new one.
> You can very easily have a graphql endpoint working in a REST API.
The fact that it is an additional endpoint with non-REST semantics is what makes it a replacement rather than an augmentation of REST. In a REST API, endpoints correlate with resources; the resources are queried and updated with HTTP verbs acting on those endpoints. GraphQL introduces a parallel world.
What should have been done instead, IMHO, was to design a standard (not necessarily limited to JSON!) way to query and update resources through their endpoints.
I don't understand how that makes it a replacement. Breaking conventions of REST for one endpoint doesn't necessarily mean you're replacing it, unless you're using REST as a dogma rather than a tool.
Having recently had to take over a GraphQL + Relay project in the last ~6 months, I agree with this. When things work (e.g., the frontend/backend magically staying in sync) it feels really great. But with the amount of tech you're throwing at the problem I found it difficult to learn (the ecosystem is immature, the documentation often sucks, your problem space grows with GraphQL + GraphQL client of choice + web framework of choice). I found the bugs and opaque performance issues to be harder to track down. I'm sure that with time and mastery these problems will resolve themselves, but I'd much rather spend my "cool stuff budget" on tech that (IMO) provides a significantly greater benefit for the inconveniences caused.
In my relatively naive opinion, graphql exists to solve a few things (which are mostly benefits for large scale companies):
1. Reduce the total number of HTTP connections required to get the data for a specific component or page. Imo this is presumably less important with http3 but the driving force is probably similar for both.
2. Give backend-for-frontend style services more autonomy to provide an interface that works for a specific feature on the front end without coupling it to specifics of how the backend organizes resources
3. Allow performance issues with frontend query logic to be addressed without changing anything on the front end
I think the situations where it would be useful are large teams that want to operate more independently where the overhead of dealing with graphql is worth it.
An example being someone trying to render a component for a list of comments. It would be great if the front end could just write some "magic" query that gives them everything they want and they don't have to worry about batching requests to specific endpoints and the order in which they should make those requests. That doesn't mean the problem magically goes away, but it's now someone else's problem and that's good: the app development can free itself from things it shouldn't care about.
I think you've got number 3 backwards. GraphQL is a generic query solution that is harder to optimize than an endpoint that was built as a one-off for a front-end use case. Endpoints designed to support a specific UI view or operation are easy to optimize because their use case is narrow and the queries can be modified simply and safely. There might be more ceremony, but it's much more maintainable in my opinion.
Graphql servers are typically not run in a manner that allows arbitrary query execution, usually only a select set of queries that were written by the devs, tested, and potentially profiled before deployment. from that perspective, if a mobile client is written to work against the interface they provide, then if you need to fix things on the backend, you can potentially do so without the frontend being aware of it. In that sense it's the same thing as a specifically designed REST endpoint, so I'll give you that, but it's hardly a negative in GraphQL's favor.
Number 1 is valid. But becoming less relevant as you mentioned. Number 3 also valid.
Number 2 I really doubt this. Having more flexible API contracts (GraphQL) usually only makes dependencies worse, not less. Sure you can write any query you want, but how do you agree on expectations if there is no contract on the API level? Strict API contracts usually force you up-front to agree on expectations.
I probably wasn't entirely clear in my post that I'm not necessarily a believer in GraphQL. I'm mostly repeating/surmising the reasons that other people are interested in it, and not necessarily myself. You could argue with any of the points, really.
It is a good summary, I would also add the need for combining lots of internal badly documented microservices as a prerequisite (basically what they used it originally for in Facebook)
I've used GraphQL on several projects now, with small teams (2-5 devs) where we control the front and back end, and for that case, it's actually been really good. It provides a solid contract that is easy to get wrong with a REST API. (Will the server send IDs as ints or strings? Will it accept either? Which params are optional? What shape does the data come back in, an array of results, or an objects with results nested somewhere? etc.) I've even setup CI to fail if it finds queries on the front-end that don't match the schema. And of course it's easy to request exactly the data you want on the front -end.
A big downside is that the tooling is immature, and all subject to big changes. And I haven't written a server meant for public consumption, but that does add new layers that you can kind of ignore when you trust the client.
If you own the backend and the frontend, I don't see how graphql helps you. You can just implement the APIs that you need to solve your business problems. Now all you're left with are the downsides. I'll pass.
One of the problems backend systems sometimes have is presenting similar data in slightly different ways for use in many different places. One of the solutions to this is GraphQL, which lets you break your data loading into reusable pieces (resolvers) and define endpoints based on those reusable pieces (queries).
You could "just" do this yourself, of course. GraphQL is nice because it gives you a sensible starting point and a lot of free tooling. You get Graphiql to help you quickly write new queries, you get well-defined schemas between server and client, you get linters for queries, you get GraphQL itself to glue your resolvers together and interpret your queries.
If your system has a grand total of 20 API endpoints and they're nearly all unique, you don't need GraphQL.
I listed some ways that it helps. You get a strict, well-defined, easily-documented API. Yeah, you can do all that with REST, but I like having a defined structure for it.
On a large team I’ll often be debugging someone else’s code anyways, and the framework code is usually better than hacked together product code (and comes with a public bug tracker).
On the team I lead, we keep our domain logic pure, encapsulated, and layered. This means that if I have to debug a fellow teammates code, I can be confident that it is only business logic related code and not more technical, library dependent code.
When our code has to interact with other systems, the other systems' responses have to pass through an anti corruption layer and be translated into something we care about. All of this is surrounded with try catch and we quite obsessively check and validate everything coming in and out.
We do all of this because if/when something breaks in some library or framework that we didnt write, it doesnt bleed through to our POJOs. And, if we ever need to switch to the new flavor of the week library which happens quite often, we don't have to rewrite everything, just the adapter classes.
> Will the server send IDs as ints or strings? Will it accept either? Which params are optional? What shape does the data come back in, an array of results, or an objects with results nested somewhere? etc.
Hum... It seems you are missing a data definition language. GraphQL isn't a very good solution to that.
I don't understand how if you're on a team in which you control both the front end and back end, you wouldn't understand these factors - "Will the server send IDs as ints or strings? Will it accept either? Which params are optional? What shape does the data come back in, an array of results, or an objects with results nested somewhere? etc"
Surely you would know all the answers to these questions because your team control the entire stack.
If it was the case that the FE and BE teams were separate, then these would be reasonable considerations, but I just don't see it in a team dealing with full control.
Designing a solid contract can be useful even when you’re one person, for the same reason that decoupling classes or concerns is useful. I don’t need to consult another person or read the code to understand what it’s expecting (especially true in contexts where documentation may not have been written yet).
GraphQL is a great experience when you consume it and the service fulfills your query needs. Because you just ask stuff and you get them. It's really cool.
On the other hand, when you are the one to implement the Graphql server, it feels like writing your own database. You have to create a query plan, handle optimizations, corner cases, etc.
Also if you really want to provide a graph experience, with inverse connections, filter on relationships and other advanced stuff... get ready to burn your mind and your soul.
> GraphQL is a great experience when you consume it and the service fulfills your query needs. Because you just ask stuff and you get them. It's really cool.
I guess it's better when the tooling you use has direct gql integration and builds the queries for you?
Because in my experience accessing the github APIs with "basic" HTTP libraries is way more annoying using v4 (graphql) than v3 ("rest") — it could also be that github's v4 API is dreadful mind, I wouldn't be surprised.
GQL should be more efficient because it's not returning 95% of garbage I don't need, but having to write 5-deep queries (because of the edge indirections) by hand is way more of a pain in the ass than performing two GET requests with a few parameters munged in the URLs. And then I still have to go and retrieve the information 5-deep in the resulting document.
Pagination is also awkward, because now you probably want multiple different queries (and thus multiple different resulting documents) so that your 2+ fetches don't retrieve unpaginated information you got the first time around. And it gets worse when nesting comes in.
I don't think graphql is generally a great experience when you consume it either.
100% agree that pagination is extremely awkward, especially with nesting. Between the pagination problem and the "oops I asked for too much data and blew up the server" problem, I think it's more work than one might think to run a GraphQL API.
For my own work, I took things in a different direction: I made a query language that parses as legal GraphQL using a vanilla parser, but has directives like `@filter, @recurse, @optional` etc. Instead of returning a giant fully-nested result, it flattens results and emits them row-by-row like a SQL database. This means the query evaluation can be lazy and incremental — if you write a query that has a billion results but only load 20 of them, then only 20 rows' worth of work happens.
My company has been using this in production for 6 years now across everything from TB-scale SQL clusters with X00,000 tables/views, to querying our own codebase, configuration, and deployment information to find and prevent bugs. I gave a 10min talk at a conference about this recently, if you'd like to learn more:
https://www.hytradboi.com/2022/how-to-query-almost-everythin...
Not parent, but my biggest challenge is some v3 APIs are not there in v4 yet. For example, activity and notifications (https://docs.github.com/en/rest/activity/notifications) is something I'm still looking forward to, but forced to keep using REST until it becomes available via GraphQL.
The pagination point is described well in nearby comments. It only applies when attempting to paginate across more than 1 dimension at once, like "get all pages of comments in an issue, and all pages of reactions for each comment".
Not just cURL; most of the time I want something from the GitHub API it's something fairly simple; using REST from Python, Go, Ruby, $preferred_language is easier than using GraphQL, too. I'm sure there are libraries out there, but hard to beat a simple "fetch me data from that URL yo".
GraphQL uses HTTP like the REST API and speaks JSON. There's no need for a library if you're comfortable sending a POST request.
It seems to me like the main friction that you and others are getting at is that GraphQL is more work to use than REST because you have to write a query. That's a fair point! Perhaps we could publish "canned" queries that are the equivalent of the most commonly used REST endpoints, or make them available for use in the API with a special param.
Yes, you need to write a query; and it's also not at all that obvious how to write a query. Let's say you want to list all repos belonging to a user or organisation, a fairly simple and common operation. I found [1] in about 30 seconds. I've been trying to do the same with GraphQL for five minutes now using the docs and GraphQL Explorer, and thus far haven't managed to get the same result.
I worked a bit with GraphQL in the past, but never all that much. Now, I'm sure I could figure it out of I sit down, read the basics of GraphQL, familiarize myself with GitHub's GraphQL schema, etc. But ... it's all a lot of effort and complex vs. the REST API; even with a solid foundation in GraphQL there's still lots more parts.
GraphQL is kind of like giving people a schema to an SQL database and telling them that's an "API"; it kind of is, but also isn't really. There's a reason almost all applications have some sort of database layer (API!) to interact with the database, rather than just writing queries on the fly whenever needed.
That's completely fair. I think the analogy to SQL as an API is very apt. No one would argue that full SQL access isn't a powerful API but it takes some legwork to understand the schema and write queries to get the data you need.
There's a divide between at least two types of persona here. On one side is integrators building products and features on top of the GitHub API. For these people GraphQL is arguably superior since the learning curve is manageable and in exchange for scaling it you can make an extremely specific query that returns your product's exact data requirements. The cost of writing this query is amortized over the lifetime of your integration.
On the other side are e.g. users automating some part of their GitHub workflow to save themselves time. I can see how the REST API feels like a better choice here, it's certainly simpler to get started with.
For what it's worth, here[0] is an example of using the `gh` CLI's graphql feature to print a list of all repository URLs for a given organization by login, sorted in a relatively complicated way. It's more verbose than doing this with the REST API but significantly more flexible. This could just as easily be done with curl but as others have pointed out, pagination requires a minimal level of logic to implement, so it's more convenient to use an existing helper. This output gets flushed 10 lines at a time as pages come in, making it suitable to compose with other commands using pipes.
The GraphQL API works just as well with curl. There's no getting around the fact that you need to pass a query text but assuming you put the query in a file the curl syntax is identical to the REST API:
> Pagination is also awkward, because now you probably want multiple different queries (and thus multiple different resulting documents) so that your 2+ fetches don't retrieve unpaginated information you got the first time around. And it gets worse when nesting comes in.
You don't need to write a different graphql query, use variables. Good graphql APIs will expose a start and limit field for Pagination.
I think you misunderstood the issue. Of course you will use variables for the pagination itself, the issue is that your head of line query will be grabbing other fields than the paginated one.
You don't want to repeat these fetches in the followup queries, they're redundant, and assuming the API is rate-limited they will decrease your query budget for no value.
That counts double if you're fetching multiple paginated fields (which also adds to the awkwardness).
Agreed — consider what happens when you have a node(start: Int, limit: Int) inside or alongside another such node with start and limit.
Your pagination is now two-dimensional, with each node's start/limit as points on its own axis.
Now add a third node to the query. Now you have three-dimensional pagination. This quickly goes off the rails.
Try writing a generic N-dimensional paginator for such a query to see why it's difficult. Even designing a sensible and reasonably flexible API for one is a headache.
Indeed. It may not be a huge problem depending on how much data you need, but there are lots of cases where you'd really rather avoid refetching the rest of the root.
TBF you could also deal with it using fragments I think, but still, not great.
The server side of pagination is really complex if you want to make sure all the results are returned exactly once. If that's your case, consider not paginating at all.
But very often, a result missing or duplicated in a few queries isn't a showstopper. On that case, pagination is very simple.
Spring Batch covers the cases where you have to get the right answer!
That is you are "full scanning" and making a report or doing something like a reconciliation process in a financial institution, it is not like some image boards where there is a link to the 1781th page of images but it spends forever loading it if you click on it.
> but having to write 5-deep queries (because of the edge indirections) by hand is way more of a pain in the ass than performing two GET requests with a few parameters munged in the URLs. And then I still have to go and retrieve the information 5-deep in the resulting document.
I usually write my queries in GraphiQL (check some boxes) and then paste them into the app after I have them working right.
I think GraphQL is best understood as an incomplete ORM that you have to complete yourself on the backend. If GraphQL generated SQL (given some tooling or what have you) pretty much all the problems are solved. Indeed backend-as-a-service products like Hasura or Postgraphile are this missing piece. I guess we're uncomfortable with SQL over the wire or open-to-the-world databases, but we shouldn't be.
Or maybe TLDR "dear next generation of engineers: SQL is actually pretty good".
SQL is excellent but exposing your database is not. The inventors of graphql specifically said that they don't think that it is a good idea to do so. They never intended it for that.
> SQL is excellent but exposing your database is not.
That was definitely true before Postgres got row security (admittedly in 2016, after GraphQL was released in 2015), but these days there's really no need to run an entire app server in front of your database just to implement permissions.
I am with you. Every time I looked at GraphQL or asked to implement one, I had to say no.
How is this a good thing for the backend or infra engineers? It's a mega facade without a lot of toolings to help the backend.
GraphQL reminded me of common ORM criticisms. Wide API surface area with a lot of rooms for accidents. And GraphQL made it worse by being exposed as a service.
Infrastructure is one thing that seems to catch a lot of people off guard. So many infrastructure tools are based on monitoring HTTP codes, but even when there are errors graphql servers send 200s unless modified. It turned into quite the headache for us.
100% - I can see how it might be great for FB where they have the capacity to optimize but without that engineering capacity it seems like it would turn into a net negative.
ORMs are also pretty easy to use on a case-by-case basis if needed, either by using the escape hatches the ORM provides or by bypassing it altogether. Deciding "oh GraphQL isn't good for this particular use case so I'll spin up a parallel REST API" is a much bigger decision to make.
what you say is unpopular, but it's a lot more true than most people (especially front end people in this case) want to admit. Of course there are plenty of exceptions (people on FE who think about, care about, and know about what happens on the backend), particularly the Venn diagram of FE people reading HN, but the majority in the industry definitely do not. The bigger and/or more specialized the company, the worse that problem gets.
To be clear, this is not just a problem for FE people. extremely normal for humans to become myopic in the areas they spend the most time in. FE does it, BE does it, management does it, everyone does it. Find a standard mobile engineer doing native iOS or Android, and they're going to be even more disconnected from the effects on the backend, and they come by it honestly. If you tend to specialize more in one area, building an awareness of your own biases/perspective, and exercising intentional empathy, can make a huge difference in how easy you are to work with.
When looking at dysfunctional engineering orgs, one of the first things I do is figure out where the "power" is and figure out their backgrounds. The most extreme example might be a company founded by a FE eng for whom backend is just a necessarily evil to support their app. Or a company founded by a BE guy for whom the real value is the API, and the clients are just there to abstract it for normal people.
Taking this in and finding a healthy balance of the way things are structured can help improve a dysfunctional org a lot. FE, BE, DevOps/Infra, etc are important pieces in an overall puzzle. Without a well-functioning team behind each, the company and product suffer.
SQL is a transferable skill. ORMs are not. If you already know SQL and have to use an ORM on top of that, then it's a net loss.
It's trivial to use SQL to build objects from individual rows in a database. Which is all an ORM is really good for. Once you start doing reporting or aggregates, then ORMs fall apart. I've seen developers who, because they have a library built up around their flavor of ORM, go and do reporting with that ORM. What happens is this ORM report consumes all the RAM in the system and takes minutes to run. Or crashes.
ORM code hits performance issues because so many objects have to be built (RAM usage) and the SQL queries underneath the framework are not at all efficient. You can use OOP on top of SQL and get decent performance. But you need shallow objects built on top of complex SQL. ORM goes the opposite: large hierarchies of nested objects built from many dumb SQL queries.
This also ties into GraphQL. Think careful about the hierarchies you design. A flat API call that does one thing well is often better than some deeply-nested monster query that has to fetch the entire universe before it can reach that one field you need.
1. query builder APIs, which can only generate valid queries, but you can control exactly what that query will be,
2. APIs that return basic data structures from the database, like maps or tuples.
Query languages like SQL are very powerful and easy to learn. And in my opinion, preferable to ORM approach of "what method calls do I need to make to trick the engine into executing the SQL I know would make this work?" ORMs add complexity and limitations that, in my opinion, are not worth the benefits.
People will have different opinions, but for me, there's nothing wrong with ORMs themselves, they are a significant productivity boost for 80% of the database interactions in your app. The tricky part is recognizing the 20% where ORMs are a bad idea, which ends up meaning that an ORM is best used not as a replacement for knowing SQL, but as a tool to make you more productive when you already know SQL.
ORM's are fine for the majority of simple use cases. When things get complicated you end up either fighting with the ORM or just overriding it and writing the sql yourself anyway.
I'll use raw SQL (maybe not as an entire query, but something like a computed column) pretty often, for situations where I want to query things like "give me all foos with a count of bars related to them", or "give me a list of foos with the name of the latest related baz". Most ORMs would want to hydrate the graph of related objects to do that, or at least have multiple round trips to the DB server.
Oh they would be lazy, it's just that expressing something like that efficiently (i.e. something like "SELECT foo.*, (SELECT count(1) FROM bar where foo_id = foo.id") is usually really hard to do. Most ORMs I've seen would N+1 on that with a naive approach, and even the "optimized" approach will want to fetch all bars vs. just the counts.
It’s not, really, but it IS a good thing for feature development speed if that’s what you’re into, and might help a team figure out quickly which data is critical to optimize for once you start putting more serious data loads through your APIs?
That's what you get for GraphQL not having an algebra.
If it had an algebra you could build a database engine that answers GraphQL queries like a conventional database engine or you could write a general purpose schema mapping and some tool would write the code that converts GraphQL queries to SQL queries or some other language.
As it is, GraphQL provides a grammar that looks like something people want to believe in but behind it all is a whole lot of nothing.
If you want to see what a GraphQL with an algebra could look like, I built one! The query language is parsed with a vanilla GraphQL parser, but has directives like `@filter, @recurse, @optional` etc.
This seems like a "be careful what you wish for" situation.
Sure, you could set up an algebra that allows you to handle arbitrary queries for zero extra programmer effort, just like a SQL database engine does. And then you could even expose it to users, and let them execute arbitrary queries.
And then, later, after you're done cleaning the molten slag off the server room floor, you could stop and reflect on whether that was really such a necessary thing to do.
If you had a rigorously defined system you could put rigorous limits on it.
If it's not rigorously defined there are no limits, just what people can get away with.
With GraphQL you get the worst of both worlds that people can't write arbitrary queries but they can still trash the system. At least with undefined semantics people don't need to argue about whether or not they got the right answers.
"Rigorous limits" for a sufficiently large database means "uses our hand-picked indexes effectively", which reduces down to "provides the same functionality as a REST API" since you need basically a whitelisted list of acceptable operations. At best you can reduce transfer time by limiting columns returned, which is something but not really worth the added complexity.
My experience trying to maintain databases that are directly exposed to multiple development teams tells me that even exposing a fully generic querying API internally is risky.
Which, just for context - that's not me saying "graphQL is bad", it's me saying, "graphQL making it hard to do that is a feature, not a bug."
I've yet to encounter a GraphQL off-the-shelf server (from Python and JS spaces) where hitting a slow query didn't immediately turn into half a day's work
The whole concept is what happens when you let a smart person work on a small problem for far, far too long
I'd recommend checking out the project link in the comment to which you replied. It is designed _specifically_ to avoid the problem you mention: instead of a fully materialized, fully-nested result, it returns flattened row-oriented results (like a SQL database).
This allows for lazy evaluation i.e. rows are produced only as they are consumed. So if you accidentally write a query that would produce a billion rows but only load 20, the execution of the query only happens for 20 rows + any batching or prefetch optimizations in the adapter used to bind the dataset to the query engine.
(1) There are usually some nodes of very high degree and traversing those nodes will explode your query, (2) if you are following N links and the average degree is d, you are going to come across dᴺ nodes and that is a lot of nodes as N gets big!
Tim-Berners Lee told me that if you can't send the whole graph you should send a subset of the graph that contains the most important facts.
It's a right answer but also a frustrating one to a programmer who sees correct implementation of algorithms to mean that you get the ticket done and they don't come at you with a ticket about it again. That is, that query I'm writing is part of an algorithm that depends on getting a certain answer and getting an uncertain answer for one query is like some spoiled milk that ruins the whole batch.
So why are we using it for so many naturally non-graph problems? 90%+ of developers' exposure to graphs is through tightly abstract interfaces, I could name maybe 3 graph-related algorithms off the top of my head, but could implement none of them without reading.
We could represent the text of this comment in a graph using one node for each unique character, but the result would be stupid, the operations would be slow, the representation needlessly complex, and implementations guaranteeably hard to work with
> Tim-Berners Lee told me that if you can't send the whole graph you should send a subset of the graph that contains the most important facts.
Indeed, I also caught the ReST buzz around the 2000-2003 timeframe, and turns out 20 years later nobody does that either, because in its purest form it's a pain in the ass for comparable reasons to the topic at hand
It's funny to see a blog post on HN almost every day where somebody rediscovers the power of columnar query answering engines which are almost the opposite of graph databases.
I've lost count of how many columnar SQL databases have been donated to the apache project and there are so many systems like Actian and Alteryx where data analysts hook together relational operators with boxes and lines.
I had a prototype of a stream processing engine that passed RDF graphs along the lines between the boxes that enable an "object-relational" model, you could eliminate the need for hard-to-maintain joins but I found that firms that had bought multiple columnar processing database companies believed in performance at all cost and couldn't care less for any system that couldn't be implemented with SIMD instructions.
How are they opposite? There are plenty of graph databases out there using columnar storage, even ones directly compatible with GraphQL Federation. Best of both worlds, so to speak.
> So why are we using it for so many naturally non-graph problems? 90%+ of developers' exposure to graphs is through tightly abstract interfaces, I could name maybe 3 graph-related algorithms off the top of my head, but could implement none of them without reading.
It's a reasonable abstraction for structuring related bits of data (like would go in a typical relational database), and that abstraction can align with the developer's mental model easier.
E.g. ORMs basically convert SQL data into an in-memory graph. Likewise, graph database APIs are natively more object-y; you follow the edge from child to parent, instead of making a bit of data the same in both tables and then querying matching rows.
They're not perfect, and shouldn't be used everywhere (nor even many places they currently get used), but I can see the appeal of abusing them.
Because graphs are a good abstraction for relations and with the right tech choices, are much more manageable and malleable than traditional relational databases.
> It's a right answer but also a frustrating one to a programmer who sees correct implementation of algorithms to mean that you get the ticket done and they don't come at you with a ticket about it again.
This rather sounds like a problem about the project manager and the project management methods that he uses.
No. I had a time in my career where I was the guy who finished projects that other people started and couldn't finish.
Some coders really don't have discipline and projects never get done because they don't think things throw and keep sending half-baked patches that get sent back by test or the customer.
The role of management is to get those people working for their competitor and then have the "fixer" move in.
Nah it's what you get for GraphQL only being an API which people inevitably conflate with the database itself (a harmful trend that probably started with SQL databases).
If you want to use GraphQL you should look for a database supporting it as an interface, or failing that look for an ORM system that supports GraphQL and whatever backend you want.
Trying to convert SQL to GraphQL or GraphQL to SQL is both equally difficult and has little to do with it not having an algebra (also I think most of it is just algebraic types, possibly lacking a proper sum type).
God forbid you should try to modify anything with GraphQL though, that part makes no sense whatsoever.
> GraphQL is a great experience when you consume it and the service fulfills your query needs.
Unless you already know SQL, and you realise how small and simple the queries could be, then it's really not a great experience to be forced to use graphql.
I've been in web dev for 20 years but mostly in the front end space.
A couple of years ago I started doing full stack and trying different databases. For the past year or so I've been using Postgres and learning SQL. This is by far the best solution I've used so far. SQL is extremely expressive, powerful, and elegant.
The problem is that SQL has a strong learning curve which many devs want to avoid. I'm convinced this is the main reason stuff like Mongo or Prisma are so popular. I actually tried Prisma before raw SQL and I vastly prefer SQL for writing queries.
I deeply regret not having spent some time learning SQL years ago.
This might be just over-familiarity on my part, but does SQL really have a strong learning curve, or is it just not used often enough directly these days that people can get by without knowing it?
Standard SQL is a really simple grammar and a very small keyword set - there's basically selecting, updating, deleting, filtering with where, aggregate queries, grouping and joins, and that's like 95% of it. Sub-queries maybe too.
> This might be just over-familiarity on my part, but does SQL really have a strong learning curve, or is it just not used often enough directly these days that people can get by without knowing it?
I think the problem is that it's declarative instead of imperative, which is really kind of a shock if you are not used to it (you can't go step by step, there's no debugger, there are no branches etc), and also that you have to think in sets in terms of your solution, which is also awkward when you're not used to it.
I think it's definitely worth it though, as nothing we have beats the relational model for CRUD, and there are so many great learning tools online, for example: https://sqlbolt.com/
SQL has a large learning curve, you can keep learning new thing on it for ears. But not a particularly steep one, you can start using it with very little knowledge, and anything extra your learn immediately improves your situation.
In my experience mentoring entry-level/junior devs, mongo's API anecdotally seems to have a much steeper learning curve over SQL. Once you get past the fundamental CRUD idioms, there are a multitude of implementation details that, if treated as opaque by devs, can introduce significant footguns in even moderate throughput load services.
Some of these details go all the way down to the WiredTiger storage engine, but others are more vanilla (e.g. indexing strategies, atomicity guarantees, causal consistency, etc).
I personally abandoned SQL about a decade ago, but I can appreciate how clean the interface semantics are for even non-technical folks. There are certainly platform-specific implementation details that can matter, especially when you get into the world of partitioning. But largely for most service loads, you're writing queries that satisfy the known index constraints that you imposed on yourself rather than constraints resultant from implementation details.
(I totally realize that even with SQL, that last statement completely changes at a certain scale threshold.)
So you're in the camp that nosql data stores like dynamo/mongo is a good replacement for most SQL workflows? Can you expand on this a bit if you have the time?
I don't actually view decisions like this as binary or mutually exclusive at all. I'm a big proponent of polyglot persistence [1], use the data stores you know and double down on what you know well. I use mongo primarily in my current work, but also have redis, elasticsearch, graphite, and etcd as sidecars in the same ecosystem.
I didn't jettison SQL because there was some fundamental limitation from a storage or scalability perspective. It was clear SQL wasn't going anywhere and would be a good infrastructure bet moving forward.
But initially what drew me to mongo was the clean interop between it and JS (Node.js is my runtime of choice). The shell is written in JS, you query (filter) using objects, you insert and update with objects. This seems like a small thing but this sort of developer experience over time is impactful. Everything feels very native to JS and it does so without any heavy abstractions like ORM/ODM.
After having used it now for 10+ years though, there's much more that I admire about it. Both from a pure architecture lens, but also from API perspective as well as it continues to get better.
To cherry pick an example— The new time series collection feature is a good example of that. For years, folks were using the bucket pattern [2] as a means to optimize query strategies for time series data by lowering timestamp cardinality. Now in v5.0, they give you a native way to specify the same kind of collection but they handle all the storage implementation details related to bucketing for you. Your API to interface with that collection remains mostly the same as any other collection. This sort of community-driven roadmap inertia is attractive to me as an engineer.
(Somewhat of a stream of consciousness here, but hopefully gives you some context as to why I made the switch so long ago)
Excellent post. I'm very much in the SQL camp myself, but the clean mapping between MongoDB and Javascript data models is outstanding. If you have a front end that just needs persistence as opposed to complex queries MongoDB is the obvious answer.
Its worth noting, for clarity/posterity, that my prior post is actually discussing mongo in a (mostly) backend distributed systems environment. I find its just as impactful there, not just in more conventional CRUD/browser/full-stack apps.
I'll bite, I have use cases alone these lines. I don't use SQL in new projects.
When iterating on new systems, especially one with live users. I can keep users at different document schemas. If I'm careful I can make it so that all document schema changes don't break old ones yet also allow for new functionality not requiring mass migrations of documents.
CouchDB allows the db to just be exposed to the world directly. Projects where it's reasonable (user owned/controlled data particularly), I can stand up the system with 97% front end code 3% backend. Having the near entirety of your application stack in one place means you can use smaller more specialized teams and your overall areas of concerns are smaller without needing to draw up a formal spec for your data transport.
The whole GraphQL vs REST debate is meaningless when you don't even have to think about your transport stack between the server and the browser. There are other perks to this model such as providing a fully functional website/webapp even while offline. It's trivial to switch between a couchdb backend and local pouchdb copy of your db. Potentially lower bandwidth use while just transferring updated docs instead of consistent queries or asset fetching where the same data moves across the wire over multiple uses (not a win for single visits to a single page style sites). Keeping multiple clients on the same document set in sync without socket.io work.
The big benefit of ORMs is in the query builders they provide: basically, syntax checking for SQL inside your language of choice, and nicer composition of SQL query parts (to make your code more DRY). Actual mapping to objects is always too heavy in my experience.
However, a slightly unrelated comment on your choice of API design: this approach always introduces an asymmetry in the model that restricts what you can do. If you start allowing post imports that auto-detect authors, you now need a Post.create(content) and Post.setUser(user) too. And then your API users start wondering what's the idiomatic way to create a new post.
The problem is that you are making an early assumption that all posts will belong to a user, yet representing that in an SQL database with database relations, one being independent of the other (User: id, name, email...), and another referencing the first (Post: id, date, user -> User, content...). Your database model allows easy transition to allowing nulls for `user`, yet your API doesn't.
Moving to a more functional API makes this much more natural and less restrictive. Shallow DAOs for User and Post and a function create_post(content, user) may look just like a namespacing difference, but they match your database design more closely. If you want to allow nulls for user in the database, you just do the same in the create_post function.
You can wrap related functions into modules (or classes) — in the domain driven design, most of these would be port/adapter functions, but if your DAO classes are sufficiently shallow, they could be service or domain functions too, etc — they are still ultimately functions (no shared state or side effects).
Just a string in my case, JDBC prepared statement or the equivalent. But if I could really choose freely, I would put all queries as functions/procedures inside the DB to achieve real decoupling from the schema, get consistency with transactions etc, but if I mention that idea, the pitchforks come out and I get chased off the property by the backend developers who become pretty much obsolete in that architecture.
That, or just a string in your application’s code.
The problem with using the ORM as you describe is that when you hit any sort of scale, you need to be doing bulk operations, otherwise your latency goes through the roof, to the point that the number of inefficient queries you are doing can tank the database. I speak from the experience of having seen a database collapse under the load of a backend written in this fashion having request load grow past a certain point — not pretty! The interim solution is to bulkify existing queries and functions in place to the greatest extent possible, while preparing for:
Converting a codebase from having endpoints doing individual ORM operations as described to having proper separation of concerns with a business logic layer between the endpoints and the database is a _massive_ cost. The earlier you implement that, the happier you will be in the longer term. It doesn’t have to be with raw SQL, but many bulk operations are much easier to express with SQL than with the ORM.
Doesn't an escape hatch on the ORM provide that though? I seem to remember in both sqlalchemy and (libraries that use) knex being able to dip down into SQL when needed.
Hell yes. Good knowledge of SQL is a superpower and is becoming a rare art form.
The new generation of devs thinks that frameworks and ORMs will do the magic for them at no cost, but they don't. There is no substitute for leveraging your storage engine to the max.
The sad part is that databases have evolved and became much better in the last 20 years (I started with MySQL 3.x), but we just don't use them. Everyone acts like "microservices" solved all of our technical challenges. Right.
In my opinion, the value of a well architected micro service is to figure out how to optimize and leverage the capabilities of the underlying storage engine, while presenting a simple performant and correct API to consumers, while not requiring those consumers to understand the underlying details of the datastore.
I am not talking about the customers. Of course they are not supposed to understand it. I am talking about the system design. And microservices do not solve problems in most companies, just create new ones. Distributed systems did not magically become simpler to reason about just because there is Docker.
GraphQL is just an API language. It doesn't free you from writing database queries.
The point of GraphQL is mostly separation of backend/frontend and avoiding over-fetching/under-fetching. If those don't sound like a benefit, you should use REST.
I would agree, but would add one note at the end: "Yes you can do this with REST by including query params to reduce/tune what gets returned, but that can quickly balloon into a monster when you get beyond pagination, ?expanded=1 for full objects (vs partials/abbreviated), etc."
The same is true of GraphQL, if you want to control how much of nested objects you get (eg. introduce limits on the number of nested objects).
Basically, with GraphQL you hide all the complexity behind generic-seeming API requiring one API call, whereas with REST you'd usually hit multiple endpoints for different data types.
GraphQL has the benefit of allowing the backend to smartly decide how to restrict the data (eg. do the joins between what would have been two bigger REST queries), but that incurs a development cost. The complexity is in marrying all the familiar optimization tricks for SQL databases with exposing that in a generic but still restricted way.
But wait, doesn’t that directly contradict the first commenter’s next paragraph?
> On the other hand, when you are the one to implement the Graphql server, it feels like writing your own database. You have to create a query plan, handle optimizations, corner cases, etc.
If it’s so easy to craft any GraphQL query as an SQL query and let the RDBMS plan an execute the query, then shouldn’t it be easy to implement the GraphQL server on the backend?
I think your point is a fair one. The distinction is that it's easy to write a contextual SQL query for any one GraphQL query when your database model closely matches your API objects. "Contextual" means that sometimes this requires a "side effect" to happen (eg. creating an index on a column in the SQL DB).
Making it generic and performant at the same time is where the complexity is.
It would be akin to saying how, since knowing that you might need an index in the SQL database is simple, a RDBMS could decide to create those indexes for you.
You're conflating a hand coded and optimized query vs. Building a system to take a tree and generate said optimized query automatically and quickly and correctly.
I don't think I'm conflating it. jseban's comment indicates that anyone who knows how to write simple SQL queries would get no benefit from using GraphQL to consume data, which must mean that there is a simple SQL query that can be written to fulfill any GraphQL query.
> there is a simple SQL query that can be written to fulfill any GraphQL query
This is true (as long as you expect both queries to be simple, or allow both to be complex).
But the conclusion you get up there is wrong and (obviously) does not follow from that. Creating a software that translate any one query into the other is a very difficult task.
Not all of the data sources for graphql may be in a single database. They may not necessarily even be stored in something that can be accessed with SQL.
> it feels like writing your own database. You have to create a query plan, handle optimizations, corner cases, etc.
The culprit is "micro" services. The whole thing was invented by a "software consulting" firm to milk as much billable hours as possible to make a system over-engineered and costy to support but easy to split into multi-layer/multi-stage outsourcing teams/phases, the industry fail into this stupid trap, and the burden was shifted into web & mobile clients, the next thing they realize is sometimes they have to make queries inside while loops.
If your data can be "planed" or "optimized" via a single centralized "GraphQL gateway", then it probably can be centralized inside a single database transaction call with so-out-of-date-you-should-never-use JOINs.
I recently had to render a user feed page, query uid for fids then fid with cmt-ids then each cmt-id for uids for avatar/nick and such, all from a stupid user profile lookup "micro" service, provided by another department, which only accept one param per query (spoiler alert: it's an "anti-pattern", but a sweet "optimization goals" for your next "sprint milestone"), I had to carefully and cleverly combine all those data needed make them as parallel lookups in async with a very good re-usable batch loader class. Which makes me wonder, if all those data sits right inside the same db, why bother scatter them into so many service pipes, then gather them in an PITA fashion?
As a developer I am not against GraphQL or Microservices because it pays, and it's a good pile of tech jargon to confuse the non-tech people and it really sticks, but from a pure technical point of view it's a waste of cpu0 power and emits needless CO2.
Although the microservices terminology might have been invented by a software consulting firm, distributed architecture already existed and solved problems for many large companies that needed to scale their products (and development processes) beyond what a small team hitting a single database could achieve.
However, I think that's the key point to keep in mind when considering whether GraphQL is a good fit - if you don't already have multiple domain-specific services in your infrastructure, then adding a GraphQL gateway service doesn't make a huge amount of sense to me, because you could've just had your small team of front end developers talk to your small team of back end developers to create exactly the optimized endpoints they needed to solve the problem.
To me GraphQL really seems like a solution for an organizational problem, where there are dozens of teams who all maintain their own services and apps, and now a variety of front end teams want to combine different sets of data from services maintained by different sets of back end teams in a way that doesn't have alignment across the company as as far as deployment/release schedules go... Well now it makes sense to construct a flexible API schema maintained by and for front end specialists - it's just moving their already-existing data processing/join logic out of their various clients into a common server-side component.
> because you could've just had your small team of front end developers talk to your small team of back end developers to create exactly the optimized endpoints they needed to solve the problem.
I think the OP (and many of the comments in this discussion) is about making a graphql endpoint for public consumption.
if you are doing it solely for internal use, it does make sense that the "break even" point would be different.
> because you could've just had your small team of front end developers talk to your small team of back end developers to create exactly the optimized endpoints they needed to solve the problem
What you are describing is called BFF I guess.
And apparently it's already out of date so let's again split BFFs into smaller parts.
> The whole thing was invented by a "software consulting" firm
I don't know the whole origin story. It definitely does feel like something Martin Fowler would come up with. But I blame Google for really making it a trend:
And you can see they understand the whole problem with microservices. It's the same thing The Mythical Man-Month was trying to tell everyone decades ago[1]
> When n people have to communicate among themselves, as n increases, their output decreases
Microservices exasperates this. It is the Multics model. Each microservice implements its own, often wildly different, API. Which every bit of code that needs to use that microservice has to go and implement.
The point of microservices is only partly to support individual teams owning a service. AFAIK the main points are isolating failures, independent deployments and horizontal scaling of individual components.
I do agree that without a good API design it can become a mess quickly and most companies go for microservices without a clear understanding of what goals they are trying to achieve with microservices. For those companies, sticking with a monolith would've probably worked better.
I've even heard cases where companies went back to a monolith and I think that is actually a smart decision in some cases.
But I definitely don't think it is a waste of CPUs power.
I spend about 3/4 of a full time job building, maintaining and improving a corporate GraphQL API, and have for the last few years. What you are describing is not my experience. In fact it is far easier than it was in the old days when each new requirement meant code changes to a REST API.
Certainly there have been problems with queries that had unacceptable performance, even those that took down the whole API server and database. Of course that wasn't a novelty with our REST APIs either. It certainly is an issue with GraphQL, but it has been a managable one for us.
Largely this is because the API is not public, and it doesn't have to simply handle anything that is thrown at it. When we see a frequent, slow GraphQL query in a report, we have many options to deal with it, including going to the front-end team and asking them to query in another way, and I have had to resort to that. Often I can optimize the code instead.
But that hasn't been a huge problem, especially compared to the great benefits of pushing most of the data work to the front-end. And the size and complexity limitations we've built into the API handle such problems seemlessly most of the time. The caller gets a clear error message that specifies the problem, and they can usually compensate very quickly with an altered query.
When they can't then I get involved and sometimes have to say, uh, we can't do that ... without scads of extra work. And the work on my plate today is for one of those scads.
I was doing lots of REST API work before GraphQL APIs, and my own and our corporate experience is that GraphQL solves a lot more problems than it causes.
The problem with GraphQL is on the front end. Suddenly, the FE team becomes responsible for understanding the entire data model, which resources can be joined together, by what keys, and what is actually performant vs what isn't.
Instead of doing a simple GET /a/b/1/c and presenting that data structure, they now need to define a query, think about what resources to pull into that query etc. If the query ends up being slow, they have to understand why and work on more complex changes than simply asking the BE team for a smaller response with a few query params.
I hit this problem when contemplating exposing the API of the application I work on to customers, to be used in their automation scripts.
We quickly realized that expecting them to learn our data model and how to use it efficiently would be much more complicated than exposing every plausible use-case explicitly. We could do this on the "API front-end" by building a set of high-level utilities that would embed the GraphQL queries, but that would essentially double much of the work being done in the front-end (and more than double if some customers want to use Python scripting while others want JS and others want TCL or Perl).
So, we decided that the best place to expose those high-level abstractions is exactly the REST API, where it is maximally re-usable.
I think what you are basically saying is the people working on the front-end are a bunch of children that cannot be trusted to do the right thing.
I’ve seen this a lot from backend teams, and it’s beyond frustrating.
Because now my nice clean frontend code suddenly has to deal with a bunch of franken query logic simply because the backend team cannot be bothered to alter their “pristine” API.
Never mind that this means a thousand requests where one graphql query would have sufficed.
Can confirm. Getting developers (Nevermind QA's) to build data model savvy appears to be one of those things some have taken for granted right up until you realize other people really did mean it when they said you were nuts.
I've never seen it as nuts and a bare pre-req ofodern computing. Apparently this view is the subject of widespread controversy amongst peers.
Hmm.. that hasn't been my experience at all. I wrote the public GraphQL API for my company, and it was a pretty straight-forward experience. Yes, I had to spend some time on the basic plumbing, but now if something needs to be added, it's just a matter of defining some interface for it and fetching when required. Grabbing an object from the network or DB doesn't need optimizations, a query plan, or have corner cases. Even grabbing i objects starting at offset j only adds a bit more busy work.
Maybe the trick is to keep it simple? There's no need for a bi-directional graph or advanced filtering. But if there really is, it's not like sticking to REST would make that any easier. Some things are just hard, no matter the interface.
GraphQL itself is not a trap, but its easy to fall into the "object graph modelling" trap with it. You probably shouldn't do that unless you have a lot of resources to spend on it. I think "Graph" in the name is what leads people astray, as long as you stick to TreeQL one should be fine.
You are right. Some things are just hard. I went deep into Graphql because I wanted to explore the possibility of it being an more comprehensible interface for the end user in comparison to a REST interface. In such cases, it is not.
Graphql gives a better way to request nested schemas and handle relationships and recursion. But when you cross that line, the client now would get ideas and starts asking "Why not be able to do that operation on the 5 level deep object?". Now you have to either not allow the client to do that, or you have to "rewrite the database" to make recursion optimal.
This is not a problem of Graphql. This is an HTTP problem. When you need to promote the database querying layer over HTTP, then you have a problem regardless.
Try a nested pagination (i.e. open the 352nd page of the 7th book on the 3rd shelf in the 5th room of the 3rd city library. Make it performant. Have fun with GraphQL! /s
That sounds trivial I think because you are looking for exactly one item and there's no pagination involved.
The problem might be to get those 352nd pages of every book with the title starting with "A" sitting on 3rd shelf of every city library: when there are unbounded results nested deeper than top-level, and possibly those multiple times, that's when it gets hairy.
Actually, it's about opening 10 different books at different pages and similarly upwards... There is no GraphQL mechanism for it outside some hack in individual libraries that does more server roundtrips, and it seems like GraphQL authors simply avoid discussing it.
For example, you can make a separate REST method returning all levels you need. However, it still doesn't account for items lost while paginating (e.g. some visible items are deleted while one scrolls on a device etc.).
The flip side of this is a lot of folks are adopting GraphQL who are not prepared to do it well, so they make something half baked, missing things you need, and their documentation is absolutely useless.
This isn't new, there's plenty of sloppy REST APIs, but it was so much easier and less painful to explore and stitch together pieces of an imperfect REST API than it is to interact with a bad GraphQL API.
I’ve found it easiest to implement in Node due to the explicit event loop structure. You use data loaders, which is a super generic term that means “batch all requests for this resource into the next event loop tick.”
So when a query requests a list of users, and then every users friends, that becomes two queries: one to load all the users, and one to load all the friends for all those users. The net effect is that your number of queries is O(query depth) rather than O(objects requested).
Admittedly this does tend to work best with more K-V oriented data that truly relational data, and might be hard to retrofit onto a brownfield project, but i’ve never found it all that hard to do.
I've found that this is a simple and effective way to handle relational data either in REST or Graphql. But imagine having to traverse trees, filter on data of different types and levels. Sort and filter on edges. I mean it can get pretty complex and I am not saying that REST would be easier on those complex cases.
In my opinion graphql and rest can both be super cute in simple everyday queries. But I am thinking that people are creating databases like postgres, mongodb, neo4j, etc, are doing exactly that. Trying to give us the power to query our data efficiently. Why not be able to expose directly the database and just add a layer for security, control, decorating and other stuff that would add value. Why rewrite databases?
There are products that do that! Hasura comes to mind.
But APIs can have different use cases. It’s usually considered bad to directly expose your table schema over GraphQL because it locks you in and makes it hard to change your data model over time. And not all API access is “get this data” and “set this data” — it can be difficult to express complex logic in just a database. And of course, some GraphQL APIs aren’t backed by a database - they’re backed by other services (a la the “backend for frontend” pattern).
I’m very pro choosing the simplest solution that works — but sometimes, the simplest solution does bring some complexity in exchange for other trade offs (like flexibility).
I agree with you. I am currently working on a project that need to give very sophisticated querying capabilities, so I'm kind of seeing everything from that prism.
I worked in the data federation space for a number of years (it's actually quite an old term, I worked in it back in 2013, around the time there was an early wave of activity around this and the concept of a "data fabric").
When I saw GraphQL come out, I knew that what you are saying would happen.
In the data federation tool I worked in, SQL was the interface abstraction to join across heterogenous platforms (think of things like Presto/Trino or Dremio). GraphQL as an interface requires the same underlying infrastructure as that data federation tool I worked on in terms of query analysis, parsing, planning, optimization, execution, etc.
Those are "hard problems" due to lack of standardized interfaces, access patterns, direct data access, I/O, network bandwidth and infra related latency, costs, compatibility, data types, etc. These problems are distributed system problems coupled with often incompatible interface layers (e.g., even if you are using multiple SQL databases with GraphQL, you run into the same).
If your scale is such that you can build GraphQL on a handful of systems and for a handful of use cases, great! If you have to go to a certain larger scale, you're back into federation territory (which in the app layer might also be called API composition).
One potential option - when you reach the point where you need complex GraphQL query coordination, more than seems to make sense to implement, pair it with a data federation tool such as Presto/Trino, Dremio, Denodo, or research approaches such as caching/materialized views (engines like that are becoming decoupled from databases, such as Materialize.io) - and let those engines do the hard work.
In that case your work becomes more like GraphQL -> SQL or API -> a data federation, caching or materialization platform. CQRS and event sourcing plays a role here too.
Consider also, the possibility that if you are willing to accept a bit of delay in aggregated results from multiple systems, doing those compositions or aggregations in the data platform layer, and simple feeding those to the GraphQL interface. That could even be done in a single database/data platform if you really wanted without too much fancy federation tech.
Federation is powerful but complex. It seems like a fun hard problem, but for many tech teams, it can be a complexity and time suck. My recommendation would be try to avoid building that if you can.
> GraphQL is a great experience when you consume it and the service fulfills your query needs. Because you just ask stuff and you get them. It's really cool.
How about caching? It feels like GraphQL tries to win some (arguable) flexibility in putting together clients and in the process throws out most of the operational advantages of resource-based APIs with significant disadvantages in both how to put together a backend.
One solution is "Persistent Queries". Other solution is to throw a Varnish Cache and cache the hell out of POST requests :P
Is it elegant? No.
Do you have another service to manage? Yes
Will you pay someone else to do it for you? Possibly
I think there are already products trying to do that. But how many layers of abstractions and dependencies are you willing to have in your everyday processes?
It does a lot of client-side caching for you. The documentation is atrocious though IMO. I'm not sure if there exists a similar framework for backend caching.
Relay is a GraphQL client. That's the irrelevant side of caching, because that can be trivially implemented by an intern, specially given GraphQL's official copout of caching based on primary keys [1], and doesn't have any meaningful impact on the client's resources.
The relevant side of caching is server-side caching: the bits of your system that allow it to fulfill results while skipping the expensive bits, like having to hit the database. This is what really matters both in terms of operational costs and performance, and this is what GraphQL fails to deliver.
Take a look at this. Either you didn't know what's challenging about caching nested graph data, or we have different definitions of triviality/interns.
I repeat: client-side caching is not a problem, even with GraphQL.
The technical problems regarding GraphQL's blockers to caching lies in server-side caching.
For server-side caching, the only answer that GraphQL offers is to use primary keys, hand-wave a lot, and hope that your GraphQL implementation did some sort of optimization to handle that corner case by caching results.
You can execute GraphQL queries via GET and set a cache up for it like REST. Technically it's also allowed to cache POST requests but I guess anyone who comes across that is going to raise their eyebrows.
> You can execute GraphQL queries via GET and set a cache up for it like REST.
Does it, though? It seems it really doesn't, nor was GraphQL designed with HTTP caching in mind.
The only references to caching in GraphQL are vague hand-waiving arguments about how theoretically GraphQL implementations might be implemented with some sort of support for caching primary keys.
But any type of HTTP caching is automatically excluded from GrahQL.
To put it differently, is there any third-party caching solution for GraphQL? As far as I could gather, the answer is no.
It looks like you are fixated on caching in GraphQL, but that's unnecessary, you can just cache GraphQL like REST, because in the end they are just GET requests. Just cache the GET request.
> It looks like you are fixated on caching in GraphQL, but that's unnecessary
Oh so one of the most basic feature of any API, one which has a direct impact on scalability and motivates entire product lines and businesses like CDNs, is now "unnecessary"?
Here's how we do it: https://wundergraph.com/docs/overview/features/caching
We "persist" all operations using their name (unique) as part of the path. Variables are passed via query parameters. We also add a unique hash as query param which is generated by the configuration. This way, each "iteration" of the application invalidates the cache automatically. Additionally, we generate ETags for each request to reduce response payloads to zero when nothing changed. (https://wundergraph.com/docs/overview/features/automatic_con...) Combined with the generated type-safe client, this is a very solid setup.
It's just a GET request. Same as REST. I don't know what you want me to show as an example. You can use Apache, nginx, squid, any proxying webserver worth its salt...
Are you actually able to show an example or not? Because changing the HTTP verb doesn't magically change the problem, and passing query parameters as a request document renders these queries uncacheable.
> You can use Apache, nginx, squid, any proxying webserver worth its salt...
Great, pick the one you're familiar with, and just show the code. Well, unless you're not "worth its salt" or are completely oblivious to the problem domain.
Note that I made a very primitive implementation. Depending on which GraphQL node is queried, the request will be cached by the proxy or not. Apollo GraphQL server has much more fine grained methods of allowing caching (see https://www.apollographql.com/docs/apollo-server/performance...) however I left the example code crude so you see exactly what's going on under the hood.
Sorry, that doesn't cut it at all. Far from it. Being able to cache a response is not the same thing as optimizing queries that hit your database. Being able to cache a response means not having to hit your database to begin with, and save up on things like traffic going into your database, and round trip time.
With REST APIs I can get a web service to return responses that are HTTP cacheable, put a nginx instance between the ingress controller and the web service, and serve responses to REST clients without even touching the web service that provides the REST API. I can even deploy nginx instances in separate regions.
I’ve never used those tools, but I don’t see how you can automate away authorization issues. The GraphQL spec[1] says authorization in the GraphQL layer is fine for prototyping or toy programs, but for a production system it needs to be in the business logic layer.
In Hasura, you authenticate externally -- can be custom API endpoint that signs a JWT/auth webhook, or an auth provider like Auth0, Okta, Firebase, Keycloak, etc. Doesn't matter, just have to return some claims values.
You can then use these claims values in your authorization (permissions) layer.
IE, when a user logs in, you can sign a claim with "X-Hasura-User-ID" = 1, and "X-Hasura-Org-ID" = 5, and then put rules on tables like:
> "Role USER can SELECT rows in table 'user' WHEN X-Hasura-User-ID = user.id"
> "Role USER can SELECT rows in table 'organization' WHEN X-Hasura-Org-Id = organization.id"
There's more depth to it than this, but this is the gist of it.
I say it over and over again, row/column level permissions are not even close to enough for any larger app. How do you translate access restriction like "user X cannot view more than 10 articles per month" or "customer Y cannot insert any more orders if total outstanding/unpaid amount of invoices in last 30 days succeeds Z" into row/columnlevel permissions? You don't, that's why you have a business layer.
> How do you translate access restriction like "user X cannot view more than 10 articles per month"
That's not permissions in the general sense of authorisation, that should just be modelled as any other "business logic". Put that article_limit in the users table and join it in the selects.
Edit: or tracking the users article views with timestamps, and making a select aggregate over the month..
> "customer Y cannot insert any more orders if total outstanding/unpaid amount of invoices in last 30 days succeeds Z"
insert into orders
select from user a
inner join invoices b
etc.. any kind of limitless complexity here on how you want to limit the orders.
But if you design a public GraphQL API, you cannot trust the query issuer (Browser or App on the Client). You have to enforce those rules outside of the query. And yes, there is tons of ways to do this outside of GraphQL, which is exactly my point, that row/column permissions alone do not suffice.
Ok I get you now, yes you are correct, if you have a public API and you have business constraints that need to be enforced, you have to do this server side.
But you can still implement this easily in queries, so for example granting the client only read permissions, plus execution permissions on certain database functions, and inside these functions you can implement the constraints.
What I'm trying to say, is that you don't necessarily need any backend "services" java etc, to implement these type of constraints, they can be modelled as any other business logic in the database.
agreed. i've worked extensively with graphile, and from working with people who were really strong in postgres i learned all the things we needed could be modeled in postgres. many of the less obvious solutions would involve functions, triggers, or stored procedures. but i liked that there was less ambiguity about where that kind of logic was implemented.
Observability, flexibility (I don't need to push a migration to change auth), SSO integration, and the ability to keep a clean separation between user and "machine" (service, replication, etc) accounts.
What does observability mean? Can you translate this into a concrete question?
As for flexibility, do you mean authentication or authorization?
SSO can be done at the SQL server level (MSSQL has it and so does Postgres, don't know about others), but handling the SSO part in your app and using "set role" and passing a user ID for your row level security policies to use is easier to set up and more flexible.
Clean separation between user and machine accounts can absolutely be done with MSSQL and Postgres.
Yeah this must be one of the most underused features ever. People don't realise that you can solve little bobby tables by just setting the permissions correctly in the database.
You cannot model every business constraint in DB permissions; Stuff like "If customer X has less than 3 active contracts, new contract activations require sign-off of Manager of at least level Y" etc.
And how do you unit test triggers? Yes you can do it in a ton of ways, but you just end up scattering your business layer all over the database, the GraphQL adapter, API gateways etc. The alternative is just to create a dedicated BE service endpoint (in whatever you prefer, REST,HTTP/JSON,SOAP, gRPC etc.), which does the required checks for you.
Triggers, functions or whatever you use are just code; yes, that is my pitch: Have your buisness logic in code, ideally in a dedicated BE API endpoint instead in the DB.
I for one think that the code path that leads to an insert conflict should be integration/unit tested. Something needs to codify how the error is reported to the client.
But unit testing is not nasty at all: just look up pgTAP.
What's nasty is that the incumbent development model intersperses data modelling constraints in the database and in the non-db code, and then we test one layer implicitly from another layer.
Unit testing is nasty? You can use whatever tool you like. I've used phpunit, junit and jasmine for unit testing databases. Choose whichever tool you like most.
It can get hairy due to how most apps are developed.
You'd usually have backend code guarding against constraint-violating entries, and then you'd have a database constraint too.
So what you need to do now is test almost exactly the same thing from your code test and from your database test to get proper coverage.
Enter declarative ORM schemas, and suddenly there's not even a guarantee that the schema you are running really matches what your ORM thinks the schema is.
For that reason, I prefer all those SQL-based database migration/evolution approaches over ORM-based schema generation, coupled with pure SQL tests (eg. with pgTAP, but yeah, any tool can do).
Basically, even for declarative code, there should be a unit test, a la double-entry bookkeeping in accounting.
And even if this is what I prefer and believe is only right, I never worked on a large company project that did all of these things.
So I don't think the entire topic should be easily dismissed: while unit testing is simple, have you ever worked on a project that tested db-schema embedded logic exactly at the right level (and not the level up)?
That’s the path I ended up taking. The GraphQL resolvers had no idea idea there was a database. They talked to a layer that understood all the business objects and that sat on top of a layer that understood authorization and only that layer had any connection to the data store.
In my mind that's just an insert that joins the contracts table and makes a case active_contracts < 3 then true else false, for the require_sign_off column.
You cannot trust the query issuer (Browser or App on the client). If you have a public GraphQL API, you need to enforce these rules. If you can just alter the query to bypass the business rule, this is called a security hole.
Using postgraphile for my current big project is the best technical choice I've ever made. There's been the occasional obscure sql incantation to learn but otherwise has been so much more productive than hand-crafting REST endpoints.
Every time I start with Graphql in surprised that I'm writing all the routes and middleware I'd need with with a restful api in express. I feel like I'm missing the point.
To what extent does this headache go away if autogenerating graphql from a relational db, using tools like Postgraphile or Hasura? I never considered making my "own" graphql service but those tools sure make it look easy to create a nice API controller through db migrations.
Do you worry about over-coupling when using Prisma? I'm hesitant to let front-end control the schema in any scenario where they're not the only users of that DB. Works great until it doesn't and can be a pain to migrate control to a backend/API team.
Our Prisma schema resides in our "backend" (Prisma essentially governs our master API). So I'm not sure why you're concerned that the front-end might control the schema.
The nicest thing about Prisma is that it is a declarative single-source-of-truth for our data models. Everything resides in one schema, all changes to database models and all migrations run through Prisma, and, best of all, strong types are inherently built in.
The team is also building useful middleware like field-level encryption; all of this together makes Prisma a very complete package.
Of course, there is a price for this convenience — we sacrifice some higher-level DB-side features. But Prisma is such a competent tool that we don't miss them much.
But that's true for any solution. This goes back to "avoid db server specific SQL", you gain the portability advantage but you're willingly giving up advanced features the db server has. How far do you want to take this to be "independent"?
I'm not concerned with independence from the database _implementation_ but independence of the database schema from any one consumer. This is one of the more interesting things about tools like Hasura/Postgres/Postgraphile in my eyes, they encourage you to separate frontends from the backend early on. That might be one team to start, but you can divide labor and add more services without rearchitecting like you would if the database was controlled by ORM from a single front end.
About 90% of our GraphQL API passes through Prisma. We have a master API that talks to many different microservices to process data and so-on, but all the data ultimately ends up residing in our Postgres DB. One of the nice things about Prisma is that it gives you a very declarative way to manage your data, and encourages using your DB as your "single source of truth".
Querying everything through one API (which relays requests to other microservices, if necessary) and having one Postgres DB which acts as the "endpoint" for all of our data is a very clean model.
For edge cases, it's also possible to write custom resolvers. Prisma doesn't prohibit that.
Philosophically, is it really a "microservice" if it doesn't have its own database? In my opinion, if multiple services are ultimately all connecting to and storing their data in the same database, then you haven't really gained very much, since one misbehaving client can still take down everyone else's service. The point of microservices was always sold to me as "every team owns their own stack", and specifically if one team's stack goes down, everyone else can cheerfully continue. (Or less cheerfully, if the team whose stack went down was identity or user service.)
I think this is solved by creating "full stack" teams where the front end developers who want the GraphQL API are also the same team who define the schema and build the service that serves that API. In large companies where GraphQL makes sense, that GraphQL API service would just call into pre-existing services that serve JSON, Protobuf etc maintained by 100% back end teams.
Forming full stack teams doesn't remove the pain of having to build the producer api in the first place. It merely shifts the burden from a backend only team to a full stack team.
> “On the other hand, when you are the one to implement the Graphql server, it feels like writing your own database. You have to create a query plan, handle optimizations, corner cases, etc.”
Is this still true if the structure of the data is relatively simple, but you have tens of millions of users? Say the data that is returned (per user) has 20 or 30 properties in total (for each user), and you are only ever asking for specific data about an individual user.
You have to do your own optimiser to avoid, for instance, the N+1 query problem. (Just Google that, plenty of explanations around.) Many GraphQL frameworks have a “naive” subquery implementation that performs N individual subqueries. You either have to override this for each parent/child pairing, or bolt something on the back to delay all the “SELECT * FROM tbl_subquery WHERE id = ?” operations and convert them into one “… WHERE id IN (…)”. Sounds like a great use of your time.
In the end you might think to yourself “why am I doing this, when my SQL database already has query optimisation?”. And it’s a fair question, you are onto it. Try one of those auto-GraphQL things instead. EdgeDB (https://edgedb.com) does it as we speak, runs atop Postgres. Save yourself the enormous effort if you’re only building a GraphQL API for a single RDBMS, and not as a façade for a cluster of microservices and databases and external requests.
Or just nod to your boss and go back to what being a backend developer has always meant: laboriously building by hand completely ad hoc JSON versions of SQL RDBMS schemas, each terribly unhappy in its own way. In no way does doing it manually but presenting GraphQL deviate from this Sisyphean tradition.
I read in the article that NOT having GraphQL exactly match your DB schema is a best practice. My response is “did a backend developer write this?” Sounds awfully convenient for job security!
My experience using GraphQL is the same as using React. Look great at firs glance and it makes sense. Using it for a while and I realize it designed to be used by the fb team. For example, they are design for a large team to work on a small components separately. Most developers are NOT fb, thank goodness. There are better, fast and light weight alternatives for smaller or other kind of teams.
There's nothing that stops you from exposing only what you would expose in a "restful" API. You can even specify the exact queries that can be used by the client. And even then GraphQL gives some nice advantages, such as introspection and endpoint discovery, as well as smoother error handling and increased type-safety.
Filtering on relationships is a big issue for us. Each nested node in the query graph (tree?) generates a new SQL query. We seem to committed to that approach at this point, to trying and migrate to a world where we inspect the whole thing, then make 1 query, isn't going to happen.
Not trying to specifically shill my own library, but I developed this a while ago before there were any established patterns with filtering on relationships in graphql.
https://github.com/tandg-digital/objection-filter
Out of curiosity, would functionality like this implemented in graphql solve your issues?
i know relationships don’t typically have props in a store like neo4j, and moreover you can reproduce that in something like postgres with a foreign key
we had a challenge like what you describe though, and were able to avoid new queries by representing the relationships as objects. in so doing, we leverage row level security and jwt claims, which is an approach to authorization which has high epistemical legibility.
I think similarly. If you have control over the back end environment, it's not worth the extra effort, additional complexity (e.g. caching challenges) and performance overheads to run a GraphQL server.
I've been using graphql for a project recently and... yeah, I'm not a fan of it. The data is stored relationally and exposed through views, fed through a graph layer, then has to be flattened on the front-end into something that's not far off from the original exposed view. That's a ton of work and really, really messes with front-end experimentation because of all of the work to unpack each graph representation every time.
Something is wrong here. The whole point of GQL is to serve things in exactly the format the front end wants. Even the other negative comments here mention how it is easy to use on the front end.
Perhaps. But just consider that your sibling comments have suggested about 5 different middleware tools that all supposedly do some similar thing. So I may be wrong, but at least four other people are wrong too ;)
This is never the case, every time I use GQL, I always have to reshape the response. GQL only lets you declare the data you want, it does not let you declare the shape in which you want it.
Then that's a problem with the schema implementation and not necessarily a fault of GraphQL. The people implementing a GraphQL schema should be working very closely with the people working on the frontend and put a lot of importance on how they want to consume the data.
GraphQL schemas that basically just expose the data models 1:1 without considering the exact workflows the frontend needs is a terrible implementation and misuse of GraphQL. Might as well just expose the data using REST
Unfortunately this is my experience as well. Generally it's a misalignment of priorities. Since the people writing the endpoint don't have to consume it, they just do whatever is easiest as quickly as possible. And often many are dismissive of frontend concerns when challenged.
Would this solve the problem described? Sounds like the annoying part is solely on the front-end, the unpacking/flattening of what the Postgraphile service returns. From their description, I wouldn't be surprised if they were already using Postgraphile or Hasura as "the graph layer".
Heh, I actually use Hasura and I find it extremely painful to use. It's unbearably fragile to state changes (eg, psql scripts + pg_dump/psql restores) and its errors are inconsistent enough to give you just enough constant false hope that your problem's fixed, but a second step is almost always needed.. and without a helpful error or button that just explains and fixes all the things from a single screen. I realize I'm probably using it wrong, but I really don't think I'm doing anything exceptionally "out there".
> "its errors are inconsistent enough to give you just enough constant false hope that your problem's fixed, but a second step is almost always needed.. and without a helpful error or button that just explains and fixes all the things from a single screen."
There are buttons on the "Settings" screen (/console/settings/metadata-status) you can click that should put your instance back in a working state (and it'll redirect you here by default if your metadata is invalid):
> [DASHBOARD TEXT]: "You have been redirected because your GraphQL Engine metadata is in an inconsistent state. To delete all the inconsistent objects from the metadata, click the "Delete all" button. If you want to manage these objects on your own, please do so and click on the "Reload Metadata" button to check if the inconsistencies have been resolved."
As someone who still builds their personal projects with it -- yeah, the error messages can be kind of opaque if they're related to Hasura's internal metadata/state. For errors that come from external services, those are passed through at least when "HASURA_GRAPHQL_DEV_MODE" is enabled.
> "It's unbearably fragile to state changes (eg, psql scripts + pg_dump/psql restores) ... I realize I'm probably using it wrong, but I really don't think I'm doing anything exceptionally "out there".
Are you dropping tables/columns which have metadata on them? IE, a relationship or permission on a table?
If you have metadata on a resource and then you remove it, without also removing references to it, the effect is the same as if you had tried to drop a table that has foreign keys that reference it in an RDBMS.
Thanks for the response! You're right that that's what I'm doing wrong, though the problem comes from after I recreate those relationships on the RDBMS side. Hasura really struggles to piece together that even though things were tore down, they were brought up in the same way. Having one button to "repair" it would be nice. This mostly happens because I more or less start from scratch on the RDBMS side every time I make a change. I'd do the same on the Hasura side, but tracking relationships (I think that's what it's called) takes about ten minutes to initialize on a relatively small database so I'm forced into making as few changes as possible.
> "but tracking relationships (I think that's what it's called) takes about ten minutes to initialize on a relatively small database"
Oof, this is insane. Should not be the case.
Are you using the Hasura CLI to automatically track any changes made with the web UI to local YAML files? You can use this, along with the ".cli-migrations" variant of the Docker image to automatically apply your metadata/migration as soon as the image starts.
So you'd run "hasura console" in terminal, which would serve the special web UI that mirrors changes to local files, and that'll serve it on http://localhost:9695
Then when you want to start fresh, just docker-compose down/up and it'll handle auto-applying everything for you:
If you use Relay, the graph representation is reasonably unpacked into a state store for you, and you're given the ability to change both the state store and the backend data in one fell swoop.
The article glosses over my favorite feature: it is typed.
Being able to automatically type the network responses on the frontend is huge. Never before had I such confidence in the FE code I am writing. My entire FE project is typed end to end without any manual assumption and things work most of the time on the first successful compilation.
In my experience, the open source tooling around GraphQL is much better than the tools that generate types from other SDLs like protobufs.
This makes it really easy for servers and clients to behave in a typesafe manner.
I've seen a backend service / front-end SPA use REST, grpc and then GraphQL to communicate. Type safety has been the easiest to understand and scale across the eng team using the GraphQL ecosystem.
OpenAPI is suppose to help with this as well, though I haven't seen alot of good examples in the wild (or where I have worked) where OpenAPI was implemented well or from the start.
100% this. I’ve used GraphQL on two projects now, where we manage the front and back end, and it’s been an amazing experience.
A strong type system, combined with a GraphQL code generator and typescript, has been an amazing experience as both front and back end developer. Not only do I have types for the backend (for the Things like input args as well as models), I have types for the front end, as well as easy to use hooks to utilise them in React. During development, we constantly fetch the latest schema and check if anything has changed, If so the build breaks then and there, highflying which queries are no longer valid.
The development experience has been amazing, and much more productive than and REST based workflow I’ve used, purely because of the type system.
An alternative approach is to use something like Thrift to type your responses —- properly implemented, an RPC framework like that is amazing. So much validation boilerplate removed on the backend, and receiving it on the frontend as a nice type is fabulous as well.
GraphQL is just nested RPC with a bad name. It's not a query language in the sense that SQL is: it doesn't natively provide filtering, pagination, joining.
It can be implemented in literally the same way as REST or gRPC, except where you don't exchange fields the client is uninterested in, and can incorporate "and then" requests as a nested field rather than a round trip.
You just have to bear in mind, removing that round trip makes it feasible for a client to send very complex requests that can trigger a large amount of processing on the server. Either you optimise that (e.g. with data loader, eager loading, caching or some other mechanism) or you rate limit like you would with a REST/GRPC request.
The biggest weak spot with GraphQL, in my opinion, is pagination. Pagination is awkward in any kind of nested API unless you use some sort of continuation handle, but then the server needs to keep state and there's a chance your pagination set can expire.
I'm aware (and that's what we use) but it still suffers the same issues. To use it with deep nested queries, you essentially need to use something like Apollo which can rewrite queries for enumeration where it doesn't re-query the things from the original query that it already has cached.
However, what I'm referring to - and I'm not sure if it has a better name - is the type of cursor where you can create it, then enumerate it separately, across multiple requests.
In pseudocode it would be something like:
let c = db.users(name like 'c%').groups(memberCount > 50).members # request 1
while c.any():
c.fetch(max=50) # request 2+
Of course, for that to work, the entire state of the query needs to be encapsulated within that cursor or saved in some temporary session variable on the server. It also doesn't really fit into GraphQL's model right now.
I seem to recall Hacker News actually used something similar for pagination back in the day, and if you left a page open for too long then tried to press next, you got an invalid continuation error or something to that effect. Not sure if that was something similar...
And where this really gets tricky is when it's not a loop like that, but the next fetch happens only after user interaction, perhaps 5 hours later.
To do that right, you need to encode everything the query is ordered by into a stateless continuation token, for example here that might be [user_name, group_id, member_id].
You can make that opaque to the client, but that's what you need to continue the search efficiently.
I haven't jumped on the GraphQL train yet, largely for a lot of the reasons the original author calls out. I see the benefits, but they don't outweigh the costs of converting our existing API surface area.
Like most of the tools we choose to use (or not use) there are trade-offs. The original tweet and post fail to recognize why GraphQL might make sense, even with its caveats. GraphQL makes the API more flexible for the front-end to consume. This reduces the number of requests a UI might need to make in order to render something, which makes clients (particularly mobile ones) faster. It also means a team of specialists working on the UI can probably add or adjust features faster, as the backend is more dynamic.
So if you're serving a certain audience (lots of clients where network requests are expensive) or have a large, specialized front-end team that's distinctly separated from the team that's responsible for the API, then GraphQL might be worth the trade offs. Sure, it'll come with some downsides, but all things do -- it's our job to be careful and deliberate about the tools we choose to use.
I recently had to consume a GraphQL API (Shopify's) after being interested in it for some time.
Wow, not a fan.
Everything was poorly documented (both in the actual API and GraphQL itself) and fussy, and required a huge amount of code to get anything done. Nesting and union types were really hard to figure out.
Frankly, the more I used it, the less I liked it.
Is it possible that the API design was bad? Sure. But my googling to figure things out didn't make me think they were doing anything out of the ordinary at all for GraphQL.
Every single time I've heard "GraphQL is poorly documented", it's from someone who doesn't know about Insomnia [0] (or something like it) and autocompletion.
You can traverse the docs as you write the code with the right client app, and this solves 90% of the "docs are bad" complaints about any given GraphQL API.
GraphQL APIs are many things. Poorly documented is not one of them.
GraphQL APIs that are public will usually have a less than optimal dev experience because it has to consider general and unknown use cases. GraphQL shines when you control the API and the frontends that consume it. This is mostly because the schema design will reflect the exact use cases the frontend needs. So the data and its shape and its mutations are exactly what the frontend needs. You can't really do that for public APIs because you don't really know or predict the use cases for 3rd party frontends.
> GraphQL shines when you control the API and the frontends that consume it.
How does this differ from controlling the API and frontend with rest/json? You can shape the data in any which way you want including nested relations, i.e get me a user with all their posts.
Unless you are $MEGA_CORP scale it really doesn't feel like its worth the investment.
You can definitely shape the data in any way you want with REST. You can find parallels with GraphQL features and REST and you can argue for either way because its 1:1 comparison. Comparing technologies requires looking both at a 1:1 level and as a whole.
But with your nested relations example, wouldn't getting a user always return its nested relations?
- what if a post has its own nested relations? some pages might need just a user's posts without the posts nested relations and some pages might need them all.
- what if a page only needs user info and doesn't need the user's posts?
With both similar scenarios, you'll need a way to communicate that to the user endpoint so it knows when and when not to return its nested relations and how deep. You could do that with query params sure but IMO that's a workaround with what the frontend really needs, a declarative way to let the API know that for this specific request, this is the exact data I need and its shape. No more no less. With also the flexibility to get everything in another request. You could also do nested resources with REST /user/posts/categories and /user/posts/etc but then that's multiple calls to get what you need. With GraphQL, these scenarios can be solve with a single API endpoint and single requests to it.
Also the productivity gain from GraphQL is actually more apparent in smaller eng org compared to a massive one. That's because for GraphQL to be at its best it requires the people implementing the schema to work closely with frontend folks. Even better if they are full stack and are the same people.
> But with your nested relations example, wouldn't getting a user always return its nested relations?
A central point of REST is that any resource (e.g., user) can support an unlimited number of different representations. This isn't just true on the level of, say, XML vs. JSON but also formats which embed and do not embed subordinate/related entities.
> You could do that with query params sure but IMO that's a workaround with what the frontend really needs, a declarative way to let the API know that for this specific request, this is the exact data I need and its shape.
How are query strings a workaround and not a declarative mechanism?
(I mean, it would be better if HTTP had a verb that was like GET with a body so that you could define one or more media types that could be used to specify details of the resource representation sought, and that's what the QUERY draft [0] is about. And I can see the view that query parameters are a workaround for the lack of a QUERY method. But that's a slightly different story.)
> A central point of REST is that any resource (e.g., user) can support an unlimited number of different representations. This isn't just true on the level of, say, XML vs. JSON but also formats which embed and do not embed subordinate/related entities.
I'm aware but how does that work in practice? actual implementation wise. What does the formats to include or exclude related/nested entities look like? how do clients use these formats? also how does supporting embedding multiple levels of nested entities work in this pattern? I asked all of this because I don't think I've seen this implemented well without a convoluted in-house implementation of a query language like GraphQL's shoehorned into query strings. A specification is only as good as the people implementing them.
> How are query strings a workaround and not a declarative mechanism?
In simple cases, they can be. But they are very limited. How do you express in query strings the exact data fields, nested entities, and shape the client wants to retrieve? (this question might be related to the questions above)
> I mean, it would be better if HTTP had a verb that was like GET with a body so that you could define one or more media types that could be used to specify details of the resource representation sought, and that's what the QUERY draft [0] is about.
I didn't know about the QUERY draft. This is very interesting and would be great to have in HTTP. This is a good example of the gaps in HTTP that technologies like GraphQL are trying to fill.
The company I'm with is decidedly smaller than mega corp and graphql has been a huge boon for productivity. The API team will hit product with some word vomit that the page they want to build is impossible due to service A not being able to talk to service B, meanwhile the frontend team just implements a resolver and is done with it.
I despise the programmer's hubris around the concept of self-documenting things especially as I'm running into so many examples of programmers who don't actually sit down and read library code these days. And especially despise apologist management that make excuses for them.
We didn't get where we are by not writing manuals and knowing what we're doing under the hood. Abstraction is fine once you're actually aware of what you're boundary conditions are.
That's probably true after you are comfortable in it, but as someone trying to wrangle it for the first time it was rather mystifying for anything outside of trivial usecases.
I just got done doing a talk on GraphQL for PyCon2022 and I do agree with some of the points here. Performance work can be tedious and is not bi-directional in the graph so the number of dataloaders can blow up making debugging hard. Identifying where n+1 queries are in the API can also be difficult, but I used this open source package to help: https://github.com/tatari-tv/query-counter. I think the article failed to mention two of the things that GraphQL does really well: dense queries and built-in pagination. You're able to do the work of many serialized REST queries in one query using the node context of the GraphQL graph structure, which is a huge win if you're hitting performance issues related to requests per second to your API. Also pagination using the cursor, before/after, etc. is very helpful and Flask_Graphene enables some slick caching there to make subsequent queries at that cursor to be extremely performant. I have code with my sample implementation which simple, but shows the power of DataLoaders: https://github.com/lame/pycon-graphql.
One problem with any such databases is that, because they are so useful, you end up making a great deal of use of them, and in particular code using them metastasizes everywhere, and worse, schema gets hard-bound into all that code everywhere.
Then one day you want to:
- make significant and incompatible
schema changes
- move subsets of schema/data out to
separate partitions / DBs
- implement a merger
- change DB / vendor
and... you can't. All that code capturing the specific API, QL, and the schema (and metaschema!) is spread all over the place, and it's too much to change, and you can't do it in any sort of atomic manner. The change will take forever and will be very costly -- you might not even bother.
You're stuck. Vendor lock-in, but more awful than usual.
But what's so special about GraphQL? Active Directory has the same problem, really. It's just that AD -and all LDAP-based DBs like it- is kind of icky because the metaschema (particularly X.500 naming) is icky, so it hasn't metastasized as much.
What's the solution? I would suggest that one has to build a proxy API to capture all direct uses of the DB and schema, and which presents a task-oriented interface to it (e.g., "add user to group", etc..). This way you can later rewrite just that very isolated and tested component. But the problem with that is that you have to bring forward a lot of the switching cost into the present, and that might be pointless cost if you end up never switching.
> I would suggest that one has to build a proxy API to capture all direct uses of the DB and schema, and which presents a task-oriented interface to it (e.g., "add user to group", etc..).
Sounds just like stored procedures to me! But those are "icky".
I tend to be highly critical of the costs of interfaces that bring nested data structures into the equation, especially on the input layer. The moment you bring nesting into the equation, things become exponentially more complex. There's a reason relational databases are relational databases and not graph databases. Flat data structures bring a lot of simplicity in expectations. Two dimensions are easy to grasp. You can see at a glance if things are going right or wrong. You'll notice immediately.
The same goes for API interfaces. A one dimensional list of input parameters is easy to understand. The length of your input vector is equal to your degrees of freedom. Easy to understand. However, with a graph query, you can query literally anything in any dimension you want. _Sometimes_ that can be powerful, but more often, you want API contracts to be strict and simple. Simple REST endpoints will bring less surprises, are easier to document, easier to explain and easier to implement.
Sure, there is the cost of doing multiple queries. But for how many applications is the added complexity worth the cost? HTTP is getting faster everyday, especially with HTTP2 and HTTP3. Multiple queries aren't as much of a deal breaker as they were 10 years ago. In complex applications, simplicity and predictability are more important than minor performance improvements.
The brower is not a trusted computing environment. Any expressive power you put, intentionally or not, in the hands of your Front End developers, you also put in the hands of potentially hostile users. The console is just a click away.
This is not the case in hypermedia based systems where content is generated on the server side. The server side is a trusted computing environment and, there, you can give developers a fully developed query language such as SQL without risking it falling into the wrong hands (mod coding errors by your developers, of course.)
This is a powerful argument for SSR and for hypermedia in general.
Typically, in security discussions I've had in the past, the fact you can tell which account was used to compromise your system was not considered a strong argument for a particular approach.
Oh, I thought the concern was from a denial-of-service angle.
Compromise would imply the account has access to things it shouldn't. Which while possible should be taken care of, and if not, a rest api isn't going to prevent.
(I'm personally thinking of a paid account of a multi-tenant system, so query limits and punitive billing would probably eliminate the worst DOS issues.)
My point is that giving expressive power to your front end developer puts that expressive power in the hands of an end user as well. So you have to be very careful with that power, and I assert that most people are not. Facebook apparently uses an exhaustive query whitelist to lock down their GraphQL end points, but the vast majority of people jumping on the GraphQL bandwagon aren't going to do that, and likely have little understanding of the security implications for not doing it.
A truly REST-ful hypermedia API eliminates this issue by moving the construction of the hypermedia server side, which is a trusted computing environment and which allows you to give the developers a fully developed and arbitrarily powerful query language like SQL. Doing so does not put that query language in the hands of end users, in contrast with things like GraphQL.
Sometimes I get this weird, probably biased and prejudiced, but real feeling that every single piece of technology that has been born inside Facebook in the last ten years is a trap.
I feel like the original Twitter thread misses some common points. Bad API design is possible with any API style. A complex aggregation with nested joins is possible with any kind of API as transport. Also they don't mention tools like Relay, which indicates that they have never used GraphQL to its full extend. We've been working a lot in this space to improve the developer experience of GraphQL, giving devs the benefits of dynamic GraphQL operations but combining it with a REST API/JSON RPC as a facade and therefore dealing with a lot of the downsides. Please check our https://wundergraph.com/ if you're interested. In terms of security, there's frameworks like entgo (https://entgo.io) that handle auth extremely well. If you look closely into the docs, you'll realize that entgo supports REST, gRPC and GraphQL as external interface. So it's clear to say that you have to deal with authz, no matter what API style. Regarding "unpredictable" performance, I'm not sure I agree with the points being made. With GraphQL, it obviously gets very visible when you have "slow queries". If an API consumer would do the same "queries" through a REST API, it might be the case that they create even more server load because it takes more requests. The difference would be that it's not really visible because you don't count 100 rest API calls as "one query". Instead, you falsely believe that you've served 100 API calls very quickly in less than 100ms each. It might be the case that the REST API calls take 10s total, while the GraphQL query took 3s. So now you're thinking that REST is 30x faster than GraphQL, but really we're comparing apples to oranges. My summary is that you should choose the right frameworks and tools for your project. REST is totally fine, but please create an Openapi specification to document it, otherwise it gets messy.
Yes you can change anything you like and add in any behaviours you wish. But, each time you do that you move away from the promise that "you basically don't need backenders" or "if you're finding it tough, just use this tool!" The latter of which has appeared in this comment thread already more than once.
You can technically accomplish many things with GraphQL. But the effort to do so erodes the benefits promised.
Even though this is true, there can still be a lot of value in certain scenarios.
For instance, this is why I use GraphQL strictly in my admin backends. Exposing the database almost verbatim through Postgraphile as an Express.js middleware. All data is now trivially accesible. Everything that requires custom resolvers or other complex customisations are handled over a Rest endpoint. I now have all the benefits of GraphQL without the complexities you face when moving into more edge case scenarios.
In pretty much the same way I use Prisma ORM but raw queries for anything non trivial.
Sure - anything that's basically generated dynamically off the database structure will work well, although you can also get auto-rest generators that would have a similar productivity effect.
What is the benefit of every request being POST? This makes caching a harder problem to solve.
Also, why is every status code 200 even in the event of an error? They want you to pass an error key in your payload and have your client be processing the payload to understand the status of the response. Why are we reinventing the wheel here, for what benefit? GraphQL had some really appealing concepts going for it like the whole querying a single source of truth for multiple data sources, and only getting the data you need. But in practice, the benefits do not seem to outweigh the costs.
You can combine multiple queries into one request in such a way that transport layer caching just isn’t effective.
If you’re making a content and read heavy website, you can run GraphQL over GET, and cache traditionally.
The spec says nothing about the transport layer itself.
Why return 200 for a request, when part of it failed?
Separation of concerns. Network layer errors are one thing, application layer errors are another.
They require different resolutions.
With a network layer issue, you could retry, or perhaps you need to fetch a new access token then retry.
For application layer issues, say you request data on 4 different entities, but the service for one of those types is down. Should you chuck out the whole request, or return everything and something like a Problem or Error type for the failed one?
Perhaps you tried to access a field you don’t have permissions for, or require elevated permissions. Should you fail the whole request, or return an error on the field itself, allowing you to inform the user what you need them to do to continue?
The point is precisely that the client processes the error closest to where it’s relevant. This works especially well with component based rendering.
That being said, caching needs to be done at the resolution layer rather than the request layer. Under REST, APIs are generally modeled as returning individual objects or lists of one kind of object, which makes requests a reasonable thing to cache. Under GraphQL, each resolver returns a different type of content, and the mixed bag of content means that there should probably be different cache policies and invalidation for each kind of data provided by each resolver.
The status code being 200 even though your query is wrong or failed to correctly return data makes sense for the same reason. If you got a 500 because one field failed to correctly resolve, but the rest of the query was fine, the 500 is only telling you that something went wrong without letting you know exactly what it is. In GraphQL, we should save the status codes for request-level network issues rather than semantic issues with the request or the response.
Sure you can query over GET but to a single endpoint - so no ability to utilize built in browser caching for any requests you want to optimize unless you offload them from graphql.
As for the 500, you can pass error details in the payload just as GraphQL would return as well, so sure you can get specifics of what went wrong.
There’s clearly value in GQL for specific use cases but IMO it’s often an early over-optimization that shouldn’t encapsulate your entire data layer until you actually need it for reasons and not just to stay bleeding edge.
To the former point, you’re saying it doesn’t make sense for GQL, I’m saying why fix what isn’t broken? Request level caching works great for certain needs. But I get your point that it doesn’t apply as directly for GQL and yes, people do have these needs also.
To the latter - at least with fetch, reading the response payload is a second blocking call after running the initial request. You can read the status with one less method. Amazing argument for pre-optimization? No, I get it, but it just feels like more overhead in practice to me at least.
I think the point has been made clear to me that there is definitely one or more use cases that GraphQL solves for better than REST, and it is still right to pick and choose the right tech for each specific job as it fits best.
GraphQL treats HTTP as a dumb pipe. All semantically relevant information is encoded in the GraphQL messages, and none of the HTTP metadata is relevant to the query posed by the client or the response returned by the server.
In theory, this allows the same processor to be used over alternative transport mechanisms and paradigms (e.g., over MQTT or Kafka topics) without adapting the messages themselves to the transport.
> "In theory, this allows the same processor to be used over alternative transport mechanisms and paradigms (e.g., over MQTT or Kafka topics) without adapting the messages themselves to the transport."
This is something that doesn't get mentioned often/a lot of people don't grasp. It's transport agnostic, and this means you can do GraphQL over high performance transports if you have the requirements.
Recent HN top post by Dan Luu: "In defense of simple architectures" discusses how they do this
> "Some areas where we’re happy with our choices even though they may not sound like the simplest feasible solution are with our API, where we use GraphQL, with our transport protocols, where we had a custom protocol for a while, and our host management, where we use Kubernetes. For our transport protocols, we used to use a custom protocol that runs on top of UDP, with an SMS and USSD fallback, for the performance reasons described in this talk. With the rollout of HTTP/3, we’ve been able to replace our custom protocol with HTTP/3 and we generally only need USSD for events like the recent internet shutdowns in Mali)."
> "As for using GraphQL, we believe the pros outweigh the cons for us:"
This confused me too, working on the client side. But part of the benefit of gql is you can do multiple queries in one call. Any or none of those could fail.
Is the status code query specific or request specific? If I have a resolver that can handle multiple queries, does it matter if one or more queries internally fails? Doesn’t that mean the request was either degraded or failed entirely? Seems like extra overhead for the frontend when the queries are just a means to data which is generally parsed and validated anyways, so aren’t query-level status within a single request sort of redundant to a frontend?
IMO status code should be request level, but I can see the reasoning you’re presenting about query-level. Interesting thought.
A common scenario where a partial failure may occur is throttling. If you have a client that is permitted to perform 10 query per second and they submit a GraphQL request with 11 queries, the server is within its rights to return the results of the first 10 queries and an error for the 11th. This would allow the client to only retry the 11th query rather than all 11 (which, if throttling were enforced at the request level, would always result in a 429 response).
Interesting, I see how that would not be as easily replicated with traditional REST. You would either need a bunch of requests or your client would need to be intimately familiar with the API. This leads me to think GQL has some benefits when used for semi-public/cross-team consumption (as opposed to a tight relationship between server and client or more monolithic apps), as there is more “opinion” in place by design to allow clients less familiar with the intricacies of the API to still use it efficiently. This is also furthered by another commentor pointing out the cross-protocol capability of GQL, as maybe different teams or companies integrating your API may have different needs in that way. Thanks for the info!
GraphQL itself doesn't care whether it arrives over GET or POST. That's an implementation decision. The main problem is the limit on GET query length in some browser and server combinations. There are ways around that though.
The "Graph" part of GraphQL is entirely optional, and IMO not worth the trouble. The best way to use GraphQL is to use it more like an RPC framework, as a replacement for calling generic HTTP endpoints that return JSON blobs. The front end libraries that support it are great and well maintained and you get the additional structure of the GraphQL type system.
It’s stretching it to say that GraphQL has a ‘specification’. It has a grammar but the algebra behind it is sketchy. (Just like that JSON spec that doesn’t say what the semantics of numbers are.) That is some of why it is popular because a lot of people seem to ‘fade out’ when they are forced to think rigorously.
(E.g. SPARQL really has an algebra which is well-defined and it gets talked about 1% as much on HN. Compare a good language spec like Common Lisp or Java to an undefined behavior festival like the early C ‘spec’.)
There's no "algebra" behind GraphQL. It's just a type system specification, with no behaviors.
The equivalent would be the TypeScript type system.
I don't bother debating GraphQL with folks anymore because I've learned the hard way there's a lot of misunderstanding.
When people say "GraphQL" they usually mean some particular implementation of a GraphQL API they had a positive or negative experience with, not the idea as a whole.
The specification as a concept is a bit hard to critique because it defines no implementation behavior. It's sort of like saying "REST API's are slow" or "REST API's that fetch nested relations produce bad SQL".
GraphQL helped me to understand a very important fact.
Even smart people don't understand stuff that is outside of their niche. And I mean Kernel-dev levels of smart. It's not that the tech they're critique is really bad, it's just not possible for them anymore to think outside their box.
> It's not that the tech they're critique is really bad
I think marketing is somewhat to blame here. I work in the GraphQL space and there's not a lot of incentive to do neutral education.
The goal is to get users to associate "GraphQL" with particular products/implementations.
Half of my dayjob consists of explaining to others why $OTHERCOMPANY is not a competitor, because despite us both being GraphQL tools we do orthogonal things. Marketing doesn't help this.
On top of this, GraphQL is (in my opinion) pretty complicated. It's not explainable in a sentence the way you can RESTful URLs, it takes a bit longer to grasp.
Despite all of this, GraphQL is the best thing since sliced bread as far as I am concerned and I will continue building services in it until something better comes along.
The same problems happened in the Object Management Group which standardizes a number of technologies such as UML and CORBA and applications of those technologies. (The map-vs-territory problem w/ UML has been addressed by various forms of "Executable UML" such as the Object Constraint Language which itself an algebra over UML-modeled objects)
Look at the specs though and you find they are deliberately designed to be hard to implement.
For instance there is the meta-object facility MOF which is great for modelling a set of objects in a language like Java. The point of MOF seems to be that you could bootstrap the whole UML edifice from a very simple foundation, and at the very least have a machine that can build a set of objects to represent both a collection of UML objects that function as a "schema" and also a collection of instance objects that are modeled by the schema.
(This is a lot like the vision of the semantic web but going about it a very different way; in fact I am about to open source something that converts MOF models into RDF inside Python and also builds Python stub functions that let you 'call methods' on an RDF node while having access to the objects via SPARQL queries and other RDF tools.)
If you actually try it however you find there are some inconsistencies, unresolved circularities, conflation of UML 1 and UML 2 concepts and other problems you run it.
I'm certain that if I was "on fire" I could bend MOF enough to bootstrap UML-in-RDF in a month or two of working overtime. I've done that kind of thing before and that's just the start of your problem because then you have to market it...
The result is that some incumbents have a "moat" but also that UML is out of the mainstream and addresses a much smaller market than it could.
I'd say GraphQL and Schema.org were both "asymmetric technologies" in the sense of asymmetric warfare but maybe the other way around.
Both of them tackled some of the problem space the "semantic web" tackled (e.g. the linked data idea of do an http request and receive a graph) but in a way that privileged the large organizations that pushed them.
With no semantics Facebook can return whatever they want from a GraphQL query, whatever is in the commercial interests. (I'd add that they have ethical constraints on top of that involving privacy, spam control, etc.) They have no real concern that anybody else can publish GraphQL and they are big enough that they can go it alone and be a defacto standard to interact with Facebook no matter how GraphQL fares in the real world.
When schema.org came out my wheelhouse was information extraction from Wikipedia, Freebase, things like hat, and I was like... There is no 'reification of subjects' in schema.org, it's not really that much better from my perspective than extracting facts from text. In fact if anything it is more of a way for Google to get a training set for a real text extractor than a way to publish facts Google can use directly.
So I was bearish on it initially but the standard really improved and technology got better both in terms of natural language processing and my understanding of matching engines that can 'reify subjects' by matching graph patterns. I don't see schema.org as difficult to consume now.
to be fair, a kernel dev is so far down his rabbit hole that web services and web tech is not something that they're particularly interested in. They also know intricate details on what my kernel does that I have no hope of ever understand or desire to. They exceed at what they're passionate about.
It’s a grammar plus some hand waving around types. Not a behavioral spec. It’s much less complete than SQL or SPARQL (e.g. SPARQL is like SQL in that it is based on relational operators but it has a good spec that describes what the operator algebra is exactly)
When post-structuralism burned out I think people would have gotten it that language doesn’t give any insight into behavior, at best people leave words behind like the evidence at the scene of a crime. Grammars are profoundly empty and meaningless.
Grammar + undefined behavior. It’s like saying ANSI C is a specification. It looks like a specification and quacks like a specification but compare it to a quality specification and you are looking into that void Badiou warned you about.
This is what a specification for a query language should look like
It is a little terse and not the easiest read but everything you need to know to write SPARQL or implement a SPARQL engine is in there. It's short.
(Now I would say that SPARQL needs to extend the algebra to deal with ordered collections, but that's what is nice about SPARQL being so well specified. If somebody wants to add a feature to SPARQL it is completely straightforward to amend the algebra AND the grammar, often you don't have to mess with the grammar and the use of namespaces means anybody can add anything.)
The GraphQL 'spec' on the other hand is like the singularity of a black hole... It's a place where computer science breaks down.
It's more than that though. Have you read it? It defines execution behavior (execution and validation algorithms, response shape, etc) and not only the grammar. We can argue about the quality of it all day but it is not just a grammar.
Think of what the Common LISP, Java, or Python specs would be if you deleted all of the specification of the semantics. The horror of it is that people would make languages that look like Common LISP, Java, or Python but they wouldn't interoperate and when you zoomed in on the details they'd all behave in nonsensical ways because, with no guidance to correct semantics, people will make up wrong things.
We live in an age when we are informed by good specifications. ALGOL and PL/I had hopelessly flawed specifications that weren't really implementable... There were lawsuits over COBOL specifications... But Common Lisp and Java were two early languages developed by adults.
In 2022 we should be at least up to a 1984 standard for writing standards.
You voice an opinion in 140 (or so) characters, then there comes someone who's literally written a book on the subject, blames you for the lack of nuance in said 140 characters, invents a few strawmen themselves, then blasts you with a 1000+ -word article.
> That’s not typically the kind of queries a GraphQL execution results in. If anything, naively implemented, GraphQL results in a ton of small queries for every resolver.
This seems like mincing words to me. Any SQL programmer worth their salt knows that if you compose "a ton of small queries" into a single query and let the DB engine do its thing the execution time will be lower.
We introduced a Graphql api 4 years ago and we deprecate it 1 year ago.
At the end of the day graphql didn't add much for our operations and we found ourselves having to maintain both the REST JSON API and the Graphql API.
We are a small team and we think types/classes are overengineering in our case and Graphql felt like types/classes/overengineering.
I wish Graphql would have used the web or help standardize a new standard for the web instead of inventing a new query language and that types were optional.
I think we need something standard like a JQL Javascript Query Language or just better conventions to use URLs and JSON for queries.
People seem to be complaining about the server component of GraphQL being too hard to maintain, but I'm curious what language and tools they are using. I find that using a code first approach with a language such as Java can provide pretty much the same maintenance as something like a REST API, but for me it naturally fits into a GraphQL API as well. Plus, possible server side calculations (e.i. asking for Fahrenheit instead of Celsius) just don't get executed if they don't need to be. I find the tooling around all of this quite natural (at least for a code first approach).
Some people want GraphQL to provide more advanced features such as filtering or some part of the query involving some business logic when, you can just create a specialized query for that. You would likely create a specialized query if you were using a REST API, unless of course you decide to needlessly complicate your REST API.
Consuming GraphQL APIs is also very pleasant. In a React web project I was working on I was able to set up some code generation from a GraphQL schema so I could get type checking in my TypeScript code. I'm sure this is all possible using some sort of schema for your REST API, but it's going to be more difficult. Of course, if you are trying to handle everything in the frontend and just want a blob of JSON data, then GraphQL may not be for you.
GraphQL is trying to put Menu Query Language on top of Structured Query Language and get the same full featured API. There are some use cases, but I think the use cases where a GraphQL API is the ONLY API is far and few between. The best solutions to me are when some endpoints of the API are done through GraphQL while others are RESTful in design still. The GraphQL spec will always be too restrictive to accomplish what developers want to do with it because of it's nature. Graph DBs like Dgraph/Neo4J and other db layers/services like EdgeDB/Hasura are making GraphQL popular which usually work beautifully for the ToDo apps but fall apart when you actually start to use them for production systems with more real world cases. And then we have newer layers coming along like Outcast which is trying to be a database with ONLY a GraphQL "query language" to do "everything" and you start to quickly see where the limitations are. the "QL" part of GraphQL is very deceiving to the novice developer who seem to be the main players in the GraphQL world.
If anybody wants to discuss these points and join a discord geared towards full stack development using a varied number of solutions, then I invite you to: GraphDev Discord https://discord.gg/KRPXpfnbUC
Watch this Honeypot documentation about GraphQL. Listen to what the developers tell which problems they had to solve when they invented GraphQL.
You likely do not have any of these problems.
> On the other hand, when you are the one to implement the Graphql server, it feels like writing your own database. You have to create a query plan, handle optimizations, corner cases, etc.
I know right. It's best to just push all that to the front end where other people have to do that. Where a 2 year junior devs has to create a saga framework to speak to the 52 web services on the backend because they need a First Name along side an account number.
Then we can complain that the front end is a crazy mess and they don't know what they are doing. If I don't understand the problem, it's not my problem! Plus uncle bob and martin fowler haven't written a book about it so it must not work.
Built a graphql backend for our Ruby on Rails application and our front-end developers love it. It works shockingly well when using it with ActiveRecord and your data model matches up easily with want the front-end wants. Added in the goldiloader gem which handles 90% of our N+1 problems and it's been pretty peachy. We're in the process of introducing mutations into our app and that has been pretty smooth experience as well.
It has some warts, but I'll toke those over the absolute monstrosities that the REST-based APIs are.
GraphQL does solve its problems. It's often very fun to use, and can save a lot of efforts if adopted correctly.
However, no matter how good or practical it is, you should mind that it's not "technology" after all. It's just an industrial solution that adopts a (relatively) new development pattern. It doesn't solve any complexities inherent to the web technologies, but merely re-adjusts the developers' responsibilities (= shifting burdens to consumers). This is a human problem (though still is an engineering problem), and is NOT a technical problem.
Thus one must approach GraphQL accordingly: does our "organization" need to re-adjust developers' responsibilities? Can we benefit from changing our development pattern/workflow?
This is not an easy question to answer, but there can be some noticeably easy cases. Say, if your FE is relatively huge/fast so that BE people simply can't catch up with it, GraphQL will be a fantastic solution. If there are too many brain-dead CRUD APIs that keep sucking up BE engineering resources (= low ROI) and causing stupid down times and development delays, GraphQL can save you.
So, when you review GraphQL, it should be reviewed in the organizational level. You should find the issues in the workflow, not in the current implementation of your service. You should struggle to improve the speed of the development and to save the engineering cost. In the end, you should NOT think this is a "better technology".
The first interaction I had with GraphQL the developer who introduced tried to make it TOO smart for its own good. It was building complex SQL dynamically which means the SQL that it was running was borderline non-deterministic.
(Aka. identified the foreign keys and primary keys linked them together and made all other fields options)
Things I don't like about GraphQL (In Java which was my experience)
1. Debugging it is annoying. There's no clear concise way to follow the code there seems to be some magic that happens where it isn't clear when certain parts of the code get invoked.
Things I like about it.
1. Lets you make multiple queries and reduce/extend the size of your payload as needed.
In my view graphQL is much better when you're not constrained by a SQL like backend. It's great at filtering the payloads which is great for mobile and such. It also allows you make multiple calls in one go which also means you can shoot yourself in the foot if you over do it.
I will call out that some of this is trauma based from my last experience. Having a more dynamic language that isn't Java may make the experience better but in general everytime we had to update the GraphQL code it was cringy.
Eventually started to gut the dynamic SQL and replacing it with a simple Query and then used GraphQL and trim the response which worked out much better.
Generally the question to ask is how many iterations of an endpoint do you need and is it worth introducing a new technology vs just having a few query parameters to do some filtering.
That being said, I'm now looking at some Query language to work with Neo4J so I'm back at looking at dynamic APIs. (:
> Lets you make multiple queries and reduce/extend the size of your payload as needed.
While this is cool, in theory, I haven't found it to be in practice. If it's an internal API, you can just provide a way for the client to get exactly what they need in a single query. If it's an external API, you have to deal with putting limits in place to keep users from burdening the system with complicated requests. Limits can become complex very quickly.
I am also considering woriking with Neo4j for a project.
I am wondering why not build on top of their Http Api? You can send multiple Cypher statements over Http and correct me if I am totally wrong, but "stealing" the concept of graphql's persistent queries, you could make use of http caching also.
Just to make the conversation easier the use case I have is a network topology so you have things like routers, switches, ports etc. If you take it all the way up to Layer 7 (Application) you can have say web services etc. So you could in theory say link X was cut what is affected?
Anyways... my basic POC was exposing endpoints so I can do things like.
to get a list of all devices but I'm basically just writing custom Cipher code to do that query and the benefits of Neo4J basically go out the door. It still has some interesting Graph features but if all i'm doing is writing a custom endpoint for every use case it's mainly pointless.
You can do a simple POST statement which takes a Neo4J query and execute it with some caching on top of it for sure.
Either ways in order to make Neo4J worth it I need a way to make the queries more dynamic. So right now I'm thinking of:
Or just having a dumb POST /custom/query that maybe only supports read operations. You can add a layer of auth but I'm not a big fan of just having some endpoint that's basically a Pipe to Neo4j. It feels just as bad as saying, typing any SQL here and we'll execute it on the server. If people know what they're doing that's fine..but at that point just setup phpMyAdmin/ pgAdmin. At that point you're trusting folks to know what they're doing and if folks accidently drop Bobby Tables (https://xkcd.com/327/) then it's an accepted risk.
If you just have a proxy to run any Cipher query, you might as well just provide users on Neo4J web instance and let them play there.
Anyways, still in early stages trying to figure out how to best leverage Neo4J
You could give a POST /custom/query with only read capabilities.
Also you can give some custom "helper" operations along.
And certainly you could give them some option to prettify the response.
Because one thing is to query the correct things, another whole thing is how to structure the response according to your needs.
Yeah this was years ago but we had concept of 'hydrated' objects so you'd pass a flag to get back shallow object or the hydrated version that had all the relationship loaded as well.
Some flags for helpers that fetch additional data would be good.
What I really like about GraphQL is that it completely get rides of the silly verb questions such as
- POST or PUT or PATCH?
- Why can't my GET or DELETE requests have a JSON body?
It is just a better interface, even if it where to be used exactly as REST without any nesting. I used the Appollo server in NodeJS and it was so natural to make parallel queries and let the query planner take care of the parallelism for me...
If you don't like the verbs you can always just use POST for everything...
My experience with GraphQL was a pain. We could have reduced queries if we changed our front end dev patterns, but we didn't and had the same number of queries as a REST pattern would have provided. Our GraphQL backend was a cobbled together mess of microservices that struggled with performance and stability. Queries could fail and we often wouldn't know why and bugs persisted for months despite work on them. We couldn't do simple things like cache queries to be reused between application launches without backflips and juggling expertise, Apollo client and all... And on and on and on.
Granted, not the best engineering shop. But in poor shops I find REST is simpler and thus less broken in practice.
Read the wall of comments.. I think GraphQL is great for one primary reason: schema.graphql files. Having strongly typed payload definitions that integrate with languages like Typescript is beneficial to frontend developers in the same way that protobuf schemas are beneficial to backend developers. I know REST APIs can accomplish the same degree of type safety using OpenAPI/Swagger schemas. Honestly, REST APIs that have these schemas start to look a lot like GraphQL APIs. To that end, using GraphQL really becomes a question of preference in using GraphQL’s schema definition language versus JSON as a schema definition language. In actuality, GraphQL schemas can be defined using JSON! And so this brings me to my conclusion about GraphQL. GraphQL is just a set of tools and conventions for implementing strongly typed REST APIs with JSON payloads. Who doesn’t want that?
well, it is a language for describing interfaces. And a decent one.
What someone uses it for, and how, is entirely different matter.
For a well-thought interface, it will need some language-design thinking. (Of course one can express a well-thought interface in anything, be it REST, SOAP, CORBA..., but in Graphql it is easier and consistent. And yes, graphql is self-documenting.. no swaggering around).
For example, i have made a ~~generic django-orm wrapper, with all the bells-and-whistles i needed - queries, paging and what not (the idea came from graph.cool, reshaped and taken further as "languageness"). And yes, a hand-made ~simple client side as well, none of the usual bloat.
Yeah it is big investment upfront. But after that, work per-object-type is near zero. Before that, the same interface-as-language, was made as REST, and was cumbersome and fragile (there are no types or syntax there, only assumptions).
I dont like graphql because most of the time I dont know which fields I need until ive been using it for a long time. I dont know if it's just the implementations ive worked with, but none have had a "just give me everything" option.
I really just want rest plus a standardized way to query.
They have UI helper tools with auto-completion, it's not a complete answer but it should make life easier. The select(*) not being supported seems to be an intentional choice unfortunately.
I have only used GraphQL when trying to get if a particular github user was my sponsor and it fails at that, you need to list all sponsors and iterate through them manually.
My solution to database is JSON over HTTP on disk, one file per "value" or "row", this means you need EXT4 on type=small to have enough inodes.
Today is tried to solve the same problem as the github one on my own db: You would have a meta index for sponsorship user-user and then just query:
To see if marc sponsors marcus, this took 2 minutes (refreshing my memory, otherwise 2 seconds), it scales and I did not have to alter anything. It just works.
Why, it describes what the question is... meta (what type of storage) user/user is self-explanatory and marc/marcus is also simple... If you consider what this replaces it will feel less nauseating maybe?
Like many things, it might be simple and intuitive _once you already know how it works_. Looking at it for the first time, my reaction was similar that of the GP. My $0.02: if the question is "using meta storage, does user marcus sponsor user marc?" I would propose a url like meta/users/{username}/sponsors/{username}, or meta/users/marc/sponsors/marcus. To me, at least, that url states a few things:
* meta/users/marc <- the user we're querying, and where we can get his info
* meta/users/marc/sponsors <- every user who is a sponsor of marc
* meta/users/marc/sponsors/marcus <- is marcus one of marc's sponsors?
Of course it wouldn't be a forum oriented towards developers if 10 users didn't hold a dozen conflicting opinions.
The user/user is a "node type" to "node type" thing: it can be user/item f.ex.
The marc/marcus is related to those so for user/item it could be marc/sword f.ex.
The reason it's that way is that how it needs to be stored on disk.
The detail data (what the relation has as meta data is subordined to the relationship of nodes user/user), you can put anything that marc/marcus have in common in that JSON file. (I just added that we used to be colleagues)
Even if it feels awkward when you are used to wasteful structures once you know this is the least complex solution you will be thankful, specially in the long run since you wont have to learn anything else ever again.
In this case, "sponsors" is an extra tricky word, because in English you can say "X sponsors Y" or "one of Y's sponsors is X", and you can't tell from just the path which one was meant.
Good callout! Using verbiage like "sponsoredby" or "sponsorof" in the url could help clarify the relationship and make it read more literally, i.e. to answer "does marcus sponsor marc" you could use meta/users/marc/sponsoredby/marcus or meta/users/marcus/sponsorof/marc.
Yeah, I used "fund": 1 in the user/user/marc/marcus... then you just have to decide the direction and everywhere except arabian it's left to right... so marc funds marcus.
And user/user/marcus/marc has it if marcus funds marc.
GraphQL isn't a trap if you just want basic crud functionality, and you use off the shelf tools like Postgraphile/Hasura/Apollo to implement it quickly. Trying to go all in on it is probably a mistake in most cases though.
I've never used GraphQL extensively in my projects, but my main concern would be rate limiting and malicious queries. It must be a lot harder than with REST, but I'm guessing that the industry found a way to workaround that.
I feel like, as with a lot of tools, graphql is really useful if you’re at the scale of having 1000s of different object types. If you’re only dealing with a handful of resources, a simple rest api is way less headache.
I've worked on four GraphQL projects. Three were backed by a relational database, and were not really that enjoyable. One was backed by a graph database (Dgraph), and was a delight: No data loaders, no N+1 problems, and in some cases no resolvers (Dgraph's query language was a superset of GraphQL, so sometimes we could just feed it the client request directly)!
I've had my fill of GraphQL, and won't be sad if I never use it again.
Shopify's API shows what the minimum ante is for a public-facing GraphQL API:
* Exotic rate limiting based on the quantity of data returned, not on the number of calls made over time. As a client, it is nearly impossible to throttle your code in any way other than "try and maybe it fails sometimes".
* Tortured graph structure with edges/node/cursor levels in the tree. Navigation is a pain in the ass. See code sample below.
Furthermore, you must provide root query navigation for every object. Shopify doesn't do this. So you end up with nearly impossible places to query to because you need to cursor into a parent to find a particular node, then cursor into its children to get to the right node. There's no limit to the crazy. Building complex shipping profiles with Shopify's API is a nightmare.
Here's a real-world query I pulled out of my code:
query ($id: ID!, $zoneCursor: String) {
deliveryProfile(id: $id) {
id
profileLocationGroups {
locationGroupZones(first: 1, after: $zoneCursor) {
pageInfo {
hasNextPage
}
edges {
cursor
node {
zone {
id
name
countries {
code {
countryCode
}
}
}
methodDefinitions(first: 150) {
pageInfo {
hasNextPage
}
edges {
node {
id
name
active
methodConditions {
operator
field
conditionCriteria {
... on Weight {
unit
value
}
... on MoneyV2 {
amount
currencyCode
}
}
}
rateProvider {
... on DeliveryRateDefinition {
price {
amount
currencyCode
}
}
}
}
}
}
}
}
}
}
}
}
I would much, much rather do this with a series of REST calls. Stripe's API is much more pleasant.
I never used GraphQL to access database or do anything with database. I honestly don't see value there. I only used it to replace REST API. It's great way to define typed API and let tools generate code so you basically use it like RPC and not worried about validation and route handling etc. Calling API becomes as simple as calling a function.
GraphQL is really useful paired with a tool like Genql [0] that creates a js library with auto completion and type safety, this makes discovering and using the API much easier and faster
1) Front-end devs can often write novel queries without involving back-end devs, if querying existing data types.
2) Clients can request only the fields they need, rather than requesting giant payloads only to use a couple of fields.
Since GraphQL is a bit more standardized than REST, client libraries can offer other features like optimistic updates, intelligent caching, and offline reads.
GraphQL reminds me XML or SOAP - too much overhead, complicated and user unfriendly for most of projects. Especially for small projects. JSON/YAML have beat XML, what will beat GraphQL? We need a modern, simple and easy to use extendable REST/RPC interface.
It just needs better middleware to handle all the BS. The problem it solves makes it extremely useful (if implemented correctly… thats a big if, I know)
My gut says it's likely the leakiest of all possible abstractions. I'm not sure in reality, but it makes me feel that 'logic' is in the wrong place. Queries should ideally be designed, there's a purpose to that API ...
It's an important point that Graph makes it easier for the user not the server. You still have to do the work, which is sometimes harder because you need to take care of edge cases and performance pot holes (in case you miss something that lets your code to run away with a heavy query).
The only difference between REST and GraphQL from the server side is the interface. The underlying work is the same.
"But we can connect multiple datasources into one query!" - REST does not stop you, it's done all the time. What is the challenge, exactly?
"But we can specify only the fields that we want" - yeah, it's called URL arguments. Many services already do that with their REST API's, it's just less enjoyable to use.
Stop treating GraphQL as some revolutionary tech. It does not make something possible that was not possible before. You could still create web pages that looked great before CSS was a thing.
It's not that GraphQL enables anything that was not possible before; it's that GraphQL provides a bit more structure and standardization around these things. If you plan to do this stuff, why not follow something with a spec and various bits of tooling rather than doing it ad-hoc?
In every company I've worked at, there is a group pulling for graphql adoption. I've gotten involved and made it to having some well fleshed out graphs. Still, I find the ergonomics to be lacking. Making POST requests with large bodies of JSON-ish queries that you have to meticulously craft from some hard to understand docs is way harder than an API call should be.
And this is JUST for a query language. It's not like the server code comes for free. There are still all the same complexities of joining this with that, filtering on foo, limiting to X results, projecting the output fields. It's super complex on the backend yet people think graphql can give them exactly what they need for free.
If there is some amazing benefit to FE developers it's lost on me. When I go to use an API and find that I have to use graphql I just groan and move on.
Yes it is a trap. I completely skipped it in my backends. Customers want particular data presented in particular way from our backends. Not some generic database to play with and waste time and money.
Here's a list of every advantage I've heard about GraphQL. All of these advantages were used as evidence to push a transition internally at my organization. Many regret it at this point, not to spoil the conclusion, but:
(1) Its schema driven. Schemas for your APIs are very good.
Sure are. And this argument is always listed with no hint of irony or even recognition that there are dozens of schema-driven API systems out there. GraphQL is one. GRPC, protobuf, JSON-RPC, OpenAPI, I mean the list goes on.
And so, realistically, where you hear this is from people who worked in an organization who wasn't codifying their APIs with a schema, and now they want one. GraphQL has schemas, so its an advantage of GraphQL; but this advantage needs to be rephrased to really mean: schemas are good, and there are many ways we can get a schema.
(2) (Extension of the last) We can use the schemas to auto-gen client libraries and server adapters; look at this cool graphql-codegen tool.
Same response. I can't even stress how many times I've had this conversation. Its like we're relearning the same thing every three years as people enter and leave the industry, without any recognition of existing, far more mature, stable, and excellent tooling.
(3) There's so much query boilerplate around every REST API we have. Pagination, filtering, selection, querying, reduction, analytics, tokenization; GraphQL solves that for us.
It doesn't. It solves Field Selection. That's it. Your new GraphQL API will have 90% of all the same boilerplate.
(4) It makes frontend life easier, because we don't have to rely on the backend to add new fields or relationships.
Yes and No. Our experience has been this: data is more-immediately available to frontend teams, yes. If they need a relationship between Users and Comments, that probably already exists, and they can get it in one call; its not a bad setup from that perspective.
But really drive into the advantage here: its not that the relationship exists in graphql and wouldn't in rest. Of course it would exist. Its that it exists in one call. That's literally the only advantage graphql gives; not that it exposes more data, but that it grants the API the ability to construct dynamic views on the data.
So, drive further. Views. Why would we use a view in, say, Postgres? Usually performance. The issue arises, and the reason why this is only a partially strong argument for GraphQL, in that these dynamic views GraphQL enables frontend to generate are non-optimized, the majority of the time. After all, its combinatorial; the API team can't predict every way data will be accessed.
Well, the funny thing is; they can't predict it, but they can still optimize these views. By being a dependency of frontend team work. Congratulations; we're back to square one; and this is LITERALLY how I've seen 80% of frontend projects play out on GQL. Either some data wasn't on the graph and we need it; or a mutation wasn't there which is necessary; or the data is there, and it worked locally, but in production for customer X its too slow because performance, so we need backend optimizations.
(5) (Extension) These dynamic views save on internet round-trips.
This is a complex one. If you want the spoilers: distributed systems are insanely complex, and I'm hesitant to say graphql is "better"; its just different, maybe better in some situations, maybe worse in others.
First: there are non-zero advantages to having the front-end make two+ API calls (especially if they can be done in parallel, which isn't always possible I fully recognize). The main one is horizontal scaling; the difference between a GraphQL request 3 layers deep, and 3 API calls, is that those 3 API calls can be handled by 3 different service replicas.
We have, on the API team, on too many occasions to count, done an audit of the GraphQL requests hitting our servers, and found absolute monstrosities which destroy P95 response times. We may, then, implement some kind of limiter; you can only go X layers deep, or select Y number of fields, or whatever. These are, to my eyes, weird! Its like saying "yea GraphQL is open ended, request whatever you want, combine two calls into one, uh oh you requested too much, you must have missed that sidenote in our documentation, now you have to make two serial calls, and we're back to REST". You can blame this on a poorly designed API; but that's kind of my point; GraphQL encourages poorly designed APIs.
To be clear, so does REST. Again, my point: GraphQL isn't better.
I say "too many occasions to count" because determining what a "good" GraphQL request "limiter" is, is an intractable problem. Should we allow 3 layers deep? 4? 20 fields per object? Doesn't it differ depending on the object; some are more expensive than others? How do we encode that "expensiveness"? There are tools which help with this; some of these ask you to assign a "expensiveness" score to every query, mutation, object, etc, and then it allows each incoming query an aggregate "maximum expensiveness" before rejecting the request. Literally. Try communicating that to your users! How?!
Moreover, if it wasn't already complicated; its so, so difficult to know where the limiter should be placed, for a given API, before the API is already in use and "broken" by someone. Think about that for a second; we have to circle back and say "yeah, we know that request was ok, if not slow, yesterday; but today we have to break our API for you." This is GraphQL. Its a constant battle due to the combinatorial expansion of ways users can access your API.
The second point I want to drive into here is: caching. Its very clear that one internet roundtrip is better than two, all else being equal. But GraphQL doesn't make all else equal. One of the fantastic things about more standard HTTP APIs is their high-cacheability. GraphQL has a strong client caching story (though, every six months our frontend teams go heads-down on upgrading to the latest major version of whatever caching library we use; its a nightmare for them). But what about edge caching?
GraphQL is DESIGNED to not be cacheable. Edge caches can't parse the internal structure of the request bodies (and even if they could, its unclear if they'd want to; its such a complex language). Identical requests can be structured entirely differently; did they order fields differently, did they name the operation differently, parameters in different places, etc. HTTP APIs can experience some of this (example: query parameter positioning), but most edge caches can handle that; edge caches can't handle GraphQL.
So the question really isn't: is one network hop better than two? Its: is two network hops, to an edge cache ten miles away with an 80% hit rate, better than one to my data center in Ashburn Virginia? NOW its less clear that one is better than the other. There's too many variables to say for sure.
Here's some other random negatives of GraphQL I'll throw on:
* N+1 queries by default. Someone will link some other library you have to add on which "fixes" this (where, fixing it really means, just adding more work for the backend teams on every new API, every library update, every library CVE, at some point we'll figure out that more code rarely solves problems on any time horizon except "right now").
* There are practically speaking no mature GraphQL servers outside of javascript. I legitimately don't know any major organizations who write the majority of their backend code in javascript. I know tons of minor organizations who do, because their hiring problems outweigh their technical ones. The major orgs I know who deploy GraphQL APIs do a gateway pattern; a thin shim of JS in front of their other networked traditional APIs. Ok, fine; not exactly transformative. Schema stitching is a joke; a solution in search of a problem extremely few have.
* That being the case: in Apollo Server + TypeScript; resolvers are extremely difficult to get strong typing on, and make it extraordinarily difficult to get contextual information about other layers of the request within a resolver. "What object did my parent's parent's resolver resolve to?" sometimes that needs to be answered. In REST-land; it would just be two-or-three requests; the answer is in the grandchild request's parameters, and orchestrated by the frontend. This is easy. But the frontend wants to turn an easy problem into a problem that requires no thought, and in doing so forces the backend to take an easy problem to a difficult problem.
Look; this a pattern you see all over GraphQL once you start using it; its invented by frontend people to make their lives easier, at any expense, including raising the net complexity of the overall system. Its a layer on top of HTTP. So, to convince me to adopt it over standard HTTP, it needs to simply be better. And I'm not convinced of that. I was when we adopted it. But we've learned better since then.
The author never had to use nested queries/paginations that need to perform so he just states that problem does not exist. Well, that's one way to address it...
It really sounds like NoSQL reincarnated. People jump on the bandwagon, oops this wasn't the ride we were looking for, just as was with git but git being too late to get off already running at full speed.
I agree, CRUD on the backend becomes trivial and it scales pretty well .
I had very pleasant experience bolting Hasura on top of existing Postgres db with couple of hundred tables and about 1tb of data. The query performance was fine and sometimes actually better in few places .
Across the board, there was a huge boost in render times by avoiding verbose round trips of HATEOS style discovery.
Only annoyance I had is Hasura not having a proper caching layer or replica support in the OSS version, even on their cloud it is limited to 100kb and does not support session variables - almost all queries are user specific in a app so without session variables it is not useful at all .
However it can be handled by putting in proxy with cache capabilities in front of Postgres.
—-
Having said that, I am not yet confident that it will scale well for public APIs with arbitrary query patterns unless more limits on nesting etc can be enforced, but for most internal APIs it should be great .
Hasura has also support for REST endpoints in 2.x so that is an option
if selling /delivering API is the primary value of the team/business perhaps no generated backend will solve out of the box all the challenges
GraphQL allows backend teams to retain control on how data is aggregated, and doesn't couple the database schema to the API's representation of the data. The frontend can get exactly what it needs when it needs it, without letting frontend engineers go totally wild and take down the DB (unless the backend engineers let them).
Why not expose a replicated database with read only permissions? On the write side, are people really making use of the flexibility of GraphQL? Because I can't think of a great use case for that, versus just opening up your database.
Avoiding coupling the database schema is reasonable, but IME, APIs get stuck in backwards compatibility land and stop making sense anyway. Maybe this is better solved with good views on the read side, and perhaps the write side is open for innovation at the database level?
Like people advocating sqlite in busy prod settings, this is squarely "a few rare people swear by this" territory. I do think it's underused, but a fair bit of a footgun (similar to running sqlite at its very limits imo)
My problem with graphql is that it makes part of the stack opaque for backend developers. You can't just inspect network requests in the browser and parsing graphql and front end code to understand what's being requested and how is a huge pain for debugging.
I have some experience using Hasura.
I've had a fantastic experience using GraphQL on the front end, but have found it confusing and challenging to configure everything properly on the back end.
Hasura takes care of all the back end setup, you use it simply to configure your own api. You can either self-host, or use them as a service (I believe they're built on top of aws).
For more context, I used it to build a react native app, and used apollo codegen to generate typescript types for all my queries, mutations, and subscriptions
The free version doesn't support response caching, only the enterprise one. If you reach any kind of scale it's a one-way ticket to enterprise, or a complete rewrite, at least last I checked.
I too wish hasura supported query caching in OSS. Not sure if you need to rewrite exactly, though. You could put something like GraphCDN in front of it or write your own cache layer that works similar to GraphCDN.
I think the poster might have meant to reply to the top comment:
> "On the other hand, when you are the one to implement the Graphql server, it feels like writing your own database. You have to create a query plan, handle optimizations, corner cases, etc."
But GraphQL as a solution is not great. It looks nice in the first place, but there is too much hassle to deal with.
P.S. I've seen quite a fraction of GraphQL projects actually using Node.js as the backend. If that's the case I would recommend TRPC[0] over GraphQL - it's more seamless and straightforward.
[0] https://trpc.io/