> A couple years later, they want to upgrade the validator now that supports the 2025 specification. However in that 2025 specification, we added database-field-id as a keyword, and its value is expected to be a string. Suddenly the user's schema is no longer valid. We've broken a user by adding a new keyword.
No you haven't. They chose to change. Same as if they change to another schema format altogether (eg XML). Is this implying a user is going commit to a schema (pragmatic laziness, notwithstanding) that they haven't read? Regardless, their following solution baffles...
> Therefore, no keyword addition can be considered safe, making forward compatibility impossible to guarantee.
Correct.
> Using the vocabulary requires writing a custom meta-schema that adds the vocabulary URI and references the associated meta-schema, and referencing that custom meta-schema in the schema.
I just can't fathom this ceaseless drive toward cruft. Now a meta-vocabulary (a sub-schema). That's supposed to be better? Change will come, in 10 years, 50 years, etc. Semantic versioning normalizes it in the json-schema itself. It's as good a system as any and definitely better than adding another versioned dependency.
> But many people also said they would be less bothered if there was a defined migration path and tooling to help.
If things were left as-is (allowing unknown keywords), it would be straightforward to write a tool that checks "upgrade compatibility" for a schema. If it's using keywords that were unknown under the old JSON Schema version but are now supported, the schema must be fixed before the JSON Schema version can be upgraded.
Adding comments to json is the one breaking change the world needs. As a parser if you stumble over a comment it's your fault! nothing in the world could stop you from just sanely ignoring comments.
JSON5 also allows extra things like single-quoting strings and multiline strings. VS Code uses JSONC which is strictly trailing commas and comments on top of JSON
Counterpoint: In practically every language that supports comments, these "comments" are now used for language extensions (IDE folding markers, eslint-ignore, FPGA synthesis hints, ...)
What exactly is wrong with that? Comment hints like that are quite common and with a few exceptions (synthesis hints) they are all optional hints that have no effect on the main interpretation of the code. IDE folding markers aren't going to hurt anyone. Chrome doesn't care about lint waivers. Etc.
This is a clear case of blaming the tool for its misuse. A hammer helps you build things, but you can also break your thumb with it. Yet many people keep hammers at home.
When you're designing a specification (I mean, any code, really, but specifications especially), anticipating how users will use it and trying to guide them down the path of -not- misusing it is part of getting the design right.
Sometimes deliberately leaving a potential feature out makes for a better end result. Sometimes it doesn't. But either way it's better to make the choice deliberately.
I do see what you're getting at but in the SQL case an unhinted statement can potentially produce such a terrible plan it invariably times out and triggers a 500 from the service making the query.
This sort of thing somewhat blurs the boundaries between affecting performance and affecting behaviour.
I'm not claiming that there's an obviously right answer here, only that there can be more questions about whether something is a good idea on net than people always consider.
Note: It would not surprise me to find that the actual motivation for leaving comments out of JSON went something like:
1) We can't have line comments because we don't want to be newline sensitive that way
2) If we do /* ... */ style comments people -will- write incompatible parsers for them no matter how carefully we specify them
3) Argh
but I wasn't there at the time, so the question will have to remain open.
> I think folding/lint/etc. markers are poor examples here, because they're
> basically annotations for other tools which seems entirely fine.
>
> However comments that affect the behaviour of the code itself are really
> much less fun to deal with.
I think there is no clear distinction between "hints for other tools" and "the behavior of the code itself" unless you implicitly assume a set of tools that define "the behavior of the code itself".
You could argue that HDL synthesis hints don't change the behavior of the code, nor do SQL optimizer hints.
I understand why you think that IDE folding hints are poor examples, but you would still force unrelated tools to leave those "comments" intact, possibly in cases where there is no clear definition of "intact" (massive transformations to the JSON structure). xkcd "spacebar heating" applies: https://xkcd.com/1172/
> For a format designed to be human readable, JSON not supporting comments is a major let down.
Comments were an explicit anti-feature. Douglas Crockford in 2012:
> I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability.
A choice can be a good choice and still lead to worse results.
As things are now trailing commas and comments would make JSON a better format, whether they were a good compromise at the time is not really the point.
JSON supports numbers of arbitrary and just fine. It's just many reader/writers of JSON does not, and those are so ubiquitous that it's often not practical to use.
Unfortunately in practice you can’t rely on large numbers being passed through correctly because of this, unless you control both ends and don’t use js.
So, if a schema identifies as conforming to the 2020-12 meta-schema, why would a new "database-field-id" keyword in the 2025 upgrade be an issue? The 2025 parsers should know that it's not a keyword in the 2020-12 meta-schema, and treat it the same as they always did, shouldn't they?
But also - as an alternate to underscore prefixes, could new keywords just reuse the "$" prefix as in "$schema"?
XML Schema is a fiendishly complex, awkward, and verbose standard, but it is very much in use. Personally, I hate it, but I have used it fairly often.
I would like it to succeed, and will use it, if it does; even if it’s ugly. If it doesn’t work, I won’t use it.
My experience tells me that it can be next to impossible to predict what will and won’t work, and there’s probably no replacement for actually putting it out there and trying.
My experience also tells me that the most certain way to ensure failure, is to try predicting the future, and engineering for a very specific one.
Personally, I have had good luck, with an “heuristic” approach, where my designs act as a “lattice” for growth, and I evolve it, as I see the direction the project takes.
The main thing, in my experience, is to never have to go back and change stuff that has already been done. I may do new stuff, in the future, that breaks the API, but I should always honor The Old Ways.
> The main thing, in my experience, is to never have to go back and change stuff that has already been done. I may do new stuff, in the future, that breaks the API, but I should always honor The Old Ways.
I just realized that sentence doesn't make sense.
I meant that people will build on an API, and make assumptions to "fill in the gaps."
As long as there's nothing in the current API that "codifies" these assumptions, I consider it OK, to add stuff that breaks the assumptions, but I won't go and make changes to stuff I've codified.
An example is some work I was doing, with an Apple UIKit API. I was calling a set of functions in a particular order. Apple released a system update that reset one of the settings, so I needed to move the call that affected that setting, to the end of the chain. They never specified the order (nor did they do so, after that, even though they probably should have), so I had no room to complain.
This takes JSON schema from a fairly lightweight, flexible definition to one of the most rigid. As much as I have a hairs time with dynamic languages, I can’t get with this. Feels contrary to the spirit of the JS ecosystem in the first place.
Makes CUE a much more attractive way to directly validate JSON - you can declare with struct a are open or closed as part of the definition.
My issue with Cue is that there's no actual way to use it as a schema. At least I couldn't find a way to have one cue document link to another cue document so that e.g. IDEs can use the information.
Also the only language Cue supports is Go at the moment which makes it a bit of a non starter IMO.
It does have a quite elegant design, but I'm not sure it really solves enough to be worth the hassle, especially given that it doesn't attempt to solve the "make config less tedious using functions" problem.
We have union types defined in fastapi/pydantic and they do show up in the api explorer. Tagged unions don't work but I think that's more to do with us/python than openapi.
Ah, that's nice. I think a year or so ago I was dealing with api docs that'd been run through Swagger and it just stopped emitting fields when it hit union types.
I could never bring myself to take JSON and family seriously. This is just another indication. I mean they didn't consider that it's not possible to have things that escape validation being made to validate in the future version? And these are the people which are making the validator software?
Instead of making a brutal narrow spec that snubs ease of use & extensibility, just to create a "last breaking change" forever spec, the spec should definitely just keep dropping new major versions when it needs to change.
This is an ugly & rough conceitedness propsed here: no one can do anythint, because we reserve the right to do anything, and we dont want to have to drop a new major version to do it.
Craven attitude. Building ultra-strict narrow protocols like this should be highly highly discouraged. This is such a mean & awful tradeoff to foist upon people; I cant see any advantage to making a brittle, narrow, but forever "safe" little path like this. Be braver.
It's a terrible & self-interested & small-minded design philosophy making awful trade-offs. It's an actively dangerous way of building, to the exclusion of allowing possibility, and for false gain ("no new major versions but it will totally change maybe & no one else has freedom").
I use strong terms because this breaks so many internet ethos, logos, & pathos. Picking the narrowest path should be warded & hazarded against in strong terms. This reaction is emotional because it's terrible terrible choices.
Postel's Law[1] has come under fire from some pretty conservative-minded viewsets (and frankly I'be rarely been persuaded by the angst against permit-if-you-can), but this is a next level assault on loose-coupling, a stark new frontier in reserving almost all possibility for yourself. My description while strong is accurate in many ways: this is brutally narrow. This is an extremeist perspective that json schema is embarking towards, snubbing basic protocol design & rejecting a non-harmful extensibility that cost them nothing (besides needing to tick a major version to make a major change).
They've explicitly said they're looking at finding a way to still be able to express custom properties.
So done right the end result will be basically no different from the use of X- headers in HTTP and friends.
What they're talking about doing could easily be -implemented- badly, but your reaction to the idea of doing it at all seems rather more exothermic than indicated by the situation.
RFC 6648 deprecated the X- header prefix. It turned out to be ridiculously hand-wringing a concern & just makes a noisey prefix that still results in name clashes anyways.
I wish they'd actually elaborated like a dozen cases in the wild, that show there's a real problem here & why they'd do this. I agree, Im having a strong reaction to this. It really seems like any workarounds that happens here are going to be ugly, going to be gross, and it feels like such a false need, such a non-issue to make new problems over.
Can anyone mention some interesting projects/use-case for JSON Schema?
I have terrible WSDL flashbacks from a past project, where the abstraction layer was always either too high (something simpler was better) or too low (and written documentation was better).
> Can anyone mention some interesting projects/use-case for JSON Schema?
finally come back to WSDL and SOAP.
sorry, you may have "terrible flashbacks", but I enjoyed those days, where I could have clients and server code auto generated, with automatic validation and so on, and I felt all these "unstructured" data sent over json as "dark ages".
I'm tired of writing again yet another rest client payload mapping by hand...
I actually did miss the feeling of being safe in the serialization/deserialization of my data when JSON became widespread, you're right.
These flashback were specifically about WSDL maintenance by hand (it was not very human readable), and how the existence of a schema was used as an excuse to not actually document the meaning of certain fields.
I guess it depends a lot on people, but it seems like most systems need to be robust to idiots to some degree the more widespread they become.
Here is one: OpenMetadata uses it extensively to help you catalog the various types of (software-related) entities you have in your organization. Each entity (tables, alerts, api endpoints..) has been given a set of fields that you can populate (https://docs.open-metadata.org/main-concepts/metadata-standa...). The schema language used for this is JSON Schema.
It's much the same as an XML schema, or K8s CRDs that have a schema embedded into the definition - schema are very useful for ensuring your data structure is valid.
You can use the schema to validate that data is structured correctly, then apply business rule validations afterwards.
And most importantly, people creating data you're ingesting have a thorough schema to validate against.
The JSON Schema documentation was rather terrible for a long time, but has significantly improved. Now the fun part is making sure that the version of JSON Schema you're using is supported by the library you're using.
But JSON Schema is far better than OpenAPI because the latter is mostly focused on API description, which adds a lot of noise when you're trying to simply describe a data structure.
And OpenAPI is also hampered by the tight binding to Swagger. For a long time OpenAPI disallowed union types because it caused issues with Swagger's codegen.
I think that latest versions of OpenAPI allow union types, but I don't think Swagger supports those versions yet. At least they didn't the last time I looked.
That's wishful thinking. I can find a lot of differences, but want to point out this one to underscore the difference in approach and level of sophistication.
So, in XSL (which is one of the XML schema languages), there's a sequence element. This element describes how tags are supposed to appear in certain order. The analogous structure in JSON would be the dictionary keys, which are also allowed to repeat... but the schema doesn't even cover that option! To make this more concrete, below, is a valid JSON:
{ "x": 1, "x": "2" }
But there's no way of describing in the schema what happens here. Not only this, XSL can restrict the number of times a certain element may repeat... but JSON schema cannot still figure out what to do about this.
There are plenty of things that can be implemented in various XML schema languages, but have nothing analogous to them in JSON schema. Another example would be ensuring uniqueness of some particular type of values across the entire document.
Just because you can doesn’t mean you should. If someone told me tags must appear in a specific sequence for their markup to work, I call the implementation flawed. That’s been the downfall of XML - doing way too much for it’s own good.
This is where you are wrong. If you are making a validator, if the format can do something, the validator should deal with that possibility. If validator is incomplete wrt' validated features, it cannot be used to guarantee the document validity, which is... well, it's most important function.
It has nothing to do with your emotional attachments to tags and sequences.
I get where you're coming from, but I don't believe overloading a data exchange format with too many dimensions of meaning is sensible. In XML, you can use tags, sequences of tags, tag attributes, and children tags; there are a multitude of ways to express the same thing. Following your approach, we could also count the number of spaces tags are indented with as an additional dimension: It's entirely possible to write XML this way and infer meaning from it, it's just not specified right now. I doubt you'd follow along with that.
Let's look at the concrete reality we're dealing with: XML usage has declined, JSON (maybe supported by JSON schema) is ubiquitous. I'd be inclined to say the pendulum has swung a little too far in terms of simplicity - but overall, it's very clear why XML failed, and too few dimensions of meaning is definitely none of the reasons.
We use Elasticsearch to store a vast amount of JSON documents in a specific format. Elasticsearch uses a strict schema to validate new items. Those documents are generated and used by several downstream projects. We have a single JSON schema installable as a package in all of our projects which is used in unit tests, and converted to the Elasticsearch schema format for reindexing. Effectively, this ensures we only have compatible documents in the index; all applications running in prod are verified to be able to handle them.
Thus, we use JSON schema as an universally valid way to exchange the data schema.
A team of 10 devs I was a part of got lots of everyday value from JSON schemas (w/ code generation) being the single source of truth of the domain model for a Typescript frontend and Java backend.
I would absolutely choose JSON schemas again for any complex enough project that chooses typed languages for both backend and frontend. (And JSON as the protocol, naturally.)
If the JSON file had a version number for the spec, then could could support future versions without breaking it, ever. Just define that all future versions will include the previous versions. E.g., specify that if a file is meant to be compliant with the next JSON version, then it needs to start with
#JSON1
And the next one with #JSON2. And if a JSON2 parser finds a JSON1 file, it should interpret it as JSON1. If a JSON1 parser finds a JSON2 file, it should try its best to parse it anyway, because your changes are not going to arbitrarily break stuff, right?
This is a breaking change, too, but it may indeed be the last one...
> A couple years later, they want to upgrade the validator now that supports the 2025 specification. However in that 2025 specification, we added database-field-id as a keyword, and its value is expected to be a string. Suddenly the user's schema is no longer valid. We've broken a user by adding a new keyword.
No you haven't. They chose to change. Same as if they change to another schema format altogether (eg XML). Is this implying a user is going commit to a schema (pragmatic laziness, notwithstanding) that they haven't read? Regardless, their following solution baffles...
> Therefore, no keyword addition can be considered safe, making forward compatibility impossible to guarantee.
Correct.
> Using the vocabulary requires writing a custom meta-schema that adds the vocabulary URI and references the associated meta-schema, and referencing that custom meta-schema in the schema.
I just can't fathom this ceaseless drive toward cruft. Now a meta-vocabulary (a sub-schema). That's supposed to be better? Change will come, in 10 years, 50 years, etc. Semantic versioning normalizes it in the json-schema itself. It's as good a system as any and definitely better than adding another versioned dependency.