The Last Breaking Change

Supermancho · on March 5, 2023

SMH

> A couple years later, they want to upgrade the validator now that supports the 2025 specification. However in that 2025 specification, we added database-field-id as a keyword, and its value is expected to be a string. Suddenly the user's schema is no longer valid. We've broken a user by adding a new keyword.

No you haven't. They chose to change. Same as if they change to another schema format altogether (eg XML). Is this implying a user is going commit to a schema (pragmatic laziness, notwithstanding) that they haven't read? Regardless, their following solution baffles...

> Therefore, no keyword addition can be considered safe, making forward compatibility impossible to guarantee.

Correct.

> Using the vocabulary requires writing a custom meta-schema that adds the vocabulary URI and references the associated meta-schema, and referencing that custom meta-schema in the schema.

I just can't fathom this ceaseless drive toward cruft. Now a meta-vocabulary (a sub-schema). That's supposed to be better? Change will come, in 10 years, 50 years, etc. Semantic versioning normalizes it in the json-schema itself. It's as good a system as any and definitely better than adding another versioned dependency.

salmonellaeater · on March 5, 2023

> But many people also said they would be less bothered if there was a defined migration path and tooling to help.

If things were left as-is (allowing unknown keywords), it would be straightforward to write a tool that checks "upgrade compatibility" for a schema. If it's using keywords that were unknown under the old JSON Schema version but are now supported, the schema must be fixed before the JSON Schema version can be upgraded.

smallstepforman · on March 5, 2023

For a format designed to be human readable, JSON not supporting comments is a major let down. And not supporting trailing comma.

singularity2001 · on March 5, 2023

Adding comments to json is the one breaking change the world needs. As a parser if you stumble over a comment it's your fault! nothing in the world could stop you from just sanely ignoring comments.

alpaca128 · on March 5, 2023

JSON5 exists and allows both comments and trailing commas. It's basically like defining objects in plain JavaScript syntax.

jgalt212 · on March 5, 2023

what's the difference between JSON5 and the config language that vs code uses ,JSON + comments (as best I can tell).

nicoburns · on March 5, 2023

JSON5 also allows extra things like single-quoting strings and multiline strings. VS Code uses JSONC which is strictly trailing commas and comments on top of JSON

moring · on March 5, 2023

Counterpoint: In practically every language that supports comments, these "comments" are now used for language extensions (IDE folding markers, eslint-ignore, FPGA synthesis hints, ...)

Please don't do that to JSON.

IshKebab · on March 5, 2023

What exactly is wrong with that? Comment hints like that are quite common and with a few exceptions (synthesis hints) they are all optional hints that have no effect on the main interpretation of the code. IDE folding markers aren't going to hurt anyone. Chrome doesn't care about lint waivers. Etc.

mst · on March 5, 2023

Things like comments being used to provide optimiser hints in SQL queries are the bane of people writing parsers or anything that depends on them.

I think folding/lint/etc. markers are poor examples here, because they're basically annotations for other tools which seems entirely fine.

However comments that affect the behaviour of the code itself are really much less fun to deal with.

Worrying that JSON implementations would end up acquiring the latter is pretty fair given our track record as a species.

alwaysbeconsing · on March 5, 2023

This is a clear case of blaming the tool for its misuse. A hammer helps you build things, but you can also break your thumb with it. Yet many people keep hammers at home.

mst · on March 6, 2023

When you're designing a specification (I mean, any code, really, but specifications especially), anticipating how users will use it and trying to guide them down the path of -not- misusing it is part of getting the design right.

Sometimes deliberately leaving a potential feature out makes for a better end result. Sometimes it doesn't. But either way it's better to make the choice deliberately.

IshKebab · on March 5, 2023

> comments that affect the behaviour of the code itself...

I agree, but those are extremely rare in my experience. SQL optimiser hints and HDL synthesis hints only affect performance.

What examples are there of comments affecting actual behaviour such that tools must parse them for the code to even run?

I can't think of any.

mst · on March 6, 2023

I do see what you're getting at but in the SQL case an unhinted statement can potentially produce such a terrible plan it invariably times out and triggers a 500 from the service making the query.

This sort of thing somewhat blurs the boundaries between affecting performance and affecting behaviour.

I'm not claiming that there's an obviously right answer here, only that there can be more questions about whether something is a good idea on net than people always consider.

Note: It would not surprise me to find that the actual motivation for leaving comments out of JSON went something like:

1) We can't have line comments because we don't want to be newline sensitive that way 2) If we do /* ... */ style comments people -will- write incompatible parsers for them no matter how carefully we specify them 3) Argh

but I wasn't there at the time, so the question will have to remain open.

moring · on March 5, 2023

> I think folding/lint/etc. markers are poor examples here, because they're > basically annotations for other tools which seems entirely fine. > > However comments that affect the behaviour of the code itself are really > much less fun to deal with.

I think there is no clear distinction between "hints for other tools" and "the behavior of the code itself" unless you implicitly assume a set of tools that define "the behavior of the code itself".

You could argue that HDL synthesis hints don't change the behavior of the code, nor do SQL optimizer hints.

I understand why you think that IDE folding hints are poor examples, but you would still force unrelated tools to leave those "comments" intact, possibly in cases where there is no clear definition of "intact" (massive transformations to the JSON structure). xkcd "spacebar heating" applies: https://xkcd.com/1172/

throw0101c · on March 6, 2023

> For a format designed to be human readable, JSON not supporting comments is a major let down.

Comments were an explicit anti-feature. Douglas Crockford in 2012:

> I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability.

* https://web.archive.org/web/20120507104813/https://plus.goog...

* https://news.ycombinator.com/item?id=3912149 (discussion at the time)

And JSON primary purpose was to be a data interchange format; the fact that is human-readable text is a nice bonus.

afiori · on March 6, 2023

A choice can be a good choice and still lead to worse results.

As things are now trailing commas and comments would make JSON a better format, whether they were a good compromise at the time is not really the point.

grey-area · on March 5, 2023

What’s wrong with using a ‘comment’ key?

I’d say not supporting large ints is worse.

https://stackoverflow.com/questions/209869/what-is-the-accep...

Uvix · on March 5, 2023

A comment key works for commenting JSON objects, but not for commenting specific keys on an object or values in an array.

grey-area · on March 6, 2023

I don’t find that a huge limitation personally. I don’t think data formats should allow comments on values, that is what docs are for.

noctune · on March 5, 2023

JSON supports numbers of arbitrary and just fine. It's just many reader/writers of JSON does not, and those are so ubiquitous that it's often not practical to use.

grey-area · on March 6, 2023

Unfortunately in practice you can’t rely on large numbers being passed through correctly because of this, unless you control both ends and don’t use js.

olejorgenb · on March 5, 2023

AFAIK, json does not support NaN and Inf which can be a pain sometimes

solarkraft · on March 6, 2023

It's pretty inconvenient and ends up in the data, rather than being external to it.

speedgoose · on March 5, 2023

You have the superset JSON5 https://json5.org/ but it’s quite rare compared to JSON.

Null-Set · on March 5, 2023

I know the maintainer of the JSON spec, his main job is to make sure no one changes it to add these.

junon · on March 5, 2023

Underscore prefixes like C.

Dedicated property to act as kitchen sinks, e.g. "custom", that says "anything in here is implementation specific".

Simply not allowing custom keywords at all, also a good option. Have users write higher level schemas that compile down to jsonschema.

All viable options.

Karellen · on March 5, 2023

The thing that confuses me is:

> A keyword is "known" if it is defined by a vocabulary listed in the schema's meta-schema (identified by the value in $schema).

    {
      "$schema": "https://json-schema.org/draft/2020-12/schema",

So, if a schema identifies as conforming to the 2020-12 meta-schema, why would a new "database-field-id" keyword in the 2025 upgrade be an issue? The 2025 parsers should know that it's not a keyword in the 2020-12 meta-schema, and treat it the same as they always did, shouldn't they?

But also - as an alternate to underscore prefixes, could new keywords just reuse the "$" prefix as in "$schema"?

mst · on March 5, 2023

I've always been fond of _x_ as a prefix for such things.

ChrisMarshallNY · on March 5, 2023

Well, time will tell.

If they get it right, people will use it.

If they don’t, it will wither on the vine.

Pretty darwinian, but This Is The Way.

XML Schema is a fiendishly complex, awkward, and verbose standard, but it is very much in use. Personally, I hate it, but I have used it fairly often.

I would like it to succeed, and will use it, if it does; even if it’s ugly. If it doesn’t work, I won’t use it.

My experience tells me that it can be next to impossible to predict what will and won’t work, and there’s probably no replacement for actually putting it out there and trying.

My experience also tells me that the most certain way to ensure failure, is to try predicting the future, and engineering for a very specific one.

Personally, I have had good luck, with an “heuristic” approach, where my designs act as a “lattice” for growth, and I evolve it, as I see the direction the project takes.

The main thing, in my experience, is to never have to go back and change stuff that has already been done. I may do new stuff, in the future, that breaks the API, but I should always honor The Old Ways.

ChrisMarshallNY · on March 5, 2023

> The main thing, in my experience, is to never have to go back and change stuff that has already been done. I may do new stuff, in the future, that breaks the API, but I should always honor The Old Ways.

I just realized that sentence doesn't make sense.

I meant that people will build on an API, and make assumptions to "fill in the gaps."

As long as there's nothing in the current API that "codifies" these assumptions, I consider it OK, to add stuff that breaks the assumptions, but I won't go and make changes to stuff I've codified.

An example is some work I was doing, with an Apple UIKit API. I was calling a set of functions in a particular order. Apple released a system update that reset one of the settings, so I needed to move the call that affected that setting, to the end of the chain. They never specified the order (nor did they do so, after that, even though they probably should have), so I had no room to complain.

abraae · on March 5, 2023

Admiring the chutzpah of declaring this will be the "last breaking change". We'll see!

And perhaps json schema should borrow from http header conventions of old and allow unknown elements if prefixed in some way e.g:

X-my-funky-element: true

crabbone · on March 5, 2023

It's probably "last" as in "latest".

superb-owl · on March 5, 2023

I prefer letting people use arbitrary keys, with a _caveat emptor_.

It's much cleaner and easier to use. And it the vast majority of cases there will never be a collision.

You can also pin your schema version with `$schema`, so breaking changes should be opt-in only.

joeblubaugh · on March 5, 2023

This takes JSON schema from a fairly lightweight, flexible definition to one of the most rigid. As much as I have a hairs time with dynamic languages, I can’t get with this. Feels contrary to the spirit of the JS ecosystem in the first place.

Makes CUE a much more attractive way to directly validate JSON - you can declare with struct a are open or closed as part of the definition.

IshKebab · on March 5, 2023

My issue with Cue is that there's no actual way to use it as a schema. At least I couldn't find a way to have one cue document link to another cue document so that e.g. IDEs can use the information.

Also the only language Cue supports is Go at the moment which makes it a bit of a non starter IMO.

It does have a quite elegant design, but I'm not sure it really solves enough to be worth the hassle, especially given that it doesn't attempt to solve the "make config less tedious using functions" problem.

WirelessGigabit · on March 5, 2023

Do it like OpenAPI does.

Separate structure from properties and type. And if you want a fixed value your type is a value.

That way you don't mix the JSONSchema's schema is separate from the thing you're describing.

EdwardDiego · on March 5, 2023

Does OpenAPI support union types yet?

More importantly, if it does, does Swagger support the OpenAPI version that does?

andreareina · on March 5, 2023

We have union types defined in fastapi/pydantic and they do show up in the api explorer. Tagged unions don't work but I think that's more to do with us/python than openapi.

EdwardDiego · on March 5, 2023

Ah, that's nice. I think a year or so ago I was dealing with api docs that'd been run through Swagger and it just stopped emitting fields when it hit union types.

crabbone · on March 5, 2023

I could never bring myself to take JSON and family seriously. This is just another indication. I mean they didn't consider that it's not possible to have things that escape validation being made to validate in the future version? And these are the people which are making the validator software?

rektide · on March 5, 2023

Instead of making a brutal narrow spec that snubs ease of use & extensibility, just to create a "last breaking change" forever spec, the spec should definitely just keep dropping new major versions when it needs to change.

This is an ugly & rough conceitedness propsed here: no one can do anythint, because we reserve the right to do anything, and we dont want to have to drop a new major version to do it.

Craven attitude. Building ultra-strict narrow protocols like this should be highly highly discouraged. This is such a mean & awful tradeoff to foist upon people; I cant see any advantage to making a brittle, narrow, but forever "safe" little path like this. Be braver.

EdwardDiego · on March 5, 2023

> brutal

> ugly & rough conceitedness

> Craven attitude

> mean & awful

Why do you phrase it in such emotive terms?

rektide · on March 5, 2023

It's a terrible & self-interested & small-minded design philosophy making awful trade-offs. It's an actively dangerous way of building, to the exclusion of allowing possibility, and for false gain ("no new major versions but it will totally change maybe & no one else has freedom").

I use strong terms because this breaks so many internet ethos, logos, & pathos. Picking the narrowest path should be warded & hazarded against in strong terms. This reaction is emotional because it's terrible terrible choices.

Postel's Law[1] has come under fire from some pretty conservative-minded viewsets (and frankly I'be rarely been persuaded by the angst against permit-if-you-can), but this is a next level assault on loose-coupling, a stark new frontier in reserving almost all possibility for yourself. My description while strong is accurate in many ways: this is brutally narrow. This is an extremeist perspective that json schema is embarking towards, snubbing basic protocol design & rejecting a non-harmful extensibility that cost them nothing (besides needing to tick a major version to make a major change).

[1] https://en.m.wikipedia.org/wiki/Robustness_principle

mst · on March 5, 2023

They've explicitly said they're looking at finding a way to still be able to express custom properties.

So done right the end result will be basically no different from the use of X- headers in HTTP and friends.

What they're talking about doing could easily be -implemented- badly, but your reaction to the idea of doing it at all seems rather more exothermic than indicated by the situation.

rektide · on March 5, 2023

RFC 6648 deprecated the X- header prefix. It turned out to be ridiculously hand-wringing a concern & just makes a noisey prefix that still results in name clashes anyways.

I wish they'd actually elaborated like a dozen cases in the wild, that show there's a real problem here & why they'd do this. I agree, Im having a strong reaction to this. It really seems like any workarounds that happens here are going to be ugly, going to be gross, and it feels like such a false need, such a non-issue to make new problems over.

https://www.rfc-editor.org/rfc/rfc6648

vincnetas · on March 5, 2023

Are you saying that exact and precise specifications are bad? Can you elaborate, because i always thought that the opposite is bad.

mihaic · on March 5, 2023

Can anyone mention some interesting projects/use-case for JSON Schema?

I have terrible WSDL flashbacks from a past project, where the abstraction layer was always either too high (something simpler was better) or too low (and written documentation was better).

_ZeD_ · on March 5, 2023

> Can anyone mention some interesting projects/use-case for JSON Schema?

finally come back to WSDL and SOAP.

sorry, you may have "terrible flashbacks", but I enjoyed those days, where I could have clients and server code auto generated, with automatic validation and so on, and I felt all these "unstructured" data sent over json as "dark ages".

I'm tired of writing again yet another rest client payload mapping by hand...

mihaic · on March 6, 2023

I actually did miss the feeling of being safe in the serialization/deserialization of my data when JSON became widespread, you're right.

These flashback were specifically about WSDL maintenance by hand (it was not very human readable), and how the existence of a schema was used as an excuse to not actually document the meaning of certain fields.

I guess it depends a lot on people, but it seems like most systems need to be robust to idiots to some degree the more widespread they become.

wolpoli · on March 5, 2023

I agree, WSDL were great. There was no need to keep checking the API doc because the client codes were auto-generated.

If we can't have JSON Schema, perhaps we could have a large language model auto generate our client codes by ingesting the API doc.

pramodbiligiri · on March 5, 2023

Here is one: OpenMetadata uses it extensively to help you catalog the various types of (software-related) entities you have in your organization. Each entity (tables, alerts, api endpoints..) has been given a set of fields that you can populate (https://docs.open-metadata.org/main-concepts/metadata-standa...). The schema language used for this is JSON Schema.

EdwardDiego · on March 5, 2023

It's much the same as an XML schema, or K8s CRDs that have a schema embedded into the definition - schema are very useful for ensuring your data structure is valid.

You can use the schema to validate that data is structured correctly, then apply business rule validations afterwards.

And most importantly, people creating data you're ingesting have a thorough schema to validate against.

The JSON Schema documentation was rather terrible for a long time, but has significantly improved. Now the fun part is making sure that the version of JSON Schema you're using is supported by the library you're using.

But JSON Schema is far better than OpenAPI because the latter is mostly focused on API description, which adds a lot of noise when you're trying to simply describe a data structure.

And OpenAPI is also hampered by the tight binding to Swagger. For a long time OpenAPI disallowed union types because it caused issues with Swagger's codegen.

I think that latest versions of OpenAPI allow union types, but I don't think Swagger supports those versions yet. At least they didn't the last time I looked.

crabbone · on March 5, 2023

> It's much the same as an XML schema,

That's wishful thinking. I can find a lot of differences, but want to point out this one to underscore the difference in approach and level of sophistication.

So, in XSL (which is one of the XML schema languages), there's a sequence element. This element describes how tags are supposed to appear in certain order. The analogous structure in JSON would be the dictionary keys, which are also allowed to repeat... but the schema doesn't even cover that option! To make this more concrete, below, is a valid JSON:

    { "x": 1, "x": "2" }

But there's no way of describing in the schema what happens here. Not only this, XSL can restrict the number of times a certain element may repeat... but JSON schema cannot still figure out what to do about this.

There are plenty of things that can be implemented in various XML schema languages, but have nothing analogous to them in JSON schema. Another example would be ensuring uniqueness of some particular type of values across the entire document.

9dev · on March 5, 2023

Just because you can doesn’t mean you should. If someone told me tags must appear in a specific sequence for their markup to work, I call the implementation flawed. That’s been the downfall of XML - doing way too much for it’s own good.

crabbone · on March 5, 2023

> Just because you can doesn’t mean you should.

This is where you are wrong. If you are making a validator, if the format can do something, the validator should deal with that possibility. If validator is incomplete wrt' validated features, it cannot be used to guarantee the document validity, which is... well, it's most important function.

It has nothing to do with your emotional attachments to tags and sequences.

9dev · on March 5, 2023

I get where you're coming from, but I don't believe overloading a data exchange format with too many dimensions of meaning is sensible. In XML, you can use tags, sequences of tags, tag attributes, and children tags; there are a multitude of ways to express the same thing. Following your approach, we could also count the number of spaces tags are indented with as an additional dimension: It's entirely possible to write XML this way and infer meaning from it, it's just not specified right now. I doubt you'd follow along with that.

Let's look at the concrete reality we're dealing with: XML usage has declined, JSON (maybe supported by JSON schema) is ubiquitous. I'd be inclined to say the pendulum has swung a little too far in terms of simplicity - but overall, it's very clear why XML failed, and too few dimensions of meaning is definitely none of the reasons.

jspdown · on March 5, 2023

Open API schema objects are define using JSON Schema. The latest version even let you specify the JSON Schema dialect.

9dev · on March 5, 2023

We use Elasticsearch to store a vast amount of JSON documents in a specific format. Elasticsearch uses a strict schema to validate new items. Those documents are generated and used by several downstream projects. We have a single JSON schema installable as a package in all of our projects which is used in unit tests, and converted to the Elasticsearch schema format for reindexing. Effectively, this ensures we only have compatible documents in the index; all applications running in prod are verified to be able to handle them. Thus, we use JSON schema as an universally valid way to exchange the data schema.

ttiurani · on March 5, 2023

A team of 10 devs I was a part of got lots of everyday value from JSON schemas (w/ code generation) being the single source of truth of the domain model for a Typescript frontend and Java backend.

I would absolutely choose JSON schemas again for any complex enough project that chooses typed languages for both backend and frontend. (And JSON as the protocol, naturally.)

relequestual · on March 6, 2023

Here's a YouTube playlist of 6 case studies with companies of various sizes: https://www.youtube.com/watch?v=fkziMQD7pqQ&list=PLHVhS4Tj1Y...

There are also two written case studies on the JSON Schema blog: https://json-schema.org/blog/

IshKebab · on March 5, 2023

It makes editing JSON in IDEs way way nicer. Worth it for that alone IMO.

beeforpork · on March 5, 2023

If the JSON file had a version number for the spec, then could could support future versions without breaking it, ever. Just define that all future versions will include the previous versions. E.g., specify that if a file is meant to be compliant with the next JSON version, then it needs to start with

#JSON1

And the next one with #JSON2. And if a JSON2 parser finds a JSON1 file, it should interpret it as JSON1. If a JSON1 parser finds a JSON2 file, it should try its best to parse it anyway, because your changes are not going to arbitrarily break stuff, right?

This is a breaking change, too, but it may indeed be the last one...