The "reinvention" is not complete and will never be necessary. The difference is...

bshacklett · on Feb 17, 2018

There's a lot of benefit to being able to state what keys may be specified in a certain location, though. Look at DSLs like Cloudformation, for instance. Having schema validation could make static analysis of this kind of code much easier to handle. E.g.: Fn::Sub may be used inside of Fn::Join, but the reverse is not true, regardless of the types "returned" by each. It's certainly possible to validate via the api, but being able to do it in my editor will make finding errors much faster.

To your other point, however, dynamic code generation is becoming much more common. AWS generates a huge amount of its code from JSON definitions across multiple languages to keep its SDKs up to date. I could see schema validation being valuable in this domain as well.

crdoconnor · on Feb 18, 2018

>There's a lot of benefit to being able to state what keys may be specified in a certain location, though.

There is. I find examples - snippets of XML/JSON - to be the best way of communicating this - not schema languages.

avmich · on Feb 17, 2018

> * Keep that schema file updated by hand every time something like Sudan breaking in two happens (no).

There is a lot of use for libraries dealing with time and dates. When you want to cover all cases, at some point you get to the situation when you have to allow variable number of seconds in a minute - not always 60, but sometimes 59 or 61, or may be even different numbers. And you don't know in advance - for arbitrary long future - which minutes will have which number of seconds.

So, for your timekeeping system to maintain precision, you have to allow external updates for when a minute will be considered non-60 seconds.

And those cases could happen more often than changing a list of valid country codes.

What would you do with time then?

crdoconnor · on Feb 17, 2018

Ideally, use a library that intelligently parses dates as part of a turing complete validator.

avmich · on Feb 17, 2018

The point is that you can't always avoid scenarios with keeping something updated. List of countries is another example.

crdoconnor · on Feb 18, 2018

The point isn't to avoid it. Of course it's inevitable - that was my point! The point is to use code to validate instead of some markup so that the programmer can use their judgment about how it should be delegated.

I wrote some example code below that shows how you can validate with list of countries in such a way that no code changes will be required when the list changes.

benjaminjackman · on Feb 17, 2018

That ability to use outside canonical sources is really interesting. Are there some existing examples of schema languages with that feature?

randallsquared · on Feb 17, 2018

JSON Schema, at least, can refer to a URI for the definition of something, and that URI can refer to only a specific section of the JSON document to which it points.

crdoconnor · on Feb 17, 2018

The point I was making was that you shouldn't use a "special" language for validation at all - you should just use a library in a regular language to do it.

Anyway, code:

yaml_text:

   John: Yemen
   James: South Sudan

python code:

   from strictyaml import load, MapPattern, Str, Enum
   import pycountry

   result = load(
       yaml_text,
       MapPattern(
           Str(),
           Enum([country.name for country in pycountry.countries]),
       )
   )

full disclosure: I wrote the validation library ^^

Someone · on Feb 17, 2018

The idea behind XML schema, DTD, etc. is to pick a simple language to express schemas in, so that implementations in different languages have a decent chance of being compatible with each other.

Python isn’t a good choice there, as it is too flexible. For example, that code could have gotten the list of allowed country names from a file, database, or URL.

⇒ If I have to send such json to you, I almost would have to write my program in python, and even then, it could be hard for me to replicate your setup.

crdoconnor · on Feb 17, 2018

>that code could have gotten the list of allowed country names from a file, database, or URL.

That is exactly the point. You should be able to do that, because the canonical list of data could easily from any of those and it should the up to programmer's discretion how to fetch it.

The point of validation is to prevent invalid data from slipping through a net at minimum cost and that's how you do that.

Suden, Sudaan and South Sudan were all invalid countries in 2010 and that YAML was invalid. In 2012, Suden and Sudaan were invalid but South Sudan was not so that YAML was valid.

In the above example you have to make no code changes in order to account for that - just update pycountry every so often.

With XML schemas and DTDs either you don't validate country at all (letting Suden and Sudaan) through the net. Or, you rewrite and redistribute the schema by hand every time some dependency like a list of countries changes.

>If I have to send such json to you, I almost would have to write my program in python

Only if I choose to validate that data using a shared schema. Frankly, I've dealt with XML a lot and the number of times I've been handed a shared schema of any kind is very low. People just don't seem to use them. If they define an API in XML for instance they tend to just send examples and give a written explanation (e.g. insert valid country name here).

I don't see much value in making a schema more inherently "shareable" especially not if it means it has to be re-released every month.