Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's a linked article discussing the shortcomings of YAML, as well: https://arp242.net/weblog/yaml_probably_not_so_great_after_a...

Reading the two screeds together, though, I think the author has ignored one key idea of both JSON and YAML as formats: their use-cases are complementary, and the "problems" with each of these two formats only arises when you're using one for the use-case best suited to the other.

YAML's use-case is for human-written configuration. That's why it has so many weird ways to write things: to allow humans to write some data down in a way that works best to communicate said data to other humans, in a way that machines can coincidentally read. (See also: AppleScript.)

And that's also why YAML is "so powerful that it's basically inherently insecure." You shouldn't be executing untrusted YAML any more than you should be executing an untrusted bash script. YAML is a language for the deploying sysadmins of a piece of software to configure said software with; it's not a language for the tenants of a system to communicate their needs to the system with.

JSON's use-case is for building machine data generators and parsers in a distributed system (like the web) where it is very important that the data be easily introspected by humans, and that humans will also be able to modify or otherwise "poke at" said data, in flight, using simple text-based tools. But the intention isn't that humans will ever author JSON "at scale." JSON is like HTTP/1.0: it's something that a human can write, in a netcat session to ensure things are working or to probe at a system's capabilities or responses; but where a human writing in said format is never going to be the "proper" way of doing things in production.

Now consider the subheadings of each screed:

"JSON as configuration files: please don’t"

- Lack of comments

- Readability

- [Too Much] Strictness

Obviously, if these are your problems, then you're a human (probably a sysadmin) hand-rolling configuration. Use YAML.

"YAML: probably not so great after all"

- Insecure by default

- Can be hard to edit, especially for large files

- It’s pretty complex

- Surprising behaviour

- It’s not portable

None of these matter if you're a human sysadmin who is interactively hand-rolling a configuration file for a piece of software. YAML "doesn't scale", but configuration files aren't a domain that requires thousands of lines or huge, deep hierarchies. And the complexity, surprises, and portability don't matter if you're not trying to "shout YAML into the void" without knowledge of what the target system is.

If those are your use-cases—if you need scalable, portable, secure, predictable data—then you're definitely not writing a configuration file, but rather are just interchanging data of some kind. Use JSON.

---

Now, the funny thing to me is, JSON is a subset of YAML.

(And that's why https://leebriggs.co.uk/blog/2019/02/07/why-are-we-templatin... was written: nobody should be templating YAML when machines are perfectly capable of emitting JSON that YAML parsers will happily accept.)

Really, rather than talking about which format to use, what this discussion should be about—in a better world than the one we're in—is the use of "trusted mode" flags in YAML parser configuration.

Ideally, you should be able to take a YAML parser, and tell it, per input stream, that "this is the hand-rolled sysadmin kind of input, so parse it using the full feature-set of YAML"; or "this is the J. Rando user kind of input, so parse it like a JSON parser with maybe one or two more features" (ala StrictYAML.)

That would solve most of the complaints the author has, on both sides: if config systems used YAML parsers but only configured them to accept a strict subset of said YAML (either StrictYAML or plain JSON) for any input they expected to come from anywhere other than a sysadmin, then there wouldn't even be a discussion here. You'd use a plain JSON parser for (efficiently, securely) receiving plain data; but for configuration, a YAML parser would be the obvious choice, because it could always be locked down.

But in the world we live in, YAML parsers don't have security features like this.

I'd make a suggestion of an alternative, which in pseudocode looks like this:

    opt.parse('-c', '--config [path]', Path) do |path|
      this.config = JSON.parse(path)
    end

    opt.parse('-C', '--config-trusted [path]', Path) do |path|
      this.config = YAML.parse(path)
    end


> But in the world we live in, YAML parsers don't have security features like this.

I'm the author of the "yaml" npm package, and would be interested to hear more precisely what security features you'd wish for a YAML parser to have. In other words, if there's one I've missed, I'd like to fix that.

https://eemeli.org/yaml/


Having the parser play "guess the type" like YAML can cause problems in almost any use case, though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: