Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You could say the same about JSON. It is arguably closer to how and AST is represented in software as it does not posses the two dimensional notation like XML does.

The unique thing about XML is that you can both have children and attributes. My guess it that this is to model OOP-based systems: Attributes are for the constructor or a certain class while the children represent dependency injections.

This is IMHO the weak point for XML: It gives too many levers. when we don't know how to assign meaning the the levers, we arrive at garbage like the example from another comment: <ssn ssn=“123”/><id>123</id>



"My guess it that this is to model OOP-based systems:"

No, SGML which became XML is a separate independent track from OOP. They both set themselves up in some concrete before they really encountered each other and the contact was a mess. This is part of why the DOM, especially the first couple of iterations, are so messy (reading DOM 1 is almost hilarious in hindsight, if you know what you're reading for [1])... it doesn't help that the DOM also smashed into yet another tech line, the dynamically-typed scripting language, face first. The hasty three-way committee-arranged shotgun marriage in the late 1990s between these techs produced a fairly dysfunctional family.

[1]: https://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html Notice this API, primarily used by Javascript, is specified in Java, complete with specifications of checked exceptions. Total clusterfuck. And a prime example of just how hard Java was inorganically jammed down the programming community's throats (which I say without regard to your current opinion of the language, it did grow up certainly, but the initial push was completely inorganic); this standard is 1998, with Java 1.0 release 1996. Java 1.2 (or 2.0 depending on how you look at it) was released in 1998, with such notable features as... the first Collections support in the library. This was not a language mature enough to be writing specifications in yet, even ignoring that you shouldn't be writing specifications like this in a specific language anyhow, especially since it was obvious and known it was going to be cross-language anyhow.


> It gives too many levers.

I don't agree that the horrible XML example is the result of that.

There's a simple semantic difference between attributes and elements in XML:

* there's only one instance per attribute

* attributes are atomic (i.e. have no children)

* attributes are order-independent

So whenever your data is atomic, the order in which it appears doesn't matter, and you only want one instance, (per element) you'd use an attribute. The reasoning behind this is to enforce basic semantic rules without having to resort to complex schemas like XSD or RELAX NG.

It has nothing to do with OOP: it's just a very basic tool for enforcing basic constraints. As with every tool, it can be misused or ignored. The XML example is the result of incompetence and/or lack of coherence in data modelling and processing (judging from the micro services mentioned), not a weakness of XML as such, IMO.


> My guess it that this is to model OOP-based systems: Attributes are for the constructor or a certain class while the children represent dependency injections.

Can't say I've ever thought of it that way - attributes just seemed like a simpler syntax for the common case of basic properties that were sensibly represented as strings (i.e. single, literal values). I don't think it would have made much fundamental difference if they were never part of the spec and you had to use sub-elements to define such properties, except perhaps for the fact an attribute of a given name can only be declared be once (which an interesting difference between JSON and XML - XML lacks any syntax for declaring arrays, so you must be able to declare multiple sub-elements with the same name).


> Can't say I've ever thought of it that way

as can be seen to the comments to to my comment, there are quite a number of ideas on the ontology of the XML format. when a single screen worth of text on my phone can hold at least 3 strong convictions on how to use a format, then it is doomed to fail.


Just as an anecdote: about two years ago I had a discussion about marshalling data structures into XML and indeed, two people managed to come up with three different schemes [0]. Of course that spells doom for something that's supposed to be used as a data-exchange format.

[0] https://news.ycombinator.com/item?id=24614404#24626486


Children and attributes are different. Attributes are like fields to a record. Parent/child is a relation between records.

There are clear criteria for choosing attribute vs text representation. Text is for humans; all the rest is for the computer. If we see something like this:

    <ssn>123</ssn>
this means the text is precious and we cannot alter it, only attach some records (‘ssn’) to some character ranges. And in most cases these records need more fields, so we add attrbiutes:

    ... <date date="2001-01-01">the first day of that year</date> ...
This is the usual case in markup: we need the computer to do something with the text and use the records as a guide.

But a notational case is different. With notational case we still need the computer to do something, but not about a particular piece of text. (This is actually the general case while text-handling is a specific one.) In this case we can command it directly:

    <foo>
      <bar id="a" />
      <baz ref="a" />
    </foo>
There is no text in these records, but we still use notational tools: 1) node type, 2) composition, 3) ordering, 4) naming/referencing. In this case we put everything into attributes.

We can have a notational piece with markup parts or a markup part with notational parts, but each has a clear purpose. There is also a third specific case: we want to switch to another notation and in this case we write it as text inside an XML element.

SVG is a good example. All data go into attributes, text inside elements appears only when 1) it is a part of the drawing, or 2) we are switching to another notation:

    <!-- notational -->
    <svg ...>
      <!-- switching to CSS, still notational  -->
      <style> 
        ...
      </style>
      <!-- markup -->
      <text>... <tspan class="...">...</tspan> ...</text>
      <!-- notational again -->
      <rect ... />
    </svg>


that's a long explanation on how to use various features of a notation. would it be easier to just use json?

You have many moving pieces with a complex notation, a complex domain, and potentially multiple architects..


If you have a simple use case, yes.

If you need the things that XML has, no, JSON is not simpler. It is more complex. Every attempt to embed XML/HTML into JSON has resulted in something worse than XML/HTML... and they are all different, too, which is bad.

The main problem with XML is that most people don't need what it has. The main problem with XML in the late 1990s and early 2000s is that it was jammed in many places that did not need what it had, and put a bad taste in developer's mouth as a result. It's actually a good solution for its niche, and that niche is large enough it isn't going anywhere, but it is also still only a niche. In that niche you're crazy to try to jam JSON in; out of that niche you're crazy to use XML. The mythos that XML is useless persists because the latter category is a much larger one.


It indeed seems like we agree :)




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: