> The second problem is competition. Some commercial providers say, 'I'm the lea...

zozbot234 · on Oct 27, 2021

HTML5 does have a fully-supported XML representation, there's no regression from XHTML. And Google themselves are working with schema.org to provide standards that endow web pages with strong semantics, along a semantic-web model - this is basically what's powering "rich" SERP results in Google and other search engines. That doesn't look like they "hate" the semantic web all that much.

codetrotter · on Oct 27, 2021

> HTML5 does have a fully-supported XML representation, there's no regression from XHTML.

Perhaps I am not quite understanding what you mean but, HTML5 allows things that are not legal in XML.

For example:

    <!doctype html>
    <html lang=en>
    <meta charset=utf-8>
    <title>Home – ACME, Inc.</title>
    <div id=outer-wrap>
    <header id=header-main>
      <h1><a href="/">ACME, Inc.</a></h1>
      <h2>Happy times</h2>
    </header>
    <nav id=navig-main>
    <ul>
      <li><a href="/">Home</a>
      <li><a href="/products/">Products</a>
      <li><a href="/blog/">Blog</a>
      <li><a href="/kb/">Knowledge Base</a>
      <li><a href="/support/">Support</a>
      <li><a href=/about.htm>About Us</a>
    </ul>
    </nav>
    <div id=content-main>
    <div class=quux>
      <h3>Etaoin shrdlu</h3>
      <p>Today is a nice day :)
      <p>Here are some <a href="http://www.example.com/">things we find important</a>:
      <ul>
        <li><a href="https://www.google.com/search?q=foo">foo</a>
        <li>bar
        <li>baz
      </ul>
    </div>
    <article class=snippet>
      <header>
        <figure class=article-illustration-image-hero>
          <img src=/static/images/bees.jpg alt="Buzzing bees.">
          <figcaption>Buzzing bees.</figcaption>
        </figure>
        <h3>The Baz and the Bees</h3>
      </header>
      <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
         incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
         quis nostrud exercitation ullamco laboris nisi ut aliquip ex
         ea commodo consequat.
      <p>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
         eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident,
         sunt in culpa qui officia deserunt mollit anim id est laborum.
      <p class=snippet-more><a href=/blog/2021/the-baz-and-the-bees.htm>Read more</a>
    </article>
    </div>
    <footer id=footer-main>
    <p>Copyright © 2021 ACME, Inc.
    </footer>
    </div>

This above example is a complete, valid HTML5 document, and representative of how I like to write HTML using HTML5.

I leave attributes unquoted where allowed. Mainly, as long as the attribute value does not contain space, equal sign, quote marks, or trailing slash, the value can be left unquoted and so I do. In XHTML this is not allowed. In XML I’m not sure.

I omit closing tags where allowed. For example you see in the example above that I’ve left both the p tags and the li tags unclosed. In XML, this is not allowed.

No XML schema and no DTD inside of the file itself. In HTML5 neither an XML schema nor a DTD is specified as part of the markup.

In short, HTML5 as a whole is not valid XML. A subset of HTML5 may be valid XML. But a document can be valid HTML5 without being valid XML.

Personally I like HTML5 a lot better than XHTML etc, exactly because the rules for HTML5 are so much more permissive than XHTML, so typing HTML5 by hand lets me type less to achieve the same and more than I did back before HTML5 existed.

nicoburns · on Oct 27, 2021

> I omit closing tags where allowed. For example you see in the example above that I’ve left both the p tags and the li tags unclosed.

The genius of the HTML5 spec is that it allows this loose parsing while specifying an unambiguous mapping to the stricter syntax, so semantically this makes no difference at all (unlike the prior situation where different browsers parsed this kind of HTML differently). Of course you need an HTML5 parser rather than an XML parser, but these are common and don't represent a big hurdle in parsing.

There are other bits like namespaces and DTDs that differ.

zozbot234 · on Oct 27, 2021

See https://html.spec.whatwg.org/multipage/xhtml.html#the-xhtml-... . Note that this XML representation can be derived automatically by parsing the "permissive" HTML5 syntax, and reuses the same vocabulary as far as practicable. However, it is fully compatible with XML tools, and even with other XML namespaces within the same document, which are not allowed in the HTML5 syntax.