Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why is HN HTML laid out in tables?
15 points by joeybaker on Dec 18, 2010 | hide | past | favorite | 36 comments
Hacker News is certainly not HTML5 friendly. Heck it doesn't even validate in HTML4 http://validator.w3.org/check?uri=http%3A%2F%2Fnews.ycombinator.com%2F&charset=%28detect+automatically%29&doctype=Inline&group=0

<body><center><table border=0 cellpadding=0 cellspacing=0 width="85%" bgcolor=#f6f6ef><tr><td bgcolor=#ff6600><table border=0 cellpadding=0 cellspacing=0 width="100%" style="padding:2px"><tr><td style="width:18px;padding-right:4px"><a href="http://ycombinator.com"><img src="http://ycombinator.com/images/y18.gif" width=18 height=18 style="border:1px #ffffff solid;"></img></a></td><td style="line-height:12pt; height:10px;"><span class="pagetop"><b><a href="http://news.ycombinator.com/news">Hacker News</a></b><img src="http://ycombinator.com/images/s.gif" height=1 width=10><a href="http://news.ycombinator.com/newest">new</a> | <a href="http://news.ycombinator.com/newcomments">comments</a> | <a href="http://news.ycombinator.com/ask">ask</a> | <a href="http://news.ycombinator.com/jobs">jobs</a> | <a href="http://news.ycombinator.com/submit">submit</a> | <font color=#ffffff>joeybaker's comments</font></span></td><td style="text-align:right;padding-right:4px;"><span class="pagetop"><a href="http://news.ycombinator.com/x?fnid=0uLs1HxN7c">login</a></span></td></tr></table></td></tr><tr style="height:10px"></tr><tr><td><tr><td><table border=0><tr><td> …………




Because it works. No mucking about with CSS overflows, stuff that jumps to the next line, things that don't align, etc.


Because despite what the CSS purists say, tables work damn well for laying out content.


Just like an iPad works as a door stop.

edit: heh, downvotes. I am not a "css purist" there are times when tables are the correct solution, but they aren't here. Just because "it works" can be said that doesn't explain/justify their use.


Assuming you don't care about blind users, yes.


Do you really think that the only reason to use CSS over table is because of purists? CSS is much easier to manage than tables and css frameworks have fixed all issues related cross-browsers problems.


I rewrote the HN markup to use XHTML + CSS + MicroFormats last year, but never got a chance to re-write the actual templating code in news.arc (http://www.arclanguage.org).

If anyone's interested in finishing that last part, let me know and I'll be happy to send you the templates and stuff I made. I also added some rudimentary support for mobile-specific stylesheets and scripts.


i don't know if i'll have time to finish anything, but if you would like to send it to me (email in profile), i'd be interested in looking it over and trying it out.


I know a lot of folks here are actual programmers (unlike me, I mean) and can nitpick anything to death, but maybe folks have missed the part about HN being a little side project that exists to serve the goals of YC? It isn't a high priority for PG. YC is the real business. This is a free service and it serves some of YC's needs but there are no ads, it isn't monetized, etc. Unlike some of the websites folks here own, it seems to me HN does not provide enough value for pg to justify jumping through hoops backwards, blindfolded and on fire to make folks here happy with some cutting edge, WOW! coding.

(Personal note: Now I feel better about my sites being laid out in tables. :-P)


Because it doesn't matter.


If someone writes C/Python/Lua/language-of-your-choice code that violates good principles, misuses constructs, and generally looks like crap, everyone calls them a lousy coder. Yet when it comes to HTML, using semantically incorrect markup is somehow ok.

I understand why pg hasn't updated the display markup, and it's his website, so he can do what he likes with it, but to say "it doesn't matter" is to give anyone writing bad markup a pass. It degrades our artform. That matters.


In a previous lifetime, I worked on the code generator for a compiler for an implementation language. We had a rule that said "A good compiler generates assembly code that an assembly programmer would be fired for writing." Early releases of that compiler would get bug reports from the OS implementation team that "this code is wrong". We would sit down and walk through the generated code, and there would be an "Oh" moment where the programmer would realize that it was simply different, and in all cases, faster.

Similarly, the code generated by good Python, to a CPU, looks like crap because it repeats stuff, allocates and releases memory, makes unnecessary calls, swaps stuff between stack and registers, and a whole host of other sins.

But we don't care, because what we care about is the end result: the Python or Lua or Ruby code generates lovely computational results. Perhaps 1% of spend time looking in detail at the actual machine code generated. The purpose of these languages is to 1) Let us write good, readable code and 2) generate a useful, sometimes beautiful result.

Similarly, the purpose of this particular Arc program is to generate a useful result, that is, a readable, useful set of html files that our browsers render readably.

So really, "it doesn't matter" if the assembly language is junk, or if the html is not something early versions of Patrick would turn out with notepad.

Think of the generated html as assembly language. It no more degrades our art form than the goofy stuff that modern languages make our CPUs eat.


I'm sorry, but you have an incorrect view of HTML. HTML's sole purpose is not display. If that were the case, it would be a simple matter of identifying what renders fastest and most accurately, but again, that's not HTML's sole purpose.

The destination goal for HTML is to convey information about the structure and context of the content, not just how it is displayed. So we must care about the HTML generated. It is requisite to it's function in providing additional machine-readable information.


So what about the information and structure of the HN site is missing by the way that it is currently displayed? What additional machine-readable information is missing in the way that HN (or really any other site) produces the information? Search?

There are a number of aggregator projects that various HN members have built by scraping this "broken" html and they seem to work quite nicely.

I spent a couple of years deep in the SGML world and am fully cognizant of all the arguments about how content needs to be completely separated from presentation. HTML is really a weak sister in that world. I don't think my view of HTML is incorrect.

In the real world, the ship of requiring correct HTML from a gramatical perspective left the harbor back in the 90's. If what you say were true, browsers would refuse to render broken HTML.


You clearly know the answer to your own question, but you don't think it's important. I'd ask you to take a step back and have a look at what I'm saying. I'm saying that I understand why pg has set his priorities as he has. All I'm suggesting is that we be honest about it. Let's not say it "doesn't matter".

Currently, aggregator projects work with HN because they know and understand the HN markup specifically. In an ideal world (and one in which we don't live, obviously), a "scraper" library should be able to identify things like comment streams based on contextual information. Think of the power that comes just from having indexes that are able to identify the title and content body. Now, what if we take that a step further and build an indexer that can recognize comments. One that can infer that one comment is made in reference to another based on its nested hierarchy. Are tables the right structure for that?

I'm asking you to dream. I'm asking you not to be complacent with the tools that "work" today. That's all. If you're content to use what you've got, and you don't care if we ever end up with markup that enables these powerful new ways of relating to data, fine, but don't say it doesn't matter. It matters.


Part of being a good engineer is understanding tradeoffs and figuring out when contextually-dependent knowledge doesn't apply.

IMHO, it doesn't matter - for Hacker News. That's independent of whether it matters for whatever website you're being paid to write at any given job.


I agree 100% regarding tradeoffs. However, saying that something is a "tradeoff" infers value. I.e., how can one evaluate a trade for something with no relative value? Saying something "doesn't matter" means it is not considered. Saying it's a "tradeoff" means you considered it, but found it less important than other factors. Those are two different things.

This is just a semantic argument, but it's one I find important. At our startup, we came out of the gate with viable, working software in under six weeks. Believe me, there were tradeoffs. But at no point in that process did I say, "this doesn't matter". That's lying to myself. It matters, but I will accept for today that I cannot have both. This helps us avoid complacency and keeps our product sharp.


But you're looking at generated code, not written code. What you're suggesting is like criticizing the output of your C compiler for not following good assembly programming practices.

On the other hand there's good reasons for fixing the markup as pointed out elsewhere in this thread (accessibility, JS DOM manipulation, etc).


That analogy doesn't hold up, because HTML is not just machine code. The next iteration of the web requires that we think of HTML as a means to provide meaning, not just format for display.

Generated code doesn't have to misuse constructs. You can just as easily generate a nested unordered-list with good semantic IDs and class names.


What's 'incorrect'? It's just following an older standard.

Will all existing C++ code be 'incorrect' once C++0x is finally released?


<table> has a meaning. It expresses recurring, orderly relationships between <th> and <td> elements. It has always been incorrect to cram in content that isn't semantically linked that way, purely to exploit a typical rendering of it. It was incorrect even when there existed no supported way to request a similar rendering, though that was a reasonable excuse if you only cared about appearance.


Sure, unless you'd like to validate, parse, maintain, or change the codebase.

edit: oh, and proper markup is faster.


Here's pg's official answer (http://news.ycombinator.com/item?id=1998708) :

why is the UI so completely neglected?

Because when I spend time on HN my top priority is features that will make the content better. I believe that matches the priorities of the users-- that users would rather use a site with good stories and comments and a primitive UI than one with a slick UI and worse stories and comments. And time is a zero-sum game. Spending more time on UI = spending less on quality.

The focus on content quality above all is the reason you find yourself saying later "If there were any other community like this..."

[...]


The OP isn't talking about Ajax calls and Dom transformations. He's talking about using semantically correct markup (c'mon, it is literally an ordered list) and a style sheet. Things that would make the site faster, easier to maintain, and easier to parse.

If this were anyone other than pg, you'd all be excoriating the developer for living in the 90s.


It wouldn't make the site easier to maintain. This stuff is all generated by software. All that would change is what the software generated as output.

How much faster would HN pages render if they used "semantically correct markup?"


The big benefit is to accessibility through screenreaders or other alternate clients.

Google's HTML is also all generated. They switched to CSS-based markup in 2007. It is a bit easier to maintain, but only a little, and mostly because of obscure things like bold tags needing to be red and normal-weight in Chinese because bolded Chinese characters look like shit.

Size and rendering speed wise, it's basically a wash. The HTML is significantly smaller, but the CSS is significantly larger, and CSS is a massive pain to refactor and can easily lead to bugs.

It's much easier to manipulate with JavaScript, because the DOM structure of the page is much simpler. Tables also have weird implicit element insertion rules, eg. the first direct child of the <table> on this page is a <tbody> tag, not the <tr> tag that you outputted as source. That alone is a big reason why Google would never go back, though I don't know how much it applies to a mostly static site like this.


Why are you generating styling at all? It's not dynamic information. It's a static asset that can be cached.

Semantically correct markup uses fewer tags, so the output is smaller (also, there'd be no online styling, which would also make the page smaller). Even a small difference on a popular site could matter. But only you can answer that for HN.

Look, do what you want. You want to write bullshit markup, be my guest. But there isn't a single web dev in this community who would code output like this for their own site. They're just too chickenshit to tell you.


Generating and caching are orthogonal questions. You can generate stuff, and then cache it. And in fact HN does do a lot of caching.

I wasn't asking how much faster "semantically correct markup" would make HN pages render as a rhetorical question. I'm genuinely curious. If you want to make claims that x is faster than y, you should be prepared to back them up with numbers, not merely with more heated language.


Generating and caching can be intertwined. For example, HN seems to generate an inline script at the top of every page. The script doesn't change nearly as often as the posts do, so that script is sent redundantly at the top of every page. If that script were located at separate URL, set to expire never, every visitor would load it once and never again. If you need to change the script, you can change the linked URL. Doing that can avoid an expensive calculation for the server, but also a network request for the client which could be very, very expensive.

To me, it sounds like the caching you're talking about is on the server side. I think you mean something similar to memoization, so you can avoid doing some expensive query or calculation. That is worth doing, but it's still possible to organize the output of those caches in an inefficient manner, and incur unnecessary network overhead. The "semantically correct" part of the markup being advocated here isn't that interesting, if you ask me. What is interesting is the claim is that you can generate less markup per page, and get the same display.

The balance between the repeat visitor cache behavior and the initial number of HTTP requests and latency for a first-time visitor can get hard to judge. Without lots of time to measure the various alternatives and mitigation tactics, it's best to try and generates as little markup as possible. That's where so-called "semantic markup" comes in. Usually it's just less markup, and does better.

Another thing to take into account is the layout behavior triggered in various browsers by the markup you're generating. HN is pretty simple and should render instantly, but it doesn't in anything I've tested (Firefox, Chrome, Safari). Each of them redraw the scrollbar one or more times. That could be due to the tables.


I am trying to dig up the old benchmarks that created the tables are slow mantra but like all things their is a grain of truth to it. Before IE 5 and the addition of the table-layout CSS property tables where considerably slower and more resource intensive on the browser to render. That being said, the point is mostly moot in modern browsers. While their are differences in performance metrics between CSS and table layout, the reality is, it is insignificant enough to not be cause for concern. Unless you are specifically targeting browsers before IE 5.


I personally believe that it would, because you can then generate the site and allow what the site looks like to be external from that generation. Managing what the site looks like, independent from what the site says, is good clean separation of concerns and does in fact promote maintainability.


"If this were anyone other than pg, you'd all be excoriating the developer for living in the 90s." ← Yup, that was my point.


Actually that's the answer to a different question.


I assumed your answer would be along those lines since HTML markup is closely related to UI.


I said it once and got downvoted to hell. I'll say it again. By removing tables and moving all styles to a separate css file you can cut page size in half.

HN is in need of an urgent redesign.


I've all but given up giving a damn about web standards. No one _REALLY_ cares. Not the uses, not the developers, not the w3c with their open-to-interpretation "recommendations", and ESPECIALLY NOT browsers vendors who, for various reasons, aren't able to nail down a consistent interpretation of web standards.

The choices are to use a framework which does the shit-work for you, or resort to lowest-common denominator "whatever works" techniques like HTML tables.


It doesn't claim to be HTML5 to begin with.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: