Hacker News new | past | comments | ask | show | jobs | submit login

What do you mean? I just push the Record Separator key on my keyboard.

/s in case :)




The entire argument against ASCII Delimited Text boils down to "No one bothered to support it in popular editors back in 1984. Because I grew up without it, it is impossible to imagine supporting it today."

You need 4 new keyboard shortcuts. Use ctrl+, ctrl+. ctrl+[ ctrl+] You need 4 new character symbols. You need a bit of new formatting rules. Pretty much page breaks decorated with the new symbols. It's really not that hard.

But, like many problems in tech, the popular advice is "Everyone recognizes the problem and the solution. But, the problematic way is already widely used and the solution is not. Therefore everyone doing anything new should invest in continuing to support the problem forever."


> The entire argument against ASCII Delimited Text boils down to "No one bothered to support it in popular editors back in 1984. Because I grew up without it, it is impossible to imagine supporting it today."

There's also the argument of "Now you have two byte values that cannot be allowed to appear in a record under any circumstances. (E.g., incoming data from uncontrolled sources MUST be sanitized to reject or replace those bytes.)" Unless you add an escaping mechanism, in which case the argument shifts to "Why switch from CSV/TSV if the alternative still needs an escaping mechanism?"


One benefit of binary formats is not needing the escaping.


Length-delimited binary formats do not need escaping. But the usual "ASCII Delimited Text" proposal just uses two unprintable bytes as record and line separators, and the signalling is all in-band.

This means that records must not contain either of those two bytes, or else the format of the table will be corrupted. And unless you're producing the data yourself, this means you have to sanitize the data before adding it, and have a policy for how to respond to invalid data. But maintaining a proper sanitization layer has historically been finicky: just look at all the XSS vulnerabilities out there.

If you're creating a binary format, you can easily design it to hold arbitrary data without escaping. But just taking a text format and swapping out the delimiters does not achieve this goal.


I did mean length-delimited binary formats (rather than ASCII formats).


At least you don't need these values in your data, unlike the comma, which shows up in human-written text.

If you do need these values in your data, then don't use them as delimiters.

Something the industry has stopped doing, but maybe should do again, is restricting characters that can appear in data. "The first name must not contain a record separator" is a quite reasonable restriction. Even Elon Musk's next kid won't be able to violate that restriction.


Hear hear! Why is all editing done with text-based editors where humans can make syntax errors. Is it about job security?


In Windows (and DOS EDIT.COM and a few other similarly ancient tools) there have existed Alt+028, Alt+029, Alt+030, and Alt+031 for a long time. I vaguely recall some file format I was working with in QBASIC used some or all of them and I was editing those files for some reason. That was not quite as far back as 1984, but sometime in the early 1990s for sure. I believe EDIT.COM had basic glyphs for them too, but I don't recall what they were, might have been random Wingdings like the playing card suits.

Having keyboard shortcuts doesn't necessarily solve why people don't want to use that format, either.


> I believe EDIT.COM had basic glyphs for them too, but I don't recall what they were, might have been random Wingdings like the playing card suits.

That is not specific to EDIT.COM; they are the PC characters with the same codes as the corresponding control characters, so they appear as graphic characters. (They can be used in any program that can use PC character set.)

However, in EDIT.COM and QBASIC you can also prefix a control character with CTRL+P in order to enter it directly into the file (and they appear as graphic characters, since I think the only control characters they will handle as control characters are tabs and line breaks).

Suits are PC characters 3 to 6; these are PC characters 28 to 31 which are other shapes.


The keys would be something other than those, though. They would be: CTRL+\ for file separator, CTRL+] for group separator, CTRL+^ for record separator, CTRL+_ for unit separator. Other than that, it would work like you described, I think.

> But, like many problems in tech, the popular advice is "Everyone recognizes the problem and the solution. But, the problematic way is already widely used and the solution is not

This is unfortunately common. However, what else happens too, is disagreement about what is the problem and the solution.


It's more like, "because the industry grew up without it, other approaches gained critical mass."

Path dependence is a thing. Things that experience network effects don't get changed unless the alternative is far superior, and ASCII Delimited Text is not that superior.

Ignoring that and pushing for it anyway will at most achieve an xkcd 927.


I’m pretty sure those used to exist.

But when looking for a picture to back up my (likely flawed) memory, Google helpfully told me that you can get a record separator character by hitting Ctrl-^ (caret). Who knew?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: