Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Its definitely rarer than double or single quotes occurring in string. But I was wondering about the parent comment's concern of passing a string through multiple levels of escaping.

> until you need to get your string through several levels of escape. how many backslashes to add? depends on how deep your pipe is and how each of those layers is defined



This is the beauty of balanced quotes: they completely eliminate the need for multiple escapes.

When the same character is used as both an open and close delimiter, you have to disambiguate between three possibilities: opening a new string, closing the current string (which may or may not be embedded) and a literal character as a constituent of the current string. By convention, an unescaped double-quote inside a string indicates closing that string, so you need different escapes to indicate opening embedded strings and constituents.

You could have done that by using two different escape characters, but for historical reasons there is only one escape character: the backslash. So that one character has to do double-duty to disambiguate two different cases. But in fact it's even worse than that because string parsers have a very shallow understanding of backslashes. To a string parser, a backslash followed by another character means only that the following character should be treated as a constituent. So you still need to disambiguate between actual constituents and opening an embedded string, and the only way to do that, because all you have is the backslash, is with more backslashes. The whole mess is just a stupid historical accident.

If you used balanced quotes you only have one case that needs to be escaped: constituents. So you never need multiple escapes.

Note that I made a mistake when I wrote:

> Only if you want to refer to [a close-quote character] literally as a closing quote rather than having it act as a closing quote.

You have to escape both open and close quotes to refer to them as constituents. In other words you would need to write something like this:

«Here is an example of a «nested string». The start of a nested string is denoted by a \« character. The end of a nested string is denoted by a \» character.»

Note that it doesn't matter how many levels deep you are:

«Even when you write «a nested string that refers to \« or \» characters» you only need one level of escape.»

Note that when you refer to quote characters as balanced pairs as in the examples above you don't actually need the escapes. The above strings will parse just fine even without the backslashes, and they will print out exactly as you expect. The only "problem" will be that they will contain embedded strings that you probably did not intend. The only time escapes are actually required is when referring to an quote characters as constituents without balancing them. This will always be the case if you refer to a close-quote without a corresponding preceding open-quote, which is the reason I got it wrong: escaping close-quotes will be more common than escaping open-quotes, but both will be needed occasionally.


I totally agree with the idea that balanced quotes are needed to make quoting sane. If the quotes in a string are balanced then it should be possible to quote it with no changes.

I would also advocate the principle that you don't escape the escape character by doubling it. There are two problems with replacing \ with \\: firstly the length of the string doubles with each nested quotation; secondly you can't tell at a glance whether \\\\\\\\\\\\\\\\\\\n contains a newline character or an n because it depends on whether the number of backslashes is odd or even.

Another useful principle is to escape a quote character with a sequence that does not contain that character: then it is much easier to check whether the quotes are balanced because you don't need to check whether any of them are escaped.

So here's a possible algorithm for quoting a string: first identify the top-level quote characters that don't match (this is not totally trivial but it isn't difficult or computationally expensive); then, in parts of the string that are not inside nested quotes, but only there, replace « with \<, » with \>, and \ with \_ (say). Does that work?


It might work but I think it misses the point. The current mess is in no small measure the result of assuming that we have to constrain ourselves to ascii. We don't. Once you accept balanced quotes you have implicitly accepted using unicode characters, at which point a whole host of new possibilities (and new problems -- see below) opens up. For example, the only reason you need \n (and \r and \t etc.) is because you want a way to use non-whitespace characters to represent whitespace. But this feature is already built in to unicode, which has dedicated non-whitespace versions of all of the ascii whitespace and control characters (␍, ␊, ␉ , ␀, etc.) so there is no need to escape any of these.

That leaves only the problem of escaping the escape character, and here again there is no need to constrain ourselves to ascii. There is no reason that the escape character needs to be backslash. In fact, that is a particularly poor choice because backslash, being an ascii character, is extremely precious real estate. In fact, it is doubly precious because it actually has a balanced partner in the forward slash, so if you are going to use backslash for any special purpose it should be partnered with forward slash as a balanced set (which open up the problem of what to use for the directory delimiter in your operating system, but that's another can o' worms).

I think the Right Answer is simply to choose a different character to serve as the escape character inside balanced strings. My first pick would probably be ␛, but there are obviously a lot of other possibilities.

This points to a potential danger of this approach: there are a lot of unicode characters that render very similarly, like U and ᑌ. You would need to choose the unicode characters with special meanings very judiciously, and make sure that when you are writing code you have an editor that renders them in some distinctive way so you can be sure you're typing what you think you're typing. But that seems doable.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: