I don't see his point, really. Yes, C doesn't have a statically testable string type. And yes, the convention is that a C "string" is just an array of character data with a trailing NUL-Byte. He constructs an array without a trailing NUL-Byte - so that's not a string, but an array of characters.
The fact that copy() now happily runs through memory is the expected result of the bug in the calling code. No, it's not useful. Yes, there are problems with the whole approach of using a terminating value - but this doesn't seem to be his point (otherwise he would also have mentioned the linear runtime complexity of strlen() and the problems that arise when a string itself contains a NUL-byte, I suppose).
Now, what is his point? This is chapter 55 [!] in a book called "Learn C The Hard Way" and the author complains about well-know problems with the standard-lib string convention, optional braces, and the common C idiom of doing assignment and value-testing at the same time, _and_ calls all of this 'Deconstructing "K&R C"'?
Maybe a "What I personally don't like about C" would have been a better title. The K&R examples are flawless. The language and stdlib are not. That's well-known. What is new?
"Yes, there are problems with the whole approach of using a terminating value" And, of course, there are problems with the other approaches too. Using a single length byte at the beginning limits you to 255 characters and makes pointer manipulation less straightforward (since the actual data starts one byte further on). Using more than one byte for the string length wastes space (a serious issue at the time C was developed, and even today in the microcontroller world). The author's assumption that he's smarter than Kernighan, Ritchie, and Thompson probably isn't the best approach.
By the way, the "old style of "leaving the cleanup to the OS" doesn't work in the modern world the way it did back in the day." works just fine on iOS and many other modern platforms.
It's way worse than an extra byte, or the offset of 1 byte for pointers; it also means you need a whole copy of every substring with its own length delimiter, and can't tokenize in place.
C code gets into just as much trouble with length-delimited data structures as it does with ASCIIZ; ASCIIZ is a red herring. People have declared over and over again that it's the single worst decision in C and the cause of every buffer overflow. But if you look over the past 10 years or so, memcpy() has caused just as much chaos, and we're just as likely to find an overflow or integer mistake in binary protocols (where NUL-termination is nonsensical) as we are in ASCII protocols.
"Leaving the cleanup to the OS" works everywhere, on every modern system, and lots of programs would benefit from shedding a lot of useless bookkeeping and just treating the process address space as an arena to be torn down all at once. But I think the point the author was trying to make is, when you code that way, you make it impossible to hoist your code into another program as a module. Which is true; if it's likely you're writing a library as well as a program, you don't get to take the easy way out.
You can still write a 100 line arena allocator to pretend like you can, though. :)
I partially agree with you, but in a different way. I feel that the real problem is the OS doesn't give code access to its own internal accounting of allocated memory. It already knows the size of any heap chunk you make, so why can't we ask it? In most C code we're carrying around either a null terminator (which can get clobbered) or a whole integer for the size.
Instead, there should be a way to ask the OS "how big is the crap this pointer is pointed at" and get a valid answer. Other useful things would be "how far inside the chunk pointed at by X is the pointer Y?" Or, "will pointing Y at J inside X cause an error?"
And it wouldn't even need to be the OS, just the allocator, probably a few macros, etc. But, for now I have to show people how to write bug resistant C code so this is the best way so far.
Part of the problem here is that the allocator doesn't need to know how big the crap the pointer is pointing at is; it only needs to know that the crap is smaller thank the chunk it allocated.
If you're going to teach people something unorthodox about C programming, writing custom allocators would probably be a great one. In more than one job I've crushed optimization problems on code written by people way smarter than me simply by knowing to profile allocation and replace malloc.
Hmm. The question "what is the size of memory that x points to?" is cheap to figure out, because free needs to do it anyway. You couldn't use a macro to do it - it would need to access the internal data structures of the allocator - but it's easy to do. The other questions could be macros that called the first function.
What are the use cases for these functions? What bugs would they prevent?
Worth pointing out again: the size of the chunk allocated for a particular data structure does not give you the precise bounds of the data structure; odds are, the chunk is slightly larger than the structure.
I wrote a response explaining why, if you know x then you must know y, but then I realized you were talking about knowing y and learning x. Yes, I agree. I'm not sure which context (knowing the actual size of the memory chunk, which is easy, or knowing the used size of the memory chunk, which is not easy) Zed was talking about.
That's a hell of a good idea. As somebody pointed out, it might not stop you from corrupting neighboring items in an array or structure, but it would let you find the size of an array allocated by itself, AND, it would stop you from corrupting the heap itself!
If you're concerned about corrupting the heap, use an allocator hardened against heap corruption. The default WinAPI allocator, even for optimized production code, is hardened that way. Userland code doesn't need to do anything to get the feature, which is as it should be, because people who write userland code don't know enough to defend the heap against memory corruption.
I would happily trade C ASCIIZ strings for Pascal/Perl/Java out of band length indicated strings, even at the cost of those edge cases. Especially if there were a way to internalize immutable string data, and share the bytes of common fragments. (this of course doesn't work well if you plan on modifying the string data)
So make the trade. I'm sorry, I can see I'm communicating some kind of disdain for alternate string representations, but every C programmer I know --- every single one of them --- has used some form of counted string at some point.
I'm just saying there's a reason the default in C is ASCIIZ. Most of what you do with strings is lightweight; compare 'em, search 'em, tokenize 'em, copy 'em. For that 80% of use cases, ASCIIZ is superior.
Should ANSI C libc provide a heavyweight counted string alternative? Sure, I think so; in fact, it's possible that the only reason it doesn't is that it would take 300 years to resolve all the disputes about exactly what such a library should like like, since every professional C programmer has their own now.
1. K&R has defects in it when the functions in it are used out of context because they didn't include defensive programming practices considered standard today.
2. People can learn a lot about writing good code by critiquing other code, even from masters, so I'm taking them through doing that.
3. There should be no sacred cows, and people hold K&R on a pedestal without questioning what's in it. This is really what causes #1, so I have them do #2.
Noble goals. And now that you said that I don't understand how I could have overlooked that in the first place.
Maybe it would be a good idea to put a bit more stress on the "no" in "no sacred cows", that's an important point beyond just K&R. Nothing should be sacred, including "Learning C The Hard Way" - and a "question everything and everyone" mindset is generally a good thing to have as a programmer [I think I read that in "The Pragmatic Programmer" ;-)]. But other than that I now think that chapter is fine. I do apologize for the somewhat passive-aggressive form of my question.
Also, he uses low-contrast grey text on a white background (might be better readable on a better screen, but on my shitty laptop it makes things harder to read than necessary) + the navigation links on top have no text at all on them (this might be intentional, due to his server having problems).
I, for one, like my advices best from people with at least basic skills in the area. Or from people who at least try to adopt the things they say/just learned/want to pass on.
I'd like to second this. Especially the "why". I do recognize a good design when I see it, and there's an overwhelming wealth of tutorials on the web. But, for me, the link is missing - _why_ does this look good?
Why is Helvetica better than Comic Sans? There must be a reason other than "obviously, it looks better". Don't hesitate to show me the math, and the formulas. I got a machine with me. It can calculate pretty good.
The fact that copy() now happily runs through memory is the expected result of the bug in the calling code. No, it's not useful. Yes, there are problems with the whole approach of using a terminating value - but this doesn't seem to be his point (otherwise he would also have mentioned the linear runtime complexity of strlen() and the problems that arise when a string itself contains a NUL-byte, I suppose).
Now, what is his point? This is chapter 55 [!] in a book called "Learn C The Hard Way" and the author complains about well-know problems with the standard-lib string convention, optional braces, and the common C idiom of doing assignment and value-testing at the same time, _and_ calls all of this 'Deconstructing "K&R C"'?
Maybe a "What I personally don't like about C" would have been a better title. The K&R examples are flawless. The language and stdlib are not. That's well-known. What is new?