That doesn't make sense. If you have loop with length you have to check both the content of the byte and the index; if you have null terminated strings you only check the content of the byte.
When you have the length, you can unroll the loop, so that you e.g. do 4 iterations at a time. With NUL you can’t do that. Moreover, loop iteration can be done in parallel (instruction-level parallelism) with processing the content of the string, since there is no data dependency between the two. With NUL you introduce a data dependency.
I read it. I don't see your point. Sentinel values simply give you a very low performance ceiling that you can't optimise past. What I wrote only took me a couple minutes. It's far from optimal (but it already takes 2/3 the time the sentinel version takes, and 1/2 the time Lemire's range version takes). With thorough effort I bet you could get at least 10 times faster than that, provided you are given the length. With sentinels, you're just left with the original performance levels you can't improve.