There sure are a lot of hard-coded numbers in that codebase. In many cases it's easy to figure out where the numbers came from, but in others, it's nearly inscrutable without a named constant or a comment or something. Here's one example:
Where does "15" come from? I suppose if I'd written a codec like this before, or if I stared at the code long enough, I could figure it out, but wouldn't it be better to use an enum or a #define?
Encountered this topic recently in the office.. I'm from the side of the fence that doesn't overly care about code tidyness, so long as it performs its overall function. 10 years ago this kind of thing might have bothered me a lot more, but at some point crossed a threshold where I realized _all code generated to the present day_ is pretty ugly and long term unmaintainable (but that's a story for a rather large and rather boring essay).
The tl;dr is simply that if you obsess over minor details on this level, a lot of brainpower is wasted that could be used for bigger problems you should be much more worried about.
Playing devil's advocate, in this case the if() is obviously a guard for the subsequent switch. Moving just the constant '15' into a #define would make it read more like some magical sentinel value, unless you also #defined all the literal values used in the switch, at which point you've introduced a wholly bullshit layer of abstraction to what was otherwise incredibly concrete and explicit code.
Let's assume you have a great reason for doing that. OK. So what do you call these constants? Well, the code appears to be branching to special cases based on the width of some integer. So we instead have what, WIDTH_1_BIT, WIDTH_2_BITS, ..., WIDTH_15_BITS? Now we've pulled those constants out, you stare at the block of #defines, and think, damn, this is so ugly since most of the value space isn't fully defined! So some kindly maintenance programmer comes along and pads out the rest, producing a perfectly beautiful block of utter line noise.
That is arguably considerably less readable than what we started with
I'm not sure how much this applies to open source. Closed source, quite possibly, since it probably won't survive long enough / need to change enough / need enough people to understand it.
Open source, meanwhile, requires being understandable to new people. It needs to explain itself, or it needs to be idiomatic (within its specialty). If neither, you're condemning people to rewrite it or waste time trying to figure out the original intent.
The second half of what I said explained how the code becomes less readable. If someone wants to hack on that code, they better understand the algorithm it implements, which means at a minimum they've probably absorbed the same 2000 page spec the original developers absorbed (and know exactly what 15 means). No amount of #define can fix that, although in the right circumstance some comments might help.
Your first paragraph sounds quite defeatist. In my experience, the way you deal with "minor details" is to write them down in a style guide. It'll take away a lot of the quibbling about small things. Some will find it limiting, others will find it liberating.
I've worked at several jobs that had decade old code in production. Some cared about the little things, some didn't. I know which code was the best to work with. In my experience, the broken windows theory is true when it comes to code. The little things matter in the long run.
This is a very insightful comment. There has also been some debate about the appropriateness of non english comments and comments that only made sense in context of internal code names for Cisco projects. The high level test we used was lets get the basic code out on github as early as possible.
It's all very well when it's a simple variable but quite often I have come up against variables that would take a whole paragraph of text to explain their purpose and don't have a concise or obvious name.
IMO it doesn't make sense to read code like this without a copy of the H.264 spec in hand. And once you've got that spec, why paraphrase a fraction of it, poorly, in the source code?
I do think that comments at the function level indicating which part of the spec you should be reading would be nice. They might not be an issue for people who are indoctrinated into the code though.
Huh. I don't think it's just you - that looks like some properly nasty code. Formatting is all over the place, lines are commented out, comments are useless and obviously wrong in places...
I don't really speak C++ though, so maybe this is normal.
The good point is that other hacked together quasi-compatible implementations now have an "official" reference implementation to browse. It could be a lot worse... I've waded through uncommented OCaml to discover how a certain shall-go-unnamed commercial, non-restful XML API worked. Or, Cisco could've never open-sourced it.
I'm not in to video circles anymore, but I was under the impression the existing open source implementations were way better. The reason this code is significant is Cisco is providing licensed binaries.
As with many things it seems like knowing how an H264 codec works before reading the code is essential. I don't someone has an approachable reference for the standard?
while i'm sure there's many other examples, in that particular case it's just the maximum value of an unsigned 8 bit integer, as defined above on line 1252.
it would be a bit better if it explicitly had a nice #define but recognizing the values of common powers of two (minus one) is a useful code reading skill.
https://github.com/cisco/openh264/blob/master/codec/encoder/...
Where does "15" come from? I suppose if I'd written a codec like this before, or if I stared at the code long enough, I could figure it out, but wouldn't it be better to use an enum or a #define?