Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not a C engineer, but I think I have an interesting recommendation to consider.

Since you are already familiar with other languages, you obviously don't need to know the basics of C. The basics are the same everywhere. Instead, you'd likely want to build that mindset on how to build good software with C.

I suggest you read other people's code and try to deep dive into the whys.

I heard Redis is a well-written software.

    git clone https://github.com/redis/redis.git
    git log –reverse
    git checkout ed9b544e...
We can now see the very first commit made by Salvatore Sanfilippo. From here, we can start exploring. By looking at the commit diffs and git log messages, you'll be able to understand the motivation behind the changes.

When you read a book, the most interesting parts are usually after all the theory is explained at the beginning. Somewhere in the middle of the book they start showing you code samples and that's when things become really interesting.

What I like about the reverse git-log approach is that it immediately throws you into the middle section of the book. This approach is both fun and eye opening.

Good luck!



For me, when I learned more than 20 years ago, reading source and manpages from Linux and other Unix like projects was a source of inspiration.

I'd recommend OpenBSD libc. https://github.com/openbsd/src/tree/master/lib/libc

String functions especially are a good way to get into the C way of thinking. Since C's approach to strings is unique.

I also learned a lot by reading manpages of libc functions or Unix utilities and thinking about how they were implemented, and writing my own little versions.


This. But for C++ and Python.


Fully agree with the principle but maybe let's not call Redis "well-written" :)

Redis is a great piece of software because of how much it impacted the "evolution" of Internet not because of the code quality, Redis's code resemble spaghetti code, a lot.

I will push myself and say that nowadays people would get fired for writing code like that!

And before someone start to throw stones (just for the sake of it), here a few examples: - Redis source code is FULL of GIANT files with thousands and thousands of lines of code, decoupling and separation of concerns are basic things! A few examples -- https://github.com/redis/redis/blob/unstable/src/module.c -- https://github.com/redis/redis/blob/unstable/src/cluster.c -- https://github.com/redis/redis/blob/unstable/src/redis-cli.c -- https://github.com/redis/redis/blob/unstable/src/networking....

- talking about auto generated, the repo contains commited auto generated code, a valid reason though really slips my mind (yes, of course I can think of a few, but ... really? :)))

- https://github.com/redis/redis/blob/9c7c6924a019b902996fc4b6... ... that struct feels like an "Italian minestrone", basically a kind of soup where you throw in, literally, all the vegetables you have around and it's great for a soup, a bit less for software engineering though :)

- the naming structure is a bit random at times, there is a mix of camel case, camel case mixed with underscores from time to time, pascal case

This kind of staff might be alright for a 0.3 or 0.4, maybe even for a 1.0, definitely not for a 7.x that has had 15 years to evolve :/ Even in cachegrand, that hasn't got to a v0.2 yet, I tried to avoid these kinda of dramas, in C is quick and easy to get there if you are not careful.

There are plenty of other small things but most of the big dramas should be there.


Re: auto generated code: it saves ci time which is a pretty huge benefit. You still need to regenerate the code to make sure it's up to date, but you can do that in parallel with tests that rely on that code.


This is a valid reason, but makes sense for code bases where the generation takes tens of minutes or more not seconds.


Here's one example of when to commit auto-generated code: if you use a tool like Cython and want to distribute sources to an end user, it is frequently recommended to distribute the C sources generated by Cython instead of the Cython code itself. The given reason is that the end user may not have access to Cython, but will likely have access to a C compiler. (This is frequently the case on scientific clusters where users don't have superuser privileges.)


Importing a C file into a Python session is a practice that can ---easily--- cause ---a lot--- of headaches, the packages required to build the code as python library and make it importable will vary between systems and make it hard to debug in case of crashes. I personally would distribuite the pre-compiled binaries, CI + docker makes that much easier.

The fact that "can be done" doesn't necessarely mean that "it's the right way" to do it.

Going back to embedding auto-generated code, focusing on this specific case, are you telling me that a line in the requirements.txt, on a machine where the user is expected to have access to a compiler, is the reason to keep auto generated code in the repository? Sorry, personally it seems a flaky reason.

Also, the very fact that you mention that users don't have superuser privileges should be even more of a reason to use virtual environments, where you don't need super privileges, or if you can't or want to use virtual environments you can still have multiple paths in the PYTHONPATH env variable and you can install packages in a specific paths leveraging the --target of pip or the PYTHONUSERBASE env variable, solving the issue as well.

I would prefer to suggest my users to improve their runtime environment than enable them to improve the chaos.


I don't agree that it's a good practice, I'm simply quoting one example to you since you mentioned you were not aware of any. IIRC, you can find many GitHub issue threads on the Cython repo where they go into more detail. It has been a while since I spent any time thinking about this so the details are no longer fresh.

Unfortunately, suggesting that users improve their runtime environment is frequently a nonstarter for different reasons. (One obvious reason: users either don't have time or technical experience to do so.)


Well I tried it and ed9b544e... is god awful, documentation has so many typos. God file 'redis.c' with 3k lines having even commented out code and some comments that I would not pass my code review.

But I love it and it is great recommendation.

Going to move with commits to see how it unfolds :)


I think this is important just to get the idioms down that you otherwise only get on the job. Could pick almost any big active C project. I used to do this and I’d see some construct I didn’t understand and consult my K&R book to learn it. I’d do this in parallel with a book though for sure, K&R preferably. C is so basic that if you pursue it further it will be about learning the projects/companies way of using C.


I've got a single digit number of contributions to projects that I like, mostly because of how daunting it is to familiarize oneself with a codebase, and this 'git log -reverse' trick seems incredible as an entry point to a new project.

That being said, how do you feel about using this technique for repos with 2k+ commits? It seems unrealistic to read all of them, or am I being short-sighted here? I get demotivated just thinking about it.


Reading 2k+ commits is a big waste of time.

Forcing yourself to read every commit to a project you "like" in order contribute to it is putting the cart before the horse. Just because you "like" a project doesn't mean you will have the skills to contribute usefully to it. This isn't a bad thing. I "like" GCC, but I don't have the skills to contribute to it. I'm not losing any sleep over this, and I'm definitely not thinking about reading every commit to GCC in chronological order.

If you use a project, you might find a bug or some deficiency in it. Then it might happen that you have the skills to quickly fix this problem. Either that, or this bug causes you so many problems or slows you down so much that it becomes worth it for you to develop the skills to fix the problem.


I like this approach to FOSS contributions, thanks for the answer, sfpotter.


I'm not sure why GP's post is upvoted so much, IMO it's a bad way to learn. I would instead find an entry point, a self-contained task that doesn't touch a lot of the codebase that you can use to learn something. Then, as you get more comfortable with some part of the codebase you can expand your scope by looking at tasks that touch other things. In general reading code is 50% of the work, the other 50% is to play with the codebase or try to make changes to it and see what happens.


That sounds reasonable, and I do agree that inching forward seems like a more viable approach to contributing in FOSS.

Thanks for the answer.


It's funny because I would never think that this is a good way to learn a codebase. Code changes completely through time, and early phases where nothing is set can be extremely messy and filled with constant refactorings. Commit messages are rarely good. I would rather try to learn a codebase looking at the latest state and the docs.


> It's funny because I would never think that this is a good way to learn a codebase.

It is NOT for learning the codebase.

It is for learning how to program in C.


Ah I see. I might try that with another language then! That makes more sense.


I did that with Git itself. As a bonus you’ll learn more about a tool you’re probably using everyday. If I remember correctly, even the early commits were pretty high quality.


FYI, missing a second dash

  git log -–reverse


Damn, what a great comment.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: