I agree, or at least simplicity is often given lip service while complexity is implemented because it’s easier.
> 3 Git repos, each submoduling and iterating upon the last
Not throwing stones, just curious — is that really the simplest approach? I would have thought one repo, with folders for source, raw articles, and website. You’ve obviously given it a lot of thought. Why this model?
I was going to call this out as something I super loved.
> The first repo archives the page as-is with a cronjob. The second repo turns the first repo into cleaned-up Markdown with an English translation with another cronjob. The third repo turns the cleaned-up Markdown into a Hugo website with a third cronjob.
This is a super awesome pattern, imo.
You all never have two systems try to make commits onto the same head, no clashing. If one process goes bad, you don't have to meddle with interweaved history, your source history is pure. It also makes the event sourcing easy; when a repo changes the next process starts, where-as you have to start filtering down your edge triggering if you conflate concerns together. In this case the author is using cron jobs, but this would be a viable medium-throughput modest-latency event-source system with little change!
This to me is much simpler. There's little appealing to me about conflating streams. Having the history show up as a continuous record within a repo could be seen as an advantage, but even still I'd rather make a fourth repo that merges the streams after the fact.
Simplicity is in the eye of beholder. Not conflating different concerns feels to me much simpler.
This mirrors how many big data / data warehousing pipelines work. Originally a lot of transformations tended to get pulled into the same flow as moving data around, but storage has gotten progressively cheaper and this is so much simpler and durable.
That's essentially why I gravitated to it, yeah. To me, Git submodules are a pretty generalized and elegant solution to the question of "how do I model a directory of files as an immutable data structure". If you know the commit hash of the submodule, you know exactly what is going to be in there. That kind of certainty is very helpful when trying to code up tricky data cleaning and the like.
My usual approach to submodules is to keep running commands until something sticks—probably not the most scientific approach.
If you're going to have multiple repos I find it cleaner and more convenient to use your language's packaging system; each project becomes just another dependency.
I wish it wasn’t such busywork rolling your own packages. It feels like a homework assignment even with generators. Lazy code is more fun to write for sure, faster to results.
I agree, or at least simplicity is often given lip service while complexity is implemented because it’s easier.
> 3 Git repos, each submoduling and iterating upon the last
Not throwing stones, just curious — is that really the simplest approach? I would have thought one repo, with folders for source, raw articles, and website. You’ve obviously given it a lot of thought. Why this model?