Some time ago, someone from the digital service of Germany reached out and asked about my use case. Maybe there will be an official version of a "Git law" repo someday...
Very cool! I came across your project last year while building https://digebu.de .
I wanted to build an "IDE-inspired" law reader. It has selection highlighting and you can open references within the same window. It scrapes gesetze-im-internet.de daily, processes the XML to JSONS and builds static HTML pages, hosted on Github pages. The entire build process for the 6000+ pages takes 5-10 minutes. It uses up less than <20% of my actions minutes that come with Github pro.
It was a really fun rabbit hole to go down.
What I found most fascinating is that: There doesn't seem to be an official version of the German law. The state just publishes official announcements like "Law X will be changed as follows", "Law X will be removed" or "Law X will be added". So the official version of the German law really is something akin to a git tree. AFAIK, all consolidated versions are created by private entities.
I did a test by picking a law at random, finding the first time it was published and then applying all the changes from subsequent years. Turns out all available versions (gesetze-im-internet, dejure.org, buzer.de) had at least a couple of small mistakes. I found that quite fascinating (and a little scary).
It's also funny how often laws are referenced that don't even exist anymore. The collection of laws really are is as tidy as you would imagine an 80 year old system, where the maintainers change every 5 years, to be.
Has git ever made the necessary updates so that you can have proper datestamps on the 80 yr old laws? Last I had checked, nothing prior to unix epoch can be put into git.
In the example I checked the mistakes wouldn't have changed the interpretation. It were mistakes like additional or missing commas, missing spaces or missing articles.
buzer.de actually has a list of things that differ in their consolidation compared to gesetze-im-internet.de: https://www.buzer.de/quality.htm
In that list you can actually find mistakes that would alter the interpretation. But I think this also sounds worse than it is. It's just a funny thought that whatever source you are using, you are essentially trusting one party to not have made any mistakes, consolidating 1000s of pages of pdfs :)
So then what is the official way to get the latest version?
I mean… how does the state itself handle those laws or are you telling me that every German court and government agency buys those books?
I'm not sure if they still buy the books, but I know from someone who worked as a judge in Germany, that they personally stopped buying the books only ~5-10 years ago, because they saw that the online availability was good enough now.
But my point is that, as far as I know, there is no official version of the final text. The official publications are made in the Bundesgesetzblatt (which had been privatized in the past, but that's another story). The publications might look like this:
1947: We hereby make the following text a law called Grundgesetz "Artikel I: Human dignity is inviolable"
2026: We hereby change the law called Grundgesetz by changing the first article to say "Human or Alien" instead of "Human".
Now there are a lot of entities that will consolidate these changes into a final text. But this consolidation isn't done officially. So, while in this example its easy to see, that in 2026 the law would read "Human and Alien dignity is inviolable", it becomes less clear when these changes are spread over 80 years and are only available as PDFs.
I'm asking as I don't agree on the underlying assumption a use case was needed. I consider the value of transparency and public information for a democratic society as evident.
The question might not have been about the transparency, but more about the choice of having it as a git repository, or whether there are actual tools based on the git repository. Arguably, the git repository is unusable for the majority of people, so it cannot be an answer to transparency in itself, some user-friendly tools based on it might.
I just want to archive the "official" XML files since the "official" website does not provide an archive. For that reason, I also don't change the XML files: The spec is available and everyone can build their own transform (to JSON, XML, whatever) based on their particular needs.
I thought about it, but decided against pre-processing: The repo is meant to be an archive, and the XML spec can be looked up. If I were to introduce a new structure by pre-processing the files, I think that might be a plus for reading, but not for archiving. Whoever has a concrete use case (the "Digebu" website above looks great!), can write their own pre-processor for that use case.
It is basically a timer for your breathing exercises, as the idea is that you inhale and exhale slowly, hold your breath in between, each step lasting 4 seconds.
I tried to make it as simple as possible, and to make it usable with old devices as well.
Thanks! I tried, but I have not yet found a way to make such guides unobtrusive. I will try out your suggestion. It is very motivating to see that people care about my little side project so much that they post a comment. So, thank you, made my day!
Parsing the legal acts with the tools you mention looks very interesting! Currently, I simply collect the published XML files whose structure is optimized for laying out the text and not so much for representing a structure of sections and subsections.
I built this tool for my girl friend. A small nimble website to help you focus on breathing deep, like really deep:
Inhale for four seconds, hold your breath for four seconds, exhale for four seconds, hold your breath for four seconds, and repeat...
Very simple. And yet, taking deep breaths helps you focus on your body and calms your mind. Try it out.
The website is very minimal on purpose – no tracking, no accounts, no newsletter banners, no tracking. Nevertheless, if you miss a feature, please leave a comment.
> “Git for everything“ would be a multi-billion dollar startup easily.
Worked on a “Git for Word” project [1], which is currently on hold.
The diff part was manageable, though not trivial to get diffs that make sense for prose/regular text.
The hard parts are UX/UI (making Git concepts transparent to “normal” users) and merging. Yet without automatic merging, branching is not very convenient.
Would love to collaborate on this in the future again. Reach out if you are working in this space, happy to share.
I’ve had better than expected success with diffing word files by converting them to markdown via pan doc. It’s nowhere near perfect as you lose nearly all formatting, but if only the actual text content is changing it allows you to automate the display of those changes.
I don't think merging will ever be fully solved by software. It's a problem created and solved by process. How annoying merges are is entirely dictated by process.
Sourcetree is the best git GUI I've used. That could be used as a model.
I think an old-style solution to merging would be fine: output a word file that uses a unique font style to indicate which user made what conflicting changes, have the user edit the document and remove all of the "merge styles", then continue.
I wish this was available for legal texts, making it easy to jump from one law to the referenced next legal provision. Many legal provisions, especially in very regulated areas, make use of “functions” “imported” from other, totally different laws.
Sorry for being off-topic, but if anyone knows a resource for that, I am super interested!
I started doing this for a niche area: US and European regulations and guidance documents for Good Laboratory Practice, and later for Canadian Cannabis regulations. Basically I created a standard XML schema for regulations and parsed them into XML [1]. This allowed for e.g. presenting tables of contents and section folding, pulling and linking definitions into their own search engine, etc. [2]
I thought that I could easily write a parser for each jurisdiction's formats, and then get predicate rules and related regulations for free.
I was wrong. a) there are many jurisdictions and sub-groups all doing their own thing; and b) most don't have any standard document formatting or tagging, let alone a defined structure. Even in the most structured formats (like the US eCFR's XML) the focus is on display rather than content. In the worst cases it was just whoever wrote up the Word document chose how they numbered and formatted chapters and sections etc.
There were so many special cases that it was a huge amount of work to add or update each document, and I ended up doing a lot of categorization and fixing by hand.
[1] I know people hate XML on HN, but I did my research and had specific reasons for choosing it at the time, including human readable, nesting sections, being able to easily publish and validate a schema, etc.
[2] See ReadtheRegs.com. You can browse the definitions page without an account.
This looks great! I share your sentiment: I looked into the XML files for the published German legal texts[1], and they seem to be made for display purposes only.
I actually pitched to the American Society of Quality Assurance a few years ago that we should be going to the various governing jurisdictions with a schema and encourage them to publish regulations in a standard format.
The benefits of treating regulations as data are enormous - not only do you have a standard way of consuming and linking regulatory requirements like in an API, you also get discoverability, the ability to make tools (syntax highlighting in legalese!), compile requirements over multiple jurisdictions, and more!
I had difficulty selling the idea among the non-computer-savvy (but technical) regulatory professionals, but I'm sure a few of you on HN can imagine the benefits of having a tree-sitter for legal code...
I could have pushed it further, taking the lead to pitch to the various regulators I work with in my consulting business, but in the end it was just too much work for a side project without interest from my peers.
I completely agree: in a lot of domains, freeform human language provides far more expressive power than you actually need, or want, for communicating ideas. My IANAL understanding of legalese is that it's an attempt to constrain the use of language to be more precise, but from an outsider's point of view it looks needlessly complicated.
In this case I wasn't attempting to constrain the language rather than to capture the structure already implicit in the system - hierarchy of chapters, sections, clauses and sub-clauses, attributes such as definitions and exceptions, cross references, repeals and previous versions, interpretation notes, etc.
While the programmer/engineer in me likes the idea of trying to codify and constrain standard legal terms and grammar to some consistent interpretation, I do think this is an XKCD style oversimplification of a very complex system.
Though IANAL I am a "regulatory QA professional" who has to interpret intent, wording and current enforcement of various food, drug and cannabis regulations every day. It's a complete mess of spaghetti code and undefined behaviour, and worse it's the implied, imprecise and badly worded parts that turn out to be the most important.
It's a moving target of guidance documents, published inspection findings that reveal "the current thinking of the inspectorate" and "industry best practices" with no single point of reference. Not to mention the pharmacopoeia and published standards. Though there are so many ways we could improve things, I doubt you could ever actually get that ideal constrained language without turning it into a billion special cases.
It can be very frustrating to work with, especially trying to convince management why they can't do something that isn't expressly forbidden in the regulations! But this does show exactly why there's so much leaning on intent rather than precise requirements - much like tax code, organisations would and do find money-saving loopholes all the time that might put people at risk, hence the moving target of interpretation and best practices.
> I wish this was available for legal texts, making it easy to jump from one law to the referenced next legal provision. Many legal provisions, especially in very regulated areas, make use of “functions” “imported” from other, totally different laws.
I mean, it "is", to the extent that if you put in the work of hyperlinking all the things during the digitizing process they can be.
Many available options seem to be based on manual annotation and, therefore, cover a limited range of all legal texts. Especially with regard to regulatory topics, those research sites usually fall short.
> Many available options seem to be based on manual annotation
I’m not sure there’s an alternative: if a reference to an other text is complete (and thus fully disambiguated) it’s reasonably easy to infer it, but if it’s only partial and thus ambiguous (e.g. Article 54) then it becomes a lot more problematic: what happens legally if the system misinterprets the reference (e.g. to the current law’s article 54 but nearby contextual clues made it clear that it was some other text’s) and the reader follows this misinterpretation?
I would be interested to know where you're encountering these issues, specifically. I'm interested in legal tech, would like to know where the gaps are
I use Input for about two years and really like it. IMHO you need a high dpi/Retina display, though.
I also like Output, a new font by Input‘s author, but haven‘t tried it yet. Might be a great fonts for usable interfaces
Highly recommended! I started with the Rails Tutorial as well and read The Well-Grounded Rubyist afterwards to get more in-depth with Ruby.
After that, I bought Bob Race‘s tutorial Build a Saas App with Rails which covers more practical aspects of building a multi-user SaaS app using well-known gems (instead of building everything by myself as in the Rails tutorial): https://leanpub.com/basair6
https://github.com/jandinter/gesetze-im-internet
I scrape the official website (https://www.gesetze-im-internet.de) once a week. The repository contains the "official" XML files with a formatting that is more focussed on presentation than on the logical structure of the legal acts, unfortunately (https://www.gesetze-im-internet.de/dtd/1.01/gii-norm.dtd).
Some time ago, someone from the digital service of Germany reached out and asked about my use case. Maybe there will be an official version of a "Git law" repo someday...