Hacker News new | past | comments | ask | show | jobs | submit login

Dealing with configuration files is a problem that extends well beyond distributed systems, or even systems programming in general. It's certainly a very important issue.

One of the fundamental problems with configuration files is that they are written in ad-hoc external DSLs. This means that the config files behave nothing like actual software--instead, they have their own special rules to follow and their own semantics to understand. These languages also tend to have very little abstractive power, which leads to all sort of incidental complexity and redundancy. Good software engineering and programming language practices are simply thrown away in config files.

Some of these issues are mitigated in part by using a well-understood format like XML or even JSON. This is great, but it does not go far enough: config files still stand alone and still have significant problems with abstraction and redundancy.

I think a particularly illuminating example is CSS. When you get down to it, CSS is just a configuration file for how your website looks. IT certainly isn't a programming language in the normal sense of the word. And look at all the problems CSS has: rules are easy to mess up, you end up copying and pasting the same snippets all over the place and css files quickly become tangled and messy.

These problems are addressed to a first degree by preprocessors like Sass and Less. But they wouldn't have existed in the first place if CSS was an embedded DSL instead of a standalone language.

At the very least, being an embedded language would give it access to many of the features preprocessors provide for free. You would be able to pack rules into variables, return them from functions and organize your code in more natural ways. Moreover, you could also take advantage of your language's type system to ensure each rule only had acceptable values. Instead of magically not working at runtime, your mistakes would be caught by compile time.

This is particularly underscored by how woefully complex CSS3 is getting. A whole bunch of the new features look like function calls or normal code; however, since CSS is not a programming language, it has to get brand new syntax for this. This is both confusing and unnecessary: if CSS was just embedded in another language, we could just use functions in the host language.

I think this approach is promising for configuration files in general. Instead of arbitrary custom languages, we could have our config files be DSL shallowly embedded in whatever host language we're using. This would simultaneously give us more flexibility and make the configuration files neater. If done correctly, it would also make it very easy to verify and process these configuration files programmatically.

There has already been some software which has taken this approach. Emacs has all of its configuration in elisp, and it's probably the most customized software in existence. My .emacs file is gigantic, and yet I find it much easier to manage than configuration for a whole bunch of other programs like Apache.

Another great example is XMonad. This is really more along the lines of what I really want because it lets you take advantage of Haskell's type system when writing or editing your configuration file. It catches mistakes before you can run them. Since your config file is just a normal .hs file, this also makes it very easy to add new features: for example, it would be trivial to support "user profiles" controlled by an environment variable. You would just read the environment variable inside your xmonad.hs file and load the configuration as appropriate.

Specifying configuration information inside a general-purpose language--probably as an embedded DSL--is quite a step forward from the ad-hoc config files in use today. It certainly won't solve all the problems with configuration in all fields, but I think it would make for a nice improvement across the board.




> This means that the config files behave nothing like actual software--instead, they have their own special rules to follow and their own semantics to understand. These languages also tend to have very little abstractive power, which leads to all sort of incidental complexity and redundancy. Good software engineering and programming language practices are simply thrown away in config files.

I have to disagree that this is the critical problem with system configuration. The typical config has orders of magnitude less complexity than the typical software project. The problem with config files is not the complexity of the configuration itself (except perhaps in Java), but the complexity of the global effects they produce. It doesn't matter how clean and DRY you make your config, and how well-tested, if the software supports too many options that can interact in unpredictable ways, or worse, produce subtle outward facing changes that ripple out to systems that interact with yours.


It's certainly not the only problem with systems configuration. However, it is a broad problem: it applies to more than just systems. And, as you may have guessed from my examples, I don't really do systems stuff all that much. But I do deal with a bunch of other configuration files and formats!

It's more a comment on config formats in general.


Well I agree the proliferation of formats is a problem, but I think it's an intractable one. General languages solve the problem of "I can't use config language X because it's missing feature Y", but for every project you bring on board you lose one because they don't want that much power in their config files.


You really don't want to use a Turing-complete language for configuration. It makes all sorts of things impossible such as:

- Automatically scanning for insecure configurations (eg. OpenSCAP)

- Parsing the configuration in other programs.

- Modifying the configuration programmatically (cf. Puppet et al)

Also, http://augeas.net/


Those are the things that would actually get easier: your configuration would be a data structure in the host language rather than a text file. So you wouldn't have to parse it at all. Other programs would have to either be in the same language or have some way of communicating with the host language, but that isn't too bad a restriction.

Similarly, this would make modifying configuration programmatically better. Instead of dealing with serializing and deserializing to random text formats, you would just provide a Config -> Config function. This would also make tools modifying configuration parameters more composable because they wouldn't be overwriting each other's changes unnecessarily.


On the off chance you see this later, I think you want a Turing-complete language, but one without an interesting standard library.

> - Automatically scanning for insecure configurations (eg. OpenSCAP)

Since the language can't access the outside world, the worst it can do is use unbounded space or time. Just verify that it halts in a couple ms.

> - Parsing the configuration in other programs.

You don't parse, you embed an interpreter and execute.

> - Modifying the configuration programmatically (cf. Puppet et al)

Code generation isn't that hard.


While reading your comment, I thought of the problems with configuration files I was having in my own project... when I realized, "why not just use python"?

The program (written in go) could simply launch a new process on startup, e.g.

  python -c "import imp; c = imp.load_source('c', 'abc.conf'); print '\n'.join('%s = %s' % (n, repr(getattr(c, n))) for n in ['option1', 'option2', 'option3'])"
and read it's output (which could also be another well-known format, like JSON) and use it for configuration. Simple, elegant, and enables very powerful abstractions.


Using a standardized, real programming language (TCL) as a standard for configuration files was suggested something like 15 years ago. RMS shot it down, saying the GNU would never use it, that they would build their own format that all the GNU tools would use (which IIRC never materialized). The problem isn't creating a good language, it's persuading the linux community to standardize on something.


Well, TCL is not a good language for configuration files imho. I'd favor a declarative approach, because you can analyse (hence debug, check, etc) it better. Inspired by Prolog, but change the syntax. A configuration language should also avoid features like "eval", though Turing-completeness is probably desirable.

Nevertheless I agree that standardization is a problem. Not for the "linux community", but in general. Some people do not require Turing-completeness and (reasonably) do not want it. Others already have a scripting language (Lua,TCL,Python,etc) embedded, so reusing it is a good idea.


Turing completeness is maybe not desirable... look at what happened when YAML was allowed to create objects of any class it liked.

I like your declarative language idea very much, though. As long as it cannot execute arbitrary code.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: