”Config files are usually pretty self explanatory”
In the universe where I live, most settings in a config file are pretty explanatory, but also, almost every config file has at least one setting that isn’t.
Also, even if I clearly understand the configuration file I’m reading, figuring out how I can change it often isn’t clear at all:
- If a json attribute contains a file path, can it be a URL or S3 object reference, too?
- does that path to a .gz file in an attribute mean input must be gzipped, or does the program look at the extension? If the latter, what compression methods are supported?
- If I don’t want to write that second output file, do I leave out the ‘extraOutput: foo.txt’ path setting, set it to null, set it to an empty string, or add an optional attribute ‘produceExtraOutput: false’ that defaults to ‘true’?
- what is the range of acceptable values for that ‘foo’ attribute that has a value of 42 in the config file I’m reading?
- is that ‘progressInterval’ value measured in bytes read, input lines, records processed, lines written, or wall time seconds?
- what other options that aren’t in this config file might I want to set in mine?
Yes, all of that could be in the documentation, but it’s easier for me if the config file contains comments describing that, and, in my experience, it also is easier for programmers writing the code to add such comments to the sample config file than to keep a documentation page in sync with the code and the sample config file.
Finally, some things cannot be in the documentation of the tool because they are local changes. For example, logging config could deviate from company standard for a reason.
In the universe where I live, most settings in a config file are pretty explanatory, but also, almost every config file has at least one setting that isn’t.
Also, even if I clearly understand the configuration file I’m reading, figuring out how I can change it often isn’t clear at all:
- If a json attribute contains a file path, can it be a URL or S3 object reference, too?
- does that path to a .gz file in an attribute mean input must be gzipped, or does the program look at the extension? If the latter, what compression methods are supported?
- If I don’t want to write that second output file, do I leave out the ‘extraOutput: foo.txt’ path setting, set it to null, set it to an empty string, or add an optional attribute ‘produceExtraOutput: false’ that defaults to ‘true’?
- what is the range of acceptable values for that ‘foo’ attribute that has a value of 42 in the config file I’m reading?
- is that ‘progressInterval’ value measured in bytes read, input lines, records processed, lines written, or wall time seconds?
- what other options that aren’t in this config file might I want to set in mine?
Yes, all of that could be in the documentation, but it’s easier for me if the config file contains comments describing that, and, in my experience, it also is easier for programmers writing the code to add such comments to the sample config file than to keep a documentation page in sync with the code and the sample config file.
Finally, some things cannot be in the documentation of the tool because they are local changes. For example, logging config could deviate from company standard for a reason.