Good practices for app configuration storage? - configuration

We have a number of loosely coupled apps, some in PHP and some in Python.
It would be beneficial to have some centralized place where they could get both global and app-specific configuration information.
Something like, for Python:
conf=config_server.get_params(url='http://config_server/get/My_app/all', auth=my_auth_data)
and then ideally use parameters as potentially nested attributes, eg. conf.APP.URL, conf.GLOBAL.MAX_SALES
I was considering making my own config server app, but wasn't sure, what would be the pros and cons of such approach vs. eg. storing config in centralized database or any other multiple-site accessible mode.
Also, if I perhaps was missing some readily available tool with good support, which could do this (I had a look at Puppet and Ansible, but they seemed to be very evolved tools for doing so much more than this. I also looked at software recommnedation SE for this, but they have a number of such question unanswered already).

I think it would be a good idea for your configuration mechanism not to be hard-coded to obtain configuration data via a particular technology (such as file, web server or database), but rather be able to obtain configuration data from any of several different technologies. I illustrate this with the following pseudo-code examples:
cfg = getConfig("file.cfg"); # from a file
cfg = getConfig("file#file.cfg"); # also from a file
cfg = getConfig("url#http://config_server/file.cfg"); # from the specified URL
cfg = getConfig("exec#getConfigFromDB.py"); # from stdout of command
The parameter passed to getConfig() might be obtained from, say, a command-line option. The "exec#..." format is a flexible mechanism, but carries the potential danger of somebody specifying a malicious command to execute, for example, "exec#rm -rf /".
This approach means you can experiment with whatever you consider to be an ideal source-of-configuration-data technology and later, if you discover that technology to be inappropriate, it will be trivial to discard it and use a different source-of-configuration-data technology instead. Indeed, the decision for which source-of-configuration-data technology to use might vary from one use case/user to another.
I developed a C++ and Java configuration file parser (sorry, no Python or PHP implementations) called Config4*. If you look at chapters 2 (overview of syntax) and 3 (overview of API) of the Config4* Getting Started Guide, you will notice that it supports the kind of flexible approach I discuss in this answer (the "url#... format is not supported, but "exec#curl -sS ..." provides the same functionality). 99 percent of the time, I end up using configuration files, but I find it comforting to know that my applications can trivially switch to using a different-source-of-configuration-data technology whenever the need might arise.

Related

Are there any configuration libraries that provide a native implementations for merging modifications found in split files (as seen in .d folders)

I've been having trouble sorting through the noise while trying to find an answer to this. There are certainly a lot of libraries available for handling configuration files but what I'm looking for is an answer to whether there is a solution available for this specific kind of split configuration.
On Linux systems I've found that it's not uncommon to find a program which has split its default configuration away from user modifications by instructing the user to place a subset of the default configuration into a separate folder (commonly found with a .d suffix). These changes override what is found in the default configuration and provide a very easy way to track at a later date what has been modified.
There is a wide variety of incompatible syntaxes used across different configuration files, and I am not aware of a library that can parse them all. But if you are willing to restrict yourself to just one configuration syntax, then two answers to your question come to mind.
The first is to use a scripting language as your configuration file syntax. Let's assume an application reads both default.cfg and override.cfg (in that order). If default.cfg contains name="john", then override.cfg might contain name="mary", which will have the effect of overriding the value of name. This occurs because the shell interpreter provides a common place to store the global variables assigned within the script files. The following pseudocode shows how your application could interact with a scripting language interpreter:
interp = new Interpreter();
interp.executeFile("default.cfg");
interp.executeFile("override.cfg");
name = interp.getValueOfVariable("name");
The second is to use Config4*, which is a Configuration-file parser I wrote; it is available in C++ and Java. I recommend you read Chapter 2 (Overivew of Config4* Syntax) in the Config4* Getting Started Guide. The "adaptive configuration" abilities of Config4* seem to fit the question I think you are asking. If, after reading Chapter 2, you agree, then read Chapter 3, which provides an overview of the C++ and Java API. That will probably be enough to get you started.

Configure applications using environment variables

12-Factor Apps suggest that you configure your application using environment variables. So far, so good. I can easily imagine that this is a good way to do it if you need to set a connection string, e.g.
But what if you have more complex configuration with lots and lots of values? I for sure do not want to have 50+ environment variables, do I?
How could I solve this, and still be compliant to the idea of 12-Factor Apps?
From a quick read of the configure link you provided, I agree with the author's claim that there is a widespread problem, but I am not convinced that their proposed solution is going to always be best. Like you, I don't relish the idea of having to define dozens of environment variables to configure an application. So here are some alternative ideas.
First, read Chapter 2 of the Config4* Getting Started Guide (disclaimer: I am the main author of that software). In particular, notice that its support for what I call adaptive configuration can go a long way towards addressing the concern that you ask about. Is Config4* the ultimate solution? Possibly not, but I think it is a good step in the right direction.
Second, the chances are that whatever application you are developing/maintaining has already settled on a particular configuration technology, such as XML files or Java property files, and it won't be feasible to migrate to using Config4*. This raises the question: is there anything you can do to avoid having a proliferation of, say, XML-based configuration files when you have multiple environments (such as dev, UAT, staging and production) in which the application will be deployed? I have outlined an approach for dealing with this issue in another StackOverflow article.

Configuration Promotion Between Environments

What is a good way to coordinate configuration changes through environments?
In an effort to decouple our code from the environment we've moved all environmental config to external files. So maybe the application will look for ${application.config.dir}/app.properties and app.properties could contain:
user.auth.endpoint=http://some.url/user
user.auth.apikey=abcd12314
The problem is, user.auth.endpoint needs to point to a test resource when on test, a staging resource when on the staging environment, and a production resource when on prod.
We could maintain different copies of the config file but this would violate DRY and become very unwieldy (there are 20+ production environments).
What's a good way to manage this? What tools should I be searching for?
Externalizing config is a good idea, you could externalize them all the way to environment variables.
Env vars are easy to change between deploys without changing any code;
unlike config files, there is little chance of them being checked into
the code repo accidentally; and unlike custom config files, or other
config mechanisms such as Java System Properties, they are a language-
and OS-agnostic standard.
From http://12factor.net/config
I know of three approaches to this.
The first approach is to write, say, a Python "wrapper" script for your application. The script will find out some environmental details, such as hostname, user name and values of environment variables, and then construct the appropriate configuration file (or a set of command-line options) that is passed to the real application.
The second approach is to embed an interpreter for a scripting language (Python, Lua and Tcl come to mind) into your application. This makes it possible for you to write a configuration file in the syntax of that embedded scripting language. In this way, the configuration file can make use of the scripting language's features, such as the ability to query environment variables or execute a shell command (such as hostname) and use if-then-else statements to set variables appropriately.
The third approach (if you are using C++ or Java) is to use the open-source Config4* library (disclaimer, I am the main developer of that). I recommend you read Chapter 2 of the "Config4* Getting Started" manual to see examples of how its flexible syntax can enable a single configuration file adapt to multiple environments.
You can take a look at http://www.configapp.com. You work with 1 configuration file, and switch/tab between the environments. Internally it's just 1 configuration file, and it handles the environment variables and generates the config file for the specific environment. In Config terminology, you have 1 Prod environment with 20+ instances. You will have a Prod environment configuration and you can tweak the 20+ instances accordingly using a web interface.
You moved environment specific properties to a separate file, but with Config, you don't have to do that. With Config, you can have 1 configuration file, with environment variables support, and common configuration applied to all environments.
Note that I'm part of the Config team.

What are logging libraries for?

This may be a stupid question, as most of my programming consists of one-man scientific computing research prototypes and developing relatively low-level libraries. I've never programmed in the large in an enterprise environment before. I've always wondered, what are the main things that logging libraries make substantially easier than just using good old fashioned print statements or file output, simple programming logic and a few global variables to determine how verbosely things get logged? How do you know when a few print statements or some basic file output ain't gonna cut it and you need a real logging library?
Logging helps debug problems especially when you move to production and problems occur on people's machines you can't control. Best laid plans never survive contact with the enemy, and logging helps you track how that battle went when faced with real world data.
Off the shel logging libraries are easy to plug in and play in less than 5 minutes.
Log libraries allow for various levels of logging per statement (FATAL, ERROR, WARN, INFO, DEBUG, etc).
And you can turn up or down logging to get more of less information at runtime.
Highly threaded systems help sort out what thread was doing what. Log libraries can log information about threads, timestamps, that ordinary print statements can't.
Most allow you to turn on only portions of the logging to get more detail. So one system can log debug information, and another can log only fatal errors.
Logging libraries allow you to configure logging through an external file so it's easy to turn on or off in production without having to recompile, deploy, etc.
3rd party libraries usually log so you can control them just like the other portions of your system.
Most libraries allow you to log portions or all of your statements to one or many files based on criteria. So you can log to both the console AND a log file.
Log libraries allow you to rotate logs so it will keep several log files based on many different criteria. Say after the log gets 20MB rotate to another file, and keep 10 log files around so that log data is always 100MB.
Some log statements can be compiled in or out (language dependent).
Log libraries can be extended to add new features.
You'll want to start using a logging libraries when you start wanting some of these features. If you find yourself changing your program to get some of these features you might want to look into a good log library. They are easy to learn, setup, and use and ubiquitous.
There are used in environments where the requirements for logging may change, but the cost of changing or deploying a new executable are high. (Even when you have the source code, adding a one line logging change to a program can be infeasible because of internal bureaucracy.)
The logging libraries provide a framework that the program will use to emit a wide variety of messages. These can be described by source (e.g. the logger object it is first sent to, often corresponding to the class the event has occurred in), severity, etc.
During runtime the actual delivery of the messaages is controlled using an "easily" edited config file. For normal situations most messages may be obscured altogether. But if the situation changes, it is a simpler fix to enable more messages, without needing to deploy a new program.
The above describes the ideal logging framework as I understand the intention; in practice I have used them in Java and Python and in neither case have I found them worth the added complexity. :-(
They're for logging things.
Or more seriously, for saving you having to write it yourself, giving you flexible options on where logs are store (database, event log, text file, CSV, sent to a remote web service, delivered by pixies on a velvet cushion) and on what is logged at runtime, rather than having to redefine a global variable and then recompile.
If you're only writing for yourself then it's unlikely you need one, and it may introduce an external dependency you don't want, but once your libraries start to be used by others then having a logging framework in place may well help your users, and you, track down problems.
I know that a logging library is useful when I have more than one subsystem with "verbose logging," but where I only want to see that verbose data from one of them.
Certainly this can be achieved by having a global log level per subsystem, but for me it's easier to use a "system" of some sort for that.
I generally have a 2D logging environment too; "Info/Warning/Error" (etc) on one axis and "AI/UI/Simulation/Networking" (etc) on the other. With this I can specify the logging level that I care about seeing for each subsystem easily. It's not actually that complicated once it's in place, indeed it's a lot cleaner than having if my_logging_level == DEBUG then print("An error occurred"); Plus, the logging system can stuff file/line info into the messages, and then getting totally fancy you can redirect them to multiple targets pretty easily (file, TTY, debugger, network socket...).

Inspiration on how to build a great command line interface

I am in the process of building interactive front-ends to a
distributed application which to date has been used to run workloads
that had a batch-job like structures and needed no UI at all. The application is mostly written in Perl and C and runs on a mix of Unix and Windows machines, but I think this isn't relevant to the UI.
The first such frontend is going have a command-line user interface --
currently, I envision something similar to the CLIs of the Procurve
switches and Cisco routers that I have worked with.
Like modern network gear CLIs, commands are going to resemble
simple sentences, (i.e. show vlans ports 1-4) and the CLI will
have some implicit state, much in the way that Unix shells and
cmd.exe in Windows have environment variables and current working
directories. Moreover, I'd like to implement great tab completion that
is aware of the application's state as much as possible and I want to be able to do that with as
little application-specific code as possible.
The low-level functionality (terminal I/O) seems easy to implement on
top of GNU Readline or similar libraries, but that's only where the
real fun starts. So far I have looked at the Perl modules
Term::Shell
and
Term::ShellUI,
but I'm not convinced that I want to use either of them. I am still
considering rolling my own solution and at the moment I am primarily looking for
inspiration.
Can you recommend any application or library, regardless of
implementation language, that implements a good CLI from which I can
borrow ideas?
I suggest you take a look at the philosophy underlying Microsoft PowerShell. From the idea of piping typed objects between commands to the consistency of its commands and argument syntax, I think it can be a source of inspiration.
You could try having a look at libcli :
"Libcli provides a shared library for
including a Cisco-like command-line
interface into other software."
http://code.google.com/p/libcli/
BTW - I forgot to mention that it is GNU Lesser GPL and actually used by Cisco in some products.
As for your last sentence/question, I'm particularly fond of zsh completion and line editing (zle).