Command line parameters or configuration file? - configuration

I'm developing a tool that will perform several types of analysis, and each analysis can have different levels of thoroughness. This app will have a fair amount of options to be given before it starts. I started implementing this using a configuration file, since the number of types of analysis specified were little. As the number of options implemented grew, I created more configuration files. Then, I started mixing some command line parameters since some of the options could only be flags. Now, I've mixed a bunch of command line parameters with configuration files and feel I need refactoring.
My question is, When and why would you use command line parameters instead of configuration files and vice versa?
Is it perhaps related to the language you use, personal preference, etc.?
EDIT: I'm developing a java app that will work in Windows and Mac. I don't have a GUI for now.

Command line parameters are useful for quickly overriding some parameter setting from the configuration file. As well, command line parameters are useful if there are not so many parameters. For your case, I'd suggest that you export parameter presets to command line.

Command line arguments:
Pros:
concise - no extra config files to maintain by itself
great interaction with bash scripts - e.g. variable substitution, variable reference, bash math, etc.
Cons:
it could get very long as the options become more complex
formatting is inflexible - besides some command line utilities that help you parse the high level switches and such, anything more complex (e.g. nested structured information) requires custom syntax such as using Regex, and the structure could be quite rigid - while JSON or YAML would be hard to specify at the command line level
Configuration files:
Pros:
it can be very large, as large as you need it to be
formatting is more flexible - you can use JSON, YAML, INI, or any other structural format to represent the information in a more human consumable way
Cons:
inflexible to interact with bash variable substitutions and references (as well as bash math) - you have to probably define your own substitution rules if you want the config file to be "generic" and reusable, while this is the biggest advantage of using command line arguments - variable math would be difficult in config files (if not impossible) - you have to define your own "operator" in the config files, or you have to rely on another bash script to carry out the variable math, and perform your custom variable substitution so the "generic" config file could become "concretely usable".
for all that it takes to have a generic config file (with custom defined variable substitution rules) ready, a bash script is still needed to carry out the actual substitution, and you still have to code your command line to accept all the variable substitutions, so either you have config files with no variable substitution, which means you "hard code" and repeat the config file for different scenarios, or the substitution logic with custom variable substitution rules make your in-app config file logic much more complex.
In my use case, I value being able to do variable substitution / reference (as well as bash math) in the bash scripts more important, since I'm using the same binary to start many server nodes with different responsibilities in a server backend cluster, and I kind of use the bash scripts as sort of a container or actually a config file to start the many different nodes with differing command line arguments.

my vote = both ala mysqld.exe

What environment/platform? In Windows you'd rather use a config file, or even a configuration panel/window in the gui.

I place configuration that don't really change in a configuration file.
Configuration that change often I place on the command-line.

Related

How do I find where a function is declared in Tcl?

I think this is more of a Tcl configuration question rather than a Tcl coding question...
I inherited a whole series of Tcl scripts that are used within a simulation tool that my company built in-house. In my scripts, I'm finding numerous instances where there are function calls to functions that don't seem to be declared anywhere. How can I trace the path to these phantom functions?
For example, rather than use source, someone build a custom include function that they named INCLUDE. Tclsh obviously balks when I try to run it there, but with my simulation software, it runs fine.
I've tried grep-ing through the entire simulation software for INCLUDE, but I'm not having any luck. Are there any other obvious locations outside the simulation software where a Tcl function might be defined?
The possibilities:
Within your software. (you have checked for this).
Within some other package included by the software.
Check and see if the environment variable TCLLIBPATH is set.
Also check and see if the simulation software sets TCLLIBPATH.
This will be a list of directories to search for Tcl packages, and you
will need to search the packages that are located outside of the
main source tree.
Another possibility is that the locations are specified in the pkgIndex.tcl file.
Check any pkgIndex.tcl files and look for locations outside the main source tree.
Within an unknown command handler. This could be in
your software or within some other package. You should be able to find
some code that processes the INCLUDE statement.
Within a binary package. These are shared libraries that are loaded
by Tcl. If this is the case, there should be some C code used to
build the shared library that can be searched.
Since you say there are numerous instances of unknown functions, my first
guess is that you have
not found all the directories where packages are loaded from. But an
''unknown'' command handler is also a possibility.
Edit:
One more possibility I forgot. Check and see if your software sets the auto_path variable. Check any directories added to the auto_path for
other packages.
This isn't a great answer for you, but I suspect it is the best you're going to get...
The procedure could be defined in a great many places. Your best bet for finding it is to use a tool like findstr (on Windows) or grep -R (on POSIX platforms) to search across all the relevant source files. But that still might not help! It might not be a procedure but instead a general command, which could be implemented in C and not as a procedure, or it could be defined in a packaged application archive (which are usually awkward to look inside). There are also other types of script-implemented command too, which could make things awkward. Generally searching and investigating is your best bet, but it might not work.
Tcl doesn't really differentiate strongly between different types of command except in some introspection operations. If you're lucky, you could find that info body tells you the definition of the procedure (and info args and info default tell you about the arguments) but that won't help with other command types at all. Tcl 8.7 will include a command (info cmdtype) that would help a lot with narrowing down what to do next, but that's no use to you now and it definitely doesn't exist in older versions.

What are some best practices for handling large logstash configuration files?

I about to deploy a logstash instance that will handle a variety of inputs and do multiple filter actions. The configuration file will most likely end up having a lot of if-then statements given the complexity and number of the inputs.
My questions are:
Is there any way to make the configuration file more 'modular'? In a programming sense, I would create functions/subroutines so that I could test independently. I've thought about dynamically creating mini-configuration files that I can use for testing. These mini files could then be combined into one production configuration.
Are there any "best practices" for testing, deploying and managing more complicated Logstash configurations?
Thanks!
There's no support for functions/subroutines per se. I break up different filters into separate files to keep a logical separation and avoid having gigantic files. I also have inputs and outputs in different files. That way I can combine all filters with debug inputs/output, for example
input {
stdin {}
}
output {
stdout {
codec => rubydebug
}
}
and invoke Logstash by hand to inspect the results of given input. Since filter ordering matters I'm using the fact that Logstash reads configuration files in alphabetical order, so the files are named NN-some-descriptive-name.conf, where NN is an integer.
I've also written a script that automates this process by letting you write a spec with test inputs and the expected resulting messages, and if there's a mismatch it'll bail out with an error and display the diff. I may be able to open source it.
As for deployment, use any configuration management system like Puppet, Chef, SaltStack, Ansible, CFEngine, or similar that you're familiar with. I'm quite happy with Ansible.
As #Magnus Bäck stated, the answer to 1. is no. currently there is no support for functions.
But as for your second question, there is a way to make the logstash configuration more modular. you can split the configuration file to multiple files, and point logstash to the files directory.
check the directory option in logstash man:
-f, --config CONFIG_PATH Load the logstash config from a specific file
or directory. If a direcory is given, all
files in that directory will be concatonated
in lexicographical order and then parsed as a
single config file. You can also specify
wildcards (globs) and any matched files will
be loaded in the order described above.

Configuration Promotion Between Environments

What is a good way to coordinate configuration changes through environments?
In an effort to decouple our code from the environment we've moved all environmental config to external files. So maybe the application will look for ${application.config.dir}/app.properties and app.properties could contain:
user.auth.endpoint=http://some.url/user
user.auth.apikey=abcd12314
The problem is, user.auth.endpoint needs to point to a test resource when on test, a staging resource when on the staging environment, and a production resource when on prod.
We could maintain different copies of the config file but this would violate DRY and become very unwieldy (there are 20+ production environments).
What's a good way to manage this? What tools should I be searching for?
Externalizing config is a good idea, you could externalize them all the way to environment variables.
Env vars are easy to change between deploys without changing any code;
unlike config files, there is little chance of them being checked into
the code repo accidentally; and unlike custom config files, or other
config mechanisms such as Java System Properties, they are a language-
and OS-agnostic standard.
From http://12factor.net/config
I know of three approaches to this.
The first approach is to write, say, a Python "wrapper" script for your application. The script will find out some environmental details, such as hostname, user name and values of environment variables, and then construct the appropriate configuration file (or a set of command-line options) that is passed to the real application.
The second approach is to embed an interpreter for a scripting language (Python, Lua and Tcl come to mind) into your application. This makes it possible for you to write a configuration file in the syntax of that embedded scripting language. In this way, the configuration file can make use of the scripting language's features, such as the ability to query environment variables or execute a shell command (such as hostname) and use if-then-else statements to set variables appropriately.
The third approach (if you are using C++ or Java) is to use the open-source Config4* library (disclaimer, I am the main developer of that). I recommend you read Chapter 2 of the "Config4* Getting Started" manual to see examples of how its flexible syntax can enable a single configuration file adapt to multiple environments.
You can take a look at http://www.configapp.com. You work with 1 configuration file, and switch/tab between the environments. Internally it's just 1 configuration file, and it handles the environment variables and generates the config file for the specific environment. In Config terminology, you have 1 Prod environment with 20+ instances. You will have a Prod environment configuration and you can tweak the 20+ instances accordingly using a web interface.
You moved environment specific properties to a separate file, but with Config, you don't have to do that. With Config, you can have 1 configuration file, with environment variables support, and common configuration applied to all environments.
Note that I'm part of the Config team.

Best way to configure a run script?

I am tasked with supportting a run script that uses environment variables to determine which tools to use, which directories to grab source files from, etc. This does not seem like the best technique to me. It seems like it would be much better to have configuration files that set all these things and have the run script parse this instead of relying on environment variables. For one thing it would allow others to run your tests ver easily (just point to the config file) and less prone to errors (environment variables getting contaminated) and easier to debug. I thought I had also read somewhere that best practices was to use an explict config file for these types of things.
I just wanted to get everyones thoughts on this.
Yes, it's often helpful to keep config separate from code (although I've seen this taken to ridiculous extremes with either too many "configurables", or long chains of dependent config files where one or two be fine).
One simple step could be moving the environment variables into a separate file and have the original script "source" the new file (which effectively becomes your "config file") - minimal changes and no additional parsing required.

The use of config file is it equivalent to use of globals?

I've read many times and agree with avoiding the use of globals to keep code orthogonal. Does the use of the config file to keep read only information that your program uses similar to using Globals?
If you're using config files in place of globals, then yes, they are similar.
Config files should only be used in cases where the end-user (presumably a computer-savvy user, like a developer) needs to declare settings for an application or piece of code, while keeping their hands out of the code itself.
My first reaction would be that it is not the same. I think the problem with globals is the read+write scenario. Config-files are readonly (at least in terms of execution).
In the same way constants are not considered bad programming behaviour. Config-files, at least in the way I use them, are just easy-changable constants.
Well, since a config file and a global variable can both have the effect of propagating changes throughout a system - they are roughly similar.
But... in the case of a configuration file that change is usually going to take place in a single, highly-visible (to the developer) location, and global variables can affect change in very sneaky and hard to track down ways -- so in this way the two concepts are not similar.
Having a configuration file ususally helps with DRY concepts, and it shouldn't hurt the orthogonality of the system, either.
Bonus points for using the $25 word 'orthogonal'. I had to look that one up in Wikipedia to find out the non-Euclidean definition.
Configuration files are really meant to be easily editable by the end user as a way of telling the program how to run.
A more specialized form of configuration files, user preferences, are used to remember things between program executions.
Global is related to a unique instance for an object which will never change, whereas config file is used as container for reference values, for objects within the application that can change.
One "global" object will never change during runtime, the other object is initialized through config file, but can change later on.
Actually, those objects not only can change during the lifetime of the application, they can also monitor the config file in order to realize "hot-change" (modification of their value without stopping/restarting the application), if that config file is modified.
They are absolutely not the same or replacements for eachother. A config file, or object can be used non-globally, ie passed explicitly.
You can of course have a global variable that refers to a config object, and that would be defeating the purpose.