What are some best practices for handling large logstash configuration files? - configuration

I about to deploy a logstash instance that will handle a variety of inputs and do multiple filter actions. The configuration file will most likely end up having a lot of if-then statements given the complexity and number of the inputs.
My questions are:
Is there any way to make the configuration file more 'modular'? In a programming sense, I would create functions/subroutines so that I could test independently. I've thought about dynamically creating mini-configuration files that I can use for testing. These mini files could then be combined into one production configuration.
Are there any "best practices" for testing, deploying and managing more complicated Logstash configurations?
Thanks!

There's no support for functions/subroutines per se. I break up different filters into separate files to keep a logical separation and avoid having gigantic files. I also have inputs and outputs in different files. That way I can combine all filters with debug inputs/output, for example
input {
stdin {}
}
output {
stdout {
codec => rubydebug
}
}
and invoke Logstash by hand to inspect the results of given input. Since filter ordering matters I'm using the fact that Logstash reads configuration files in alphabetical order, so the files are named NN-some-descriptive-name.conf, where NN is an integer.
I've also written a script that automates this process by letting you write a spec with test inputs and the expected resulting messages, and if there's a mismatch it'll bail out with an error and display the diff. I may be able to open source it.
As for deployment, use any configuration management system like Puppet, Chef, SaltStack, Ansible, CFEngine, or similar that you're familiar with. I'm quite happy with Ansible.

As #Magnus Bäck stated, the answer to 1. is no. currently there is no support for functions.
But as for your second question, there is a way to make the logstash configuration more modular. you can split the configuration file to multiple files, and point logstash to the files directory.
check the directory option in logstash man:
-f, --config CONFIG_PATH Load the logstash config from a specific file
or directory. If a direcory is given, all
files in that directory will be concatonated
in lexicographical order and then parsed as a
single config file. You can also specify
wildcards (globs) and any matched files will
be loaded in the order described above.

Related

Questionnaire tool to create config files

I have an application that needs a configuration file with several inputs which depend on the project that is going to be delivered. Things that are included in this conf file are IP's of databases, activating certain functions depending on the customer's needs, changing the values of some title screens, etc... A short example of a file could be something like:
postgresdb=192.156.98.98
transactions.enabled=true
application.name="client-1-logistics"
historicaldb=196.125.125.16
....
This files can become large and it might be difficult to find which parameters must be changed, specially if the configuration process has to be done by an external department.
I was looking into some kind of tool or framework that allows you to create some sort of questionnaire by which the user answers yes or no questions and fills out boxes with specific IP's or messages and get as a result the configuration file needed. This would be much tidier as you could group the questions into sections and has the potential of customising the configuration process with more context on the different parameters.
Does anyone know of such a framework?. How do you handle this kind of complex configuration processes?
The approach I outline below is not exactly what you are looking for, but it might provide some food for thought.
Use a template engine (example, Velocity, or any of the
several dozen listed in Wikipedia) to create a templated
version of your configuration file, containing lots of boilerplate
configuration that won't change, with the occasional
${variable_name} placeholder (the syntax for a placeholder will
vary from one template engine to another).
Write a small metadata file containing variable_name=value
settings.
Write a trivial program that: (a) parses the metadata file and loads
the variable_name=value settings into a Map (the template engine
might refer to the Map as, say, a context object); (b) uses the
template engine to parse the template file; (c)
merges/evaluates/instantiates the parsed template file with the settings in
the Map; and (d) writes the result to the target
configuration file.
You might be able to use steps 1 and 3 above without change. It is only step 2 that you need to adapt to your questionnaire requirements. Instead of a questionnaire, perhaps you could give users a document that explains how to write the metadata file.

Read all CSV files in a directory into an internal table

I have a parameter and, on F4, we can choose the directory. I'm trying to figure out how to choose a folder and read the content of all the files in it (the files are in .CSV) to an internal table. I think I have to use TMP_GUI_DIRECTORY_LIST_FILES function. Hope I'm explaining myself. Thank you.
You'll have to do this manually: first read the list of files, the go through each file and process its contents. There may be some odd function modules to read CSV files, but be aware that many of them are broken - for example, they just clip the lines that exceed a certain length. Therefore I won't recommend any of them - personally, I'd implement the CSV import part myself.
If you have access to the transaction KCLJ in your system you could analyze the coding behind it. This tool has an option to interpret CSV files so you might find interesting function modules that might help you with your tasks.
EDIT: I looked at it very quickly and the piece of coding you could reuse is reconvert_format from include RKCDFILEINCFOR. An example how to call it is located starting from line 128 in the same include.

Checkstyle and Findbugs for changed files only on Jenkins (and/or Hudson)

We work with a lot of legacy code and we think about introducing some metrics for new code. Is it possible to let Findbugs and Checkstyle run on changed files only instead of a complete project?
It would be nice to assure that only file with a minimum of quality is checked in, but the code base itself is not (yet) touched and evaluated not to confuse people by thousands of issues.
In theory, it would be possible. You would use a shell script to parse the SVN (or whatever SCM) change logs after a given start date, identify the .java files from these change sets and build two patterns from these:
The Findbugs Maven Plugin expects a comma-separated list of class (or
package) names for the parameter onlyAnalyze, so you'll have
to translate file names to fully qualified class names (this will get
tricky when you're dealing with inner classes)
The Maven Checkstyle Plugin is even worse, it expects a
configuration file for its packageNamesLocation parameter.
Unfortunately, only packages are allowed, not individual files. So
you'll have to translate file names to packages.
In the above examples I assume that you are using maven. I am pretty sure that similar things can be done with ant, but I wouldn't know.
I myself would probably use a Groovy script instead of a shell script to achieve the above results.
Findbugs has ant tasks that can do diffs against different findbugs results to see just the deltas, so only reporting new bugs, see
http://findbugs.sourceforge.net/manual/datamining.html

Command line parameters or configuration file?

I'm developing a tool that will perform several types of analysis, and each analysis can have different levels of thoroughness. This app will have a fair amount of options to be given before it starts. I started implementing this using a configuration file, since the number of types of analysis specified were little. As the number of options implemented grew, I created more configuration files. Then, I started mixing some command line parameters since some of the options could only be flags. Now, I've mixed a bunch of command line parameters with configuration files and feel I need refactoring.
My question is, When and why would you use command line parameters instead of configuration files and vice versa?
Is it perhaps related to the language you use, personal preference, etc.?
EDIT: I'm developing a java app that will work in Windows and Mac. I don't have a GUI for now.
Command line parameters are useful for quickly overriding some parameter setting from the configuration file. As well, command line parameters are useful if there are not so many parameters. For your case, I'd suggest that you export parameter presets to command line.
Command line arguments:
Pros:
concise - no extra config files to maintain by itself
great interaction with bash scripts - e.g. variable substitution, variable reference, bash math, etc.
Cons:
it could get very long as the options become more complex
formatting is inflexible - besides some command line utilities that help you parse the high level switches and such, anything more complex (e.g. nested structured information) requires custom syntax such as using Regex, and the structure could be quite rigid - while JSON or YAML would be hard to specify at the command line level
Configuration files:
Pros:
it can be very large, as large as you need it to be
formatting is more flexible - you can use JSON, YAML, INI, or any other structural format to represent the information in a more human consumable way
Cons:
inflexible to interact with bash variable substitutions and references (as well as bash math) - you have to probably define your own substitution rules if you want the config file to be "generic" and reusable, while this is the biggest advantage of using command line arguments - variable math would be difficult in config files (if not impossible) - you have to define your own "operator" in the config files, or you have to rely on another bash script to carry out the variable math, and perform your custom variable substitution so the "generic" config file could become "concretely usable".
for all that it takes to have a generic config file (with custom defined variable substitution rules) ready, a bash script is still needed to carry out the actual substitution, and you still have to code your command line to accept all the variable substitutions, so either you have config files with no variable substitution, which means you "hard code" and repeat the config file for different scenarios, or the substitution logic with custom variable substitution rules make your in-app config file logic much more complex.
In my use case, I value being able to do variable substitution / reference (as well as bash math) in the bash scripts more important, since I'm using the same binary to start many server nodes with different responsibilities in a server backend cluster, and I kind of use the bash scripts as sort of a container or actually a config file to start the many different nodes with differing command line arguments.
my vote = both ala mysqld.exe
What environment/platform? In Windows you'd rather use a config file, or even a configuration panel/window in the gui.
I place configuration that don't really change in a configuration file.
Configuration that change often I place on the command-line.

Best way to configure a run script?

I am tasked with supportting a run script that uses environment variables to determine which tools to use, which directories to grab source files from, etc. This does not seem like the best technique to me. It seems like it would be much better to have configuration files that set all these things and have the run script parse this instead of relying on environment variables. For one thing it would allow others to run your tests ver easily (just point to the config file) and less prone to errors (environment variables getting contaminated) and easier to debug. I thought I had also read somewhere that best practices was to use an explict config file for these types of things.
I just wanted to get everyones thoughts on this.
Yes, it's often helpful to keep config separate from code (although I've seen this taken to ridiculous extremes with either too many "configurables", or long chains of dependent config files where one or two be fine).
One simple step could be moving the environment variables into a separate file and have the original script "source" the new file (which effectively becomes your "config file") - minimal changes and no additional parsing required.