Is there a good solution for sharing secret variables and configurations between different servers, programs and processes? - configuration

Lets say you have a complex system that consists of several programs and processes running on several servers.
Now you want to change some important configuration variable like the password to a database that is used by several of these processes.
Is there a standardized solution that enables you to store these configurations at a central spot and pushes changes on update to different servers?
I imagine something like environment variables but that can be read and changed fast during runtime.

You can use MQ s(short for message queues) for distributed application communications.
Rackspace has a short article about different MQ s. Check below:
Using Message Queues
Most of these applications use SSL and HTTPS so you can have a peace of mind about the channels being secure.

Related

designing an agnostic configuration service

Just for fun, I'm designing a few web applications using a microservices architecture. I'm trying to determine the best way to do configuration management, and I'm worried that my approach for configuration may have some enormous pitfalls and/or something better exists.
To frame the problem, let's say I have an authentication service written in c++, an identity service written in rust, an analytics services written in haskell, some middletier written in scala, and a frontend written in javascript. There would also be the corresponding identity DB, auth DB, analytics DB, (maybe a redis cache for sessions), etc... I'm deploying all of these apps using docker swarm.
Whenever one of these apps is deployed, it necessarily has to discover all the other applications. Since I use docker swarm, discovery isn't an issue as long all the nodes share the requisite overlay network.
However, each application still needs the upstream services host_addr, maybe a port, the credentials for some DB or sealed service, etc...
I know docker has secrets which enable apps to read the configuration from the container, but I would then need to write some configuration parser in each language for each service. This seems messy.
What I would rather do is have a configuration service, which maintains knowledge about how to configure all other services. So, each application would start with some RPC call designed to get the configuration for the application at runtime. Something like
int main() {
AppConfig cfg = configClient.getConfiguration("APP_NAME");
// do application things... and pass around cfg
return 0;
}
The AppConfig would be defined in an IDL, so the class would be instantly available and language agnostic.
This seems like a good solution, but maybe I'm really missing the point here. Even at scale, tens of thousands of nodes can be served easily by a few configuration services, so I don't forsee any scaling issues. Again, it's just a hobby project, but I like thinking about the "what-if" scenarios :)
How are configuration schemes handled in microservices architecture? Does this seem like a reasonable approach? What do the major players like Facebook, Google, LinkedIn, AWS, etc... do?
Instead of building a custom configuration management solution, I would use one of these existing ones:
Spring Cloud Config
Spring Cloud Config is a config server written in Java offering an HTTP API to retrieve the configuration parameters of applications. Obviously, it ships with a Java client and a nice Spring integration, but as the server is just a HTTP API, you may use it with any language you like. The config server also features symmetric / asymmetric encryption of configuration values.
Configuration Source: The externalized configuration is stored in a GIT repository which must be made accessible to the Spring Cloud Config server. The properties in that repository are then accessible through the HTTP API, so you can even consider implementing an update process for configuration properties.
Server location: Ideally, you make your config server accessible through a domain (e.g. config.myapp.io), so you can implement load-balancing and fail-over scenarios as needed. Also, all you need to provide to all your services then is just that exact location (and some authentication / decryption info).
Getting started: You may have a look at this getting started guide for centralized configuration on the Spring docs or read through this Quick Intro to Spring Cloud Config.
Netflix Archaius
Netflix Archaius is part of the Netflix OSS stack and "is a Java library that provides APIs to access and utilize properties that can change dynamically at runtime".
While limited to Java (which does not quite match the context you have asked), the library is capable of using a database as source for the configuration properties.
confd
confd keeps local configuration files up-to-date using data stored in external sources (etcd, consul, dynamodb, redis, vault, ...). After configuration changes, confd restarts the application so that it can pick up the updated configuration file.
In the context of your question, this might be worthwhile to try as confd makes no assumption about the application and requires no special client code. Most languages and frameworks support file-based configuration so confd should be fairly easy to add on top of existing microservices that currently use env variables and did not anticipate decentralized configuration management.
I don't have a good solution for you, but I can point out some issues for you to consider.
First, your applications will presumably need some bootstrap configuration that enables them to locate and connect to the configuration service. For example, you mentioned defining the configuration service API with IDL for a middleware system that supports remote procedure calls. I assume you mean something like CORBA IDL. This means your bootstrap configuration will not be just the endpoint to connect to (specified perhaps as a stringified IOR or a path/in/naming/service), but also a configuration file for the CORBA product you are using. You can't download that CORBA product's configuration file from the configuration service, because that would be a chicken-and-egg situation. So, instead, you end up with having to manually maintain a separate copy of the CORBA product's configuration file for each application instance.
Second, your pseudo-code example suggests that you will use a single RPC invocation to retrieve all the configuration for an application in a single go. This coarse level of granularity is good. If, instead, an application used a separate RPC call to retrieve each name=value pair, then you could suffer major scalability problems. To illustrate, let's assume an application has 100 name=value pairs in its configuration, so it needs to make 100 RPC calls to retrieve its configuration data. I can foresee the following scalability problems:
Each RPC might take, say, 1 millisecond round-trip time if the application and the configuration server are on the same local area network, so your application's start-up time is 1 millisecond for each of 100 RPC calls = 100 milliseconds = 0.1 second. That might seem acceptable. But if you now deploy another application instance on another continent with, say, a 50 millisecond round-trip latency, then the start-up time for that new application instance will be 100 RPC calls at 50 milliseconds latency per call = 5 seconds. Ouch!
The need to make only 100 RPC calls to retrieve configuration data assumes that the application will retrieve each name=value pair once and cache that information in, say, an instance variable of an object, and then later on access the name=value pair via that local cache. However, sooner or later somebody will call x = cfg.lookup("variable-name") from inside a for-loop, and this means the application will be making a RPC every time around the loop. Obviously, this will slow down that application instance, but if you end up with dozens or hundreds of application instances doing that, then your configuration service will be swamped with hundreds or thousands of requests per second, and it will become a centralised performance bottleneck.
You might start off writing long-lived applications that do 100 RPCs at start-up to retrieve configuration data, and then run for hours or days before terminating. Let's assume those applications are CORBA servers that other applications can communicate with via RPC. Sooner or later you might decide to write some command-line utilities to do things like: "ping" an application instance to see if it is running; "query" an application instance to get some status details; ask an application instance to gracefully terminate; and so on. Each of those command-line utilities is short-lived; when they start-up, they use RPCs to obtain their configuration data, then do the "real" work by making a single RPC to a server process to ping/query/kill it, and then they terminate. Now somebody will write a UNIX shell script that calls those ping and query commands once per second for each of your dozens or hundreds of application instances. This seemingly innocuous shell script will be responsible for creating dozens or hundreds of short-lived processes per second, and of those short-lived processes will make numerous RPC calls to the centralised configuration server to retrieve name=value pairs one at a time. That sort of shell script can pu a massive load on your centralised configuration server.
I am not trying to discourage you from designing a centralised configuration server. The above points are just warning about scalability issues you need to consider. Your plan for an application to retrieve all its configuration data via one coarse-granularity RPC call will certainly help you to avoid the kinds of scalability problems I mentioned above.
To provide some food for thought, you might want to consider a different approach. You could store each application's configuration files on a web sever. A shell start script "wrapper" for an application can do the following:
Use wget or curl to download "template" configuration files from the web server and store the files on the local file system. A "template" configuration file is a normal configuration file but with some placeholders for values. A placeholder might look like ${host_name}.
Also use wget or curl to download a file containing search-and-replace pairs, such as ${host_name}=host42.pizza.com.
Perform a global search-and-replace of those search-and-replace terms on all the downloaded template configuration files to produce the configuration files that are ready to use. You might use UNIX shell tools like sed or a scripting language to perform this global search-and-replace. Alternatively, you could use a templating engine like Apache Velocity.
Execute the actual application, using a command-line argument to specify the path/to/downloaded/config/files.

PostgreSQL monitoring With Zabbix

In Zabbix, what is the possibility to monitor many postgres databases .
because only one server database can be defined in odbc.ini
Thanks .
If you look at the zabbix share site here, you will see several templates which provide for monitoring via the zabbix agent. By using the zabbix agent instead of the server side odbc option, you push the actual connectivity out to the agent (which could, of course, be on the same server as zabbix). This allows you to do separate discovery and even separate credentials for each server (e.g. in user macros).
I did not experiment with the templates present there, merely offer them as examples. Determining what exactly one monitors on a database is the more difficult subject, of course, since it can be highly variable by site -- what is "normal" at that site, and so what is worthy of an alert. But with LLD and ability to do arbitrary queries, you can form any items and triggers you may need. Effectively you can have your DBA (might be you also of course) craft the criteria, and just put it wholesale into the template.

How do I allow any host to submit jobs in Oracle/Sun Grid Engine?

We have an internal network devoted to development and testing, and this network has an OGE cluster on it. I'd like to allow any machine on that network to submit jobs, without having to add them manually one by one as submit hosts. I've tried doing a wildcard, but it hasn't liked my syntax. Is there any way to do this?
Thanks!
A qualified "no" - if you really need this, consider automating it instead.
GridEngine does not support wildcarding in host names. GE relies heavily on forward and reverse name resolution for pretty much all host interactions. You are not going to get a GridEngine cluster to blindly accept job submissions from any unspecified machine on your subnet without some bad magic.
If you use a configuration management system like Puppet or Chef, that might be the best layer to define whether a server is a submit host or not.
The alternative, brute force way(this will almost certainly violate your IT department's acceptable use policy) is using something like nmap to produce a list of hostnames on your network(if you think you can get away with it) and write a simple shell script to add each one as a submit host. This approach would require minor ongoing maintenance as the hosts on your network change over time, etc.

Distributed Tornado-Based Chat Server

I have a requirement to build a distributed Comet-based server for a large number of clients (over 500K concurrent) with high throughput. I'm currently investigating the possibility of using Tornado for it's high efficiency in dealing with high number of long-polling requests.
My concern is whether a single Tornado server could handle such a large number of long polling clients. As an experiment, I would like to expand Tornado Chat demo (https://github.com/facebook/tornado/tree/master/demos/chat) to a distributed environment. I.e. have a bunch of Tornado chat servers running in parallel, each responsible for a changing set of clients.
I would appreciate any ideas/thoughts you have with regard to implementing such a scheme, or any references to relevant resources.
Thanks!
In general to make the basic chat distributed across several Tornado instances you need to create a distributed message passing mechanism, the most straightforward implementation will be to just use some kind of message queue like RabbitMQ (or it's competitor) and send fanout messages when user types something, while all connections are listening.
My initial thought about this is to have an Nginx server/reverse proxy in the front-end, while have multiple instances of Tornado in the back, this could be a Tornado instance per process, try to do some bench-marking to your machine to see how many running Tornado instances on different process a machine can handle, when you notice degradation in performance, start doing the same thing on another machine.
Nginx will round robin all the servers you have to distribute the load over the long-polling/Tornado servers/instances.
Not really sure how the rabbitmq will be useful in this case.

Should I move client configuration data to the server?

I have a client software program used to launch alarms through a central server. At first it stored configuration data in registry entries, now in a configuration XML file. This configuration information consists of Alarm number, alarm group, hotkey combinations, and such.
This client connects to a server using a TCP socket, which it uses to communicate this configuration to the server. In the next generation of this program, I'm considering moving all configuration information to the server, which stores all of its information in a SQL database.
I envision using some form of web interface to communicate with the server and setup the clients, rather than the current method, which is to either configure the client software on the machine through a control panel, or on install to ether push out an xml file, or pass command line parameters to the MSI. I'm thinking now the only information I would want to specify on install would be the path to the server. Each workstation would be identified by computer name, and configured through the server.
Are there any problems or potential drawbacks of this approach? The main goal is to centralize configuration and make it easier to make changes later, because our software is usually managed by one or two people at most.
Other than allowing for the client to function offline (if such a possibility makes sense for your application), there doesn't appear to be any drawback of moving the configuration to a centralized location. Indeed even with a centralized location, a feature can be added in the client to cache the last known configuration, for use when the client is offline).
In case you implement a [centralized] database design, I suggest to consider storing configuration parameters in an Entity-Attribute-Value (EAV) structure as this schema is particularly well suited for parameters. In particular it allows easy addition and removal of particular parameters and also the handling parameters as a list (paving the way for a list-oriented display as well in the UI, and therefore no changes needed in the UI either when new types of parameters are introduced).
Another reason why configuartion parameter collections and EAV schemas work well together is that even with very many users and configuration points, the configuration data remains small enough that is doesn't suffer some of the limitations of EAV with "big" tables.
Only thing that comes to mind is security of the information. In either case you probably have that issue though. Probably be easier to interface with though with a database as everything would be in one spot.