Using Chef/Puppet and managing hand-made changes - configuration

I'm running a complex server setup for a defacto high-availability service. So far it takes me about two days to set everything up so I would like to automate the provisioning.
However I do a quite a lot of manual changes to (running) server(s). A typical example is changing a firewall configuration to cope with various hacking attempts, packet floods etc. Being able to work on active nodes quickly is important. Also the server maintains a lot of active TCP connections and loosing those for a simple config change is out of question.
I don't understand if either Chef or Puppet is designed to deal with this. Once I change some system config, I would like to store it somewhere and use it while the next instance is being provisioned. Should I stick with one of those tools or choose a different one?

Hand made changes and provisioning don't take hands. They don't even drink tea together.
At work we use puppet to manage all arquitecture, and as you we need to do hand made changes in a hurry due to performance bottlenecks, attacks, etc.
What we do is first make sure puppet is able to setup every single part of the arquitecture ready to be delivered without any specific tuning.
Then when we need to do hand made changes, if in a hurry as long you don't mess with files managed by puppet there's no risk, if it's a puppet managed file what we need to change then we just stop puppet agent and do whatever we need.
After hurry ended, we proceed as follows:
These changes should be applied to all servers with same symptoms ?
If so, then you can develop what puppet call 'facts' which is code that it's run on the agent on each run and save results in variables available in all your puppet modules, so if for example you changed ip conntrack max value because a firewall was not able to deal with all connections, you could easily (ten lines of code) have in puppet on each run a variable with current conntrack count value, and so tell puppet to set a max value related to current usage. Then all other servers will benefit for this tunning and likely you won't ever have to deal with conntrack issues anymore (as long you keep running puppet with a short frequency which is the default)
These changes should be always applied by hand on given emergencies?
If configuration is managed by puppet, find a way to make configuration include other file and tell puppet to ignore it. This is the easiest way, however it's not always possible (e.g. /etc/network/interfaces does not support includes). If it's not possible, then you will have to stop puppet agent during emergencies to be able to change puppet files without risk of being removed on next puppet run.
Are this changes only for this host and no other host will ever need it?
Add it to puppet anyway! Place a sweet if $fqdn == my.very.specific.host and put inside whatever you need. Even for a single case it's always beneficial (and time consuming) to migrate all changes you do to a server, as will allow you to do a full restore of server setup if for some reason your server crash to a not recoverable state (e.g. hardware issues)
In summary:
For me the trick in dealing with hand made changes it's putting a lot of effort in reasoning how you decided to do the change and after emergency is over move that logic into puppet. If you felt something was wrong because for a given software slots all were used but free memory was still available on the server so to deal with the traffic peak was reasonable to allow more slots to be run, then spend some time moving that logic into puppet. Very carefully of course, and as time consuming as the amount of different scenarios on your architecture you want to test it against, but at the end it's very, VERY rewarding.

I would like to complete Valor's excellent answer.
puppet is a tool to enforce a configuration. So you must think of it this way :
on the machine I run puppet onto...
I ask puppet client...
to ensure that the config of the current machine...
is as specified in the puppet config...
which is taken from a puppet server, or directly from a bunch of puppet files (easier)
So to answer one of your questions, puppet doesn't require a machine or a service reboot. But if a change in a config file you set with puppet requires a reboot of the corresponding service/daemon/app, then there is no way to avoid it. There are method in puppet to tell that a service needs to be relaunched in case of config change. Of course, puppet will not relaunch the service if it sees that nothing changed.
Valor is assuming you use puppet in client/server way, with (for example) puppet clients polling a puppet server for config every hours. But it is also possible to move your puppet files from machine to machine, for example with git, and launch puppet manually. This way is :
far simpler than the client/server technique (authentication is a headache)
only enforce config change when you explicitely ask for it, thus avoiding any overwrite of your handmade changes
This is obviously not the best way to use puppet if you manage a lot of machines, but it may be a good start or a good transition.
And also, puppet is very hard to learn at an interesting level. It took me 2 weeks to be able to automatically install an AWS server from scratch. I don't regret it, but you may want to know that fact if you must convince a boss to allocate you time.

Related

Is there a way to keep track of the calls being done in mysql server by a web app?

I'm finishing a system at work that makes calls to mysql server. Those calls' arguments reveal information that I need to keep private, like vote(idUser, idCandidate). There's no information in the db that relates those two of course, nor in "the visible part" of the back end, but even though I think this can't be done, I wanted to make sure that it is impossible to trace this sort of calls, with a log or something (calls that were made, or calls being made at the moment), as it is impossible in most languages, unless you specifically "debug" in a certain way, while the system is in production and being used. I hope the questions is clear enough. Thanks.
How do I log thee? Let me count the ways.
MySQL query log. I can enable this per-session and send everything to a log file.
I can set up a slave server and have insertions sent to me by the master. This is a significant intervention and would leave a wide trace.
On the server, unbeknownst to either Web app and MySQL log, I can intercept communications between the two. I need administrative access to the machine, of course.
On the server, again with administrative access, I can both log the query calls and inject a logging instrumentation into the SQL interface (the legitimate one is the MySQL Audit Plugin, but there are several alternatives, developed for various purposes by developers over the years)
What can you do? You can have the applications use a secure protocol, just for starters.
Then, you need to secure your machine so that administrator tricks do not work, and even if the logs are activated, nobody can read them and you can be advised of any new and modified file to delete it promptly.

How do I allow any host to submit jobs in Oracle/Sun Grid Engine?

We have an internal network devoted to development and testing, and this network has an OGE cluster on it. I'd like to allow any machine on that network to submit jobs, without having to add them manually one by one as submit hosts. I've tried doing a wildcard, but it hasn't liked my syntax. Is there any way to do this?
Thanks!
A qualified "no" - if you really need this, consider automating it instead.
GridEngine does not support wildcarding in host names. GE relies heavily on forward and reverse name resolution for pretty much all host interactions. You are not going to get a GridEngine cluster to blindly accept job submissions from any unspecified machine on your subnet without some bad magic.
If you use a configuration management system like Puppet or Chef, that might be the best layer to define whether a server is a submit host or not.
The alternative, brute force way(this will almost certainly violate your IT department's acceptable use policy) is using something like nmap to produce a list of hostnames on your network(if you think you can get away with it) and write a simple shell script to add each one as a submit host. This approach would require minor ongoing maintenance as the hosts on your network change over time, etc.

Git environment setup. Advice needed

Background info:
We are currently 3 web programmers (good, real-life friends, no distrust issues).
Each programmers SSH into the single Linux server, where the code resides, under their own username with sudo powers.
We all use work on the different files at one time. We ask the question "Are you in the file __?" sometimes. We use Vim so we know if the file is opened or not.
Our development code (no production yet) resides in /var/www/
Our remote repo is hosted on bitbucket.
I am *very* new to Git. I used subversion before but I was basically spoon-fed instructions and was told exactly what to type to sync up codes and commit.
I read about half of Scott Chacon's Pro Git and that's the extent to most of my Git knowledge.
In case it matters, we run Ubuntu 11.04, Apache 2.2.17, and Git 1.7.4.1.
So Jan Hudec gave me some advice in the previous question. He told me that a good practice to do the following:
Each developer have their own repo on their local computer.
Let the /var/www/ be the repo on the server. Set the .git folder to permission 770.
That would mean that each developer's computer need to have their own LAMP stack (or at least Apache, PHP, MySQL, and Python installed).
The codes are mostly JavaScript and PHP files so it's not a big deal to clone it over. However how do we locally manage the database?
In this case, we only have two tables and it'll be simple to recreate the entire database locally (at least for testing). But in the future when the database gets too big, then should we just remotely log on the MySQL database on the server or should we just have a "sample" data for developing and testing purposes?
What you're doing is transitioning from "everybody works together in one environment" to "everybody has their own development environment". The major benefit is everybody won't be stepping on each other's feet.
Other benefits include a heterogeneous development environment, that is if everyone is developing on the same machine the software will become dependent on that one setup because developers are lazy. If everyone develops in different environments, even just with slightly different versions of the same stuff, they'll be forced to write more robust code to deal with that.
The main drawback, as you've noticed, is setting up the environment is harder. In particular, making sure the database works.
First, each developer should have their own database. This doesn't mean they all have to have their own database server (though its good for heterogeneous purposes) but they should have their own database instance which they control.
Second, you should have a schema and not just whatever's in the database. It should be in a version controlled file.
Third, setting up a fresh database should be automatic. This lets developers set up a clean database with no hassle.
Fourth, you'll need to get interesting test data into that database. Here's where things get interesting...
You have several routes to do that.
First is to make a dump of an existing database which contains realistic data, sanitized of course. This is easy, and provides realistic data, but it is very brittle. Developers will have to hunt around to find interesting data to do their testing. That data may change in the next dump, breaking their tests. Or it just might not exist at all.
Second is to write "test fixtures". Basically each test populates the database with the test data it needs. This has the benefit of allowing the developer to get precisely the data they want, and know precisely the state the database is in. The drawbacks are that it can be very time consuming, and often the data is too clean. The data will not contain all the gritty real data that can cause real bugs.
Third is to not access the database at all and instead "mock" all the database calls. You trick all the methods which normally query a database into instead returning testing data. This is much like writing test fixtures, and has most of the same drawbacks and benefits, but it's FAR more invasive. It will be difficult to do unless your system has been designed to do it. It also never actually tests if your database calls work.
Finally, you can build up a set of libraries which generate semi-random data for you. I call this "The Sims Technique" after the video game where you create fake families, torture them and then throw them away. For example, lets say you have User object who needs a name, an age, a Payment object and a Session object. To test a User you might want users with different names, ages, ability to pay and login status. To control all that you need to generate test data for names, ages, Payments and Sessions. So you write a function to generate names and one to generate ages. These can be as simple as picking randomly from a list. Then you write one to make you a Payment object and one a Session object. By default, all the attributes will be random, but valid... unless you specify otherwise. For example...
# Generate a random login session, but guarantee that it's logged in.
session = Session.sim( logged_in = true )
Then you can use this to put together an interesting User.
# A user who is logged in but has an invalid Visa card
# Their name and age will be random but valid
user = User.sim(
session = Session.sim( logged_in = true ),
payment = Payment.sim( invalid = true, type = "Visa" ),
);
This has all the advantages of test fixtures, but since some of the data is unpredictable it has some of the advantages of real data. Adding "interesting" data to your default sim and rand functions will have wide ranging repercussions. For example, adding a Unicode name to random_name will likely discover all sorts of interesting bugs! It unfortunately is expensive and time consuming to build up.
There you have it. Unfortunately there's no easy answer to the database problem, but I implore you to not simply copy the production database as it's a losing proposition in the long run. You'll likely do a hybrid of all the choices: copying, fixtures, mocking, semi-random data.
A few options, in order of increasing complexity:
You all connect to the live master DB, read/write permissions. This is risky, but I guess you're already doing it. Make sure you have backups!
Use test fixtures to populate a local test DB and just use it. Not sure what tools there are for this in the PHP world.
Copy (mysqldump) the master database and import it into your local machines' MySQL instances, then set up your dev environments to connect to your local MySQL. Repeat the dump/import as necessary
Set up one-way replication from the master to your local instances.
Optionally, set up a read-only user on the main DB, and configure your app to let you switch to a read-only connection to the real master DB in case you can't wait for that next copy of the master data.
Own repo does not mean own Staging server (this config is hardly maintained and extremely bad scaled to 10-20-100 developers)
It's always better to have as soon as possible (semi-)automated build-system, which convert repository-stored source-data to live system (less handwork - less changes to make non-code errors) and (maybe) some type of Continuos Integration (test often, find bugs fast). For build-system (DB-part) you have only to prepare initial data (tables structures, data-dumps) as (versioned) texts, which are
easy mergeable between merges
handled and processed and converted to final usable object by code, not by hand - no human errors, no operation's interferences

How to Guarantee Message delivery with Celery?

I have a python application where I want to start doing more work in the background so that it will scale better as it gets busier. In the past I have used Celery for doing normal background tasks, and this has worked well.
The only difference between this application and the others I have done in the past is that I need to guarantee that these messages are processed, they can't be lost.
For this application I'm not too concerned about speed for my message queue, I need reliability and durability first and formost. To be safe I want to have two queue servers, both in different data centers in case something goes wrong, one a backup of the other.
Looking at Celery it looks like it supports a bunch of different backends, some with more features then the others. The two most popular look like redis and RabbitMQ so I took some time to examine them further.
RabbitMQ:
Supports durable queues and clustering, but the problem with the way they have clustering today is that if you lose a node in the cluster, all messages in that node are unavailable until you bring that node back online. It doesn't replicated the messages between the different nodes in the cluster, it just replicates the metadata about the message, and then it goes back to the originating node to get the message, if the node isn't running, you are S.O.L. Not ideal.
The way they recommend to get around this is to setup a second server and replicate the file system using DRBD, and then running something like pacemaker to switch the clients to the backup server when it needs too. This seems pretty complicated, not sure if there is a better way. Anyone know of a better way?
Redis:
Supports a read slave and this would allow me to have a backup in case of emergencies but it doesn't support master-master setup, and I'm not sure if it handles active failover between master and slave. It doesn't have the same features as RabbitMQ, but looks much easier to setup and maintain.
Questions:
What is the best way to setup celery
so that it will guarantee message
processing.
Has anyone done this before? If so,
would be mind sharing what you did?
A lot has changed since the OP! There is now an option for high-availability aka "mirrored" queues. This goes pretty far toward solving the problem you described. See http://www.rabbitmq.com/ha.html.
You might want to check out IronMQ, it covers your requirements (durable, highly available, etc) and is a cloud native solution so zero maintenance. And there's a Celery broker for it: https://github.com/iron-io/iron_celery so you can start using it just by changing your Celery config.
I suspect that Celery bound to existing backends is the wrong solution for the reliability guarantees you need.
Given that you want a distributed queueing system with strong durability and reliability guarantees, I'd start by looking for such a system (they do exist) and then figuring out the best way to bind to it in Python. That may be via Celery & a new backend, or not.
I've used Amazon SQS for this propose and got good results. You will recieve message until you will delete it from queue and it allows to grow you app as high as you will need.
Is using a distributed rendering system an option? Normally reserved for HPC but alot of concepts are the same. Check out Qube or Deadline Render. There are other, open source solutions as well. All have failover in mind given the high degree of complexity and risk of failure in some renders that can take hours per image sequence frame.

Should I move client configuration data to the server?

I have a client software program used to launch alarms through a central server. At first it stored configuration data in registry entries, now in a configuration XML file. This configuration information consists of Alarm number, alarm group, hotkey combinations, and such.
This client connects to a server using a TCP socket, which it uses to communicate this configuration to the server. In the next generation of this program, I'm considering moving all configuration information to the server, which stores all of its information in a SQL database.
I envision using some form of web interface to communicate with the server and setup the clients, rather than the current method, which is to either configure the client software on the machine through a control panel, or on install to ether push out an xml file, or pass command line parameters to the MSI. I'm thinking now the only information I would want to specify on install would be the path to the server. Each workstation would be identified by computer name, and configured through the server.
Are there any problems or potential drawbacks of this approach? The main goal is to centralize configuration and make it easier to make changes later, because our software is usually managed by one or two people at most.
Other than allowing for the client to function offline (if such a possibility makes sense for your application), there doesn't appear to be any drawback of moving the configuration to a centralized location. Indeed even with a centralized location, a feature can be added in the client to cache the last known configuration, for use when the client is offline).
In case you implement a [centralized] database design, I suggest to consider storing configuration parameters in an Entity-Attribute-Value (EAV) structure as this schema is particularly well suited for parameters. In particular it allows easy addition and removal of particular parameters and also the handling parameters as a list (paving the way for a list-oriented display as well in the UI, and therefore no changes needed in the UI either when new types of parameters are introduced).
Another reason why configuartion parameter collections and EAV schemas work well together is that even with very many users and configuration points, the configuration data remains small enough that is doesn't suffer some of the limitations of EAV with "big" tables.
Only thing that comes to mind is security of the information. In either case you probably have that issue though. Probably be easier to interface with though with a database as everything would be in one spot.