It seems to me that both tools are used to easily install and automatically configure applications.
However, I've limitedly used Docker and haven't used Ansible at all. So I'm a little confused.
Whenever I search for a comparison between these two technologies, I find details about how to use these technologies in combination.
There are many reasons most articles talk about using them together.
Think of Ansible as a way of installing and configuring a machine where you can go back and tweak any individual step of that install and configuration in the future. You can then scale that concept out to many machines as you are able to manage.
A key difference where Ansible has an advantage is that it can not just manage the internals of the machine, but also manage the other systems such as networking, DNS, monitoring etc that surround the machine.
Building out many machines via Ansible takes pretty much as much time to do 50 machines as it does to make 1, as all 50 will be created step by step. If you are running a rolling deploy across multiple environments its this build step by step that takes up time.
Now think of Docker as having built one of those individual machines - installed and configured and ready to be deployed wherever a docker system is available (which is pretty much everywhere these days). The drawback here is you don't get to manage all the other aspects needed around making docker containers actually work, and tweaking them long term isn't as much fun as it sounds if you haven't automated the configuration (hence Ansible helps here).
Scaling from 1 to 50 Docker machines once you have already created the initial image is blindingly fast in comparison to the step by step approach Ansible takes, and this is most obvious during a rolling deploy of many machines in smaller groups.
Each has its drawbacks in either ability or speed. Combine them both however and it can be pretty awesome. As no doubt with most of the articles you have already read, I would recommend looking at using Ansible to create (and update) your base Docker container(s) and then using Ansible to manage the rollout of whatever scale of containers you need to satisfy your applications usage.
They are completely different things. Ansible is used to automate configuration and management of machines/containers an Docker is a lightweight container system for Linux.
http://www.ansible.com/home
https://www.docker.com/
Related
I have an Kubernetes environment running multipe applications (services). Now i'm a little bit confused how to setup the MySQL database instance(s).
According to different sources each microservice should have there own database. Should i create a single MySQL statefulset in HA mode running multiple databases OR should i deploy a separate MySQL instance for each application (service) running one database each.
My first thought would be the first option hence where should HA oterwise be usefull for? Would like to hear some differente views on this.
Slightly subjective question, but here's what we have setup. Hopefully, that will help you build a case. I'm sure someone would have a different opinion, and that might be equally valid too:
We deploy about 70 microservices, each with it's own database ("schema"), and it's own JDBC URL (defined via a service). Each microservice has it's own endpoint and credentials that we do not share between microservices. So in effect, we have kept the design to be completely independent across the microservices as far as the schema is concerned.
Deployment-wise, however, we have opted to go with a single database instance for hosting all databases (or "schemas"). While technically, we could deploy each database on its own database instance, we chose not to do it for few main reasons:
Cost overhead: Running separate database instances for each microservice would add a lot of "fixed" costs. This may not be directly relevant to you if you are simply starting the database as a MySQL Docker container (we use a separate database service, such as RDS or Google Cloud SQL). But even in the case of MySQL as a Docker container, you might end up having a non-trivial cost if you run, for example, 70 separate containers one per microservice.
Administration overhead: Given that databases are usually quite involved (disk space, IIOPs, backup/archiving, purge, upgrades and other administration activities), having separate database instances -- or Docker container instances -- may put a significant toll on your admin or operations teams, especially if you have a large number of microservices
Security: Databases are usually also critical when it comes to security as the "truth" usually goes in the DB. Keeping encryption, TLS configuration and strengths of credentials aside (as they should be of utmost importance regardless of your deployment model), security considerations, reviews, audits and logging will bring in significant challenges if your databases instances are too many.
Ease of development: Relatively less critical in the grand scheme of things, but significant, nonetheless. Unless you are thinking of coming up with a different model for development (and thus breaking the "dev-prod parity"), your developers may have a hard time figuring out the database endpoints for debugging even if they only need that information once-in-a-while.
So, my recommendation would be to go with a single database instance (Docker or otherwise), but keep the databases/schemas completely independent and inaccessible by the any microservice but the "owner" microservice.
If you are deploying MySQL as Docker container(s), go with a StatefulSet for persistence. Define an external pvc so that you can always preserve the data, no matter what happens to your pods or even your cluster. Of course, if you run 'active-active', you will need to ensure clustering between your nodes, but we do run it in 'active-passive' mode, so we keep the replica count to 1 given we only use MySQL Docker container alternative for our test environments to save costs of external DBaaS service where it's not required.
So I am using Mantl.io for our environment. Things are going very well and we are now past the POC phase and starting to think about how we are going to handle continuous delivery. Obviously automation is key. Maybe my approach or thinking is wrong but I am trying to figure out a way to manage the json I will pass to marathon to deploy the docker containers from our registry via a jenkins job call. We have various environments (testing, perf, prod, etc) and in each of these environments I will have my 30+ microservices needing different variables set for cpu, memory, environment variables, etc.
So I am just not sure the best approach for taking my docker containers and linking them with what could be maybe 10 or more different configurations per microservice depending on the environment.
Are there tools for building, managing, versioning, linking containers to configs to environments? I just can't seem to find anything in this realm and that leads me to believe I am headed down the wrong path.
Thanks
I am working on docker container on linux machine. I have to create database docker container. I have chosen MySQL database. I have three requirement:
load balancing - the database contain a huge table will approx. 100 million record. So we need to share the table across multiple server. To carter this I have chosen MySQL Cluster. I need to distribute the data based on the sharde key. The load balancing will be done by HAProxy.
Que : please correct me if I am wrong? provide a better solution
Persistence - even if the all database container dies, its should able to recover from it For this I have planned to create data-only docker container.
Que : if the data-only docker container dies, will this container able to recover? Is there any change in volume if it comes up?
Availability - Since there will be multiple SQL server with replica feature, even if one server dies other server will become primary.
Que : please correct me if I am wrong? provide a better solution
Once upon a time, I remember a when a database table with one million records was considered "big data"...
Before assuming you need to split your dataset across multiple machines I would highly suggest that you first get comfortable with running a single database within a Docker container. Given enough resources MySQL is quite capable of scaling up to 100 million records.
Docker is designed to isolate processes from others running on the same host. This creates challenges for monolithic applications which frequently have a software architecture involving multiple processes communicating to each other over some form of host based IPC (inter process communication). That does not mean they cannot be containerized, but a large multiprocess container looks and operates a lot like a virtual machine, implying that perhaps docker is a less optimal technological fit.
Before I get too negative, it's completely possible to run clustered MySQL using Docker. Couple of examples returned by Google:
http://galeracluster.com/2015/05/getting-started-galera-with-docker-part-1/
http://severalnines.com/blog/how-deploy-galera-cluster-mysql-using-docker-containers
My warning is that you see less examples of running these clusters across multiple Docker hosts, implying the use cases are mostly for demo or test currently.
I'm new to logstash but I like how easy it makes shipping logs and aggregating them. Basically it just works. One problem I have is I'm not sure how to go about making my configurations maintainable. Do people usually have one monolithic configuration file with a bunch of conditionals or do they separate them out into different configurations and launch an agent for each one?
We heavily use Logstash to monitor ftbpro.com. I have two notes which you might find useful:
You should run one agent (process) per machine, not more. Logstash agents requires some amount of CPU and memory, especially under high loads, so you don't want to run more than one on a single machine.
We manage our Logstash configurations with Chef. We have a separate template for each configuration and Chef assembles the configuration by the roles of the machine. So the final result is one large configuration in each machine, but on our repository the configurations are separate and thus maintainable.
Hope this helps you.
I'll offer the following advice
Send your data to Redis as a "channel" rather than a "list", based on time and date, which makes managing Redis a lot easier.
http://www.nightbluefruit.com/blog/2014/03/managing-logstash-with-the-redis-client/
I'm running a complex server setup for a defacto high-availability service. So far it takes me about two days to set everything up so I would like to automate the provisioning.
However I do a quite a lot of manual changes to (running) server(s). A typical example is changing a firewall configuration to cope with various hacking attempts, packet floods etc. Being able to work on active nodes quickly is important. Also the server maintains a lot of active TCP connections and loosing those for a simple config change is out of question.
I don't understand if either Chef or Puppet is designed to deal with this. Once I change some system config, I would like to store it somewhere and use it while the next instance is being provisioned. Should I stick with one of those tools or choose a different one?
Hand made changes and provisioning don't take hands. They don't even drink tea together.
At work we use puppet to manage all arquitecture, and as you we need to do hand made changes in a hurry due to performance bottlenecks, attacks, etc.
What we do is first make sure puppet is able to setup every single part of the arquitecture ready to be delivered without any specific tuning.
Then when we need to do hand made changes, if in a hurry as long you don't mess with files managed by puppet there's no risk, if it's a puppet managed file what we need to change then we just stop puppet agent and do whatever we need.
After hurry ended, we proceed as follows:
These changes should be applied to all servers with same symptoms ?
If so, then you can develop what puppet call 'facts' which is code that it's run on the agent on each run and save results in variables available in all your puppet modules, so if for example you changed ip conntrack max value because a firewall was not able to deal with all connections, you could easily (ten lines of code) have in puppet on each run a variable with current conntrack count value, and so tell puppet to set a max value related to current usage. Then all other servers will benefit for this tunning and likely you won't ever have to deal with conntrack issues anymore (as long you keep running puppet with a short frequency which is the default)
These changes should be always applied by hand on given emergencies?
If configuration is managed by puppet, find a way to make configuration include other file and tell puppet to ignore it. This is the easiest way, however it's not always possible (e.g. /etc/network/interfaces does not support includes). If it's not possible, then you will have to stop puppet agent during emergencies to be able to change puppet files without risk of being removed on next puppet run.
Are this changes only for this host and no other host will ever need it?
Add it to puppet anyway! Place a sweet if $fqdn == my.very.specific.host and put inside whatever you need. Even for a single case it's always beneficial (and time consuming) to migrate all changes you do to a server, as will allow you to do a full restore of server setup if for some reason your server crash to a not recoverable state (e.g. hardware issues)
In summary:
For me the trick in dealing with hand made changes it's putting a lot of effort in reasoning how you decided to do the change and after emergency is over move that logic into puppet. If you felt something was wrong because for a given software slots all were used but free memory was still available on the server so to deal with the traffic peak was reasonable to allow more slots to be run, then spend some time moving that logic into puppet. Very carefully of course, and as time consuming as the amount of different scenarios on your architecture you want to test it against, but at the end it's very, VERY rewarding.
I would like to complete Valor's excellent answer.
puppet is a tool to enforce a configuration. So you must think of it this way :
on the machine I run puppet onto...
I ask puppet client...
to ensure that the config of the current machine...
is as specified in the puppet config...
which is taken from a puppet server, or directly from a bunch of puppet files (easier)
So to answer one of your questions, puppet doesn't require a machine or a service reboot. But if a change in a config file you set with puppet requires a reboot of the corresponding service/daemon/app, then there is no way to avoid it. There are method in puppet to tell that a service needs to be relaunched in case of config change. Of course, puppet will not relaunch the service if it sees that nothing changed.
Valor is assuming you use puppet in client/server way, with (for example) puppet clients polling a puppet server for config every hours. But it is also possible to move your puppet files from machine to machine, for example with git, and launch puppet manually. This way is :
far simpler than the client/server technique (authentication is a headache)
only enforce config change when you explicitely ask for it, thus avoiding any overwrite of your handmade changes
This is obviously not the best way to use puppet if you manage a lot of machines, but it may be a good start or a good transition.
And also, puppet is very hard to learn at an interesting level. It took me 2 weeks to be able to automatically install an AWS server from scratch. I don't regret it, but you may want to know that fact if you must convince a boss to allocate you time.