Docker dependencies design for containers configuration and startup - mysql

I containerize a web application which has container dependencies.
The containers are listed here in their dependency order, the latter being dependent on the former.
There is a mysql container which compiles mysql, installs it and configures it.
There is a learnintouch container which installs files and seeds custom product data into the mysql container.
There is a learnintouch.com container which installs files and seeds custom website data into the mysql container.
The data seeding is part of the application installation and needs to be done only once in the application lifetime.
The data seeding is quite long, very long in fact.
It would be nice to have the application created AND started by a docker-compose.yml file sitting in the learnintouch.com directory.
At first, I was hoping to have only three containers, all dependents, with each dependent container waiting for its dependency to complete its data seeding, before running itself, and finally starting the application.
I now see this will be difficult to achieve, if not impossible. It is already tricky to have docker-compose wait for a service to start up, but it's even more so to check for a data seeding to have completed.
I reckon one way is to have two branches in the containers dependency tree, one for the application installation doing the data seeding and another one only starting the application.
Is that a common practice ?

The best practice is to remove the dependency so containers can start in any order. If one container starts before another that it uses, it should gracefully return an error if you attempt to use it, but begin to work as soon as the dependencies come online. This allows various microservices to be independently upgraded, replaced, or migrated without needing to restart your entire infrastructure.
When that's not possible, realize that docker-compose is great at what it does, but what it does is limited in scope. So you'll need to extend that with your own scripting. So you may have several compose files, and the first one would be run to seed the data until it completes and then when it returns the script would continue to where you launch your other containers.
Lastly, for large data structures, that is best managed as an external volume, which you can create, update, and backup independently from your other containers.

Related

Docker VS Native installation on a server. How to recover from downtime mysql/mariadb without data lloss

Fellows,
I have the following assumption:
I have an application X (let us e wordpresss) running into a docker container and links to another docker container running mysql. Annd for some reason I lose the database it there a way to recover without any data loss from the latest changes?
For some reason (not sure if true or not) any file changes into docker container it automatically is lost. Therefore is the database is into a docker container then anything that is into the db is automatically lost.
Instead if mysql is running on the server and the database is down I just an restart the server and then is ok because the changes into the filesystem are written. And before make the system live i can also force a backup to have the latest changes.
In booth cases of course somehow I managed to get a nice backup in a daily basis. And in booth cases I want the minimum data loss.
And of course I understand that maybe data corruption in the second option. I want to have the latest written data even if corrupted.

Ways to be able to invoke some commands on each of the servers in cluster through the UI node

Each node in my cluster can have more than one components, where the components are:
PSQL
Mongo Config server
Mongo shards
Redis
Celery Workers
Python processing node
So on..
The UI nodes are under auto scaling of AWS, and cannot be run on any other nodes. We can configure one or more components on a node by some CLI commands that we have built. Just to you an idea, we have commands like:
Turn off/on Redis
Turn off/on PSQL
Add another shard (can be done only if shard is running on this node)
etc.
So each CLI command's execution depends on the components installed on that node. Moreover, each CLI command's interaction is different, some take just one parameter, some may need a lot more. Now as the cluster grows in size, there is a requirement to centrally execute these commands somehow. I think this can probably be done as follows:
Build a tab specifically for super user admin, where he can see all nodes, and after selecting one of the nodes, he can select on all the possible CLI commands
Depending on CLI command, an Expect script should be made to run on that node
Now, I know this is all quite messy, I was hoping to know if there's a simple utility/framework which kind of helps simplify all this, if possible?

Cloud based LAMP cluster

I run a pretty customized cluster for processing large amounts of scientific data based on a basic LAMP design. In general, I run a separate MySQL server with around 128GB of ram and about 1TB of storage. Separately, I run a head node that serves as an nfs mount point for the data input of my process, and a webserver to display results. Finally, I trypically have a few compute nodes that get their jobs from a mysql table, get the data from NFS, do some heavy lifting, then put results into mysql.
I have come across a dataset I would like to process which is pretty large (1TB of input data), and I don't really have the hardware on hand to handle it. As a result, I began investigating google compute engine etc, and the prospect of scaling instances to process these data rapidly with the results stored in a mysql instance. Upon completion the mysql tables could be dumped from the cloud and brought up locally for analysis. I would have no problem deploying a MySQL server, along with the rest of the LAMP pieces and the compute nodes, but I can't quite figure out how I would do this in the cloud.
A major sticking point seems to be the lack of read/write NFS which would allow me to get the data onto several instances, crunch it, then push the results to MySQL. This is a necessary step for me as I could queue hundreds of jobs from the webserver, then have the instances (as many as 50-100) pick the jobs up by connecting to a centralized mysql instance to find out what jobs an instance needs to do and where the data is. Process the data (there is a file conversion that happens which make the write part necessary), crunch the data, then load results to mysql. I hope I'm explaining my situation clearly. This seems like a great example of a CPU intensive process that would scale nicely in the cloud, I just can't seem to put all the pieces together... Any input is appreciated!
It sounds quite possible; I've been doing similar things in GCE for a while now.
NFS mount - you just need configure it as you would normally. Set up the NFS server on the head node, and then configure the clients on the slave nodes to mount it. Here and here are some basic configuration instructions for Centos 6 I used to get NFS up and running.
Setting up a LAMP stack is very straightforward. These machines run pretty much vanilla Linux distros, so you can just use yum or apt-get to install components.
For the cluster, you will probably end up having an image for the head node you use once, and then another image for the slave nodes that you replicate for each one.
For the scheduler, I've used Condor and Sge successfully, but I'm sure the other ones would work just as well.
Hope this helps.

Pre-existing MySQL data with Vagrant / VirtualBox

Background: I used to develop using MAMP and over the months/years I've accumulated a large mysql database (a few gigs) that I use for development for my different projects. I finally got around to setting up a VM using Vagrant and I've gotten everything set up and working nicely except my database. I'm running a CentOS 6.5 guest box on an OSX host.
My problem: I need my database to be completely persistent so I can vagrant up/destroy as many boxes as I need to, but the mysql persists.
My solution #1: I initially mounted a synced folder using vboxsf. This works pretty well and seems to be my best option so far, but performance is pretty bad. Query-intensive pages on my dev sites take 1-3 seconds to load whereas they might normally take under a second to load.
My solution #2: I then tried mounting a synced folder using nfs because the performance should be much better. The issue here is that mysql complains b/c, given the nature of nfs, it can't chown the data directory to the mysql:mysql user. I get the following errors when trying to start up the mysqld service:
chown: changing ownership of '/www/mysql': Operation not permitted
chmod: changing permissions of '/www/mysql': Permission denied
Sooo, my question is: are there any better ways to accomplish what I need? I feel like NFS would be the best solution, but I don't know how to get around the whole ownership/permission issues automatically with Vagrant. Any help would be appreciated.
I had the same issue or requirement for my local dev on Mac. And I found a solution for a MySQL-only Vagrant box with external data linked as folder_sync. But it'll run on Win too I guess.
Here is the Vagrant box config: https://github.com/ronnyhartenstein/vagrant-mysql-shared-folder
And if you understand German, here is my blog article with some background infos and tests (and fails of course): http://blog.rh-flow.de/2014/11/11/es-hat-sich-ausgemampft-vagrant-ist/
First of all, let me start with saying this is not best practice. You may know yourself that this can lead to problems if e.g. your PC goes blank or you want to give one project to another person for development. Of course, especially as a one-person-endevour, there are more important things than having test data importers and stuff :) So let's look for solutions.
NFS Permissions
To get NFS permissions right, your users need to have the same UID and GUID on host and guest. It's pretty tricky to setup and you should not change it from the guest. Maybe you can change it on the host to make it writeable to mysql and make UID and GUID the same. Of course, the moment the host changes this won't work anymore.
rsync shared folder
Rsync might not be the fastest in terms of syncing, but if you create on rsync shared folder where only MySQL is writing and which syncs back to some folder on your host this might be a solution. The "real" projects could still live inside a virtualbox share or nfs and you don't need to bother with correct permissions.
There might be some other solutions as well:
Create a backup/restore strategy
One way to go would be to backup MySQL inside your vagrant box at various points, e.g. every day. You could also run the backup when the box is shut down, thus creating a backup right before you destroy the box. Placing this backup at a shared folder, you'd have up-to-date data in case you destroy a box. Performance should be pretty good as the data MySQL is using wouldn't be on a shared folder.
Run MySQL on host or other vagrant box
It's of course possible to connect from within your vagrant box to your host or another vagrant box which runs MySQL. Your host or this box could be long-lived and could serve as a central "MySQL Server" for all your projects.
Have a MySQL slave running on the same machine which writes to shared folder
I believe with MySQL a master/slave combination is possible. Running both on one machine with the master (which you use in your projects) living inside your vm and not writing anything to a shared folder and a slave which writes to your shared folder and is a mirror of your master. This would mean that you have high performance and a few secs of delay between writing something and having it written to your shared folder. Of course, keeping this setup running and making sure it works all the time can be tricky.
You can use bindfs for changing the user/group of a share. I'm actually using a plugin called vagrant-bindfs which let's you remount a share with different ownerships. It works, but i haven't tried it with mysql to see how it performs.
Relevant lines on my Vagrantfile:
unless Vagrant.has_plugin?("vagrant-bindfs")
raise 'vagrant-bindfs is not installed! Please install with vagrant plugin install vagrant-bindfs'
end
config.vm.synced_folder "../", "/temp-nfs-mounts/sites-unbinded", type: :nfs
config.bindfs.bind_folder "/temp-nfs-mounts/sites-unbinded", "/sites", :force_user => "vagrant", :force_group => "vagrant", :create_as_user => true

How to prevent certain Jenkins jobs from running simultaneously?

I have a couple of jobs that use a shared resource (database), which sometimes can cause builds to fail in the (rare) event that the jobs happen to get triggered simultaneously.
Given jobs A through E, for example, is there any way to specify that A and C should never be run concurrently?
Other than the aforementioned resource, the builds are independent of each other (not e.g. in a upstream/downstream relation).
A "brute-force" way would be limiting number of executors to one, but that obviously is less than ideal if most jobs could well be executed concurrently and there's no lack of computing resources on the build server.
There are currently 2 ways of doing this:
Use the Throttle Concurrent Builds plugin.
Set up those jobs to run on a slave having only 1 executor.
The Locks and Latches plugin here should help.
This question is probably a dupe of How do I ensure that only one of a certain category of job runs at once in Hudson?
That's an old question, but the topic can still be relevant, especially when running application tests on Jenkins.
The Lockable Resources Plugin allows you to define lockable resources that can be used by builds. If your build requires an resource, it takes the lock. If a second build requires the same resource (which then is already locked), it will be queued for the resource to be free.
Although the docs use computers or printers as examples for lockable resources, the database example from above should work as well.
In opposite to the Locks and Latches Plugin mentioned in answers from 2012, this package seems to be currently maintained (currently ~2016).
Have a look at the External Resource Dispatcher Jenkins plugin, which was first published in November 2012. This (relatively) new plugin seems to exactly cover this use case.
N.B. you don't need physical or virtual hardware for a slave/node, you can set up "slaves" that run on the master server.
Manage Jenkins > Manage Nodes > New node
and make a "dumb slaves" each with its own root directory.
Create a few slaves, execute them when the server boots, and then you have essentially created pools of executors.
You might have, say...
db - only one executor in your case.
compile - limit according to hardware or # of CPUs.
scripts - have many executors for all those little jobs that Jenkins is good at doing.
Old question, and whether this will work for your application I can't be sure as you didn't mention details of your application. However, I wanted to add the way that I handled this in our Rails application test suite.
Our application's database configuration (database.yml) isn't in the source repository. Instead, it lives in /var/lib/configs/uniquing_database.yml on the VM which runs our Jenkins instance.
One of the steps of our build process involves copying this config file to the project workspace:
cp /var/lib/jenkins/configs/myapp_unique_database.yml config/database.yml
and that config takes workspace and build number information exposed to the environment by Jenkins into account in order to create a uniquely named database for that job and it's specific execution:
test:
adapter: postgresql
encoding: unicode
host: 127.0.0.1
port: 5432
database: myapp_test<%= ENV['JOB_NAME'].split('/').last %><%= ENV['BUILD_NUMBER'] %>
The rest of our build proceeds without any knowledge or care that it's running in a distinct database. Finally, at the end of our build, we make sure to drop that database so we don't have a bunch of test databases polluting the file system:
RAILS_ENV=test bundle exec rake db:drop