I have a couple of jobs that use a shared resource (database), which sometimes can cause builds to fail in the (rare) event that the jobs happen to get triggered simultaneously.
Given jobs A through E, for example, is there any way to specify that A and C should never be run concurrently?
Other than the aforementioned resource, the builds are independent of each other (not e.g. in a upstream/downstream relation).
A "brute-force" way would be limiting number of executors to one, but that obviously is less than ideal if most jobs could well be executed concurrently and there's no lack of computing resources on the build server.
There are currently 2 ways of doing this:
Use the Throttle Concurrent Builds plugin.
Set up those jobs to run on a slave having only 1 executor.
The Locks and Latches plugin here should help.
This question is probably a dupe of How do I ensure that only one of a certain category of job runs at once in Hudson?
That's an old question, but the topic can still be relevant, especially when running application tests on Jenkins.
The Lockable Resources Plugin allows you to define lockable resources that can be used by builds. If your build requires an resource, it takes the lock. If a second build requires the same resource (which then is already locked), it will be queued for the resource to be free.
Although the docs use computers or printers as examples for lockable resources, the database example from above should work as well.
In opposite to the Locks and Latches Plugin mentioned in answers from 2012, this package seems to be currently maintained (currently ~2016).
Have a look at the External Resource Dispatcher Jenkins plugin, which was first published in November 2012. This (relatively) new plugin seems to exactly cover this use case.
N.B. you don't need physical or virtual hardware for a slave/node, you can set up "slaves" that run on the master server.
Manage Jenkins > Manage Nodes > New node
and make a "dumb slaves" each with its own root directory.
Create a few slaves, execute them when the server boots, and then you have essentially created pools of executors.
You might have, say...
db - only one executor in your case.
compile - limit according to hardware or # of CPUs.
scripts - have many executors for all those little jobs that Jenkins is good at doing.
Old question, and whether this will work for your application I can't be sure as you didn't mention details of your application. However, I wanted to add the way that I handled this in our Rails application test suite.
Our application's database configuration (database.yml) isn't in the source repository. Instead, it lives in /var/lib/configs/uniquing_database.yml on the VM which runs our Jenkins instance.
One of the steps of our build process involves copying this config file to the project workspace:
cp /var/lib/jenkins/configs/myapp_unique_database.yml config/database.yml
and that config takes workspace and build number information exposed to the environment by Jenkins into account in order to create a uniquely named database for that job and it's specific execution:
test:
adapter: postgresql
encoding: unicode
host: 127.0.0.1
port: 5432
database: myapp_test<%= ENV['JOB_NAME'].split('/').last %><%= ENV['BUILD_NUMBER'] %>
The rest of our build proceeds without any knowledge or care that it's running in a distinct database. Finally, at the end of our build, we make sure to drop that database so we don't have a bunch of test databases polluting the file system:
RAILS_ENV=test bundle exec rake db:drop
Related
I containerize a web application which has container dependencies.
The containers are listed here in their dependency order, the latter being dependent on the former.
There is a mysql container which compiles mysql, installs it and configures it.
There is a learnintouch container which installs files and seeds custom product data into the mysql container.
There is a learnintouch.com container which installs files and seeds custom website data into the mysql container.
The data seeding is part of the application installation and needs to be done only once in the application lifetime.
The data seeding is quite long, very long in fact.
It would be nice to have the application created AND started by a docker-compose.yml file sitting in the learnintouch.com directory.
At first, I was hoping to have only three containers, all dependents, with each dependent container waiting for its dependency to complete its data seeding, before running itself, and finally starting the application.
I now see this will be difficult to achieve, if not impossible. It is already tricky to have docker-compose wait for a service to start up, but it's even more so to check for a data seeding to have completed.
I reckon one way is to have two branches in the containers dependency tree, one for the application installation doing the data seeding and another one only starting the application.
Is that a common practice ?
The best practice is to remove the dependency so containers can start in any order. If one container starts before another that it uses, it should gracefully return an error if you attempt to use it, but begin to work as soon as the dependencies come online. This allows various microservices to be independently upgraded, replaced, or migrated without needing to restart your entire infrastructure.
When that's not possible, realize that docker-compose is great at what it does, but what it does is limited in scope. So you'll need to extend that with your own scripting. So you may have several compose files, and the first one would be run to seed the data until it completes and then when it returns the script would continue to where you launch your other containers.
Lastly, for large data structures, that is best managed as an external volume, which you can create, update, and backup independently from your other containers.
I have a small app running on a production server. In the next update the db schema will change; this means the production database schema will need to change and there will need to be some data manipulation.
What's the best way to do this? I.E run a one off script to complete these tasks when I deploy to the production server?
Stack:
Nodejs
Expressjs
MySQL using node mysql
Codeship
Elasticbeanstalk
Thanks!
"The best way" depends on your circumstances. Is this a rather seldom occurrence, or is it likely to happen on a regular basis? How many production servers are there? Are there other environments, e.g. for integration tests, staging etc.? Do your developers have an own DB environment on their machines? Does your process involve continuous integration?
The more complex your landscape is, the better it is to use solutions like Todd R suggested (Liquibase, Flywaydb).
If you just have one production server and it can be down for maintenance for a few hours, the it could be sufficient to
Schedule a maintenance downtime with your stakeholders and users
Shutdown the server
Create a backup
Update the database structure and contents as necessary
Deploy software updates
Restart the server
Test the result (manually or automatically)
Inform your stakeholders and users
If anything goes wrong, rollback to a backed up version of the database and your software.
Having database update scripts is advisable. Having tested them once or more is advisable even more. Creating a backup in advance is essential.
http://www.liquibase.org/ or http://flywaydb.org/ - pretty "heavy" for one time use, but if you'll need to change the schema again in the future, probably worth investing the time to learn one of these.
I run a pretty customized cluster for processing large amounts of scientific data based on a basic LAMP design. In general, I run a separate MySQL server with around 128GB of ram and about 1TB of storage. Separately, I run a head node that serves as an nfs mount point for the data input of my process, and a webserver to display results. Finally, I trypically have a few compute nodes that get their jobs from a mysql table, get the data from NFS, do some heavy lifting, then put results into mysql.
I have come across a dataset I would like to process which is pretty large (1TB of input data), and I don't really have the hardware on hand to handle it. As a result, I began investigating google compute engine etc, and the prospect of scaling instances to process these data rapidly with the results stored in a mysql instance. Upon completion the mysql tables could be dumped from the cloud and brought up locally for analysis. I would have no problem deploying a MySQL server, along with the rest of the LAMP pieces and the compute nodes, but I can't quite figure out how I would do this in the cloud.
A major sticking point seems to be the lack of read/write NFS which would allow me to get the data onto several instances, crunch it, then push the results to MySQL. This is a necessary step for me as I could queue hundreds of jobs from the webserver, then have the instances (as many as 50-100) pick the jobs up by connecting to a centralized mysql instance to find out what jobs an instance needs to do and where the data is. Process the data (there is a file conversion that happens which make the write part necessary), crunch the data, then load results to mysql. I hope I'm explaining my situation clearly. This seems like a great example of a CPU intensive process that would scale nicely in the cloud, I just can't seem to put all the pieces together... Any input is appreciated!
It sounds quite possible; I've been doing similar things in GCE for a while now.
NFS mount - you just need configure it as you would normally. Set up the NFS server on the head node, and then configure the clients on the slave nodes to mount it. Here and here are some basic configuration instructions for Centos 6 I used to get NFS up and running.
Setting up a LAMP stack is very straightforward. These machines run pretty much vanilla Linux distros, so you can just use yum or apt-get to install components.
For the cluster, you will probably end up having an image for the head node you use once, and then another image for the slave nodes that you replicate for each one.
For the scheduler, I've used Condor and Sge successfully, but I'm sure the other ones would work just as well.
Hope this helps.
I have a Jenkins (Hudson) server setup that runs tests on a variety of slave machines. What I want to do is reconfigure the slave (using remote APIs), reboot the slave so that he changes take effect, then continue with the rest of the test. There are two hurdles that I've encountered so far:
Once a Jenkins job begins to run on the slave, the slave cannot go down or break the network connection to the server otherwise Jenkins immediately fails the test. Normally, I would say this is completely desirable behavior. But in this case, I would like for Jenkins to accept the disruption until the slave comes back online and Jenkins can reconnect to it - or the slave reconnects to Jenkins.
In a job that has been attached to the slave, I need to run some build tasks on the Jenkins master - not on the slave.
Is this possible? So far, I haven't found a way to do this using Jenkins or any of its plugins.
EDIT - Further Explanation
I really, really like the Jenkins slave architecture. Combined with the plugins already available, it makes it very easy to get jobs to a slave, run, and the results pulled back. And the ability to pick any matching slave allows for automatic job/test distribution.
In our situation, we use virtualized (VMware) slave machines. It was easy enough to write a script that would cause Jenkins to use VMware PowerCLI to start the VM up when it needed to run on a slave, then ship the job to it and pull the results back. All good.
EXCEPT Part of the setup of each test is to slightly reconfigure the virtual machine in some fashion. Disable UAC, logon as a different user, have a different driver installed, etc - each of these changes requires that the test VM/slave be rebooted before the changes take affect. Although I can write slave on-demand scripts (Launch Method=Launch slave via execution of command on the master) that handle this reconfig and restart, it has to be done BEFORE the job is run. That's where the problem occurs - I cannot configure the slave that early because the type of configuration changes are dependent on the job being run, which occurs only after the slave is started.
Possible Solutions
1) Use multiple slave instances on a single VM. This wouldn't work - several of the configurations are mutually exclusive, but Jenkins doesn't know that. So it would try to start one slave configuration for one job, another slave for a different job - and both slaves would be on the same VM. Locks on the jobs don't prevent this since slave starting isn't part of the job.
2) (Optimal) A build step that allows a job to know that it's slave connection MIGHT be disrupted. The build step may have to include some options so that Jenkins knows how to reconnect the slave (will the slave reconnect automatically, will Jenkins have to run a script, will simple SSH suffice). The build step would handle the disconnect of the slave, ignore the usually job-failing disconnect, then perform the reconnect. Once the slave is back up and running, the next build step can occur. Perhaps a timeout to fail the job if the slave isn't reconnectable in a certain amount of time.
** Current Solution ** - less than optimal
Right now, I can't use the slave function of Jenkins. Instead, I use a series of build steps - run on the master - that use Windows and PowerShell scripts to power on the VM, make the configurations, and restart it. The VM has a SSH server running on it and I use that to upload test files to the test VM, then remote execute them. Then download the results back to Jenkins for handling by the job. This solution is functional - but a lot more work than the typical Jenkins slave approach. Also, the scripts are targeted towards a single VM; I can't easily use a pool of slaves.
Not sure if this will work for you, but you might try making the Jenkins agent node programmatically tell the master node that it's offline.
I had a situation where I needed to make a Jenkins job that performs these steps (all while running on the master node):
revert the Jenkins agent node VM to a powered-off snapshot
tell the master that the agent node is disconnected (since the master does not seem to automatically notice the agent is down, whenever I revert or hard power off my VMs)
power the agent node VM back on
as a "Post-build action", launch a separate job restricted to run on the agent node VM
I perform the agent disconnect step with a curl POST request, but there might be a cleaner way to do it:
curl -d "offlineMessage=&json=%7B%22offlineMessage%22%3A+%22%22%7D&Submit=Yes" http://JENKINS_HOST/computer/THE_NODE_TO_DISCONNECT/doDisconnect
Then when I boot the agent node, the agent launches and automatically connects, and the master notices the agent is back online (and will then send it jobs).
I was also able to toggle a node's availability on and off with this command (using 'toggleOffline' instead of 'doDisconnect'):
curl -d "offlineMessage=back_in_a_moment&json=%7B%22offlineMessage%22%3A+%22back_in_a_moment%22%7D&Submit=Mark+this+node+temporarily+offline" http://JENKINS_HOST/computer/NODE_TO_DISCONNECT/toggleOffline
(Running the same command again puts the node status back to normal.)
The above may not apply to you since it sounds like you want to do everything from one jenkins job running on the agent node. And I'm not sure what happens if an agent node disconnects or marks itself offline in the middle of running a job. :)
Still, you might poke around in this Remote Access API doc a bit to see what else is possible with this kind of approach.
Very easy. You create a Master job that runs on the Master, from the master job you call the client job as a build step (it's a new kind of build step and I love it). You need to check that the master job should wait for the client job to finish. Then you can run your script to reconfigure your client and run the second test on the client.
An even better strategy is to have two nodes running on your slave machines. You need to configure two nodes in Jenkins. I used that strategy successfully with a unix slave. The reason was that I needed different environment variables to be set up and I didn't wanted to push that into the jobs. I used ssh clients, so I don't know if it is possible with different client types. Than you might be able to run both tests at the same time or you chain the jobs or use the master strategy mentioned above.
ENVIRONMENT:
Ubuntu hardy LTS, Apache 2, Passenger
Virtual Server One: rails 2.3.8 application accessing MySQL common_database
Virtual Server Two: rails 3.0.4 application accessing the same MySQL common_database
The first application gets regular use from our clients. The second application is released but seeing light usage currently. The database seems to be working fine.
Someone advised me that this configuration could wind up corrupting the database. We anticipate that the second application will eventually get heavy usage. It would kill some momentum if the database became corrupted. A few more facts:
Neither application has the ability to change database structure.
Both apps will be accessing the same tables at the same time.
It is impossible for both apps to attempt to update the same record at the same time.
Has anyone experienced corruption of a MySQL database that is accessed in this kind of configuration? Were you able to overcome the problem? How?
What was the case made for risk of corruption?
As long as you aren't using rake db:migrate and both apps have the same models, you have roughly the same risks as if two or more web servers with the same application version have been spun up. If there is a divergence in expectation of what the database schema looks like between the two apps, you may run into issues, especially if the divergence is related to foreign keys, or application logic that is based on magic values.
With any concurrent manipulation of a database you run into a category of issues around modifying the same records simultaneously (who wins when two clients try to modify the same piece of data; what kind of locking model do you use, etc).