Spreading a single build over multiple slaves - hudson

I want to write an ant script that is distributed over multiple slaves. I don't understand exactly how the hudson system works, but it seems to simply run the whole of one type of build on a single slave. I would like multiple slaves to run in parallel to do my testing. How can I accomplish this?

You split your testing job into several jobs. 1 job per slave. Your build will then trigger all testing jobs at the same time. If you need to run an additional job, you can use the join trigger plugin.
The release notes for Hudson 1.377 list a new feature:
Queue/execution model is extended to
allow jobs that consume multiple
executors on different nodes
Don't know what that exactly means. But I will definitely have a look.

Related

What exactly triggers a pipeline job to run in Code Repositories?

Want to understand when the pipeline job runes so I can more effectively understand the pipeline build process. Does it check the code change from master branch of the Code Repository?
Building a job on the pipeline, builds the artifact that was delivered on the instances, not what has been merged onto master.
It should be the same, but there is a checking process after the merge onto master and before the delivery of the artifact, like you would have on a regular Git/Jenkins/Artifactory.
So there is a delay.
And moreover if these checks don't pass, your change, even though merged onto master, will never appear on the pipeline.
To add a bit more precision as to what #Kevin Zhang wrote.
There's also the possibility to trigger a job using an API call, even though it's not the most common.
Also you can combine the different events to say things like
Before work hours
build only if the schedule of the morning update has succeeded
during work hours
build every hour
if an input has new data
and
if a schedule has run successfully
or another dataset has been updated
after hours
build whenever an input has new data
It can also helps you create loops, like if you have a huge amount of data coming in input B and it impacts your sync toward the ontology, or a timeserie,... , you could create a job that takes a limited number of rows from input B and log the id's of these in a table to not retake them again, you process those rows and when the output C is updated you rerun your job and when there is no more row you update output D.
You can also add a schedule on the job that produces input B from input A stating to rerun it only when output D is updated.
This would enable you to process a number of files from a source, process the data from those files chunk by chunk and then take another batch of files and iterate.
By naming your schedule functionnaly you can have a more controlled build of your pipeline and finer grain of data governance and you can also add some audit table or log tables based on these schedules, which will make debugging and auditing way more easy.
You would have a trace of when and where a specific source update has reach.
Of course, you need such precision only if your pipeline is complexe : like many different sources, updated at different time and updating multiple part of your pipeline.
For instance if you're unifying the data of your client that was before separated in many silos or if it is a multinational group of many different local or global entities, like big car manufacturers
It depends on what type of trigger you’ve set up.
If your schedule is a single cron schedule (i.e.: by scheduled time), the build would not look at the master branch repo. It'll just build according to the cron schedule.
If your schedule contains an event trigger (e.g. one of the 4 event types: Job Spec Put, Transaction Committed, Job Succeeded and Schedule Ran Successfully), then it'll trigger based on the event where only the Job Spec Put even type would trigger based on the master branch code change.

SQL Server continuous job

I have a requirement to run a job continuously which includes a stored procedure. This stored procedure does a critical task where it processes huge load of data as they come. As I know, it is not allowed to run 2 or more instances of a job in the same time by SQL Server it self. So, my questions are
Is there a way to run SQL Sever job continuously?
Do continuously running jobs hurt performance of the server?
There are continuous replication jobs; however, those are continuous because of an inline switch used in the command line and not due to the job being scheduled as continuous.
The only way to emulate a continuous job is to simply have it run often. There is an option under scheduling to run the job down to every second 24/7/365. With that said, you will need to be careful that the job isn't overrunning itself and that it is efficient enough to not cause issues with your server.
Whether it will effect performance is going to be reliant on what it does. If the job only selects the current date/time (not a very useful thing to do but an example), I would not expect an issue; however, if it runs complicated algorithms then it almost certainly going to cause issues.
I would recommend running this on a test server before putting it into production.

How can I integrate data on regular bases between 2 different MySQL Servers?

I currently have 2 MySQL Serve running on a different machines. Once of them is a staging environment (A) and another is a production environment (B). What I need to do is to take data from (A) and update/insert into B based on the conditions. If MySQL had Linked option then I can simply create a stored procedure that does the work for me and that would solve my whole problem. Unfortunately a great product like MySQL does not have this necessary future.
But since I can't write a procedure to do that what application I can use that will do the integration for me? note this integration will need to be automatic so it can be done daily and sometimes hourly.
My question is there an integration application out there that will integrate data from on MySQL Server to another automatically?
Thanks
I'm not a MySQL guy, and I don't know if it has a Linked/Federated option or not, but I understand it has very good replication. You could replicate tables from A into copy tables in B, then put your SP on B and run it whenever you want.
You could also write an application/batch script/ETL job that transfers the data and applies the conditions and just run it from a scheduler. If you do this frequently (meaning you do many other processes like this), I'd lean toward an ETL tool. I use Pentaho Data Integration. There are several others.

Do not run some job until some other job is running

I have two different jobs.
These jobs could be started independently of each other.
But they work with the same object.
And I would like to have only one of these jobs running at the same time.
Is it possible to configure job in Hudson/Jenkins to do not run until some other jobs are running?
Have a look at the locks and latches plugin. If I understood your question correctly, it does exactly what you want.
Use Build triggers - Build after other projects are built.
Use this trigger option to achieve your task.
point the job to run on a specific environment which limit the job to run simultaneously. with this the second job will be in the queue rather than running together

How do I ensure that only one of a certain category of job runs at once in Hudson?

I use Hudson to automate the testing of a very large important product. I want to have my testing-hosts able to run as many concurrent builds as they will theoretically support with the exception of excel-tests which must only run one per machine at any time. Any number of non-excel tests can run concurrently, however at most one excel test at a time must run per machine.
Background:
Most of my tests are normal unit-tests - the sort of thing that I can easily run in parallel. Unfortunately a substantial and time consuming part of my unit-testing plan consists of tests which have been implemented in Excel.
You might think it crazy to implement a test in Excel - actually there's an important reason: Most of our users access our system via a Excel. Excel has it's own quirky ways of handling data so the only way to guarantee that our stuff works for Excel users is to literally implement our reg-test our application Excel.
I've written a test-runner tool which allows me to easily fire off a group of excel tests: Each test is a single .xls file. Each group is a folder full of excel files. I've got about 30 groups which need to be run for an end-to-end test. My tool converts the result of each of the tests into JUnit style XML which Hudson is able to understand. The tests use the pywin32com library to automate excel. When run on their own they are reliable.
I've got a group of computers which are dedicated to running tests. Each machine is quad-core and can theoretically run quite a lot of stuff at once. Unfortunately I've found that COM cannot be used to safely control more than 1 excel per machine at a time.
That is to say if a 2nd build stars which tries to talk to Excel via COM it might interfere with the one which is already running and cause both tests to fail.
I can run as many other non-excel processes as the machine will allow but I need to find a way so that Hudson does not attempt to launch any more than 1 process which requires excel on any one machine concurrently.
Sounds like the Locks and Latches plugin might help you.
http://hudson.gotdns.com/wiki/display/HUDSON/Locks+and+Latches+plugin
Isn't hudson java?
Since you've tagged this post python, I'll point out that buildbot, has slave locks to limit individual steps on individual slaves (or use them as more coarse locks if you'd like).