Steps of Job Execution with Yarn - hadoop2

I'm new to Hadoop.
What are exactly MR application Master and container?
What are the steps of the job execution with Yarn.
Thanks for your help

The ApplicationMaster is a instance of a framework-specific library and is responsible for negotiating resources from the ResourceManager and working with the NodeManager on all the hosts to execute and monitor the containers and their resource consumption.
container : Application execution happens inside containers.

Related

Can't run docker containers for Jenkins and MySQL at the same time on EC2

I'm testing to set up an environment on AWS EC2
with two docker containers for Jenkins and MySQL respectively.
But when I try to run a MySQL container, the Jenkins container gets killed.
So I tried to run the Jenkins docker again, but then EC2 just stopped completely.
I guess this is because I'm using the free tier one, but could anyone possibly explain what's causing this issue?
I'd really appreciate it!
Can you share the commands or configuration files you're using to run these two containers? I suspect that it was a coincidence you faced both when the Jenkins container failed and the EC2 instance stopped working. In the event that Jenkins and Docker both have the same container name attributed to them, Docker will throw an error. In any other event, Docker will simply create the new container which will be entirely indifferent and agnostic about the other one.
When you say you're using the Free tier what do you mean by this? The AWS Free tier? It is unlikely that using that had any impact on the software running on your instance.
If you can provide this additional information I'd he more than happy to help you continue troubleshooting this issue.
EDIT: Removed claim that AWS Free Tier may cause container interruptions. The Linux Out of Memory Killer does, in fact, make this a possibility as noted in the comments by #akazuko. Could you please also provide the output for journalctl -xeu docker in your response? Doing so will indicate whether or not the OOM Killer is responsible. Be sure to trigger the error once or twice before running that command as it produces log files.

Spring batch deployed on openshift using several pods

I deploy an application on Openshift and I use at least 2 pods.
My war contains a Spring Batch application, scheduled by a Spring cron.
Of course, each pod start the same batch at the same time, and it's my problem/question.
Is there a way to avoid this behaviour ? I would like to start only one batch instance (or is there a way to configure Spring batch to check if a batch is already running ?)
Thanks in advance.
Assuming you use Deployment, it's not trivial, but here are some ideas that can help you.
Use ScheduledJobs/CronJobs from Kubernetes. Meaning you would ditch controling of launching batch from your app completely and have dedicated pod launched to perform batch job and die
Use master elector sidecar for establishing the right to exec batch (https://github.com/kubernetes/contrib/tree/master/election)
Implement some locking mechanism on your own
Use StatefulSet and bind batch to run only on a praticular hostname (ie. by config var passed to Pods like BATCH_HOSTNAME. StatefulSets have deterministic names so you could say that batch should run only on my-pods-0
It sounds like you need leader election in your situation. Spring Integration provides leader election functionality you can use to determine who is the master. That master would be the one that actually launches the jobs. The other would just ignore the scheduled event. You can read more about Spring Integration's leader election in the documentation here: https://docs.spring.io/spring-integration/api/org/springframework/integration/support/leader/LockRegistryLeaderInitiator.html

how to run mysql container using Apache Mesos/Marathon

I'm trying to use Apache Marathon to run my container based application.
For this I've installed Mesos, Zookeeper, marathon and Docker. Is there anything other than that I need to install.
I'm trying Simple docker-based application in this
https://mesosphere.github.io/marathon/docs/application-basics.html
I am not able to run this, it is only showing deploying
maratho giving INFO delaying /basic-3 due to backoff.
Is the procedure I followed correct. Any help is much appreciated. I've installed my master and slave on same machine
thanks
Could you first check whether your cluster is that up correctly?
Check whether in the Mesos UI (hostname:5050 by default) whether the slaves are registered
Can you run a simple marathon job such as 'sleep 30' to check the marathon configuration?
Joerg
P.S. You could also check whether Mesos is currently pulling the docker (?) image which might take while. Therefore you might want to look into the Mesos log...

Deploying one instance of Docker container on Apache Mesos/Marathon

I have tried using Marathon framework to deploy only one instance of MySQL container on the web UI to test the functions of Apache Mesos. The problem is that it run and deployed so many containers at a time even though I've stated only one instance. But after letting the process "sleep for 10s" to find out about the problem, I found out that it actually run 4 containers at a time. Any help?

Using JUnit, Maven and Hudson/Jenkins for integration tests

We are going to use Hudson/Jenkins build server to both build our server applications (just calling maven) and run integration tests against it. We are going to prepare 3 Hudson/Jenkins jobs: for build, deploy and run integration tests, which call each other in this order. All these jobs (build, deploy, integration tests) will be running nightly.
The integration tests are written with JUnit and are invoked by mvn test, (which will be invoked by the "test" Hudson/Jenkins job in turn). Since they require the server to be up and running we have to run that "deploy" job.
Does it make sense? Is there are any special server to deploy application and run tests or Hudson/Jenkins is ok for that?
It definitely makes sense, basically you are referring to a build pipeline. There is a Jenkins-plugin to help visualize the upstream/downstream projects (you create a new pipeline view in jenkins).
As for the deployment of the server component, this depends on what technology/stack you are running on. For instance you could write a script that deploys the application to a test environment using a post-build step in jenkins.
Another option is to use a maven plugin to deploy the application. You can separate the deployment step in profile, and run only the deploy goal on the deploy step etc.
Basically there are a lot of options, but the idea of a build pipeline makes a lot of sense. To read up on build pipelines and related topics I would suggest taking a look at Continuous Deployment.
For more information related to Jenkins, have a look at this video.
Does it make sense? Is there are any special server to deploy
application and run tests or Hudson/Jenkins is ok for that?
You can run the application on the same server as jenkins, but wether that makes sense depends on the application. If it depends heavily on a specific server setup, a better choice may be to run the server in a vm, and but the configuration in source control. There are plenty of tools to help automate this, of the top of my head you have Puppet, Chef and Vagrant
Depending on the technology of your server, you could do all of this in a single Hudson project, executing your integration tests using Maven's Failsafe plugin instead of Surefire.
This allows you to start and deploy prior to executing the integration tests, and shutdown your server after they have completed. It also allows you to separate your integration tests from your unit tests.
For Java EE applications, you can perform the start/deploy/stop steps using Cargo, or use an embedded Jetty containing and the Jetty Maven plugin.