Configuring hadoop manually on ec2 - configuration

Can someone please point me in the direction of any resources that will help me manually setup/configure Hadoop (1.0.4) on EC2. I agree that there are lots of resources for accomplishing this using tools, services etc. but what I'm looking for is some help figuring out what modifications to manually make to the conf/*.xml files for both slaves and master in order to get Hadoop working.
Right now, I have 5 ec2 instances running and all of them are capable of running hadoop jobs individually in psuedo-distributed mode. So, I need to turn one into the master and the rest into slaves by way of configuring the conf files, such that the slaves know where the namenode and jobtracker is and the master knows about all the slaves.
My understanding is that I will also have to configure the EC2 security group of the instances so that they can all talk to one another on the right port. I think I'm OK with this.
Can anyone help me out with the configuration part, or point me towards something that might help?

I found this thorough tutorial here which got me up and running eventually: http://cloudblog.8kmiles.com/2011/12/05/hadoop-fully-distributed-setup/

Related

Can't run docker containers for Jenkins and MySQL at the same time on EC2

I'm testing to set up an environment on AWS EC2
with two docker containers for Jenkins and MySQL respectively.
But when I try to run a MySQL container, the Jenkins container gets killed.
So I tried to run the Jenkins docker again, but then EC2 just stopped completely.
I guess this is because I'm using the free tier one, but could anyone possibly explain what's causing this issue?
I'd really appreciate it!
Can you share the commands or configuration files you're using to run these two containers? I suspect that it was a coincidence you faced both when the Jenkins container failed and the EC2 instance stopped working. In the event that Jenkins and Docker both have the same container name attributed to them, Docker will throw an error. In any other event, Docker will simply create the new container which will be entirely indifferent and agnostic about the other one.
When you say you're using the Free tier what do you mean by this? The AWS Free tier? It is unlikely that using that had any impact on the software running on your instance.
If you can provide this additional information I'd he more than happy to help you continue troubleshooting this issue.
EDIT: Removed claim that AWS Free Tier may cause container interruptions. The Linux Out of Memory Killer does, in fact, make this a possibility as noted in the comments by #akazuko. Could you please also provide the output for journalctl -xeu docker in your response? Doing so will indicate whether or not the OOM Killer is responsible. Be sure to trigger the error once or twice before running that command as it produces log files.

Go+MySql: how easy is to migrate to GKE (Google Cloud Container Engine)?

My project is currently hosted by an independent cloud provider.
I am using 2 Virtual Machines, with Linux:
one hosts a Go application
one hosts a MySql database
I would now like to move to the Google Cloud Platform.
Do you think does it make sense to move to Google Cointainer Engine (GKE), rather than to the Google Compute Engine (which would have the same virtual machine model (IaaS) I am using with the current provider)?
I have never used Kubernetes and Docker. How easy would it be to make the migration? Am I going to complicate my life uselessly?
How difficult is the configuration for my simple model?
I have never used Kubernetes and Docker.
Moving to a platform that you have no experience with doesn't sound like a great idea. Instead, why not start by doing some tutorials about Docker and then Kubernetes?
After that, you might try Minikube (https://kubernetes.io/docs/getting-started-guides/minikube/) locally to start writing some manifests for the components (which sound like maybe a DaemonSet or single Pod with PersistentVolume for MySQL and a Deployment for the Go application).
Once you have the pieces working locally, then it would probably make more sense to think about migrating. You would have a much better understanding of what you are getting into and if it is something you would want to undertake.

how to run mysql container using Apache Mesos/Marathon

I'm trying to use Apache Marathon to run my container based application.
For this I've installed Mesos, Zookeeper, marathon and Docker. Is there anything other than that I need to install.
I'm trying Simple docker-based application in this
https://mesosphere.github.io/marathon/docs/application-basics.html
I am not able to run this, it is only showing deploying
maratho giving INFO delaying /basic-3 due to backoff.
Is the procedure I followed correct. Any help is much appreciated. I've installed my master and slave on same machine
thanks
Could you first check whether your cluster is that up correctly?
Check whether in the Mesos UI (hostname:5050 by default) whether the slaves are registered
Can you run a simple marathon job such as 'sleep 30' to check the marathon configuration?
Joerg
P.S. You could also check whether Mesos is currently pulling the docker (?) image which might take while. Therefore you might want to look into the Mesos log...

Mysql: How to configure mysql proxy for an existing master-slave setup

I want to configure mysql proxy on my test environment to observe the below.
1. Behavior of the proxy
2. How load, CPU usage varies on my test server for read/write distribution.
I googled and able to install proxy on my ubuntu linux.
But I didnt see any thing on configuring it in a step by step manner and how to start or stop this.
Shall some one explore on this and this would be of great help for me.
Thanks in advance
Regards,
UDAY
By default if you run the proxy on the same machine as the server it will listen to port 4040 and query a backend server on the msyql default port of 3036. Other port numbers and server locations can be configured from the command line or with a configuration file.
To distribute queries across servers, add monitoring, profiling etc. you need to provide a Lua script to mysql-proxy. See the example / tutorial scripts in /usr/local/share/docs that came with the installation download. There is work to do for a production implementation.
The basics of how the scripting works can be found here under MySQL Proxy Scripting.
Don't be worried about Lua. The syntax is quite readable given the tutorial examples to work from. As and when you need it lua.org has more details of Lua.

Port LAMP application to EC2

Any good resource on how to port a LAMP stack to Ec2?
Mainly I'm concerned about storage, the MySQL part. The existing app works agains a single store. Do I need to port all my storage to S3? Will the EC2 instances be able to share a single MySQL database? Alternatively I can partition my data and have a single database for each EC2 image, but I still need a global user account database for authentication and if the data is partitioned the requests have to be routed to the proper image. Not sure how this is achieved in EC2.
To wrap up: where should I start?
These Tips for deploying a LAMP stack on Amazon EC2 are IMO a really good starting point. I'd suggest to read them first (I'm not sure I understand your concerns about the storage part), maybe things will be clearer after.
I know this is old, but for anyone who's in this situation check out: http://www.robotmedia.net/2011/04/how-to-create-an-amazon-ec2-instance-with-apache-php-and-mysql-lamp/
That is the most straight forward tutorial I've found for implementing a LAMP stack on Amazon EC2.
Using S3 isn't required, although it is an affordable way to host files. Yes, multiple instances can share a single database and you can use database replication for additional availability. Here's a great tutorial for that: http://aciddrop.com/2008/01/10/step-by-step-how-to-setup-mysql-database-replication/