Executing mrjob boostrap commands on head-node only - configuration

I have a mrjob configuration that includes loading a large file from s3 into HDFS. I would like to include these commands in the configuration file, but it seems that all bootstrap commands execute on all of the nodes in the cluster. This is over-kill and might also create synchronization problems.
Is there some way to include startup commands for the master node only in the mrjob configuration or is the only solution to SSH into the head node after the cluster is up to perform these operations?
Yoav

Well, you could have your steps start with a mapper and set mapred.map.tasks=1 in your jobconf. I've never tried it, but seems like it should work.
Another suggestion:
Use a filesystem or zookeeper for coordination:
if get_exclusive_lock_on_resource(filesystem_path_or_zookeeper_path):
Do the expensive bit
release_lock(filesystem_path_or_zookeeper_path)
if expensive_bit_not_complete():
sleep 10

Related

Is there a way in cloud foundry cf push command to have a manifest.yml with several "profiles"?

We have different setups (deployment parameters) for prod and for non prod environments, in regards to memory, instances etc.
We are deploying our applications with Jenkins pipeline, on Pivotal Cloud Foundry environments, which is eventually calling a script with a "CF push" command.
We are examining using two different manifest.yml files (but dislike the duplicity if identical parameters).
We are also examining using --var-file with two different vars files. We have a concern with backward compatibility, and the effort (we have many MSs) of adding so many files.
We want a manifest.yml that will look like this:
applications:
- name: myAppName
services:
- discovery
- config-server
profile:
dev:
memory: 1024M
instances: 1
prod:
memory: 4096M
instances: 4
Assuming we will need to pass a parameter profile=dev to the cf push command is fine.
In DEV environment 1 instance with 1024M of memory will be deployed; while in PROD environments, 4 instances with 4096M of memory will be deployed.
I suggest that you reconsider using variables in your manifest. You can use --var-file, but if you want to avoid having those files present you can just pass in multiple --var=<name>=<val> arguments instead.
That or just have dev.yml and prod.yml files, you can then cf push -f dev.yml or cf push -f prod.yml and pick between the two. There's a little duplication, but the files are tiny so it shouldn't be a big deal.
Hope that helps!
I don’t think, trying to achieve everything using CF CLI commands is the right way to do
I would achieve this much simply by writing a bash script and executing cf-push sequencely in whichever fashion I would like to have..

Containers reconfiguration in real-time

I have faced with following case and haven't found clear answer for me.
Preconditions:
I have kubernetes cluster
there are some options related to my application (for example debug_level=Error)
there are pods running and each of them uses configuration (ENV, mount path or cli args)
later I need to change value of some option (the same 'debug_level' Error -> Debug)
The Q is:
how should I notify my Pods that configuration has changed?
Earlier we could just send HUP signal to the exact process directly or call systemctl reload app.service
What are the best practices for this use-case?
Thanks.
I think this is something you could achieve using sidecar containers. This sidecar container could monitor for changes in the configuration and send the signal to the appropiate process. More info here: http://blog.kubernetes.io/2015/06/the-distributed-system-toolkit-patterns.html
Tools like kubediff or kube-applier can compare your kubernetes YAML files, to what's running on the cluster.
https://github.com/weaveworks/kubediff
https://github.com/box/kube-applier

What does the command line arguments for PM2 startup mean precisely

I am a little confused about start up scripts and the command line options. I am building a small raspberry pi based server for my node applications. In order to provide maximum protection against power failures and flash write corruption, the root file system is read only, and that embraces the home directory of my main user, were the production versions of my apps (two of them) are stored. Because the .pm2 directory here is no good for logs etc I currently set PM2_HOME environment variable to a place in /var (which has 512kb unused space around it to ensure writes to i. The eco-system.json file reads this environment variable also to determine where to place its logs.
In case I need to, I also have a secondary user with a read write home directory in another (protected by buffer space around it) partition. This contains development versions of my application code which because of the convenience of setting environments up etc I also want to monitor with PM2. If I need to investigate a problem I can log in to that user and run and test the application there.
Since this is a headless box, and with watchdog and kernel panic restarts built in, I want pm2 to start during boot and at minimum restart the two production apps. Ideally it should also starts the two development versions of the app also but I can live without that if its impossible.
I can switch the read only root partition to read/write - indeed it does so automatically when I ssh into my production user account. It switches back to read only automatically when I log out.
So I went to this account to try and create a startup script. It then said (unsurprisingly) that I had to run a sudo command like so:-
sudo su -c "env PATH=$PATH:/usr/local/bin pm2 startup ubuntu -u pi --hp /home/pi"
The key issue for me here is the --hp switch. I went searching for some clue as to what it means. Its clearly a home directory, but it doesn't match PM2_HOME - which is set to /var/pas in my case to take it out of the read only area. I don't want to try and and spray my home directory with files that shouldn't be there. So am asking for some guidance here
I found out by experiment what it does with an "ubuntu" start up script. It uses it to set PM2_HOME in the script by appending "/.pm2" to it.
However there is nothing stopping you editing the script once it has created it and setting PM2_HOME to whatever you want.
So effectively its a helper for the script, but only that and nothing more special.

MySQL SELinux conflict Fedora 19

I've successfully installed MySQL 5.6 on my F19. Although the installation was successful, I'm unable to start the mysql service.
When I ran
service mysql start
It returns the following error:
Starting MySQL..The server quit without updating PID file (/var/lib/mysql/sandboxlabs.pid).
I disabled SELinux (permissive mode), and the service started smoothly. But I did some research about disabling SELinux, and found that disabling SELinux is a bad idea. So, is there any way to add custom MySQL policy? Or should I leave the SELinux to permissive mode?
The full answer depends on your server configuration and how you're using MySQL. However, it's completely feasible to modify your SELinux policy to allow MySQL to run. In most cases, this sort of operation can be performed with a small number of shell commands.
Start by looking at /var/log/audit/audit.log. You can use audit2allow to generate a permission-granting policy around the log messages themselves. On Fedora 19, this utility is in the policycoreutils yum package.
The command
# grep mysql /var/log/audit/audit.log | audit2allow
...will output the policy code that would need to be compiled in order to allow the mysql operations that were prevented and logged in audit.log. You can review this output to determine whether you'd like to incorporate such permissions into your system's policy. It can be a bit esoteric but you can usually make out a few file permissions that mysql would need in order to run.
To enable these changes, you need to create the policy module as a compiled module:
# grep mysql /var/log/audit/audit.log | audit2allow -M mysql
...will output the saved plaintext code to mysql.te and the compiled policy code to mysql.pp. You can then use the semodule tool to import this into your system's policy.
# semodule -i mysql.pp
Once you've done this, try starting mysqld again. You might need to repeat this process a few times since mysqld might still falter on some new access permission that wasn't logged in previous runs. This is because the server daemon encounters these permission checks sequentially and if it gets tripped on one, it won't encounter the others until you allow access to the initial ones. Have patience -- sometimes you will need to create mysql1.pp mysql2.pp mysql3.pp ... and so on.
If you're really interested in combining these into a unified policy, you can take the .te files and "glue" these together to create a unified .te file. Compiling this file is only slightly more work -- you need the Makefile from /usr/share/selinux/devel/Makefile in order to convert this into a .pp file.
For more information:
If you're a more graphical type, there's also a great article by RedHat magazine on compiling policy here. There's also a great blog article which takes you through the creation of a policy here. Note the emphasis on using /usr/share/selinux/devel/Makefile to compile your own .te, .fc, and .if files (selinux source written in M4).

is it possible to start activemq with a configuration file that's not in one of the default locations?

All right all you activemq guru's out there...
Currently activemq require a configuration file before it runs. It appears from its debug output message:
$ ./activemq start -h
INFO: Using default configuration (you can configure options in one of these file: /etc/default/activemq /home/user_name/.activemqrc)
That you can only put it in one of those two locations. Anybody know if this is the case? Is there some command line parameter to specify its location?
Thanks!
-roger-
Yes, it is possible. Here are 3 possible answers.
If classpath is setup properly:
activemq start xbean:myconfig.xml
activemq start xbean:file:./conf/broker1.xml
Not using the classpath:
activemq start xbean:file:C:/ActiveMQ/conf/broker2.xml
reference:
http://activemq.apache.org/activemq-command-line-tools-reference.html
I have not been able to find the answer to this and I struggled with this myself for a while, but I've found a bit of a workaround. When you use bin/activemq create, you can create a runnable instance that will have its own bin, conf, and data directories. Then you have more control over that runnable insance and the .activemqrc becomes less important.
See this for detail on the create option : http://activemq.apache.org/unix-shell-script.html
Try this:
bin/activemq start xbean:/home/user/activemq.xml
Note that if the xml file includes other files like jetty.xml then it needs to be in that dir also.
If using a recent 5.6 SNAPSHOT you can set the env var ACTIVEMQ_CONF to point to the location where you have the config files
in the /bin/activemq script under # CONFIGURATION # For using instances, you can add or remove any file destinations you'd like.
Be very though since it ignores the others at the first occurrency of a file, read more here
Unix configuration
happy coding !