how to set livy.server.session.timeout on EMR cluster boostrap? - configuration

I am creating an EMR cluster, and using jupyter notebook to run some spark tasks.
My tasks die after approximately 1 hour of execution, and the error is:
An error was encountered:
Invalid status code '400' from https://xxx.xx.x.xxx:18888/sessions/0/statements/20 with error payload: "requirement failed: Session isn't active."
My understanding is that it is related to the Livy config livy.server.session.timeout, but I don't know how I can set it in the bootstrap of the cluster (I need to do it in the bootstrap because the cluster is created with no ssh access)
Thanks a lot in advance

On EMR, livy-conf is the classification for the properties for livy's livy.conf file, so when creating an EMR cluster, choose advanced options with Livy as an application chosen to install, please pass this EMR configuration in the Enter Configuration field.
[{'classification': 'livy-conf','Properties': {'livy.server.session.timeout':'5h'}}]
On EMR, Livy binary is located at /etc/livy/, and so the config file is at /etc/livy/conf/livy.conf
To verify this,
Create an EMR cluster with a known ec2 key-pair, Livy and above config
Using the ec2 key-pair, login to the EC2 Master node associated with the cluster ssh -i some-ec2-key-pair.pem hadoop#ec2-00-00-00-0.ca-region-n.compute.amazonaws.com
Navigate to /etc/livy/conf, vim livy.conf & see the updated value of livy.server.session.timeout

If you don't want the Livy session to go down at all, then set the property livy.server.session.timeout-check to false in /etc/livy/conf/livy.conf.

Another way to do that if you don’t want to recreate the cluster is:
go to /etc/livy/conf/livy.conf and set the livy.server.session.timeout property to the value you would like.
After that, run sudo restart livy-server to make the configuration applied.

Related

How to run several IPFS nodes on a single machine?

For testing, I want to be able to run several IPFS nodes on a single machine.
This is the scenario:
I am building small services on top of IPFS core library, following the Making your own IPFS service guide. When I try to put client and server on the same machine (note that each of them will create their own IPFS node), I will get the following:
panic: cannot acquire lock: Lock FcntlFlock of /Users/long/.ipfs/repo.lock failed: resource temporarily unavailable
Usually, when you start with IPFS, you will use ipfs init, which will create a new node. The default data and config stored for that particular node are located at ~/.ipfs. Here is how you can create a new node and config it so it can run besides your default node.
1. Create a new node
For a new node you have to use ipfs init again. Use for instance the following:
IPFS_PATH=~/.ipfs2 ipfs init
This will create a new node at ~/.ipfs2 (not using the default path).
2. Change Address Configs
As both of your nodes now bind to the same ports, you need to change the port configuration, so both nodes can run side by side. For this, open ~/.ipfs2/configand findAddresses`:
"Addresses": {
"API": "/ip4/127.0.0.1/tcp/5001",
"Gateway": "/ip4/127.0.0.1/tcp/8080",
"Swarm": [
"/ip4/0.0.0.0/tcp/4001",
"/ip6/::/tcp/4001"
]
}
To for example the following:
"Addresses": {
"API": "/ip4/127.0.0.1/tcp/5002",
"Gateway": "/ip4/127.0.0.1/tcp/8081",
"Swarm": [
"/ip4/0.0.0.0/tcp/4002",
"/ip6/::/tcp/4002"
]
}
With this, you should be able to run both node .ipfs and .ipfs2 on a single machine.
Notes:
Whenever you use .ipfs2, you need to set the env variable IPFS_PATH=~/.ipfs2
In your example you need to change either your client or server node from ~/.ipfs to ~/.ipfs2
you can also start the daemon on the second node using IPFS_PATH=~/.ipfs2 ipfs daemon &
Hello, I use ipfs2, after running two daemons at the same time, can indeed open localhost:5001 / webui, run the second localhost:5002 / webui has an error, as shown in the attachment
Here are some ways I've used to create multiple nodes/peers ids.
I use windows 10.
1st node go-ipfs (latest version)
2nd node Siderus Orion ifps (connect to Orion node , not local) -- https://orion.siderus.io/
Use VirtualBox to run a minimal ubuntu installation. (You can set up as many as you want)
Repeat the process and you have 4 nodes or as many as you want.
https://discuss.ipfs.io/t/ipfs-manager-download-install-manage-debug-your-ipfs-node/3534 is another gui that installs and lets you manage all ipfs commands without CMD. He just released it a few days ago and it looks well worth lots of reviews.
Disclaimer I am not a coder or computer professional. Just a huge fan of IPFS! I hope we can raise awareness and change the world.

Restarting a MySQL server managed by Ambari

I have a scenario where I need to change several parameters of a hadoop cluster managed by Ambari to document performance of a particular application. The change in the configs entails a restart of the affected components.
I am using the Ambari REST API for achieving this. I figured out how to do this for all service components of hadoop. I' am not sure whether the API provides a way to restart the MySQL server that Hive uses.
I have the following questions:-
Is it the case that a mere stop and start of mysqld on the appropriate machine is enough to ensure that the required configuration changes are recognized by Ambari and the application?
I chose the 'New MySQL database' option while installing Hive via Ambari. Does this mean that restarts are reflected in Ambari only when it is carried out from the Ambari UI?
Your inputs would be highly appreciated.
Thanks!
Found a solution to the problem. I used the following commands using the Ambari REST API for changing configurations and restarting services from the backend.
Login to the host on which the ambari server is running and use the already provided config.sh script as described below.
Modifying configuration files
#!/bin/bash
CLUSTER_NAME=$1
CONFIG_FILE=$2
PROPERTY_NAME=$3
PROPERTY_VALUE=$4
/var/lib/ambari-server/resources/scripts/configs.sh -port <ambari-server-port> set localhost $1 $2 "$3" "$4"
where CONFIG_FILE can take values like tez-site, mapred-site, hadoop-site, hive-site etc. PROPERTY_NAME and PROPERTY_VALUE should be set to values relevant to the specified CONFIG_FILE.
Restarting host components
curl -uadmin:admin -H 'X-Requested-By: ambari' -X POST -d '
{
"RequestInfo":{
"command":"RESTART",
"context":"Restart MySQL server used by Hive Metastore on node3.cluster.com and HDFS client on node1.cluster.com",
"operation_level":{
"level":"HOST",
"cluster_name":"c1"
}
},
"Requests/resource_filters":[
{
"service_name":"HIVE",
"component_name":"MYSQL_SERVER",
"hosts":"node3.cluster.com"
},
{
"service_name":"HDFS",
"component_name":"HDFS_CLIENT",
"hosts":"node1.cluster.com"
}
]
}' http://localhost:<ambari-server-port>/api/v1/clusters/c1/requests
Reference Links:
Restarting components
modifying configurations
Hope this helps!

Apache Drill with Kerberos

Does anyone know how to enable kerberos with Apache Drill? Is it possible. I can't seem to find any documentation on it, or any questions/answers floating around with the information on it. I am currently running a CDH cluster.
I am getting this error when trying to use HDFS with Drill:
Error: PERMISSION ERROR: SIMPLE authentication is not enabled. 
Available:[TOKEN, KERBEROS]
HDFS + Kerberos integration isn't currently supported / tested / documented. Vote on this ticket to track when it becomes available:
https://issues.apache.org/jira/browse/DRILL-3584
There isn't any documentation that the Drill team provides about how to enable kerberos and they haven't tested kerberos with Drill. Drill Eng. does believe that it should work.
In order to gain access onto the cluster once Kerberized, you must configure certain files in order to gain access.
Make an HDFS Superuser account as indicated in this Cloudera doc. On the Main Node, run
•sudo kadmin.local
In addition, add an 'hdfs' principal with this command
•addprinc hdfs#LOCALDOMAIN -- Where localdomain is the principal name
In order to enable authentication with Kerberos, we also need to copy the file hadoop-yarn-api.jar into Drill's class path. Example given below
•cp /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/client/hadoop-yarn-api.jar ~/apache-drill/jars/
The above step and the three following must be performed on each node of the cluster that an Apache Drill is installed.
Next, Drill's conf/core-site.xml file should be edited to contain the following snippet of xml. You might have to copy this file from /etc/hadoop/conf.cloudera.yarn/core-site.xml, etc or a similar path.
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
After this step, you will also need to add the following xml snippet below to the drill core-site.xml file. In this instance, hdfs/_HOST#LOCALDOMAIN is my principal property. The property can be found on the hdfs-site.xml file
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST#LOCALDOMAIN</value>
</property>
All that is left to do is create an 'hdfs' Kerberos ticket for the user that we're logged into
•kinit hdfs -- hdfs is the super user
Then start up each of the drillbits
•/opt/apachedrillfolder/bin/Drillbit.sh start
So now, Drill has both the configuration and the authority to use our kerberized HDFS store. Give it a shot by opening up a Drill prompt (drill-conf) and trying a query

In moqui, configuration to use mysql and loading with seed data

In moqui, I am trying to configure to use mysql, commented out derby and uncommented mysql in defaultconf, I copied the connector to framework lib, included the dependency in framework build.gradle, on running load, I get this error - java.lang.reflect.InvocationTargetExceptionjavax.management.InstanceAlreadyExistsException: bitronix.tm:type=JDBC,UniqueName=DEFAULT_transactional_DS,Id=0 -- thanks for any help
Can you post a snippet of code you have modified in MoquiDefaultConf.xml and build.graddle file.
A viable alternative to configure MySQL with Moqui is by doing related setting in configuration files (i.e. MoquiDevConf.xml for development instance, MoquiStagingConf.xml for staging instance and MoquiProductionConf.xml for production instance.). Follow the steps below to configure MySQL with Moqui.
Since, May be you are trying to do some development, you need to make changes in MoquiDevConf.xml file only.
Replace the <entity-facade> code in MoquiDevConf.xml with the following code.
<entity-facade crypt-pass="MoquiDefaultPassword:CHANGEME">
<datasource group-name="transactional" database-conf-name="mysql" schema-name="">
<inline-jdbc jdbc-uri="jdbc:mysql://127.0.0.1:3306/MoquiTransactional?autoReconnect=true&useUnicode=true&characterEncoding=UTF-8"
jdbc-username="MYSQL_USER_NAME" jdbc-password="MYSQL_PASSWORD" pool-minsize="2" pool-maxsize="50"/>
</datasource>
</entity-facade>
In the code above 'MoquiDEFAULT' is the name of database. Replace the MYSQL_USER_NAME and MYSQL_PASSWORD with your MySQL username and password.
Create a database in MySQL (as per the code above, create the database with name MoquiTransactional).
Add the jdbc driver for MySQL in the runtime/lib directory.
In MoquiInit.properties file, set MoquiDevConf.xml file path to "moqui.conf" property i.e. moqui.conf=conf/MoquiDevConf.xml
Now just simply build, load and run.
To answer your question for loading seed data,
you can simply the run the gradle command gradle load -Ptypes=seed, this only loads the seed type data.
Without more details my best guess is that you have another instance of Bitronix running on the machine, by the UniqueName almost certainly another instance of Moqui running. Make sure no other instance is running, killing background processes if there are any, before starting your new instance.

where to specify the root directory of hadoop on slave node?

I need to setup a hadoop/hdfs cluster with one namenode and two datanodes. I am aware of conf/slaves file which lists the machines datanodes are running. But how can I specify where hadoop/hdfs is locally installed on slave node? Also the user account to start hdfs there?
Edit: in log files, I find following error, when I tried to start-dfs.sh
ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: file:///
The user is expected to be the same as on the master node. The location of the actual data can be modified by changing the dfs.data.dir node inhadoop-site.xml.