where to specify the root directory of hadoop on slave node? - configuration

I need to setup a hadoop/hdfs cluster with one namenode and two datanodes. I am aware of conf/slaves file which lists the machines datanodes are running. But how can I specify where hadoop/hdfs is locally installed on slave node? Also the user account to start hdfs there?
Edit: in log files, I find following error, when I tried to start-dfs.sh
ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: file:///

The user is expected to be the same as on the master node. The location of the actual data can be modified by changing the dfs.data.dir node inhadoop-site.xml.

Related

how to set livy.server.session.timeout on EMR cluster boostrap?

I am creating an EMR cluster, and using jupyter notebook to run some spark tasks.
My tasks die after approximately 1 hour of execution, and the error is:
An error was encountered:
Invalid status code '400' from https://xxx.xx.x.xxx:18888/sessions/0/statements/20 with error payload: "requirement failed: Session isn't active."
My understanding is that it is related to the Livy config livy.server.session.timeout, but I don't know how I can set it in the bootstrap of the cluster (I need to do it in the bootstrap because the cluster is created with no ssh access)
Thanks a lot in advance
On EMR, livy-conf is the classification for the properties for livy's livy.conf file, so when creating an EMR cluster, choose advanced options with Livy as an application chosen to install, please pass this EMR configuration in the Enter Configuration field.
[{'classification': 'livy-conf','Properties': {'livy.server.session.timeout':'5h'}}]
On EMR, Livy binary is located at /etc/livy/, and so the config file is at /etc/livy/conf/livy.conf
To verify this,
Create an EMR cluster with a known ec2 key-pair, Livy and above config
Using the ec2 key-pair, login to the EC2 Master node associated with the cluster ssh -i some-ec2-key-pair.pem hadoop#ec2-00-00-00-0.ca-region-n.compute.amazonaws.com
Navigate to /etc/livy/conf, vim livy.conf & see the updated value of livy.server.session.timeout
If you don't want the Livy session to go down at all, then set the property livy.server.session.timeout-check to false in /etc/livy/conf/livy.conf.
Another way to do that if you don’t want to recreate the cluster is:
go to /etc/livy/conf/livy.conf and set the livy.server.session.timeout property to the value you would like.
After that, run sudo restart livy-server to make the configuration applied.

Elasticsearch 5.0.0. cluster node not joining

Ok this shouldn't be this hard, I'm trying to run 2 nodes in an elasticsearch cluster and getting an exception when trying to start node-1(node-2 which is master is already started). Using elasticsearch v 5.0.0 for both instances
Exception: failed to send join request to master, reason RemoteTransportException can't add node found existing node with the same id but is a different node instance]
Node-1 config:
node.name: SANNNNN-1
network.host: 10.3.185.250
discovery.zen.ping.unicast.hosts: ["10.3.185.251:9300"]
Node-2 config:
node.name: SAN-2
network.host: 10.3.185.251
discovery.zen.ping.unicast.hosts: ["10.3.185.251:9300"]
Full Exception on node 2:
[INFO ][o.e.d.z.ZenDiscovery ] [SANNNNN-1] failed to send join request to master [{SAN-2}{DxExoYHHTu2-rFvuvQSuEg}{OfYBe97HQCmcha63CFiYlQ}{10.3.185.251}{10.3.185.251:9300}], reason [RemoteTransportException[[SAN-2][10.3.185.251:9300][internal:discovery/zen/join]]; nested: IllegalArgumentException[can't add node {SANNNNN-1}{DxExoYHHTu2-rFvuvQSuEg}{hP4gLRugRgWzSuNnEhGHSw}{10.3.185.250}{10.3.185.250:9300}, found existing node {SAN-2}{DxExoYHHTu2-rFvuvQSuEg}{OfYBe97HQCmcha63CFiYlQ}{10.3.185.251}{10.3.185.251:9300} with the same id but is a different node instance]; ]
Ok so the issue was copying the elasticsearch folder from one node to another over scp. Elasticsearch saves the node id in elasticsearch/data/ folder. Deleted the data folder on one node and restarted it. The cluster is up and running. Hope this saves someone the hassle.
Remove the directory <Elastic search home>/data and restart the ES node, this issue is due to elastic search storing id in this directory, and this is a common mistake when copying one working elastic search directory from one node to another.
after fixing the issue, check the cluster status like this:
curl -X GET "localhost:9200/_cluster/health"
works fine with elastic search 6 as well
I had the same issue after cloning a Data Node in Azure. I ended up finding the Data file by starting in the root folder:
/datadisks/disk1/elasticsearch/data
I kept reading that others found the folder elsewhere and wanted to share here.

Apache Geode Configuration

I had a problem trying to get Apache Geode (v1.0.0-incubating.M2) running on Linux.
The problem was: while I was trying to run gfsh start server --name=server1 example command from the documentation it gave me the following error:
Exception in thread "main" com.gemstone.gemfire.InternalGemFireError: Cannot resolve local host name to an IP address.
It turns out that you need to have your hostname (given by output of hostname command) be present in /etc/hosts file.
In my case, hostname gives an alias as an output (let's say my_alias), so I solved the problem by adding my_ip my_full_domain my_alias line to /etc/hosts.

In moqui, configuration to use mysql and loading with seed data

In moqui, I am trying to configure to use mysql, commented out derby and uncommented mysql in defaultconf, I copied the connector to framework lib, included the dependency in framework build.gradle, on running load, I get this error - java.lang.reflect.InvocationTargetExceptionjavax.management.InstanceAlreadyExistsException: bitronix.tm:type=JDBC,UniqueName=DEFAULT_transactional_DS,Id=0 -- thanks for any help
Can you post a snippet of code you have modified in MoquiDefaultConf.xml and build.graddle file.
A viable alternative to configure MySQL with Moqui is by doing related setting in configuration files (i.e. MoquiDevConf.xml for development instance, MoquiStagingConf.xml for staging instance and MoquiProductionConf.xml for production instance.). Follow the steps below to configure MySQL with Moqui.
Since, May be you are trying to do some development, you need to make changes in MoquiDevConf.xml file only.
Replace the <entity-facade> code in MoquiDevConf.xml with the following code.
<entity-facade crypt-pass="MoquiDefaultPassword:CHANGEME">
<datasource group-name="transactional" database-conf-name="mysql" schema-name="">
<inline-jdbc jdbc-uri="jdbc:mysql://127.0.0.1:3306/MoquiTransactional?autoReconnect=true&useUnicode=true&characterEncoding=UTF-8"
jdbc-username="MYSQL_USER_NAME" jdbc-password="MYSQL_PASSWORD" pool-minsize="2" pool-maxsize="50"/>
</datasource>
</entity-facade>
In the code above 'MoquiDEFAULT' is the name of database. Replace the MYSQL_USER_NAME and MYSQL_PASSWORD with your MySQL username and password.
Create a database in MySQL (as per the code above, create the database with name MoquiTransactional).
Add the jdbc driver for MySQL in the runtime/lib directory.
In MoquiInit.properties file, set MoquiDevConf.xml file path to "moqui.conf" property i.e. moqui.conf=conf/MoquiDevConf.xml
Now just simply build, load and run.
To answer your question for loading seed data,
you can simply the run the gradle command gradle load -Ptypes=seed, this only loads the seed type data.
Without more details my best guess is that you have another instance of Bitronix running on the machine, by the UniqueName almost certainly another instance of Moqui running. Make sure no other instance is running, killing background processes if there are any, before starting your new instance.

UnknownHostException while formatting HDFS

I have installed CDH4 on CentOS 6.3 64-bit in Pseudo Distributed mode using the following instructions. Everything is set to localhost in the Hadoop configuration files. But, still when I format the name node the below exception appears. When I add an 192.168.1.101 CentOSHost entry to the /etc/hosts file the exception goes away and I am able to run format/start HDFS and run MR jobs.
I want to run MR jobs even when I am not connected to the network without adding an entry to the /etc/hosts file. How to get this done?
12/08/27 22:17:15 WARN net.DNS: Unable to determine address of the host-falling back to "localhost" address
java.net.UnknownHostException: CentOSHost: CentOSHost
at java.net.InetAddress.getLocalHost(InetAddress.java:1360)
at org.apache.hadoop.net.DNS.resolveLocalHostIPAddress(DNS.java:283)
at org.apache.hadoop.net.DNS.(DNS.java:59)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newBlockPoolID(NNStorage.java:1017)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newNamespaceInfo(NNStorage.java:565)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:145)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:724)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1095)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1193)
It looks like some where the configuration is returning/ using the hostname as CentOSHost.
What does hostname --fqdn returns to you?
For Hadoop, it is important that name look-up and reverse look-up work successfully. You should be able to resolve the ip-address and resolve hostname from the ip-address (Reverse resolution). This can be tested using the above command.
The entry to /etc/hosts is required for the reverse resolution to work. Unless the entry and the configuration are pointing to localhost. Even in that case the hostname --fqdn should return as localhost.