Apache Drill with Kerberos - apache-drill

Does anyone know how to enable kerberos with Apache Drill? Is it possible. I can't seem to find any documentation on it, or any questions/answers floating around with the information on it. I am currently running a CDH cluster.
I am getting this error when trying to use HDFS with Drill:
Error: PERMISSION ERROR: SIMPLE authentication is not enabled. 
Available:[TOKEN, KERBEROS]

HDFS + Kerberos integration isn't currently supported / tested / documented. Vote on this ticket to track when it becomes available:
https://issues.apache.org/jira/browse/DRILL-3584

There isn't any documentation that the Drill team provides about how to enable kerberos and they haven't tested kerberos with Drill. Drill Eng. does believe that it should work.

In order to gain access onto the cluster once Kerberized, you must configure certain files in order to gain access.
Make an HDFS Superuser account as indicated in this Cloudera doc. On the Main Node, run
•sudo kadmin.local
In addition, add an 'hdfs' principal with this command
•addprinc hdfs#LOCALDOMAIN -- Where localdomain is the principal name
In order to enable authentication with Kerberos, we also need to copy the file hadoop-yarn-api.jar into Drill's class path. Example given below
•cp /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/client/hadoop-yarn-api.jar ~/apache-drill/jars/
The above step and the three following must be performed on each node of the cluster that an Apache Drill is installed.
Next, Drill's conf/core-site.xml file should be edited to contain the following snippet of xml. You might have to copy this file from /etc/hadoop/conf.cloudera.yarn/core-site.xml, etc or a similar path.
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
After this step, you will also need to add the following xml snippet below to the drill core-site.xml file. In this instance, hdfs/_HOST#LOCALDOMAIN is my principal property. The property can be found on the hdfs-site.xml file
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST#LOCALDOMAIN</value>
</property>
All that is left to do is create an 'hdfs' Kerberos ticket for the user that we're logged into
•kinit hdfs -- hdfs is the super user
Then start up each of the drillbits
•/opt/apachedrillfolder/bin/Drillbit.sh start
So now, Drill has both the configuration and the authority to use our kerberized HDFS store. Give it a shot by opening up a Drill prompt (drill-conf) and trying a query

Related

how to set livy.server.session.timeout on EMR cluster boostrap?

I am creating an EMR cluster, and using jupyter notebook to run some spark tasks.
My tasks die after approximately 1 hour of execution, and the error is:
An error was encountered:
Invalid status code '400' from https://xxx.xx.x.xxx:18888/sessions/0/statements/20 with error payload: "requirement failed: Session isn't active."
My understanding is that it is related to the Livy config livy.server.session.timeout, but I don't know how I can set it in the bootstrap of the cluster (I need to do it in the bootstrap because the cluster is created with no ssh access)
Thanks a lot in advance
On EMR, livy-conf is the classification for the properties for livy's livy.conf file, so when creating an EMR cluster, choose advanced options with Livy as an application chosen to install, please pass this EMR configuration in the Enter Configuration field.
[{'classification': 'livy-conf','Properties': {'livy.server.session.timeout':'5h'}}]
On EMR, Livy binary is located at /etc/livy/, and so the config file is at /etc/livy/conf/livy.conf
To verify this,
Create an EMR cluster with a known ec2 key-pair, Livy and above config
Using the ec2 key-pair, login to the EC2 Master node associated with the cluster ssh -i some-ec2-key-pair.pem hadoop#ec2-00-00-00-0.ca-region-n.compute.amazonaws.com
Navigate to /etc/livy/conf, vim livy.conf & see the updated value of livy.server.session.timeout
If you don't want the Livy session to go down at all, then set the property livy.server.session.timeout-check to false in /etc/livy/conf/livy.conf.
Another way to do that if you don’t want to recreate the cluster is:
go to /etc/livy/conf/livy.conf and set the livy.server.session.timeout property to the value you would like.
After that, run sudo restart livy-server to make the configuration applied.

Specify JFROG_ACCESS home instead of ~/.jfrog_access (Artifactory 5.5.2)

I managed to set up artifactory using our existing tomcat. I have set to ARTIFACTORY_HOME=/opt/artifactory, that part works well. There is, however, also the jfrog access.war file, which needs to be running as well. I didn't figure out which variable to use to specify its home, therefore it defaults to ~/.jfrog_access, which is not at all what I like.
I moved the content over to my $ARTIFACTORY_HOME/access and symlinked it, but that's not the way to go for sure. Any help appreciated.
In case someone is stumbling over this thread and struggles with the same problem:
Solution for me was to also extract the Context files (access.xml and artifactory.xml which are available in the zip file under <zip extract>/misc/tomcat) to the Tomcat configuration folder, e.g. $CATALINA_HOME/conf/Catalina/localhost/. After that the $ARTIFACTORY_HOME env will be recognized on Access startup.
A previous answer finally put me on the right track for solving this problem on Amazon Linux.
In addition to copying access.xml and artifactory.xml to ${catalina.home}/host/MY_HOSTNAME, I found that some other changes were needed.
I modified the docBase attributes in the XML context files because my server has multiple hostnames:
/usr/share/tomcat8/conf/Catalina/repo.mydomain.org/access.xml
<Context path="/access" docBase="${catalina.home}/host/repo.mydomain.org/access.war">
<Parameter name="jfrog.access.bundled" value="true" override="true"/>
<!-- enable annotations scanning of access jar files -->
<JarScanner scanClassPath="false">
<JarScanFilter defaultPluggabilityScan="false" pluggabilityScan="access*" defaultTldScan="false"/>
</JarScanner>
</Context>
/usr/share/tomcat8/conf/Catalina/repo.mydomain.org/artifactory.xml
<Context crossContext="true" path="/artifactory" docBase="${catalina.home}/host/repo.mydomain.org/artifactory.war">
</Context>
Important Note: In order to prevent the above two XML files from being deleted by Tomcat Manager during upgrades via Undeploy/Deploy WAR, make sure they are owned by root and not writable by the tomcat user:
chown root.root access.xml artifactory.xml
chmod 644 access.xml artifactory.xml
If you forget to do the above, you will likely end up missing these files, which will break the communication between the access and artifactory web applications, resulting in login failures ("Username or Password Are Incorrect"). In this case, these errors result from the lack of communication between the web applications, not a problem with the credentials themselves.
/usr/share/tomcat8/conf/Catalina/repo.mydomain.org/manager.xml
This gives me the ability to upload new versions of access.war and artifactory.war via https://repo.mydomain.org:8443/manager/html:
<Context docBase="${catalina.home}/webapps/manager" privileged="true" antiResourceLocking="false">
</Context>
Additionally, I created the following folder to serve as the artifactory.home:
sudo mkdir /usr/share/artifactory
sudo chown tomcat.tomcat /usr/share/artifactory
tomcat8.conf
Add (or modify) the following line:
JAVA_OPTS="-Dartifactory.home=/usr/share/artifactory -Djfrog.access.home=/usr/share/artifactory/access -Dartifactory.access.client.serverUrl.override=http://localhost:8080/access"
Note: The Access Client URL specified above must use localhost in order to avoid the Server HTTP parameter from being overwritten by Apache and its modules. For instance, if I use:
https://repo.mydomain.org/access/api/v1/system/ping
The Server HTTP header value in the response is:
Server: Apache/2.4.33 (Amazon) OpenSSL/1.0.2k-fips mod_jk/1.2.43
And the Access Client produces the following exception:
[ERROR] (o.j.a.c.AccessClientImpl:154) - Access client/server version mismatch. Client version: 4.1.5, Server version: 2.4.33 (Amazon) OpenSSL
Which means the Access Client is depending on the first string matching #.#.# in the server header. This seems like a really fragile part of the Access Client. They should have used X-JFrog-Access-Server or something instead of trying to control a value that is set by the web server. So, to reiterate, use http://localhost:8080/access to connect directly to the tomcat server.
Artifactory 6.2.0 depends on Apache Derby (the specific version can be found in jfrog-artifactory-oss-6.2.0.zip\artifactory-oss-6.2.0\tomcat\lib). This should be added as a shared library to Tomcat:
mkdir /usr/share/tomcat8/shared
cd /usr/share/tomcat8/shared
wget http://central.maven.org/maven2/org/apache/derby/derby/10.11.1.1/derby-10.11.1.1.jar
Add or modify the following line in catalina.properties:
shared.loader=${catalina.home}/shared/*.jar
Since we want https://repo.mydomain.org to go to the Artifactory webapp:
mkdir /usr/share/tomcat8/host/repo.mydomain.org/ROOT
echo '<html><head><meta http-equiv="refresh" content="0;URL=/artifactory"></meta></head><body></body></html>' > /usr/share/tomcat8/host/repo.mydomain.org/ROOT/index.html
And make sure the services automatically start on reboot:
sudo chkconfig httpd on
sudo chkconfig tomcat8 on
Artifactory will then be available at the url:
https://repo.mydomain.org/artifactory/webapp/

Questions on starting Locator using snappydata/bin> ./spark-shell.sh script

Spark v. 0.5
Here's the command I used to start a Locator:
ubuntu#ip-172-31-8-115:/snappydata-0.5-bin/bin$ ./snappy-shell locator start
Starting SnappyData Locator using peer discovery on:
0.0.0.0[10334] Starting DRDA server for SnappyData at address localhost/127.0.0.1[1527]
Logs generated in /snappydata-0.5-bin/bin/snappylocator.log
SnappyData Locator pid: 9352 status: running
It looks like it starts the DRDA server locally, with no outside interface for a client to connect to. So, I cannot reach my SnappyData Locator using this JDBC URL from an outside client host (e.g. my SquirrelSQL editor).
This does not connect:
jdbc:snappydata://MY-AWS-PUBLIC-IP-HERE:1527/
What property do I pass my ./snappy-shell.sh location start command to get the DRDA Server to start on a public IP address instead of "localhost/127.0.0.1"?
Use -client-bind-address and -client-port options. For locator also use the -peer-discovery-address and -peer-discovery-port options to specify bind address for other locators/servers/leads (that are passed to their -locators=<address>:<port>):
snappy-shell locator start -peer-discovery-address=<internal IP for peers> -client-bind-address=<public IP for clients>
See the output of snappy-shell locator --help for commonly used options.
For SnappyData releases, you may find it much easier to use the global configuration for all of the locators, servers, leads. Check configuring the cluster.
This will allow specifying all options for all JVMs of the cluster in conf/locators, conf/leads, conf/servers then starting with snappy-start-all.sh, status with snappy-status-all.sh and stop all with snappy-stop-all.sh
On a related note, we at SnappyData Inc., are developing scripts to enable users quickly launch SnappyData cluster on AWS.
If you want to try it out, below steps would guide you. We would love to hear your feedback on this.
Download its development branch git clone https://github.com/SnappyDataInc/snappydata.git -b SNAP-864 (You don't need to clone the repo for this, but I could not find a way to attach the scripts here.)
Go to ec2 directory cd snappydata/cluster/ec2
Run snappy-ec2. ./snappy-ec2 -k ec2-keypair-name -i /path/to/keypair/private/key/file launch your-cluster-name
See this README for more details.

In moqui, configuration to use mysql and loading with seed data

In moqui, I am trying to configure to use mysql, commented out derby and uncommented mysql in defaultconf, I copied the connector to framework lib, included the dependency in framework build.gradle, on running load, I get this error - java.lang.reflect.InvocationTargetExceptionjavax.management.InstanceAlreadyExistsException: bitronix.tm:type=JDBC,UniqueName=DEFAULT_transactional_DS,Id=0 -- thanks for any help
Can you post a snippet of code you have modified in MoquiDefaultConf.xml and build.graddle file.
A viable alternative to configure MySQL with Moqui is by doing related setting in configuration files (i.e. MoquiDevConf.xml for development instance, MoquiStagingConf.xml for staging instance and MoquiProductionConf.xml for production instance.). Follow the steps below to configure MySQL with Moqui.
Since, May be you are trying to do some development, you need to make changes in MoquiDevConf.xml file only.
Replace the <entity-facade> code in MoquiDevConf.xml with the following code.
<entity-facade crypt-pass="MoquiDefaultPassword:CHANGEME">
<datasource group-name="transactional" database-conf-name="mysql" schema-name="">
<inline-jdbc jdbc-uri="jdbc:mysql://127.0.0.1:3306/MoquiTransactional?autoReconnect=true&useUnicode=true&characterEncoding=UTF-8"
jdbc-username="MYSQL_USER_NAME" jdbc-password="MYSQL_PASSWORD" pool-minsize="2" pool-maxsize="50"/>
</datasource>
</entity-facade>
In the code above 'MoquiDEFAULT' is the name of database. Replace the MYSQL_USER_NAME and MYSQL_PASSWORD with your MySQL username and password.
Create a database in MySQL (as per the code above, create the database with name MoquiTransactional).
Add the jdbc driver for MySQL in the runtime/lib directory.
In MoquiInit.properties file, set MoquiDevConf.xml file path to "moqui.conf" property i.e. moqui.conf=conf/MoquiDevConf.xml
Now just simply build, load and run.
To answer your question for loading seed data,
you can simply the run the gradle command gradle load -Ptypes=seed, this only loads the seed type data.
Without more details my best guess is that you have another instance of Bitronix running on the machine, by the UniqueName almost certainly another instance of Moqui running. Make sure no other instance is running, killing background processes if there are any, before starting your new instance.

where to specify the root directory of hadoop on slave node?

I need to setup a hadoop/hdfs cluster with one namenode and two datanodes. I am aware of conf/slaves file which lists the machines datanodes are running. But how can I specify where hadoop/hdfs is locally installed on slave node? Also the user account to start hdfs there?
Edit: in log files, I find following error, when I tried to start-dfs.sh
ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: file:///
The user is expected to be the same as on the master node. The location of the actual data can be modified by changing the dfs.data.dir node inhadoop-site.xml.