Adding Spark CSV dependency to Zeppelin - csv

I'm running an EMR with a spark cluster on AWS.
Spark version is 1.6
When running the folllowing command:
proxy = sqlContext.read.load("/user/zeppelin/ProxyRaw.csv",
format="com.databricks.spark.csv",
header="true",
inferSchema="true")
I get the following error:
Py4JJavaError: An error occurred while calling o162.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at
http://spark-packages.org
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
How can I solve this? I assume I should add a package but how do I install it and where?

There is many way to add packages in Zeppelin :
One of them is to actually change the conf/zeppelin-env.sh configuration file adding the packages you need e.g com.databricks:spark-csv_2.10:1.4.0 in your case to the submit options since Zeppelin uses the spark-submit command under the hood :
export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.4.0"
But let's say that you don't have actually access to those configuration. You can then use Dynamic Dependency Loading via %dep interpreter (deprecated) :
%dep
z.load("com.databricks:spark-csv_2.10:1.4.0")
This will require that you load the dependencies before launching or restarting the interpreter.
Another way to do it is do add the dependency you need via the interpreter dependency manager as described in the following link : Dependency Management for Interpreter.

Well,
First you need to download the CSV liv from Maven repository:
https://mvnrepository.com/artifact/com.databricks/spark-csv_2.10/1.5.0
Check the scala version that you are using. If is 2.10 or 2.11.
When you call spark-shell our spark-submit or pyspark. Or even your Zeppelin you need to add the option --jars and the path to your lib.
Like this:
pyspark --jars /path/to/jar/spark-csv_2.10-1.5.0.jar
Than you can call it as you did above.
You can see other close issue here: How to add third party java jars for use in pyspark

Related

java.lang.NoSuchFieldError: defaultReader (JsonSmartJsonProvider.java:39)

I am using json-path-2.4.0 library in spark jobs which has a dependency on json-smart 2.x , but the spark jars default classpath folder (/usr/hdp/2.6.5.0-292/spark2/jars/) has json-smart 1.x which always gets precedence and I am unable to use the json-path 2.x library.
Facing the below error everytime I run :
java.lang.NoSuchFieldError: defaultReader
at com.jayway.jsonpath.spi.json.JsonSmartJsonProvider.(JsonSmartJsonProvider.java:39)
at com.jayway.jsonpath.internal.DefaultsImpl.jsonProvider(DefaultsImpl.java:21)
at com.jayway.jsonpath.Configuration.defaultConfiguration(Configuration.java:174)
Similar issue has been reported earlier :
JSON Path 2.3.0 conflicts with hadoop 2.7 Environment JSON-smart1.2.0.jar
But havent found any working solution. Please help.

How to setup Robot Framework standalone jar with SwingLibrary?

I'm using Robot Framework with SwingLibrary to test a Java Swing based application. Since I'm not used to Python and also don't want to setup the Python environment, I decided to go with the Robot standalone JAR version (current version 2.8.4).
My problem is the setup in combination with SwingLibrary (version 1.8.0). I don't know where to put the library such that it gets recognized by Robot.
So far, I have the following test case (mytest.txt):
*** Settings ***
Library SwingLibrary
*** Test Cases ***
MyTestCase
Start Application MyApp
I tried with putting the standalone jar in conjunction with the test case in a folder, and created one subfolder (called it Lib) where I put the SwingLibrary JAR (and later also extracted the JAR).
I added the SwingLibrary as well as my own application to the classpath, tried executing robot the following way:
java -Xbootclasspath/a:Lib/swinglibrary-1.8.0.jar:Lib/MyApp.jar -jar robotframework-2.8.4.jar mytest.txt
and also with
java -jar robotframework-2.8.4.jar mytest.txt
I always get one of the following errors:
[ WARN ] Imported library 'SwingLibrary' contains no keywords
==============================================================================
Mytest
==============================================================================
MyTestCase | FAIL |
No keyword with name 'Start Application' found.
or
[ ERROR ] Error in file 'mytest.txt': Importing test library 'SwingLibrary' failed: ImportError: No module named SwingLibrary
You can use the standalone jar without the -jar option, allowing you to specify the classpath in the standard manner. The main class for the standalone jar is org.robotframework.RobotFramework, so the syntax would be
java -cp robotframework-2.8.4.jar:Lib/swinglibrary-1.8.0.jar:Lib/MyApp.jar org.robotframework.RobotFramework
Slightly more verbose but it's standard and so avoids any oddnesses caused by using the non-standard -Xbootclasspath option.

JDBC driver not found error in monkeyrunner/jython

I need to Insert something in the DB. im using JDBC as a connector, jython the script, mysql the DB and the script is running in CentOS.
my code looks something like this:
from com.android.monkeyrunner import MonkeyRunner, MonkeyDevice, MonkeyImage
from com.ziclix.python.sql import zxJDBC
db=zxJDBC.connect("jdbc:mysql://XXX.XXX.XXX.XXX:3306/dbname","USER","PASSWORD","org.gjt.mm.mysql.Driver")
c=db.cursor() c.execute("INSERT INTO tablename values ('X','X','X')")
before that, I downloaded and decompressed the file from here (in the desktop)
I added the path to classpath by doing this
export PATH=/home/XX/Desktop/mysql-connector-java-5.1.22
and when I ran the script, it gave me this error
zxJDBC.DatabaseError.driver [org.gjt.mm.mysql.Driver] not found
what have I done wrong? is the name of the driver name correct? because I just copied it in one of the tutorials that I've seen. or probably did I install the driver correctly?
Thanks.
this is how I managed to solve the error:
Download the JDBC driver here
Extract the tar.gz file anywhere you want.
You will find mysql-connector-java-5.1.22-bin.jar inside that folder. Copy that and paste to (in my case) /%android-sdk%/tools/lib
Add the new location of mysql-connector-java-5.1.22-bin.jar to classpath
do the script like this
from com.android.monkeyrunner import MonkeyRunner, MonkeyDevice,
MonkeyImage
from com.ziclix.python.sql import zxJDBC
db=zxJDBC.connect("jdbc:mysql://XXX.XXX.XXX.XXX:3306/dbname","USER","PASSWORD","com.mysql.jdbc.Driver")
c=db.cursor()
c.execute("INSERT INTO tablename values ('X','X','X')")
db.commit()
Hope this helps to those who will need it in the future. :)
How are you running jython? If you're using the standalone install, i.e. java -jar jython.jar, then from the Java Documentation ...
-jar
When you use this option, the JAR file is the source of all user classes, and other user class path settings are ignored.
... you can't add anything to the classpath. Repackaging the required classes into the jython jar is one approach or this answer has an alternative solution - basically add the jython.jar to the classpath too (either using -cp or CLASSPATH) and run the org.python.util.jython class directly.
I got the sample problem in windows7,I slove this problem by this:
download the JDBC driver
add the mysql-connector-java-ver-bin.jar to envionment variables:
such as:
CLASSPATH : C:\xxx-path\mysql-connector-java-5.1.41-bin.jar
then I slove this problem

Jruby log4j integration

I am currently working on integrating Java application General Architecture For Text Engineering (GATE) with a Rails application using JRuby architecture. When we worked on integrating JRuby with log4j, I am getting following error:
0 [main] DEBUG Main.class - Hello world
gate/Gate.java:80:in `<clinit>': java.lang.NoClassDefFoundError: org/apache/log4
j/Logger (NativeException)
from gateapp/Main.java:86:in `main'
from test.rb:12
test.rb is the name of ruby program.
I tried importing all the log4j apache libraries, and included the class file in the test.rb file.
When I run the Java program alone its running fine. But when I generate the jar file and include them in Ruby file (test.rb) , I am getting this error
java.lang.NoClassDefFoundError: org/apache/log4j/Logger (NativeException) problem is occuring. How to deal with this problem ?
You need to make sure the log4j JAR file is in your classpath. One way to do this is to set the CLASSPATH variable in your environment. Another way would be to call require in your ruby code like
require "/some/path/MyStuff.jar"
Here is my config to set it up with couchbase Java SDK
include Java
def setup_log4j
java::lang.System.setProperty("net.spy.log.LoggerImpl", "net.spy.memcached.compat.log.Log4JLogger")
fa = Java::OrgApacheLog4j::FileAppender.new();
fa.setName("FileLogger");
fa.setFile("./log/#{Rails.env}.log");
fa.setLayout(Java::OrgApacheLog4j::PatternLayout.new("%d %-5p [%c{1}] %m%n"));
fa.setThreshold(Java::OrgApacheLog4j::Level::INFO);
fa.setAppend(true);
fa.activateOptions();
Java::OrgApacheLog4j::Logger.getRootLogger().addAppender(fa)
end
Just beware that I required the lo4j.jar file earlier.
Worth to mention that there is project named log4jruby.

How to register a JDBC driver using jruby-complete.jar?

I'm trying to write a script that is executed with the jruby-complete.jar like so:
java -cp derby.jar; -Djdbc.drivers=org.apache.derby.jdbc.EmbeddedDriver -jar jruby-complete.jar -S my_script.rb
I'm using JVM 1.6.0_11 and JRuby 1.4.
In my jruby script I attempt to connect to the database like this.
connection = Java::com.sql.DriverManager.getConnection("jdbc:derby:path_to_my_DB")
This throws a java.sql.SQLException: "No suitable driver found" exception.
I've tried manually loading the driver into the class loader using Class.forName which gives me the same error.
It looks like to me that the class loader being used by the DriverManager is not the same as the current thread's. I've tried setting the current thread's class loader using:
JThread = java.lang.Thread
...
class_loader = JavaLang::URLClassLoader.new(
[JavaLang::URL.new("jar:file:/derby.jar!/")].to_java(
JavaLang::URL),JRuby.runtime.jruby_class_loader)
JThread.currentThread().setContextClassLoader(class_loader )
But this doesn't help.
Any ideas?
OK I downloaded jruby-complete.jar and had a go....
This seems to work for me:
java -classpath c:\ruby\db-derby-10.5.3.0-bin\lib\derby.jar;jruby-complete-1.4.0.jar org.jruby.Main -S derby.rb
When using the -jar switch, the -classpath option is ignored (maybe the CLASSPATH shell var is too). But on the above line, we put both required jars on the class path and pass the class name to execute (i.e. org.jruby.Main). The script being passed in is as per my other answer.
Another option (which I have not tried) would be to alter the jruby-complete.jar manifest file to specify as classpath, as described here:
Adding Classes to the JAR File's Classpath
First, make sure your driver jar is not corrupted (this made me waste a couple of days one time).
Second, read this about JRuby/Java classloade: JRuby Wiki
Third (because I haven't played with "jruby-complete") try this simple script and then see if you can adapt as you need.
require 'java'
require 'C:\ruby\db-derby-10.5.3.0-bin\lib\derby.jar' # adjust for your machine
include_class "java.sql.DriverManager"
derby = org.apache.derby.jdbc.EmbeddedDriver.new
connection = DriverManager.getConnection("jdbc:derby:derbyDB;create=true")