No suitable driver found for jdbc in Spark - mysql

I am using
df.write.mode("append").jdbc("jdbc:mysql://ip:port/database", "table_name", properties)
to insert into a table in MySQL.
Also, I have added Class.forName("com.mysql.jdbc.Driver") in my code.
When I submit my Spark application:
spark-submit --class MY_MAIN_CLASS
--master yarn-client
--jars /path/to/mysql-connector-java-5.0.8-bin.jar
--driver-class-path /path/to/mysql-connector-java-5.0.8-bin.jar
MY_APPLICATION.jar
This yarn-client mode works for me.
But when I use yarn-cluster mode:
spark-submit --class MY_MAIN_CLASS
--master yarn-cluster
--jars /path/to/mysql-connector-java-5.0.8-bin.jar
--driver-class-path /path/to/mysql-connector-java-5.0.8-bin.jar
MY_APPLICATION.jar
It doens't work. I also tried setting "--conf":
spark-submit --class MY_MAIN_CLASS
--master yarn-cluster
--jars /path/to/mysql-connector-java-5.0.8-bin.jar
--driver-class-path /path/to/mysql-connector-java-5.0.8-bin.jar
--conf spark.executor.extraClassPath=/path/to/mysql-connector-java-5.0.8-bin.jar
MY_APPLICATION.jar
but still get the "No suitable driver found for jdbc" error.

I had to add the driver option when using the sparkSession's read function.
.option("driver", "org.postgresql.Driver")
var jdbcDF - sparkSession.read
.option("driver", "org.postgresql.Driver")
.option("url", "jdbc:postgresql://<host>:<port>/<DBName>")
.option("dbtable", "<tableName>")
.option("user", "<user>")
.option("password", "<password>")
.load()
Depending on how your dependencies are setup, you'll notice that when you include something like compile group: 'org.postgresql', name: 'postgresql', version: '42.2.8' in Gradle, for example, this will include the Driver class at org/postgresql/Driver.class, and that's the one you want to instruct spark to load.

There is 3 possible solutions,
You might want to assembly you application with your build manager (Maven,SBT) thus you'll not need to add the dependecies in your spark-submit cli.
You can use the following option in your spark-submit cli :
--jars $(echo ./lib/*.jar | tr ' ' ',')
Explanation : Supposing that you have all your jars in a lib directory in your project root, this will read all the libraries and add them to the application submit.
You can also try to configure these 2 variables : spark.driver.extraClassPath and spark.executor.extraClassPath in SPARK_HOME/conf/spark-default.conf file and specify the value of these variables as the path of the jar file. Ensure that the same path exists on worker nodes.

I tried the suggestions shown here which didn't work for me (with mysql). While debugging through the DriverManager code, I realized that I needed to register my driver since this was not happening automatically with "spark-submit". I therefore added
Driver driver = new Driver();
The constructor registers the driver with the DriverManager, which solved the SQLException problem for me.

Related

How to select spark library with our --jars?

There are multiple version mysql connector library.
/usr/share/java/mysql-connector-java-5.1.46.jar
/usr/share/java/mysql-connector-java.jar
/usr/share/java/mariadb-connector-java.jar
/usr/share/java/mysql-connector-java-8.0.24.jar
I added external jar library blow path. (spark-default.xml)
- spark.driver.extraClassPath : ~~~:/usr/share/java/*
- spark.executor.extraClassPath : ~~~:/usr/share/java/*
If I run spark-submit command without --jars {specific mysql connector}, what is mysql connector version ? Where can i find? (spark history server?)
ex)
jdbc = spark.read.format("jdbc")\
.option("driver", "com.mysql.jdbc.Driver")\
.option("url", "jdbc:mysql://url:3306/db")\
.option("user", "XXX")\
.option("password", "XXX")\
.option("dbtable", "table")\
.load()
jdbc.show()
The environment tab of the Spark Web UI contains a section "Classpath entries". Here you can find all jars that have been added to the classpath of the currently running Spark application and so identify the jar (and the version) of the mysql connector.
If a history server is running, it's Web UI also contains the same information after the Spark job has finished.

Adding Spark CSV dependency to Zeppelin

I'm running an EMR with a spark cluster on AWS.
Spark version is 1.6
When running the folllowing command:
proxy = sqlContext.read.load("/user/zeppelin/ProxyRaw.csv",
format="com.databricks.spark.csv",
header="true",
inferSchema="true")
I get the following error:
Py4JJavaError: An error occurred while calling o162.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at
http://spark-packages.org
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
How can I solve this? I assume I should add a package but how do I install it and where?
There is many way to add packages in Zeppelin :
One of them is to actually change the conf/zeppelin-env.sh configuration file adding the packages you need e.g com.databricks:spark-csv_2.10:1.4.0 in your case to the submit options since Zeppelin uses the spark-submit command under the hood :
export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.4.0"
But let's say that you don't have actually access to those configuration. You can then use Dynamic Dependency Loading via %dep interpreter (deprecated) :
%dep
z.load("com.databricks:spark-csv_2.10:1.4.0")
This will require that you load the dependencies before launching or restarting the interpreter.
Another way to do it is do add the dependency you need via the interpreter dependency manager as described in the following link : Dependency Management for Interpreter.
Well,
First you need to download the CSV liv from Maven repository:
https://mvnrepository.com/artifact/com.databricks/spark-csv_2.10/1.5.0
Check the scala version that you are using. If is 2.10 or 2.11.
When you call spark-shell our spark-submit or pyspark. Or even your Zeppelin you need to add the option --jars and the path to your lib.
Like this:
pyspark --jars /path/to/jar/spark-csv_2.10-1.5.0.jar
Than you can call it as you did above.
You can see other close issue here: How to add third party java jars for use in pyspark

How to add a JDBC driver to a Jenkins pipeline?

I want to create a database within a pipeline script to be used by the deployed app. But first I started testing the connection. I got this problem:
java.sql.SQLException: No suitable driver found for jdbc:mysql://mysql:3306/test_db
I have the database plugin and the MySQL database plugin installed.
How do I get the JDBC driver?
import groovy.sql.Sql
node{
def sql = Sql.newInstance("jdbc:mysql://mysql:3306/test_db", "user","passwd", "com.mysql.jdbc.Driver")
def rows = sql.execute "select count(*) from test_table;"
echo rows.dump()
}
Update after albciff answer:
My versions of:
Jenkins = 2.19.1
Database plugin = 1.5
Mysql database plugin = 1.1
The latest test script.
import groovy.sql.Sql
Class.forName("com.mysql.jdbc.Driver")
Which throws:
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
From the MySQL DataBase Plugin documentation you can see that jdbc drivers for MySQL are included:
Note that MySQL JDBC driver is under GPLv2 with FOSS exception. This
plugin by itself qualifies under the FOSS exception, but if you are
redistributing this plugin, please do check the license terms.
Drizzle(+MySQL) Database Plugin is available as an alternative to this
plugin, and that one is under the BSD license.
More concretely the actual last version (1.1) for this plugin contains connector version 5.1.38:
Version 1.1 (May 21, 2016) mysql-connector version 5.1.38
So probably in order to have the driver available you have to force the driver to be registered.
To do so use Class.forName("com.mysql.jdbc.Driver") before instantiate the connection in your code:
import groovy.sql.Sql
node{
Class.forName("com.mysql.jdbc.Driver")
def sql = Sql.newInstance("jdbc:mysql://mysql:3306/test_db", "user","passwd", "com.mysql.jdbc.Driver")
def rows = sql.execute "select count(*) from test_table;"
echo rows.dump()
}
UPDATE:
In order to has the JDBC connector classes available in the Jenkins pipeline groovy scripts you need to update the DataBase plugin to last currently version:
Version 1.5 (May 30, 2016) Pipeline Support
You can simply add the java connector in the java class path.
If jenkins is running java < 9 you probably will find the right place inside something like that:
<java_home>/jre/lib/ext
If jenkins is running java >= 9 you probably will find the right place inside something like that:
/usr/share/jenkins/jenkins.war
To find your paths you can check:
http://your.jenkins.host/systemInfo (or navigate system info path by GUI) and search for java.ext.dirs or java.class.path
http://your.jenkins.host/script (running console script such as System.getProperty("java.ext.dirs") or System.getProperty("java.class.path"))
This snippet can help you with the jenkins.war thing when running inside docker:
#adding extra jars to default jenkins java classpath (/usr/share/jenkins/jenkins.war)
RUN sudo mkdir -p /usr/share/jenkins/WEB-INF/lib/
RUN whereis jar #just to find full jar command classpath to use with sudo
COPY ./jar-ext/groovy/mysql-connector-java-8.0.21.jar /usr/share/jenkins/WEB-INF/lib/
RUN cd /usr/share/jenkins && sudo /opt/java/openjdk/bin/jar -uvf jenkins.war ./WEB-INF/lib/mysql-connector-java-8.0.21.jar
For Jenkins running on Java >= 9 add the jdbc drivers under ${JENKINS_HOME}/war/WEB-INF/lib and under the --webroot directory.

How to use Spark DataFrame with MySQL

OK, I have known that I could use jdbc connector to create DataFrame with this command:
val jdbcDF = sqlContext.load("jdbc",
Map("url" -> "jdbc:mysql://localhost:3306/video_rcmd?user=root&password=123456",
"dbtable" -> "video"))
But I got this error: java.sql.SQLException: No suitable driver found for ...
And I have tried to add jdbc jar to spark_path with both commands but failed:
spark-shell --jars mysql-connector-java-5.0.8-bin.jar
SPARK_CLASSPATH=mysql-connector-java-5.0.8-bin.jar spark-shell
My Spark version is 1.3.0 while Class.forName("com.mysql.jdbc.Driver").newInstance is worked.
It is caused because the data frame does the find the the Mysql Connector Jar in the class path. This can be resolved by adding the jar to the spark class path as below:
Edit /spark/bin/compute-classpath.sh as
CLASSPATH="$CLASSPATH:$ASSEMBLY_JAR:yourPathToJar/mysql-connector-java-5.0.8-bin.jar"
Save the file and Restart the spark.
You might want to try mysql-connector-java-5.1.29-bin.jar

How to register a JDBC driver using jruby-complete.jar?

I'm trying to write a script that is executed with the jruby-complete.jar like so:
java -cp derby.jar; -Djdbc.drivers=org.apache.derby.jdbc.EmbeddedDriver -jar jruby-complete.jar -S my_script.rb
I'm using JVM 1.6.0_11 and JRuby 1.4.
In my jruby script I attempt to connect to the database like this.
connection = Java::com.sql.DriverManager.getConnection("jdbc:derby:path_to_my_DB")
This throws a java.sql.SQLException: "No suitable driver found" exception.
I've tried manually loading the driver into the class loader using Class.forName which gives me the same error.
It looks like to me that the class loader being used by the DriverManager is not the same as the current thread's. I've tried setting the current thread's class loader using:
JThread = java.lang.Thread
...
class_loader = JavaLang::URLClassLoader.new(
[JavaLang::URL.new("jar:file:/derby.jar!/")].to_java(
JavaLang::URL),JRuby.runtime.jruby_class_loader)
JThread.currentThread().setContextClassLoader(class_loader )
But this doesn't help.
Any ideas?
OK I downloaded jruby-complete.jar and had a go....
This seems to work for me:
java -classpath c:\ruby\db-derby-10.5.3.0-bin\lib\derby.jar;jruby-complete-1.4.0.jar org.jruby.Main -S derby.rb
When using the -jar switch, the -classpath option is ignored (maybe the CLASSPATH shell var is too). But on the above line, we put both required jars on the class path and pass the class name to execute (i.e. org.jruby.Main). The script being passed in is as per my other answer.
Another option (which I have not tried) would be to alter the jruby-complete.jar manifest file to specify as classpath, as described here:
Adding Classes to the JAR File's Classpath
First, make sure your driver jar is not corrupted (this made me waste a couple of days one time).
Second, read this about JRuby/Java classloade: JRuby Wiki
Third (because I haven't played with "jruby-complete") try this simple script and then see if you can adapt as you need.
require 'java'
require 'C:\ruby\db-derby-10.5.3.0-bin\lib\derby.jar' # adjust for your machine
include_class "java.sql.DriverManager"
derby = org.apache.derby.jdbc.EmbeddedDriver.new
connection = DriverManager.getConnection("jdbc:derby:derbyDB;create=true")