How to select spark library with our --jars? - mysql

There are multiple version mysql connector library.
/usr/share/java/mysql-connector-java-5.1.46.jar
/usr/share/java/mysql-connector-java.jar
/usr/share/java/mariadb-connector-java.jar
/usr/share/java/mysql-connector-java-8.0.24.jar
I added external jar library blow path. (spark-default.xml)
- spark.driver.extraClassPath : ~~~:/usr/share/java/*
- spark.executor.extraClassPath : ~~~:/usr/share/java/*
If I run spark-submit command without --jars {specific mysql connector}, what is mysql connector version ? Where can i find? (spark history server?)
ex)
jdbc = spark.read.format("jdbc")\
.option("driver", "com.mysql.jdbc.Driver")\
.option("url", "jdbc:mysql://url:3306/db")\
.option("user", "XXX")\
.option("password", "XXX")\
.option("dbtable", "table")\
.load()
jdbc.show()

The environment tab of the Spark Web UI contains a section "Classpath entries". Here you can find all jars that have been added to the classpath of the currently running Spark application and so identify the jar (and the version) of the mysql connector.
If a history server is running, it's Web UI also contains the same information after the Spark job has finished.

Related

Adding Spark CSV dependency to Zeppelin

I'm running an EMR with a spark cluster on AWS.
Spark version is 1.6
When running the folllowing command:
proxy = sqlContext.read.load("/user/zeppelin/ProxyRaw.csv",
format="com.databricks.spark.csv",
header="true",
inferSchema="true")
I get the following error:
Py4JJavaError: An error occurred while calling o162.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at
http://spark-packages.org
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
How can I solve this? I assume I should add a package but how do I install it and where?
There is many way to add packages in Zeppelin :
One of them is to actually change the conf/zeppelin-env.sh configuration file adding the packages you need e.g com.databricks:spark-csv_2.10:1.4.0 in your case to the submit options since Zeppelin uses the spark-submit command under the hood :
export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.4.0"
But let's say that you don't have actually access to those configuration. You can then use Dynamic Dependency Loading via %dep interpreter (deprecated) :
%dep
z.load("com.databricks:spark-csv_2.10:1.4.0")
This will require that you load the dependencies before launching or restarting the interpreter.
Another way to do it is do add the dependency you need via the interpreter dependency manager as described in the following link : Dependency Management for Interpreter.
Well,
First you need to download the CSV liv from Maven repository:
https://mvnrepository.com/artifact/com.databricks/spark-csv_2.10/1.5.0
Check the scala version that you are using. If is 2.10 or 2.11.
When you call spark-shell our spark-submit or pyspark. Or even your Zeppelin you need to add the option --jars and the path to your lib.
Like this:
pyspark --jars /path/to/jar/spark-csv_2.10-1.5.0.jar
Than you can call it as you did above.
You can see other close issue here: How to add third party java jars for use in pyspark

How to add a JDBC driver to a Jenkins pipeline?

I want to create a database within a pipeline script to be used by the deployed app. But first I started testing the connection. I got this problem:
java.sql.SQLException: No suitable driver found for jdbc:mysql://mysql:3306/test_db
I have the database plugin and the MySQL database plugin installed.
How do I get the JDBC driver?
import groovy.sql.Sql
node{
def sql = Sql.newInstance("jdbc:mysql://mysql:3306/test_db", "user","passwd", "com.mysql.jdbc.Driver")
def rows = sql.execute "select count(*) from test_table;"
echo rows.dump()
}
Update after albciff answer:
My versions of:
Jenkins = 2.19.1
Database plugin = 1.5
Mysql database plugin = 1.1
The latest test script.
import groovy.sql.Sql
Class.forName("com.mysql.jdbc.Driver")
Which throws:
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
From the MySQL DataBase Plugin documentation you can see that jdbc drivers for MySQL are included:
Note that MySQL JDBC driver is under GPLv2 with FOSS exception. This
plugin by itself qualifies under the FOSS exception, but if you are
redistributing this plugin, please do check the license terms.
Drizzle(+MySQL) Database Plugin is available as an alternative to this
plugin, and that one is under the BSD license.
More concretely the actual last version (1.1) for this plugin contains connector version 5.1.38:
Version 1.1 (May 21, 2016) mysql-connector version 5.1.38
So probably in order to have the driver available you have to force the driver to be registered.
To do so use Class.forName("com.mysql.jdbc.Driver") before instantiate the connection in your code:
import groovy.sql.Sql
node{
Class.forName("com.mysql.jdbc.Driver")
def sql = Sql.newInstance("jdbc:mysql://mysql:3306/test_db", "user","passwd", "com.mysql.jdbc.Driver")
def rows = sql.execute "select count(*) from test_table;"
echo rows.dump()
}
UPDATE:
In order to has the JDBC connector classes available in the Jenkins pipeline groovy scripts you need to update the DataBase plugin to last currently version:
Version 1.5 (May 30, 2016) Pipeline Support
You can simply add the java connector in the java class path.
If jenkins is running java < 9 you probably will find the right place inside something like that:
<java_home>/jre/lib/ext
If jenkins is running java >= 9 you probably will find the right place inside something like that:
/usr/share/jenkins/jenkins.war
To find your paths you can check:
http://your.jenkins.host/systemInfo (or navigate system info path by GUI) and search for java.ext.dirs or java.class.path
http://your.jenkins.host/script (running console script such as System.getProperty("java.ext.dirs") or System.getProperty("java.class.path"))
This snippet can help you with the jenkins.war thing when running inside docker:
#adding extra jars to default jenkins java classpath (/usr/share/jenkins/jenkins.war)
RUN sudo mkdir -p /usr/share/jenkins/WEB-INF/lib/
RUN whereis jar #just to find full jar command classpath to use with sudo
COPY ./jar-ext/groovy/mysql-connector-java-8.0.21.jar /usr/share/jenkins/WEB-INF/lib/
RUN cd /usr/share/jenkins && sudo /opt/java/openjdk/bin/jar -uvf jenkins.war ./WEB-INF/lib/mysql-connector-java-8.0.21.jar
For Jenkins running on Java >= 9 add the jdbc drivers under ${JENKINS_HOME}/war/WEB-INF/lib and under the --webroot directory.

No suitable driver found for jdbc in Spark

I am using
df.write.mode("append").jdbc("jdbc:mysql://ip:port/database", "table_name", properties)
to insert into a table in MySQL.
Also, I have added Class.forName("com.mysql.jdbc.Driver") in my code.
When I submit my Spark application:
spark-submit --class MY_MAIN_CLASS
--master yarn-client
--jars /path/to/mysql-connector-java-5.0.8-bin.jar
--driver-class-path /path/to/mysql-connector-java-5.0.8-bin.jar
MY_APPLICATION.jar
This yarn-client mode works for me.
But when I use yarn-cluster mode:
spark-submit --class MY_MAIN_CLASS
--master yarn-cluster
--jars /path/to/mysql-connector-java-5.0.8-bin.jar
--driver-class-path /path/to/mysql-connector-java-5.0.8-bin.jar
MY_APPLICATION.jar
It doens't work. I also tried setting "--conf":
spark-submit --class MY_MAIN_CLASS
--master yarn-cluster
--jars /path/to/mysql-connector-java-5.0.8-bin.jar
--driver-class-path /path/to/mysql-connector-java-5.0.8-bin.jar
--conf spark.executor.extraClassPath=/path/to/mysql-connector-java-5.0.8-bin.jar
MY_APPLICATION.jar
but still get the "No suitable driver found for jdbc" error.
I had to add the driver option when using the sparkSession's read function.
.option("driver", "org.postgresql.Driver")
var jdbcDF - sparkSession.read
.option("driver", "org.postgresql.Driver")
.option("url", "jdbc:postgresql://<host>:<port>/<DBName>")
.option("dbtable", "<tableName>")
.option("user", "<user>")
.option("password", "<password>")
.load()
Depending on how your dependencies are setup, you'll notice that when you include something like compile group: 'org.postgresql', name: 'postgresql', version: '42.2.8' in Gradle, for example, this will include the Driver class at org/postgresql/Driver.class, and that's the one you want to instruct spark to load.
There is 3 possible solutions,
You might want to assembly you application with your build manager (Maven,SBT) thus you'll not need to add the dependecies in your spark-submit cli.
You can use the following option in your spark-submit cli :
--jars $(echo ./lib/*.jar | tr ' ' ',')
Explanation : Supposing that you have all your jars in a lib directory in your project root, this will read all the libraries and add them to the application submit.
You can also try to configure these 2 variables : spark.driver.extraClassPath and spark.executor.extraClassPath in SPARK_HOME/conf/spark-default.conf file and specify the value of these variables as the path of the jar file. Ensure that the same path exists on worker nodes.
I tried the suggestions shown here which didn't work for me (with mysql). While debugging through the DriverManager code, I realized that I needed to register my driver since this was not happening automatically with "spark-submit". I therefore added
Driver driver = new Driver();
The constructor registers the driver with the DriverManager, which solved the SQLException problem for me.

How to use Spark DataFrame with MySQL

OK, I have known that I could use jdbc connector to create DataFrame with this command:
val jdbcDF = sqlContext.load("jdbc",
Map("url" -> "jdbc:mysql://localhost:3306/video_rcmd?user=root&password=123456",
"dbtable" -> "video"))
But I got this error: java.sql.SQLException: No suitable driver found for ...
And I have tried to add jdbc jar to spark_path with both commands but failed:
spark-shell --jars mysql-connector-java-5.0.8-bin.jar
SPARK_CLASSPATH=mysql-connector-java-5.0.8-bin.jar spark-shell
My Spark version is 1.3.0 while Class.forName("com.mysql.jdbc.Driver").newInstance is worked.
It is caused because the data frame does the find the the Mysql Connector Jar in the class path. This can be resolved by adding the jar to the spark class path as below:
Edit /spark/bin/compute-classpath.sh as
CLASSPATH="$CLASSPATH:$ASSEMBLY_JAR:yourPathToJar/mysql-connector-java-5.0.8-bin.jar"
Save the file and Restart the spark.
You might want to try mysql-connector-java-5.1.29-bin.jar

Qt - Trying mysql driver

I followed this tutorial to compile mysql driver with VS2010 :
Qt - How to get|compile Mysql driver.
The compilation fails with the error : LNK1123: failure during conversion to COFF: file invalid or corrupt
I tried with with multiple versions of mysql and qt, i always get the same error.
Note: I am using Qt-4.8.4 and mysql-5.5.32-win32.
I would rebuild Qt from source, because you also need the SQL driver (not only the plugin).
The driver source is located under /src/sql/drivers/mysql
The plugin source is located under /src/plugins/sqldrivers
/src/sql/drivers/mysqldrivers.pri contains this:
contains(sql-drivers, all):sql-driver += psql mysql odbc oci tds db2 sqlite ibase
contains(sql-drivers, mysql):include($$PWD/mysql/qsql_mysql.pri)
So I think that you need to run configure with the options: -qt-sql-mysql and -plugin-sql-mysql before compiling Qt.
Recompile Qt
Open a Qt 4.8.4 Command Prompt
cd \qtdir
nmake distclean
configure -debug-and-release -platform win32-msvc2010 -mp -nomake examples -nomake demos -qt-sql-mysql -plugin-sql-mysql
nmake
You might need to point configure to the correct include/library directory for MySQL, by adding this options: -I "c:\path\to\mysql\include" and -L "c:\path\to\mysql\lib"