Sqoop functionality has been removed from DSE - mysql

I am new to cassandra. Here I am tring to transfer whole my MYSQL database to cassandra using sqoop. But after all setup, when i execute following command.
bin/dse sqoop import-all-tables -m 1 --connect jdbc:mysql://127.0.0.1:3306/ABCDatabase --username root --password root --cassandra-thrift-host localhost --cassandra-create-schema --direct
I have received following error.
Sqoop functionality has been removed from DSE.
It said that sqoop functionality is removed from datastax. can you please if it removed then is there any other way to do that?
Thanks

You can use Spark to transfer data - it should be easy, something like:
val table = spark.read.jdbc(jdbcUrl, "table", connectionProperties)
table.write.format("org.apache.spark.sql.cassandra").options(
Map("table" -> "TBL", "keyspace" -> "KS")).save()
Examples of jdbc URLs, options, etc. are described in Databrick's documentation as they could be different for different databases.

Related

Can I use mysql database as destination storage for apache hudi

I am new to Apache Hudi,Please let me know if there any configuration is provided in apache hudi for writing data on mysql database.
If you want to write data from an Hudi table to MySQL, when you read your Hudi table as a dataframe, you can use spark jdbc driver to write the data to MySQL:
# read hudi table
df = spark.read.format("hudi").load("/path/to/table")
# write dataframe to MySQL
df.write.format('jdbc').options(
url='jdbc:mysql://<MySQL server url>/<database name>',
driver='com.mysql.jdbc.Driver',
dbtable='<table name>',
user='<user name>',
password='<password>'
).mode('append').save()
But if you want to use MySQL as a file system for your Hudi tables, the answer is no. Hudi manages the storage layer of the datasets on HCFS (Hadoop Compatible File System), and where MySQL is not a HCFS, Hudi cannot use it.
You can tryout options like Sqoop which is an interface for ETL of data from Hadoop (HDFS, Hive) to SQL tables and viceversa.
If we have a CSV/Textfile Table in Hive, Sqoop can easily export the table from HDFS to MySQL table RDS with commands like below:
export --table <mysql_table_name> --export-dir hdfs:///user/hive/warehouse/csvdata --connect jdbc:mysql://<host>:3306/<db_name> --username <username> --password-file hdfs:///user/test/mysql.password --batch -m 1 --input-null-string "\\N" --input-null-non-string "\\N" --columns <column names to be exported, without whitespace in between the column names
Feel free to check out my question post on Sqoop here

Force sqoop to re-create hive schemas when importing

We are importing databases from MySQL to Hive using Sqoop (1.4.6). Everything works ok, except when table schemas get updated (mainly columns being added) in the source databases. The modifications do not end up in Hive. It seems that the Hive schema is created only once, and not verified in each import. The rows are loaded fine, but of course missing the new columns. We can work around this by first dropping the databases to force a schema re-creation in Hive, but my question is, that is there a way to do this from Sqoop directly?
Our import script resembles:
sqoop import-all-tables
--compress
--compression-codec=snappy
--connect "jdbc:mysql://HOST:PORT/DB"
--username USER
--password PASS
--hive-import
--hive-overwrite
--hive-database DB
--as-textfile
you can use hcatalog table instead of hive, it will work.

Sqoop import all tables not syncing with Hive database

I imported MySQL database tables to Hive using sqoop tool by using below script.
sqoop import-all-tables --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" --username=retail_dba --password=cloudera --hive-import --hive-overwrite --create-hive-table --warehouse-dir=/user/hive/warehouse/
but when I check the database in hive, there is no retail.db.
If you want to import all tables in a specific hive database (already created). Use:
--hive-database retail
in your sqoop command.
as dev said if you want to sqoop everything in a particular db then use
--hive-database retail_db else every tables will be sqooped under default warehouse dir/tablename
Your command sqoops everything into this directory: /user/hive/warehouse/retail.db/
To import into hive use this argument: --hive-import and why are you using --as-textfile?
If you want to store as textfile then use --as-textfile and then use hive external table command to create external tables in Hive.

sqoop - trouble importing tables

I am new to BigData i don't know what is going on! Please note, I am learning this myself.
I imported a table called sqooptest from MySQL, from a db called sqoopex8 using this command:
sqoop import \
--connect jdbc:mysql://localhost/sqoopex8 \
--table sqooptest \
-m 1
I don't know where it goes (or imports).
There is a bunch of errors that is thrown, and honestly, i don't even know what to look for in the error. If it's the last line, it says "16/04/23 01:46:52 ERROR tool.ImportTool: Error during import: Import job failed!" Again, I am in learning phase and I am learning this all by myself, so please bear with me!
Now, I look under /user/hduser/ and there is a folder by the name of the table (sqooptest). Nothing inside it though.
Next, intuitively, looking around in the Internet, i find out that MySQL saves all its dbs in /var/lib/mysql. Apparently, i didn't have access to it, so I had to access it from the terminal (CLI). When I did, I found all my dbs there. Now, I did this:
sqoop import \
--connect jdbc:mysql://localhost/sqoopex8 \
--table sqooptest \
--target-dir /var/lib/mysql \
-m 1
( added --target-dir /var/lib/mysql)
This worked for some reason. when I do hadoop fs -ls /var/lib/mysql I see two files - _SUCCESS and part-m-00000. Why is that? Why did it not work in the first time.
Also, in the first attempt, even when I specify the target to HDFS --target-dir /user/hduser, it doesn't take it for some reason. When I give the target as a local file system, it takes it. Why?
Sqoop Needs a empty target path to save files. the path you gave /var/lib/mysql is used as the hdfs path to save the imported files.
/user/hduser might not have worked because that directory might exist or you don't have privileges to create one. try hadoop fs -mkdir to check.
For SUCCESS and part files here is the nice post
What are SUCCESS and part-r-00000 files in hadoop
By default sqoop will import the data from RDBMS to hdfs in the hdfs user directory eg: /user/hduser/[tablename]/part-m-00000 To avoid this and to store the data in our desired directory we have to mention it in the clause --target-dir. But this path should be existed except the last directory in the path. like if --target-dir is /user/hduser/mydirectory/mytabledata is the path, /user/hduser/mydirectory/ should exist in the hdfs then sqoop will create a directory[mytabledata here] under /user/hduser/mydirectory/ and will import the rdbms table data in this directory.
sqoop import \
--connect jdbc:mysql://localhost/sqoopex8 \
--table sqooptest \
--target-dir /path/to/your/desireddir \
-m 1
Please check the documentation here .

Sqoop: Could not load mysql driver exception

I Installed Sqoop in my local machine. Following are the config information.
Bash.bashrc:
export HADOOP_HOME=/home/hduser/hadoop
export HBASE_HOME=/home/hduser/hbase
export HIVE_HOME=/home/hduser/hive
export HCAT_HOME=/home/hduser/hive/hcatalog
export SQOOP_HOME=/home/hduser/sqoop
export PATH=$PATH:$HIVE_HOME/bin
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HBASE_HOME/bin
export PATH=$PATH:$SQOOP_HOME/bin
export PATH=$PATH:$HCAT_HOME/bin
Hadoop:
Version: Hadoop 1.0.3
Hive:
Version: hive 0.11.0
Mysql Connector driver
version: mysql-connector-java-5.1.29
"The driver is copied to the lib folder of sqoop"
Sqoop :
version: sqoop 1.4.4
After making all the installation I create a table in mysql named practice_1, But when I run the load command to load data from mysql to hdfs the command throws an exception:
ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
Coud anyone please guide me what can be the possible problem.
You need database driver in 'SQOOP' classpath check this
It has wonderful explanation about the 'SQOOP'
SQOOP has other options like
Ex: --driver com.microsoft.jdbc.sqlserver.SQLServerDriver -libjars=".*jar"
from here
You can use Sqoop with any other JDBC-compliant database. First, download the appropriate JDBC driver for the type of database you want to import, and install the .jar file in the $SQOOP_HOME/lib directory on your client machine. (This will be /usr/lib/sqoop/lib if you installed from an RPM or Debian package.) Each driver .jar file also has a specific driver class which defines the entry-point to the driver. For example, MySQL's Connector/J library has a driver class of com.mysql.jdbc.Driver. Refer to your database vendor-specific documentation to determine the main driver class. This class must be provided as an argument to Sqoop with --driver.
You may be interested in understanding the difference between connector and driver here is the article
Another solution which avoids using a shared library is adding the driver jar to the classpath of sqoop by using HADOOP_CLASSPATH. I haven't got the -libjars option to work. This solution works also on a secure cluster using kerberos.
HADOOP_CLASSPATH=/use.case/lib/postgresql-9.2-1003-jdbc4.jar
sqoop export --connect jdbc:postgresql://db:5432/user \
--driver org.postgresql.Driver \
--connection-manager org.apache.sqoop.manager.GenericJdbcManager \
--username user \
-P \
--export-dir /user/hive/warehouse/db1/table1 \
--table table2
This one works at least with sqoop 1.4.3-cdh4.4.0
You need to add the MySql connector to /usr/lib/sqoop/lib.
MySQL JDBC Driver by default is not present in Sqoop distribution in order to ensure that the default distribution is fully Apache license compliant.
Hope this helps...!!!
copy the 'mysql-connector-java-5.1.41-bin.jar' into sqoop/lib folder and execute sqoop import statements
If you have copied mysql driver to the sqoop lib folder. It will work for sure. Make sure you sqoop command is correct
/home/hduser/sqoop/bin/sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root -–table practice_1 -m 1
It's a Oozie ShareLib problem. The script below works for my:
At Shell
sudo -u hdfs hadoop fs -chown cloudera:cloudera /user/oozie/share/lib/lib_20170719053712/sqoop
hdfs dfs -put /var/lib/sqoop/mysql-connector-java.jar /user/oozie/share/lib/lib_20170719053712/sqoop
sudo -u hdfs hadoop fs -chown oozie:oozie /user/oozie/share/lib/lib_20170719053712/sqoop
oozie admin -oozie http://localhost:11000/oozie -sharelibupdate
oozie admin -oozie http://localhost:11000/oozie -shareliblist sqoop
At Hue Sqoop Client
sqoop list-tables --connect jdbc:mysql://localhost/retail_db --username root --password cloudera
More detail at:
https://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/
You need to grant priveleges to the tables as below:
grant all privileges on marksheet.* to 'root'#'192.168.168.1'
identified by 'root123';
flush privileges;
Here is sample command that I have successfully executed:
sqoop import --verbose --fields-terminated-by ',' --connect
jdbc:mysql://192.168.168.1/test --username root --password root123
--table student --hive-import --create-hive-table --hive-home /home/training/hive --warehouse-dir /user/hive/warehouse
--fields-terminated-by ',' --hive-table studentmysql