SQOOP export CSV to MySQL fails - mysql

I have CSV file in HDFS with lines like:
"2015-12-01","Augusta","46728.0","1"
I am trying to export this file to MySQL table.
CREATE TABLE test.events_top10(
dt VARCHAR(255),
name VARCHAR(255),
summary VARCHAR(255),
row_number VARCHAR(255)
);
With the command:
sqoop export --table events_top10 --export-dir /user/hive/warehouse/result --escaped-by \" --connect ...
This command fails with error:
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: Can't parse input data: '2015-12-02,Ashburn,43040.0,9'
at events_top10.__loadFromFields(events_top10.java:335)
at events_top10.parse(events_top10.java:268)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:834)
at events_top10.__loadFromFields(events_top10.java:320)
... 12 more
In case I do not use --escaped-by \" parameter than MySQL table contains rows like this
"2015-12-01" | "Augusta" | "46728.0" | "1"
Could you please explain how to export CSV file to MySQL table without double quotes?

I have to use both --escaped-by \ and --enclosed-by '\"'
So the correct command is
sqoop export --table events_top10 --export-dir /user/hive/warehouse/result --escaped-by '\\' --enclosed-by '\"' --connect ...
For more information please see official documentation

Related

Kafka connect with mysql for CDC: error Aborting snapshot due to error when last running 'UNLOCK TABLES'

I was trying to capture mysql change into kafka console consumer following this tutorial
In mysql conf file my.cnf, I have added server-id as 0 [found by this command: mysqld --verbose --help].
So, when creating connector by this:
curl -i -X POST -H "Accept:application/json" \
-H "Content-Type:application/json" http://localhost:8083/connectors/ \
-d '{
"name": "mysql-connector8",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "localhost",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": 1,
"database.server.name": "tigerhrm",
"database.history.kafka.bootstrap.servers": "localhost:9092",
"database.history.kafka.topic": "dbhistory.demo3" ,
"include.schema.changes": "true",
"tasks.max":1
}
}'
when I set server.id=0 as set in my my.cnf file, it gives error, but when change to any random number, connector created but gives a subsequent error:
[2020-02-06 11:58:25,190] INFO USE flyDB
(io.debezium.connector.mysql.SnapshotReader:803) [2020-02-06
11:58:25,191] INFO CREATE TABLE oauth_access_token ( token_id
varchar(255) DEFAULT NULL, token blob, authentication_id
varchar(255) NOT NULL, user_name varchar(255) DEFAULT NULL,
client_id varchar(255) DEFAULT NULL, authentication blob,
refresh_token varchar(255) DEFAULT NULL, PRIMARY KEY
(authentication_id) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
(io.debezium.connector.mysql.SnapshotReader:803) [2020-02-06
11:58:25,192] INFO Step 7: committing transaction
(io.debezium.connector.mysql.SnapshotReader:611) [2020-02-06
11:58:25,193] INFO Step 8: releasing global read lock to enable MySQL
writes (io.debezium.connector.mysql.SnapshotReader:625) [2020-02-06
11:58:25,193] INFO Writes to MySQL tables prevented for a total of
00:00:00.684 (io.debezium.connector.mysql.SnapshotReader:635)
[2020-02-06 11:58:25,193] ERROR Failed due to error: Aborting snapshot
due to error when last running 'UNLOCK TABLES':
com/mysql/jdbc/CharsetMapping
(io.debezium.connector.mysql.SnapshotReader:162)
org.apache.kafka.connect.errors.ConnectException:
com/mysql/jdbc/CharsetMapping at
io.debezium.connector.mysql.AbstractReader.wrap(AbstractReader.java:183)
at
io.debezium.connector.mysql.AbstractReader.failed(AbstractReader.java:161)
at
io.debezium.connector.mysql.SnapshotReader.execute(SnapshotReader.java:665)
at java.lang.Thread.run(Thread.java:748) Caused by:
java.lang.NoClassDefFoundError: com/mysql/jdbc/CharsetMapping at
io.debezium.connector.mysql.MySqlValueConverters.charsetFor(MySqlValueConverters.java:300)
at
io.debezium.connector.mysql.MySqlValueConverters.converter(MySqlValueConverters.java:270)
at
io.debezium.relational.TableSchemaBuilder.createValueConverterFor(TableSchemaBuilder.java:330)
at
io.debezium.relational.TableSchemaBuilder.convertersForColumns(TableSchemaBuilder.java:259)
at
io.debezium.relational.TableSchemaBuilder.createKeyGenerator(TableSchemaBuilder.java:148)
at
io.debezium.relational.TableSchemaBuilder.create(TableSchemaBuilder.java:127)
at
io.debezium.connector.mysql.MySqlSchema.lambda$applyDdl$3(MySqlSchema.java:365)
at java.lang.Iterable.forEach(Iterable.java:75) at
io.debezium.connector.mysql.MySqlSchema.applyDdl(MySqlSchema.java:360)
at
io.debezium.connector.mysql.SnapshotReader.lambda$execute$9(SnapshotReader.java:422)
at io.debezium.jdbc.JdbcConnection.query(JdbcConnection.java:389) at
io.debezium.jdbc.JdbcConnection.query(JdbcConnection.java:344) at
io.debezium.connector.mysql.SnapshotReader.execute(SnapshotReader.java:420)
... 1 more
where oauth_access_token is a table in my db[there exists multiple db].
Checking connectors' status I got this:
curl -s "http://localhost:8083/connectors" | jq '.[]' | xargs -I {mysql-connector} curl -s "http://localhost:8083/connectors/mysql-connector/status" | jq -c -M '[.name,.connector.state,.tasks[].state] |
join(":|:")' | column -s : -t | tr -d \" | sort
mysql-connector | RUNNING | FAILED
mysql-connector | RUNNING | FAILED
mysql-connector | RUNNING | FAILED
mysql-connector | RUNNING | FAILED
mysql-connector | RUNNING | FAILED
mysql-connector | RUNNING | FAILED
mysql-connector | RUNNING | FAILED
mysql-connector | RUNNING | FAILED
, Last column is their task status. and hence no database change is detected in topic where topic should be created named by the tables existed in db. How to solve this ?
Add: when I delete that table from db, the error occurs for another table and goes on !All services are running on my local ubuntu machine.
. Any kind of help is much appreciated!

I am getting exception when importing mysql data to HDFS

I am trying to import MySQL data into HDFS but I am getting an exception.
I have a table(products) in MYSQL and I am using the following command to import data into HDFS.
bin/sqoop-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table products --target-dir /user/nitin/products
I am getting the following exception:
Error: java.io.IOException: SQLException in nextKeyValue
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)`enter code here`
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.`enter code here`map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.sql.SQLException: Unknown type '246 in column 2 of 3 in binary-encoded result set.
at com.mysql.jdbc.MysqlIO.extractNativeEncodedColumn(MysqlIO.java:3710)
at com.mysql.jdbc.MysqlIO.unpackBinaryResultSetRow(MysqlIO.java:3620)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1282)
at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:335)
at com.mysql.jdbc.RowDataDynamic.<init>(RowDataDynamic.java:68)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:416)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:1899)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1347)
at com.mysql.jdbc.ServerPreparedStatement.serverExecute(ServerPreparedStatement.java:1393)
at com.mysql.jdbc.ServerPreparedStatement.executeInternal(ServerPreparedStatement.java:958)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1705)
at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
... 12 more
I have also used this command to import data into HDFS:
bin/sqoop-import --connect jdbc:mysql://localhost:3306/test?zeroDateTimeBehavior=convertToNull --username root --password root --table products --target-dir /user/nitin/product
MapReduce job failed.
It's because of data type conversion.
Try using --map-column-java option to define the column data type mapping explicitly

unable to export Json data from HDFS to Oracle using sqoop export

while doing sqoop export,cant parse input data.seeing below exception.
java.lang.RuntimeException: Can't parse input data: '"DeptId":888'
Caused by: java.lang.NumberFormatException
From Oracle DeptId is Number datatype
sqoop export \
--connect "jdbc:oracle:thin:#(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=wcx2-scan..com)(PORT=))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=***)))" \
--table API.CUSTOMER \
--columns 'Id','DeptId','Strategy','RCM_PROD_MSG_TXT' \
--export-dir /tmp/test \
--map-column-java RCM_PROD_MSG_TXT=String \
--username ********* \
--password ********** \
--input-fields-terminated-by ',' \
--input-null-string '\N' \
--input-null-non-string '\N'
Sample Json data
{"Id":"27952436","DeptId":888,"Strategy":"syn-cat-recs","recs":
[629848,1029280]}
Make sure data should be loaded to Oracle table

ERROR exec.DDLTask : java.lang.NoSuchMethod Error:

I was importing data from mysql to hive using sqoop:
sqoop import --connect jdbc:mysql://localhost:3306/DATASET -username root -P --table MATCHES --hive-import
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang/Class;)Lcom/fasterxml/jackson/databind/ObjectReader;18/11/25
11:42:58 ERROR ql.Driver: FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang/Class;)Lcom/fasterxml/jackson/databind/ObjectReader;
Do you have the jackson-databind jar in your hive lib directory.check it once

using sqoop to update hive table

I am trying to sqoop data out of a MySQL database where I have a table with both a primary key and a last_updated field. I am trying to essentially get all records that were recently updated and overwrite the current records in the hive warehouse
I have tried the following command
sqoop job --create trainingDataUpdate -- import \
--connect jdbc:mysql://localhost:3306/analytics \
--username user \
--password-file /sqooproot.pwd \
--incremental lastmodified \
--check-column last_updated \
--last-value '2015-02-13 11:08:18' \
--table trainingDataFinal \
--merge-key id \
--direct --hive-import \
--hive-table analytics.trainingDataFinal \
--null-string '\\N' \
--null-non-string '\\N' \
--map-column-hive last_updated=TIMESTAMP
and I get the following error
15/02/13 14:07:41 INFO hive.HiveImport: FAILED: SemanticException Line 2:17 Invalid path ''hdfs://dev.cluster.com:8020/user/hdfs/_sqoop/13140640000000520_32226_hwhjobdev_cluster.com_trainingDataFinal'': No files matching path hdfs://dev.cluster.com:8020/user/hdfs/_sqoop/13140640000000520_32226_dev.cluster.com_trainingDataFinal
15/02/13 14:07:42 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive exited with status 64
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:385)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:335)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:239)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
I thought by including the --merge-key it would be able to overwrite the old records with new records. Does anyone know if this is possible in sqoop?
I don't think sqoop can do it.
--merge-key is only used by sqoop-merge not import
also see http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1764421