Big table sqoop import to hive, fixed mapper less rows fetch - mysql

I wanna sqoop big table in mysql.
this table has about 300,000,000 rows and operating.
I wanna sqoop as soon as possible, not burden to system.
So, my sqoop command is
sqoop import \
--connect jdbc:mysql://.... \
--username ... \
--password ... \
--query "SELECT * FROM table_name where \$CONDITIONS" \
--hive-import \
--hive-overwrite \
--hive-table ........ \
--create-hive-table \
--mapreduce-job-name .... \
--num-mappers $mapper_count \
--fetch-size $patch_count \
--split-by "account_id"
I want to move all of big table, 500,000 at a time using 100 mappers.
I tried setting num-mappers=100 and fetch-size=500000. But only num-mappers adopted. So, operating Database stressed moving 3,000,000 rows each mappers.
Please advice.
Thanks.

Related

Sqoop making column as null for long text in sqoop export

I am trying to export records from S3 to Mysql Aurora using sqoop export .
One of the data type in S3 is like clob and the its long text and XML file is stored in it as string .
When i run my sqoop job it runs fine but in Mysql this column value comes as blank space not as null .
Is there anyway i can make long text appear in mysql table as well ?
This is my sqoop export
sqoop export \
--direct \
--connect jdbc:mysql://abcd.amazonaws.com/FSP \
--username admin \
--password Welcome123 \
--table AUDIT_EVENT \
--export-dir s3://abcd/DMS/FSP/AUDIT_EVENT \
-num-mappers 25 \
--fields-terminated-by ',' \
--batch \
--input-lines-terminated-by '\n' \
-- --default-character-set=latin1
I did try to use this option as well
--map-column-hive DETAILS=String
But when i select i see blank space in the table .
So you this and you will see the value
--map-column-java DETAILS=String

How to use 'create-hive-table' with Sqoop correctly?

I'm trying to import data from MySQL table to Hive using Sqoop. From what I understood there are 2 ways of doing that.
Import data into HDFS and then create External Table in Hive and load data into that table.
Use create-hive-table while running Sqoop query to create a new table in Hive and directly load data into that. I am trying to do this but can't do it for some reason
This is my code
sqoop import \
--connect jdbc:mysql://localhost/EMPLOYEE \
--username root \
--password root \
--table emp \
--m 1 \
--hive-database sqoopimport \
--hive-table sqoopimport.employee \
--create-hive-table \
--fields-terminated-by ',';
I tried using --hive-import as well but got error.
When I ran the above query, job was successful but there was no table created in hive as well as data was stored in \user\HDFS\emp\ location where \HDFS\emp was created during the job.
PS: Also I could not find any reason for using --m 1 with Sqoop. It's just there in all queries.
I got the import working with following query. There is no need to write create-hive-table we can just write new table name with hive-table and that table will be created. Also if there is any issue then go to hive-metastore location and run rm *.lck then try import again.
sqoop import \
--connect jdbc:mysql://localhost/EMPLOYEE \
--username root \
--password root \
--table emp4 \
--hive-import \
--hive-table sqoopimport.emp4 \
--fields-terminated-by "," ;

Import a mysql table (sqoop) directly in hive using --create-hive-table

I'm training myself for the HDPCD exam, so I'm testing all possible import and exports using MySQL to Hive. In this example, I would like to import a table from MySQL and create from scratch the same table in hive using the parameter --create-hive-table. Although in the [documentation][1] it's included I have found a right example to do it. I have tried this but it doesn't work
sqoop import --connect jdbc:mysql://master/poc --username root --table dept --where 'id_dept > 2' --hive-import --hive-database poc --hive-table deptv2 --create-hive-table true -m 1 --split-by id_dept
Please, guys if someone of you knows how to use it let me know it. I appreciate thanks so much
I'm back cause I have just to try again just putting the parameter without any input and it worked that's it. Anyways I'll leave an example. Probably it's going to help someone.
sqoop import --connect jdbc:mysql://master/poc --username root \
--table dept --where 'id_dept > 2' --hive-import \
--hive-database poc --hive-table deptv2 --create-hive-table -m 1 --split-by id_dept
Thanks.

How does Sqoop map csv file column to column of my sql table

how does Sqoop mapped import csv file to my sql table's column ? I just ran below import and export sqoop command and it work properly but not sure how Sqoop mapped the imported result into my sql table column's ? I have CSV file created manually which I want to export to my sql so need a way to specify csv file & column mapping ..
sqoop import \
--connect jdbc:mysql://mysqlserver:3306/mydb \
--username myuser \
--password mypassword \
--query 'SELECT MARS_ID , MARKET_ID , USERROLE_ID , LEADER_MARS_ID , CREATED_TIME , CREATED_USER , LST_UPDTD_TIME , LST_UPDTD_USER FROM USERS_TEST u WHERE $CONDITIONS' \
-m 1 \
--target-dir /idn/home/data/user
Deleted record from my sql database and run the below export command which inserted data back into table .
sqoop export \
--connect jdbc:mysql://mysqlserver:3306/mydb \
--table USERS_TEST \
--export-dir /idn/home/data/user \
--username myuser \
--password mypassword \
You can utilize --input-fields-terminated-by and --columns parameters to control the structure of the data to be exported back to RDBMS through Sqoop.
I would recommend you to refer the sqoop user guide for more information.

sqoop import query only import the first column

I use sqoop to import data from mysql to hadoop in csv form, it works well when use table argument. However, when I use query argument, it can only import the first column, the other columns are missed.
Here you are my command.
sqoop import \
--connect jdbc:mysql://127.0.0.1:3306/sqoop \
--username root \
--password root \
--query ' select age, job from person where $CONDITIONS ' \
--bindir /tmp/sqoop-hduser/compile \
--fields-terminated-by ',' \
--target-dir /Users/hduser/hadoop_data/onedaydata -m1
In the csv file, it shows only the age.
Does anyone know how to solve it?
Thanks
Read this documentation from sqoop User Guide, When you use $condition you must specift the splitting column.
Sqoop can also import the result set of an arbitrary SQL query. Instead of using the --table, --columns and --where arguments, you can specify a SQL statement with the --query argument.
When importing a free-form query, you must specify a destination directory with --target-dir.
If you want to import the results of a query in parallel, then each map task will need to execute a copy of the query, with results partitioned by bounding conditions inferred by Sqoop.
Your query must include the token $CONDITIONS which each Sqoop process will replace with a unique condition expression. You must also select a splitting column with --split-by.
For example:
$ sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
--split-by a.id --target-dir /user/foo/joinresults
Alternately, the query can be executed once and imported serially, by specifying a single map task with -m 1:
$ sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
-m 1 --target-dir /user/foo/joinresults
Try this:
sqoop import \
--connect jdbc:mysql://127.0.0.1:3306/sqoop \
--username root \
--password root \
**--columns "First_Column" \**
--bindir /tmp/sqoop-hduser/compile \
--fields-terminated-by ',' \
--target-dir /Users/hduser/hadoop_data/onedaydata -m1
Whenever you are using --query parameter, you need to specify the --split-by parameter with the column that should be used for slicing your data into multiple parallel tasks. The another required parameter is --target-dir, which specifies the directory on HDFS where your data should be stored.
Solution: Try to include --split-by argument to your sqoop command and see if the error is resolved.