Export Impala Table from HDFS to MySQL - mysql

I'm trying to use Sqoop to export an impala table from HDFS to MySQL. The table has already been made in MySQL and the schema of the two tables should match.
Impala table information:
1 start_date string
2 start_station_code string
3 end_date string
4 end_station_code string
5 duration_sec int
6 is_member int
7 cnt bigint
Impala table sample:
2019-05-05 14:07:42100022019-05-05 14:31:087143140611322
2019-05-08 17:51:57100022019-05-08 17:55:29705221101322
2019-05-05 14:07:40100022019-05-05 14:31:087143140711322
2019-05-07 09:55:48100022019-05-07 10:02:28672439911322
2019-05-03 06:54:38100022019-05-03 06:59:51705231201322
2019-05-07 09:56:33100022019-05-07 10:02:17705234311322
2019-05-05 14:06:40100022019-05-05 14:18:04642768411322
2019-05-01 08:54:36100022019-05-01 08:58:20705222301322
2019-05-02 09:17:22100022019-05-02 09:22:16692129401322
2019-05-02 09:16:37100022019-05-02 09:19:30705217201322
2019-05-06 07:09:54100022019-05-06 07:18:45608453111322
MySQL Table information:
+--------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+-------------+------+-----+---------+-------+
| start_date | varchar(10) | YES | | NULL | |
| start_station_code | varchar(20) | YES | | NULL | |
| end_date | varchar(20) | YES | | NULL | |
| end_station_code | varchar(20) | YES | | NULL | |
| duration_sec | int(11) | YES | | NULL | |
| is_member | int(11) | YES | | NULL | |
| cnt | bigint(20) | YES | | NULL | |
+--------------------+-------------+------+-----+---------+-------+
Export code:
sqoop export --connect jdbc:mysql://localhost/oozie --username root --password root --table bixirides_export --export-dir /user/hive/warehouse/impala_out/6* -m 1 --input-fields-terminated-by "|";
For some reason the sqoop export fails as soon as the Map task reaches 100%. The schema should match properly, but for some reason the export fails.
Error Message:
ERROR tool.ExportTool: Error during export:
Export job failed!

I see couple of issues.. based on your qn
start and end date are varchar(10) but data size seems to longer than that.
2019-05-05 14:07:42
I see delimiter as | but don't see that in Hive table.
Did you create the table with
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS textfile

Related

Import CSV using LOAD DATA getting wrong values

I have a big csv (near 100mb) that I would like to import in a table with the following structure:
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| cep | varchar(255) | YES | MUL | NULL | |
| site | text | YES | | NULL | |
| cidade | text | YES | | NULL | |
| uf | text | YES | | NULL | |
| cepbase | text | YES | | NULL | |
| segmentacao | text | YES | | NULL | |
| area | text | YES | | NULL | |
| cepstatus | int(1) | YES | | NULL | |
| score | int(11) | NO | | NULL | |
| fila | int(11) | NO | | NULL | |
+-------------+------------------+------+-----+---------+----------------+
I was about to write some code to import but I've found a MySQL command that does the job to me. So I've write the following:
LOAD DATA LOCAL INFILE '/Users/user/Downloads/base.csv'
INTO TABLE cep_status_new
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n'
IGNORE 1 ROWS
(#id,#cep,#site,#cidade,#uf,#cepbase,#segmentacao,#area,#cepstatus,#score,#fila)
SET id=NULL, cep=#col1, site='GOD', cidade=#col6, uf=#col7, cepbase='-', segmentacao=#col9, cepstatus=#col2, area='BING', score=99999, fila=5;
To try this code, I've removed thousand lines from my CSV and let only 2 lines: header and an input example:
cep,status,gang,bang,random,mock,awesome,qwert,hero
01019000,0,00387,00388,3550308,SAO PAULO,SP,011,B2
The code runs without problem but my insert is pretty strange:
mysql> select * from cep_status_new;
+----+------+------+--------+---------+---------+-------------+------+-----------+-------+------+
| id | cep | site | cidade | uf | cepbase | segmentacao | area | cepstatus | score | fila |
+----+------+------+--------+---------+---------+-------------+------+-----------+-------+------+
| 1 | 1 | GOD | 24655 | 3554805 | - | SP | BING | 0 | 99999 | 5 |
+----+------+------+--------+---------+---------+-------------+------+-----------+-------+------+
1 row in set (0.01 sec)
Why values from CSV are not being filled correctly?
According to this specification the column list after IGNORE 1 ROWS decides how the columns of the CSV file are mapped to columns of the table. It can either list the table columns in the order of the file or it can load the file columns into variables. With the column list
(#id,#cep,#site,#cidade,#uf,#cepbase,#segmentacao,#area,#cepstatus,#score,#fila)
you are loading 11 columns of the CSV file into variables named "id", "cep", etc. In the SET statement you then need to declare how the columns of the table are constructed from the variables. With the given statement you are refering to variables #col1 etc. that are not defined anywhere and consequently have undefined values.
The corrected statement (that I sadly can't test myself right now) should be:
INTO TABLE cep_status_new
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n'
IGNORE 1 ROWS
(#col1,#col2,#col3,#col4,#col5,#col6,#col7,#col8,#col9)
SET id=NULL, cep=#col1, site='GOD', cidade=#col6, uf=#col7, cepbase='-', segmentacao=#col9, cepstatus=#col2, area='BING', score=99999, fila=5;

Using Alter and ignore in mysql to remove duplicates

I am using MYSQL and have a table 'bid' which has duplicates entries into it
My table schema is
ITEM_CODE | int(11) | YES | | NULL | |
| Max_BidP | int(11) | YES | | NULL | |
| Seller_Name | varchar(45) | YES | | NULL | |
| Buyer_Name | varchar(45) | YES | | NULL | |
| ITEM_NAME | varchar(45) | YES | | NULL | |
| Qty | int(11) | YES | | 1 | |
+-------------+-------------+------+-----+---------+-------+
One of the entries in the table
16 | 30 | sahraw | sahraw | J.K Rowling | 1 |
16 | 30 | sahraw | sahraw | J.K Rowling | 1 |
I am trying to remove the dulicates and the query I am specifying is
ALTER IGNORE TABLE bid ADD UNIQUE INDEX (ITEM_CODE , Max_BidP ,Seller_Name , Buyer_Name , ITEM_NAME , Qty);
But its giving me an error
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IGNORE TABLE bid ADD UNIQUE INDEX (ITEM_CODE , Max_BidP ,Seller_Nam' at line 1
Any suggestions where I am going wrong.
Thanks
Please check the MySQL version you are using.
As of MySQL 5.7.4, the IGNORE clause for ALTER TABLE is removed and its use produces an error.
http://dev.mysql.com/doc/refman/5.7/en/alter-table.html
If you're using MySql 5.7.4 or later, IGNORE is no longer available. See MySQL “ALTER IGNORE TABLE” Error In Syntax

Add rows into a mysql table from a .sql file

I have a question on How I can insert a .sql file into a MySQL table which already contains lot of data ?
My .sql file looks like (1200 rows) :
--
-- Descriptif plan comptable SYSCOHADA (utf-8)
--
INSERT INTO llx_accounting_system (rowid, pcg_version, fk_pays, label, active) VALUES (10,'SYSCOHADA', 49, 'Plan comptable Ouest-Africain', 1);
INSERT INTO llx_accounting_account (rowid, fk_pcg_version, pcg_type, pcg_subtype, account_number, account_parent, label, active) VALUES (15000,'SYSCOHADA','CAPITAUX','XXXXXX','1',0,"Capital",'1');
INSERT INTO llx_accounting_account (rowid, fk_pcg_version, pcg_type, pcg_subtype, account_number, account_parent, label, active) VALUES (15001,'SYSCOHADA','CAPITAUX','XXXXXX','101',15000,"Capital social",'1');
INSERT INTO llx_accounting_account (rowid, fk_pcg_version, pcg_type, pcg_subtype, account_number, account_parent, label, active) VALUES (15002,'SYSCOHADA','CAPITAUX','XXXXXX','1011',15001,"Capital souscrit, non appele);
My MySQL table looks like :
mysql> describe llx_accounting_account ;
+----------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+-------------------+-----------------------------+
| rowid | int(11) | NO | PRI | NULL | auto_increment |
| entity | int(11) | NO | | 1 | |
| datec | datetime | YES | | NULL | |
| tms | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| fk_pcg_version | varchar(32) | NO | MUL | NULL | |
| pcg_type | varchar(20) | NO | | NULL | |
| pcg_subtype | varchar(20) | NO | | NULL | |
| account_number | varchar(32) | NO | MUL | NULL | |
| account_parent | varchar(32) | YES | | NULL | |
| label | varchar(255) | NO | | NULL | |
| fk_user_author | int(11) | YES | | NULL | |
| fk_user_modif | int(11) | YES | | NULL | |
| active | tinyint(4) | NO | | 1 | |
+----------------+--------------+------+-----+-------------------+-----------------------------+
13 rows in set (0.00 sec)
My MySQL table is not empty. There is already data and I want to add my .sql file after my data table.
I didn't execute this command because I think it's false :
LOAD DATA LOCAL INFILE 'data_3.9.sql'
INTO TABLE llx_accounting_account
FIELDS TERMINATED BY ';'
LINES TERMINATED BY '\n'
Do you have the solution ?
Thank you :)
----------------------------------------------------------------------------------
Solution :
With comments by #RakeshKumar and #PaulF, I found a way to solve my problem :
1) I deleted all rows where fk_pcg_version = 'SYSCOHADA':
delete from llx_accounting_account where fk_pcg_version = 'SYSCOHADA' ;
2) I imported the .sql file :
mysql -u root -p****** dolibarr < data_3.9.sql
3) I modified one information because account_number was 1 instead of 10 where rowid = 15000 :
UPDATE llx_accounting_account SET account_number = 10 WHERE rowid=15000 ;
Seems good :)
Thank you ;)
Use following way to import file
mysql -u username -p'password' dbname < filename.sql
Your import didn't work because you had already your same old SYSCOHADA rows in your table.
You can delete all rows where fk_pcg_version = 'SYSCOHADA' and import again your file corrected.
I had the same error importing data from a SQL file that was created from a mysqldump table output to file command. The reason was that the SQL file has the DROP TABLE IF EXISTS and CREATE TABLE statement at the start of the file.
Once this is removed, then it effectively appends the rows to the existing records.
Maybe this will help others.

MySQL CSV Import Fails - "Data too long for column 'air_date' at row 1"

I'm trying to import a CSV file into a MySQL table and I'm having all kinds of trouble getting it to work. Here's what I'm trying to do:
I am working on a video database and have an existing table with data already in it called episodes. Here's how it's set up:
+--------------+-----------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-----------------------+------+-----+-------------------+-----------------------------+
| title | varchar(40) | NO | MUL | NULL | |
| media_id | varchar(11) | NO | | NULL | |
| ep_info | varchar(75) | YES | | NULL | |
| air_date | varchar(20) | NO | | NULL | |
| trt | varchar(8) | NO | | NULL | |
| times_played | mediumint(9) unsigned | NO | | 0 | |
| last_played | timestamp | YES | | NULL | |
| entered | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| ep_desc | varchar(300) | NO | | NULL | |
+--------------+-----------------------+------+-----+-------------------+-----------------------------+
The primary key is the id field, with the title field set as a foreign key to the shows table. The shows table looks like this:
+-------------+-------------+------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+------------+-------+
| title | varchar(50) | NO | PRI | NULL | |
| title_image | varchar(50) | NO | | NULL | |
| gif_image | varchar(50) | NO | | NULL | |
| info_url | varchar(30) | NO | | shows.html | |
+-------------+-------------+------+-----+------------+-------+
My CSV file is in the following format:
"Big Wolf On Campus","BWOC0102","Season 1 Episode 2: The Bookmobile","April 9, 1999";"21:57",NULL,NULL,NULL,NULL,"Once every 70 years, a window of transference opens that offers Tommy a chance to pass his curse to another person. Merton volunteers but that same day a bookmobile shows up in Pleasantville and people start disappearing."
"Big Wolf On Campus","BWOC0103","Season 1 Episode 3: Butch Comes To Shove","April 16, 1999","21:06",NULL,NULL,NULL,NULL,"When a character from a 1950s educational film gets sick of the rules he decides to leave the movie for Pleasantville. While there Butch decides to find someone to bring back to his black-and-white world - and Stacey is at the top of his list."
During the import, I want the data in the CSV added to the existing data in the table. I also want the last_played field set to NULL (only updated when the show plays), the entered field set with a current timestamp, and the id field auto_incremented with the next value for the table.
Here is my import statement:
LOAD DATA INFILE 'ytv.csv' INTO TABLE episodes
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';
The resulting error message:
ERROR 1406 (22001): Data too long for column 'air_date' at row 1
What am I doing wrong here? It seems like the data is getting shifted over one column when it's importing (such that ep_info from the CSV is going into the air_date column) but I can't figure out why. Any insight would be much appreciated for this MySQL novice.
It seems you have some new episodes with no mapping entry in shows table. You can create a new table like episodes, remove any constraints, load the data to the new table, insert all missing show titles to your show table, then insert episodes from the new table to the episodes table.
Or you can delete the foreign key, load the data, amend you shows table, then add the foreign key back.

“Lost connection to MySQL server during query” error

I have MySQL 5.5.20 and this table:
mysql> desc country;
+----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| id | int(255) | NO | PRI | NULL | |
| iso | varchar(2) | NO | | NULL | |
| name | varchar(80) | NO | | NULL | |
| printable_name | varchar(80) | NO | | NULL | |
| iso3 | varchar(3) | YES | | NULL | |
| numcode | smallint(6) | YES | | NULL | |
+----------------+-------------+------+-----+---------+-------+
If I run a query like this
SELECT country.ID, country.ISO, country.NAME,
country.PRINTABLE_NAME, country.ISO3, country.NUMCODE
FROM country;
It returns:
ERROR 2013 (HY000): Lost connection to MySQL server during query
If I change the order of the columns (ISO3 before PRINTABLE_NAME for example) like this:
SELECT country.ID, country.ISO, country.NAME,
country.PRINTABLE_NAME, country.NUMCODE, country.ISO3
FROM country;
It works fine!
Or if I rewrite the query using lower-case letters for columns, it works as well.
This issue appears from time to time (about once a month) and the only solution to fix it is to restart MySQL! i have checked error log and this what i have in it:
130310 11:01:23 [Warning] /usr/sbin/mysqld: Forcing close of thread 401108 user: 'root'
I am really confused and don't know why this happens! Any ideas on how this could be fixed?