LOAD DATA LOCAL INFILE lines starting by issue - mysql

I have csv file with over a million lines and i need to write only the lines starting by '01' into database.
.csv file looks like this
01;104;5586;20;1000;
01;105;5586;80;1000;
01;106;5586;80;1000;
04;104;5586;20;1000;
06;105;5586;80;1000;
05;106;5586;80;1000;
SQL looks linke this
LOAD DATA LOCAL INFILE '$uploadfile'
REPLACE INTO TABLE mytable
FIELDS TERMINATED BY ';' ENCLOSED BY ''
IGNORE 1 LINES
(`a`, `b`, `c`, `d`, `e`);
So this works to import all lines. But how can i get only the lines which start with 01;....

You can try this one -
LOAD DATA LOCAL INFILE 'filename.csv'
REPLACE INTO TABLE mytable
FIELDS TERMINATED BY ';' ENCLOSED BY ''
LINES STARTING BY '01' TERMINATED BY '\r\n'
(`a`, `b`, `c`, `d`, `e`)
SET `a` = '01';
and you will get something like this -
01 104 5586 20 1000
01 105 5586 80 1000
01 106 5586 80 1000
Check line separator you use - '\r\n' or '\n' in TERMINATED BY clause.

Devart's solution would work if the data in the line never contains "01" again.
I found out that LINES STARTING BY is not working as expected according to this article:
https://bugs.mysql.com/bug.php?id=3632
"Another problem was that LINES STARTING BY xxx means that the MySQL will assume the lines start is at the next occurence of xxx".
So if my data also include 01, not just the start of the line, MySQL read some data from these lines. So this will insert bad data:
01;104;5586;01;1000
01;105;8586;80;1000
01;106;5586;80:0123
I wonder why there is no solution like the suggetions in the article from 2012:
(1) let LINES STARTING BY 'X' continue to mean "line contains 'X' anywhere and data prior to 'X' in the record is skipped" and document this as such.
(2) add LINES STARTING BY 'X' POSITION N which means "line contains 'X' beginning at character position N relative to the beginning of the record.
I ended up spliting the files or adding a prefix (X) at the start of each line.
The .csv like this
X01;104;5586;20;1000;
X01;105;5586;01;1000;
X01;106;5586;80;1000;
X04;104;5586;01;0123;
X06;105;5586;80;1000;
X05;106;5586;80;1000;
And the code like this:
LOAD DATA LOCAL INFILE 'filename.csv'
REPLACE INTO TABLE mytable
FIELDS TERMINATED BY ';' ENCLOSED BY ''
LINES STARTING BY 'X01' TERMINATED BY '\r\n'
(`a`, `b`, `c`, `d`, `e`) SET `a` = '01';
Maybe someone else out there had the same missunderstanding with LINES STARTING BY.

Related

MySql: how to import csv with comma in decimals

I've been searching for a solution to my problem but found none.
I have a text file (>150K rows), tab delimited.
Numbers with decimals have a comma.
My fields are set as FLOAT.
The problem is that MySql is expecting a dot as decimal separator, not a comma, so all decimals are just ignored.
I tried to use different characters set, or even try replacing the comma with dot during import, but with no luck.
My import step
USE stat;
LOAD DATA LOCAL INFILE 'tbl_t1.txt' INTO TABLE `tbl_t1`
FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;
My raw data
Partno grossw netw ......up to 132 col
Item1 1,4753 1,1325
Item2 0,1673 0,0184
Item3 2,1357 2,0361
Once imported I get
Item1 1 1 .........
Item2 0 0
Item3 2 2
When trying below code I get zero rows, my table is emptied
USE stat;
LOAD DATA LOCAL INFILE 'tbl_t1.txt' INTO TABLE `tbl_t1`
FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\r\n' IGNORE 1 LINES
(#col1) SET grossw = REPLACE(#col1, ',' , '.');
UPDATE
I found why the table is emptied, the issue is that I have 132 columns in the table, and #col1 handle just 1 column. My above was just an example.
I need to investigate more.
Any help is very much appreciated.
Thanks.

How to use UTC_TIMESTAMP() in csv file

I need a way to use the UTC_TIMESTAMP() function in a CSV file.
Currently I'm using the following Syntax;
LOAD DATA INFILE '/path/to/file.csv'
INTO TABLE my_table FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
I'm not quite sure what to do to keep the UTC_TIMESTAMP() from being enclosed in quotes. When entered in the database, I get the following result
"0000-00-00 00:00:00"
This was the only example I could find on stack overflow or on Google for converting a string to a MYSQL value.
MySQL CSV import: datetime value
I solved the problem by following the MySQL documentation on this page.
https://dev.mysql.com/doc/refman/5.7/en/load-data.html
About halfway down the page there is a section that shows you how to create variables for rows and then set table rows equal to native mysql functions or values assigned to those variables(Whichever you choose).
The example in the documentation that I'm referring to looks like this.
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, column2)
SET column3 = CURRENT_TIMESTAMP;
I fixed my problem by restructuring my code like this...
LOAD DATA LOCAL INFILE '/path/to/file.csv'
INTO TABLE veh_icodes FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(id, vcode, year, make, model, body_style, tran_type, engine_cap, drive_train, doors, trim, created_by, updated_by, #created_at, #updated_at, active)
SET created_at = CURRENT_TIMESTAMP, updated_at = CURRENT_TIMESTAMP;"
I hope this helps someone. :)

ignore first two characters on a column while importing csv to mysql

I am trying to import a csv file to mysql table, But I need to remove First two characters on particular column before importing to mysql.
This is my statment :
string strLoadData = "LOAD DATA LOCAL INFILE 'E:/park/Export.csv' INTO TABLE tickets FIELDS terminated by ',' ENCLOSED BY '\"' lines terminated by '\n' IGNORE 1 LINES (SiteId,DateTime,Serial,DeviceId,AgentAID,VehicleRegistration,CarPark,SpaceNumber,GpsAddress,VehicleType,VehicleMake,VehicleModel,VehicleColour,IssueReasonCode,IssueReason,NoticeLocation,Points,Notes)";
Column IssueReasoncode' has data like 'LU12' , But i need to remove the first 2 characters it should have only integers on it and not alpha numeric .
I need to remove 'LU' from that column.
Is it possible to write like this on left(IssueReasonCode +' '2). This column is varchar(45) and cant be changed now because of large data on it.
Thanks
LOAD DATA INFILE has the ability to perform a function on the data for each column as you read it in (q.v. here). In your case, if you wanted to remove the first two characters from the IssueReasonCode column, you could use:
RIGHT(IssueReasonCode, CHAR_LENGTH(IssueReasonCode) - 2)
to remove the first two characters. You specify such column mappings at the end of the LOAD DATA statement using SET. Your statement should look something like the following:
LOAD DATA LOCAL INFILE 'E:/park/Export.csv' INTO TABLE tickets
FIELDS terminated by ','
ENCLOSED BY '\"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(SiteId, DateTime, Serial, DeviceId, AgentAID, VehicleRegistration, CarPark, SpaceNumber,
GpsAddress, VehicleType, VehicleMake, VehicleModel, VehicleColour, IssueReasonCode,
IssueReason, NoticeLocation, Points, Notes)
SET IssueReasonCode = RIGHT(IssueReasonCode, CHAR_LENGTH(IssueReasonCode) - 2)
Referencing this and quoting this example , you can try the below to see if it works
User variables in the SET clause can be used in several ways. The
following example uses the first input column directly for the value
of t1.column1, and assigns the second input column to a user variable
that is subjected to a division operation before being used for the
value of t1.column2:
LOAD DATA INFILE 'file.txt' INTO TABLE t1 (column1, #var1) SET
column2 = #var1/100;
string strLoadData = "LOAD DATA LOCAL INFILE 'E:/park/Export.csv' INTO TABLE tickets FIELDS terminated by ',' ENCLOSED BY '\"' lines terminated by '\n' IGNORE 1 LINES (SiteId,DateTime,Serial,DeviceId,AgentAID,VehicleRegistration,CarPark,SpaceNumber,GpsAddress,VehicleType,VehicleMake,VehicleModel,VehicleColour,#IRC,IssueReason,NoticeLocation,Points,Notes) SET IssueReasonCode = substr(#IRC,2) ;";

Is it possible to perform functions during LOAD DATA INFILE without creating a stored procedure?

I have two columns, q1 and q2, that I'd like to sum together and put in the destination column, q.
The way I do it now, I put the data in an intermediate table, then sum during loading, but I'm wondering if it's possible to do it during extraction instead?
Here's my script:
LOAD DATA INFILE 'C:\temp\foo.csv'
INTO TABLE new_foo
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(q1,q1)
INSERT INTO foo (q) SELECT q1+q2 AS q
FROM foo_temp;
Try:
LOAD DATA INFILE 'C:\temp\foo.csv'
INTO TABLE `foo`
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(#`q1`, #`q2`)
SET `q` = #`q1` + #`q2`;

if exists update else insert csv data MySQL

I am populating a MySQL table with a csv file pulled from a third party source. Every day the csv is updated and I want to update rows in MySQL table if an occurrence of column a, b and c already exists, else insert the row. I used load data infile for the initial load but I want to update against a daily csv pull. I am familiar with INSERT...ON DUPLICATE, but not in the context of a csv import. Any advice on how to nest LOAD DATA LOCAL INFILE within INSERT...ON DUPLICATE a, b, c - or if that is even the best approach would be greatly appreciated.
LOAD DATA LOCAL INFILE 'C:\\Users\\nick\\Desktop\\folder\\file.csv'
INTO TABLE db.tbl
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 lines;
Since you use LOAD DATA LOCAL INFILE, it is equivalent to specifying IGNORE: i.e. duplicates would be skipped.
But
If you specify REPLACE, input rows replace existing rows. In other words, rows that have the same value for a primary key or unique index as an existing row.
So you update-import could be
LOAD DATA LOCAL INFILE 'C:\\Users\\nick\\Desktop\\folder\\file.csv'
REPLACE
INTO TABLE db.tbl
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 lines;
https://dev.mysql.com/doc/refman/5.6/en/load-data.html
If you need a more complicated merge-logic, you could import CSV to a temp table and then issue INSERT ... SELECT ... ON DUPLICATE KEY UPDATE
I found that the best way to do this is to insert the file with the standard LOAD DATA LOCAL INFILE
LOAD DATA LOCAL INFILE
INTO TABLE db.table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 lines;
And use the following to delete duplicates. Note that the below command is comparing db.table to itself by defining it as both a and b.
delete a.* from db.table a, db.table b
where a.id > b.id
and a.field1 = b.field1
and a.field2 = b.field2
and a.field3 = b.field3;
To use this method it is essential that the id field is an auto incremental primary key.The above command then deletes rows that contain duplication on field1 AND field2 AND field3. In this case it will delete the row with the higher of the two auto incremental ids, this works just as well if we were to use < instead of >.