How can I import a TSV file when the numbers use comma as a decimal separator?
LOAD DATA INFILE '$filename' INTO TABLE dados_meteo IGNORE 3 LINES
($fields[0], $fields[1], $fields[2], $fields[3], $fields[4], $fields[5])
SET POM='$pom'
;
Try to replace ',' to '.' when loading.
For example -
LOAD DATA INFILE 'file.csv' INTO TABLE dados_meteo
(#var1, #var2)
SET column1 = REPLACE(#var1, ',', '.'), column2 = REPLACE(#var2, ',', '.')
If $field[0] is your numeric:
LOAD DATA INFILE '$filename' INTO TABLE dados_meteo IGNORE 3 LINES
(#var1, $fields[1], $fields[2], $fields[3], $fields[4], $fields[5])
SET POM='$pom', $field[0] = CONVERT(REPLACE(#var1,',', ''), DECIMAL(10));
If more are numeric, simply repeat the pattern with a #var2, #var3, etc. You'll want to replace the DECIMAL(10) with whatever your field really is.
Related
The following code doesn't work on the part of importing dates and i cant figure why. Dates in the csv are like DD/MM/YYYY and the error is it imports all the data but leaves null every date. Also error says:
ER_UNKNOWN_SYSTEM_VARIABLE: Unknown system variable 'FECHA_POSICION'
Lines in the csv file look like:
EDC00001,66600/7089855,21/01/2021,21/01/2021,"DEPOSIT Deposit",4000,4000
EDC00002,66600/7089855,29/01/2021,29/01/2021,CFDs,"-9,94","3990,06"
USE DATA_BASE;
CREATE TABLE ESTADO_DE_CUENTA (
ID_OPERACION VARCHAR(20) NOT NULL PRIMARY KEY,
ID_CUENTA VARCHAR(20),
FECHA_POSICION DATE,
FECHA_VALOR DATE,
CONCEPTO VARCHAR(100),
IMPORTE FLOAT(12, 2),
SALDO_EN_EFECTIVO FLOAT(12, 2)
);
LOAD DATA LOCAL INFILE 'PATH.csv' INTO TABLE ESTADO_DE_CUENTA2
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(ID_OPERACION, ID_CUENTA, #FECHA_POSICION, #FECHA_VALOR, CONCEPTO, IMPORTE,
SALDO_EN_EFECTIVO)
SET FECHA_POSICION = STR_TO_DATE(#FECHA_POSICION, '%d/%m/%Y')
SET FECHA_VALOR = STR_TO_DATE(#FECHA_VALOR, '%d/%m/%Y')
You can call SEt only once and all columsn have to be separated by comma
Like
USE DATA_BASE;
LOAD DATA LOCAL INFILE 'PATH.csv' INTO TABLE ESTADO_DE_CUENTA2
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(ID_OPERACION, ID_CUENTA, #FECHA_POSICION, #FECHA_VALOR, CONCEPTO, IMPORTE,
SALDO_EN_EFECTIVO)
SET `FECHA_POSICION` = STR_TO_DATE(#FECHA_POSICION, '%d/%m/%Y') , `FECHA_VALOR` = STR_TO_DATE(#FECHA_VALOR, '%d/%m/%Y')
I have a csv with 3 date columns but each is formatted dd/mm/yyyy H:i:s i.e 27/05/2019 20:25:00
I am trying to manipulate these to insert using LOAD DATA INFILE without any success:
My state looks like this:
LOAD DATA LOCAL INFILE '/file.csv'
INTO TABLE db_table FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
SET promotion_starts = str_to_date(#column7, '%d/%m/%Y %H:%i:%s'),
promotion_ends = str_to_date(#column8, '%d/%m/%Y %H:%i:%s'),
date_added = str_to_date(#column17, '%d/%m/%Y %H:%:%i:%s')
All other data inserts fine but the date columns are all null
You're missing the line of the query that specifies how the fields of the CSV file correspond to table columns, and defines #column7 and #column8.
LOAD DATA LOCAL INFILE '/file.csv'
INTO TABLE db_table FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(foo, bar, baz, xxx, yyy, zzz, #column7, #column8, aaa, bbb, ccc, ddd, eee, fff, ggg, hhh, #column17, iii, jjj)
SET promotion_starts = str_to_date(#column7, '%d/%m/%Y %H:%i:%s'),
promotion_ends = str_to_date(#column8, '%d/%m/%Y %H:%i:%s'),
date_added = str_to_date(#column17, '%d/%m/%Y %H:%:%i:%s')
Replace all the column names I made up with the actual column names in your table that correspond to the CSV fields.
I have a table like this:
CREATE TABLE `tblinquiries` (
`UID` varchar(50) DEFAULT NULL,
`ReviewDate` date NOT NULL,
`InquiryId` varchar(50) DEFAULT NULL,
`AuditStatus` varchar(50) DEFAULT NULL,
PRIMARY KEY (`InquiryId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I have a csv file with data:
UID,ReviewDate,InquiryId,AuditStatus
UID1,2018-07-06,109814969,Check
UID2,2018-07-06,109866072,Check
UID3,2018-07-06,109911408,Check
UID4,2018-07-06,109798278,Check
I use below command to to upload the data:
$location = '../uploads/';
$name = $_FILES["file"]["name"];
$filePath = $location.$name;
$table = 'tblinquiries';
LOAD DATA LOCAL INFILE "'.$filePath.'"
INTO TABLE '.$table.'
FIELDS TERMINATED by \',\' OPTIONALLY ENCLOSED BY \'"\'
LINES TERMINATED BY \'\n\'
IGNORE 1 LINES
It uploads the data but adds extra character "\r" from the second line. I exported the data and got like below:
('UID4', '2018-07-06', '109798278', 'Check'),
('UID1', '2018-07-06', '109814969', 'Check\r'),
('UID2', '2018-07-06', '109866072', 'Check\r'),
('UID3', '2018-07-06', '109911408', 'Check\r');
After running:
SELECT AuditStatus, LENGTH(AuditStatus) FROM `tblinquiries`
got:
AuditStatus LENGTH(AuditStatus)
Check 5
Check 6
Check 6
Check 6
How can I solve this?
I would assume that your source data has those \r control characters, because LOAD DATA doesn't typically add data to the source file (unless you tell it to do so, which does not appear to be the case). We can try running RTRIM on the AuditStatus column:
LOAD DATA LOCAL INFILE "'.$filePath.'"
INTO TABLE '.$table.'
FIELDS TERMINATED by \',\' OPTIONALLY ENCLOSED BY \'"\'
LINES TERMINATED BY \'\n\'
IGNORE 1 LINES
(UID, ReviewDate, InquiryId, #AuditStatus)
SET AuditStatus = RTRIM(#AuditStatus);
As #Sloan suggested I changed the line terminators and that solved the problem.
Here is the final code.
LOAD DATA LOCAL INFILE "'.$filePath.'"
INTO TABLE '.$table.'
FIELDS TERMINATED by \',\' OPTIONALLY ENCLOSED BY \'"\'
LINES TERMINATED BY \'\r\n\'
IGNORE 1 LINES
I have been importing some data from MySQL to Postgres, the plan should have been simple- manually re-create the tables with their equivalent data types, divise a way to output as CSV, transfer over the data, copy it into Postgres. Done.
mysql -u whatever -p whatever -d the_database
SELECT * INTO OUTFILE '/tmp/the_table.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\' FROM the_table;
send and import to postgres
psql -etcetc -d other_database
COPY the_table FROM '/csv/file/location/the_table.csv' WITH( FORMAT CSV, DELIMITER ',', QUOTE '"', ESCAPE '\', NULL '\N' );
It had been too long, I had forgotten that '0000-00-00' was a thing...
so first of all I had to come up with some way of addressing weird data types, preferably at the MySQL end and so wrote this script for the 20 or so tables I planned to import to address any imcompatabilities and list out the columns accordingly
with a as (
select
'the_table'::text as tblname,
'public'::text as schname
), b as (
select array_to_string( array_agg( x.column_name ), ',' ) as the_cols from (
select
case
when udt_name = 'timestamp'
then 'NULLIF('|| column_name::text || ',''0000-00-00 00:00:00'')'
when udt_name = 'date'
then 'NULLIF('|| column_name::text || ',''0000-00-00'')'
else column_name::text
end as column_name
from information_schema.columns, a
where table_schema = a.schname
and table_name = a.tblname
order by ordinal_position
) x
)
select 'SELECT '|| b.the_cols ||' INTO OUTFILE ''/tmp/'|| a.tblname ||'.csv'' FIELDS TERMINATED BY '','' OPTIONALLY ENCLOSED BY ''"'' ESCAPED BY ''\\'' FROM '|| a.tblname ||';' from a,b;
Generate CSV, ok. Transfer across, ok - Once over...
BEGIN;
ALTER TABLE the_table SET( autovacuum_enabled = false, toast.autovacuum_enabled = false );
COPY the_table FROM '/csv/file/location/the_table.csv' WITH( FORMAT CSV, DELIMITER ',', QUOTE '"', ESCAPE '\', NULL '\N' ); -- '
ALTER TABLE the_table SET( autovacuum_enabled = true, toast.autovacuum_enabled = true );
COMMIT;
and it was all going well, until I came across this message:
ERROR: invalid byte sequence for encoding "UTF8": 0xed 0xa0 0xbd
CONTEXT: COPY new_table, line 12345678
a second table also encountered the same error however every other one imported successfully.
Now all columns and tables in the MySQL db were set to utf8, the first offending table containing messages was along the lines of
CREATE TABLE whatever(
col1 int(11) NOT NULL AUTO_INCREMENT,
col2 date,
col3 int(11),
col4 int(11),
col5 int(11),
col6 int(11),
col7 varchar(64),
PRIMARY KEY(col1)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
So presumably the data should be utf... right? to make sure there were no major errors I edited the my.cnf to ensure everything I could think of to include the encoding
[character sets]
default-character-set=utf8
default-character-set=utf8
character-set-server = utf8
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
I altered my initial "query generating query" case statement to convert columns for the sake of converting
case
when udt_name = 'timestamp'
then 'NULLIF('|| column_name::text || ',''0000-00-00 00:00:00'')'
when udt_name = 'date'
then 'NULLIF('|| column_name::text || ',''0000-00-00'')'
when udt_name = 'text'
then 'CONVERT('|| column_name::text || ' USING utf8)'
else column_name::text
end as column_name
and still no luck. After googling "0xed 0xa0 0xbd" I am still none the wiser, character sets are not really my thing.
I even opened the 3 gig csv file to the line it mentioned and there didn't appear to be anything out of place, looking with a hex editor I could not see those byte values (edit: maybe I didn't look hard enough) so I am starting to run out of ideas. Am I missing something really simple, and worryingly, is it possible that some of the other tables may have been more "silently" corrupted too?
The MySQL version is 5.5.44 on a ubuntu 14.04 operating system and the Postgres is 9.4
Without any further things to try I went for the simplest solution, just alter the files
iconv -f utf-8 -t utf-8 -c the_file.csv > the_file_iconv.csv
there were about 100 bytes between the new files and the originals, so there must've been invalid bytes in there somewhere that I could not see, they imported "properly" so I suppose that is good, however it would be nice to know if there were some way to enforce proper encoding when creating the files before discovering about it on import.
Suck with the following:
$loaddata = "LOAD DATA INFILE 'filename.csv'
INTO TABLE tb1
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\\r\\n'
IGNORE 1 LINES
(
Entity,
HK,
#Period,
)
SET Period = STR_TO_DATE(#Period,'%C%YY%MM')
";
which gives me and sql syntax error near
) SET Period = STR_TO_DATE(#Period,'%C%YY%MM')
Period is a DATE variable. for the period Oct-13 the cvs will show 11310.
tks in advance!
You have a superfluous comma after #Period:
$loaddata = "LOAD DATA INFILE 'filename.csv'
INTO TABLE tb1
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\\r\\n'
IGNORE 1 LINES
(
Entity,
HK,
#Period -- , removed here
)
SET Period = STR_TO_DATE(#Period,'%C%YY%MM')
";
However, your date format string is almost certainly incorrect. %C, %YY and %MM are invalid specifiers. See DATE_FORMAT().