Related
I'm trying to read a csv file and load the values to a mysql table.
My csv file looks like this:
"1026235","2172","Werdmühlestrasse","4","400","Werdmühlestrasse 4","3","real","BB","261AA01857","3169179","2683137.449","1247708.724","8001","0","Zürich","AA1750","","K","1","Lindenhof","13","1301","Zürichberg","Altstadt","St.Peter u Paul","St.Peter","1026238","562","Fortunagasse","15","1500","Fortunagasse 15","3","real","BB","261AA01852","140709","2683163.645","1247502.811","8001","0","Zürich","AA5297","","K","1","Lindenhof","13","1301","Zürichberg","Altstadt","St.Peter u Paul","St.Peter","1","3","3","29.0","8.539764579706915","47.373115180353350","POINT (2683163.8 1247502.8)"
This is the command I'm trying to run:
LOAD DATA INFILE '/home/coder/project/geoz.adrstzh_adressen_stzh_p.csv'
INTO TABLE mainZuerichAddresses
FIELDS
TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(#col1,#dummy,#col3,#col4,#dummy,#col6,#col7,#dummy,#dummy,#col10,#dummy,#dummy,#dummy,#col14,#dummy,#col16,#dummy,#dummy,#dummy,#col20,#col21,#dummy,#col23,#col24,#col25,#col26,#col27,#col28,#col29,#dummy,#dummy,#dummy,#dummy,#col34)
SET objid=#col29,gebaeudeeingangnummer=#col1,adresse=#col6,lokalisationsname=#col3,
hausnummer=#col4,plz=#col14,plz_ortschaft=#col16,stadtkreis=#col20,
gebaeudenummer=#col10,statistisches_quartier=#col21,status=#col7,statistische_zone=#col23,schulkreis=#col24,verwaltungsquartier=#col25,
roem_kath_kirchgemeinde=#col26,ev_ref_kirchgemeinde=#col27,ev_ref_kirchenkreis=#col28,geometry=#col34;
Here I added all the 34 columns from the csv file:
(#col1,#dummy,#col3,#col4,#dummy,#col6,#col7,#dummy,#dummy,#col10,#dummy,#dummy,#dummy,#col14,#dummy,#col16,#dummy,#dummy,#dummy,#col20,#col21,#dummy,#col23,#col24,#col25,#col26,#col27,#col28,#col29,#dummy,#dummy,#dummy,#dummy,#col34)
and here I'm trying to add the data to the table columns I have, which are in a different order than the csv and I don't need all of them, only 18. (Can I even do that, cherry-pick columns from the csv file and mix their order?)
SET objid=#col29,gebaeudeeingangnummer=#col1,adresse=#col6,lokalisationsname=#col3,hausnummer=#col4,plz=#col14,plz_ortschaft=#col16,stadtkreis=#col20,gebaeudenummer=#col10,statistisches_quartier=#col21,status=#col7,statistische_zone=#col23,schulkreis=#col24,verwaltungsquartier=#col25,roem_kath_kirchgemeinde=#col26,ev_ref_kirchgemeinde=#col27,ev_ref_kirchenkreis=#col28,geometry=#col34;
But I'm keep getting this error:
ERROR 1366 (HY000): Incorrect integer value: 'Werdmühlestrasse 4' for column 'plz' at row 1
I read the documentation, but it's not very clear how the mysql should be formatted.:
You must also specify a column list if the order of the fields in the
input file differs from the order of the columns in the table.
Otherwise, MySQL cannot tell how to match input fields with table
columns.
I based my mysql command on this question, but it's quite old.
I also found this question which gave some advice about FIELDS and LINES termination so I played around a bit with that.
I'm not sure if the csv formatting is the problem or the order I'm trying to load the data from the csv into the table colums.
Someone has an idea?
look carefully on error message.
There says value: 'Werdmühlestrasse 4' is not integer.
There are a number of questions in your question for this part 'it's not very clear how the mysql should be formatted'
What appears in brackets defines the order of columns in the csv file and should include all columns for example
given a csv file
name,junk,val
mike,1234,aaa
bob,4567,bbb
steve,8910,ccc
and a table
create table t(id int auto_increment primary key,
name varchar(20),
junk varchar(20),
val varchar(20));
The following will fail because I have not provided a column list in the csv file and load data infile attempts to load the first field in input file to id and the datatype does not match the datatype for id in the table.
LOAD DATA INFILE 'C:\\Program Files\\MariaDB 10.1\\data\\sandbox\\data.txt'
INTO TABLE t
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"'
LINES TERMINATED BY '\r\n' IGNORE 1 ROWS;
in fact I want to allow auto increment so I specify the target columns for all the input file columns.
LOAD DATA INFILE 'C:\\Program Files\\MariaDB 10.1\\data\\sandbox\\data.txt'
INTO TABLE t
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"'
LINES TERMINATED BY '\r\n' IGNORE 1 ROWS
(name,junk,val);
+----+-------+------+------+
| id | name | junk | val |
+----+-------+------+------+
| 1 | mike | 1234 | aaa |
| 2 | bob | 4567 | bbb |
| 3 | steve | 8910 | ccc |
+----+-------+------+------+
3 rows in set (0.001 sec)
and if I want col3 in the file for go to name in table and col1 in file to go to val in table
LOAD DATA INFILE 'C:\\Program Files\\MariaDB 10.1\\data\\sandbox\\data.txt'
INTO TABLE t
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"'
LINES TERMINATED BY '\r\n' IGNORE 1 ROWS
(val,junk,name);
+----+-------+------+-------+
| id | name | junk | val |
+----+-------+------+-------+
| 1 | mike | 1234 | aaa |
| 2 | bob | 4567 | bbb |
| 3 | steve | 8910 | ccc |
| 4 | aaa | 1234 | mike |
| 5 | bbb | 4567 | bob |
| 6 | ccc | 8910 | steve |
+----+-------+------+-------+
6 rows in set (0.001 sec)
and if I want to load a column from the input file park it in a user defined variable and do nothing with it (as you have done)
LOAD DATA INFILE 'C:\\Program Files\\MariaDB 10.1\\data\\sandbox\\data.txt'
INTO TABLE t
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"'
LINES TERMINATED BY '\r\n' IGNORE 1 ROWS
(val,#dummy,name);
+----+-------+------+-------+
| id | name | junk | val |
+----+-------+------+-------+
| 1 | mike | 1234 | aaa |
| 2 | bob | 4567 | bbb |
| 3 | steve | 8910 | ccc |
| 4 | aaa | 1234 | mike |
| 5 | bbb | 4567 | bob |
| 6 | ccc | 8910 | steve |
| 7 | aaa | NULL | mike |
| 8 | bbb | NULL | bob |
| 9 | ccc | NULL | steve |
+----+-------+------+-------+
9 rows in set (0.001 sec)
Another use for the user defined variables is for input processing see the manual for examples of this https://dev.mysql.com/doc/refman/8.0/en/load-data.html
in your case you aren't doing any input transformations so all those set statements appear to be unnecessary, but will work.
I have a mysql data table and a csv file, the table has a json type column, and the csv file has a corresponding json type field, I use the "load data local infile..." method to import the csv file into mysql , there is a problem with this process.
here is my datasheet details:
mysql> desc test;
+---------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| content | json | YES | | NULL | |
| address | varchar(255) | NO | | NULL | |
| type | int | YES | | 0 | |
+---------+--------------+------+-----+---------+----------------+
and my sql statement:
mysql> load data local infile '/Users/kk/Documents/test.csv'
-> into table test
-> fields terminated by ','
-> lines terminated by '\n'
-> ignore 1 rows
-> (id,address,content,type);
ERROR 3140 (22032): Invalid JSON text: "The document root must not be followed by other values." at position 3 in value for column 'test.content'.
My csv file data is as follows
"id","address","content","type"
1,"test01","{\"type\": 3, \"chain\": 1, \"address\": \"test01\"}",1
2,"test02","{\"type\": 3, \"chain\": 2, \"address\": \"test02\"}",1
If you are able to hand-craft a single insert statement that works (example here) you could go via a preprocessor written in a simple scripting language. Python, AutoIT, PowerShell, ... Using a preprocessor you have more control of fields, quoting, ordering etc compared to direct import in MySQL.
So for example (assuming you have used Python)
python split.py /Users/kk/Documents/test.csv > /tmp/temp.sql
mysql -h myhostname -u myUser mydatabase < temp.sql
where temp.sql would be something like
insert into test (content, address, type) values (`{"type":3,"chain":1,"address":"test01"}`, `test01`, 1);
...
I have a csv file which I have uploaded here
https://drive.google.com/file/d/1JfYc-7840utoa3k5iamEC-sScPlzsVVK/view
I have created a Table Titanic it has following structure.
mysql> desc Titanic;
+----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| last | varchar(255) | NO | | NULL | |
| first | varchar(255) | NO | | NULL | |
| gender | char(2) | NO | | NULL | |
| age | decimal(3,0) | YES | | NULL | |
| class | int(3) | NO | | NULL | |
| fare | decimal(5,0) | NO | | NULL | |
| embarked | varchar(255) | NO | | NULL | |
| survived | char(3) | NO | | NULL | |
+----------+--------------+------+-----+---------+-------+
8 rows in set (1.89 sec)
I have been asked to use LOAD DATA INFILE Statement to populate this table,
and as per my assignment
A blank entry for age means that the age is unknown
Fare can have more than two digits because money was not base-10 at that time
I try to execute the statement as follows
mysql> load data infile '/var/lib/mysql-files/Titanic.csv' into table Titanic,fields terminated by ',' optinally enclosed by '"' lines terminated by '\n' ignore1 lines;
I get error
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ',fields terminated by ',' optinally enclosed by '"' lines terminated by '\n' ign' at line 1
I try following
mysql> load data infile '/var/lib/mysql-files/Titanic.csv' into table Titanic,fields terminated by ',' optinally enclosed by '"' lines terminated by '\n' ignore
1 lines;
error I get is
ERROR 1064 (40000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ',fields terminated by ',' optinally enclosed by '"' lines terminated by '\n' ign' at line 1
If you look at the csv file in the link I gave row 7 has entry
Moran Mr. James M 3 8.4583 Queenstown no
there is no age mentioned in above row.So age has been assumed to be NULL hence while creating table I used NULL in create table for age.
row 24 has following entry
McGowan Miss Anna "Annie" F 15 3 8.0292 Queenstown yes
decimal value in fare.
row 87 has following entry
Backstrom Mrs. Karl Alfred (Maria Mathilda Gustafsson) F 33 3 15.85 Southampton yes
has a bracket in column whose title is first.
row 150 has following
Navratil Mr. Michel ("Louis M Hoffman") M 36.5 2 26 Southampton no
is having " " which I have in some fields not always.
I am not able to understand how to use LOAD DATA INFILE statement to use this csv which I have.
I am doing it for learning so I do not want to use any GUI tool.
What is the mistake in above LOAD DATA statement which I am trying to execute?
How can I use load data in this kind of csv where some values like double quotes " " and brackets () appear in some fields and some fields do not have any thing in them they are blank or NULL.
I am using mysql on Ubuntu 19.10.
Server version: 8.0.18-0ubuntu0.19.10.1 (Ubuntu)
update 1
as per the discussion in comments here I am pasting the csv file I have as text
last,first,gender,age,class,fare,embarked,survived
Braund,Mr. Owen Harris,M,22,3,7.25,Southampton,no
Cumings,Mrs. John Bradley (Florence Briggs Thayer),F,38,1,71.2833,Cherbourg,yes
Heikkinen,Miss Laina,F,26,3,7.925,Southampton,yes
Futrelle,Mrs. Jacques Heath (Lily May Peel),F,35,1,53.1,Southampton,yes Allen,Mr. William Henry,M,35,3,8.05,Southampton,no
Moran,Mr. James,M,,3,8.4583,Queenstown,no
McGowan," Miss Anna ""Annie""",F,15,3,8.0292,Queenstown,yes
Backstrom,Mrs. Karl Alfred (Maria Mathilda Gustafsson),F,33,3,15.85,Southampton,yes
Ford," Miss Robina Maggie ""Ruby""",F,9,3,34.375,Southampton,no
Navratil," Mr. Michel (""Louis M Hoffman"")",M,36.5,2,26,Southampton,no Byles,Rev. Thomas Roussel Davids,M,42,2,13,Southampton,no
the full csv in text form can be seen here https://pastebin.com/1B1mVYhJ
apart from this here is a screenshot of how it looks at my system when I issue a load data query
load data infile query
update 2
I have done this assignment by changing the definition of table created rather than taking all values as different different data types I took all of them as varchar
the questions which I was tried to do are here
http://arshahuja.blogspot.com/2018/01/deit-14610-big-data-analytics-laboratory.html
solution is also there the only problem was using this kind of csv file.
However I am not very convinced by creating a table like this and Loading the data as mentioned in above problem scenarios following table definition solved my problem. But what if I need to use these values like age and class ,fares in some mathematical calculations then how will I go for writing a query which has every thing as varchar?
mysql> desc Titanic;
+----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| last | varchar(255) | NO | | NULL | |
| first | varchar(255) | NO | | NULL | |
| gender | varchar(255) | NO | | NULL | |
| age | varchar(255) | YES | | NULL | |
| class | varchar(255) | NO | | NULL | |
| fare | varchar(255) | NO | | NULL | |
| embarked | varchar(255) | NO | | NULL | |
| survived | varchar(3) | NO | | NULL | |
+----------+--------------+------+-----+---------+-------+
8 rows in set (0.06 sec)
You have a comma after your table name.
load data infile '/var/lib/mysql-files/Titanic.csv' into table Titanic,fields terminated by ...
If you look at the syntax documentation at https://dev.mysql.com/doc/refman/8.0/en/load-data.html and the example of a complete statement, there is no comma after the table name.
LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test
FIELDS TERMINATED BY ',' LINES STARTING BY 'xxx';
In general, when MySQL reports a syntax error, it tells you exactly at which point in the statement it found something it didn't think matched the syntax rules.
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ',fields terminated by ',' optinally enclosed by '"' lines terminated by '\n' ign' at line 1
The error above tells you it got confused at the comma right before "fields terminated by..."
That's where you should double-check your statement against the syntax reference documentation, or other examples of working statements.
You've misspelled optionally:
mysql> load data infile '/var/lib/mysql-files/Titanic.csv' into table Titanic,fields terminated by ',' optinally enclosed by '"' lines terminated by '\n' ignore 1 lines;
FIY:
I'm working with a CVS file from Census - FactFinder
Using MySQL 5.7
OS is Windows 10 PRO
So, I created this table:
+----------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------+------+-----+---------+-------+
| SERIALNO | bigint(13) | NO | PRI | NULL | |
| DIVISION | int(9) | YES | | NULL | |
| PUMA | int(4) | YES | | NULL | |
| REGION | int(1) | YES | | NULL | |
| ST | int(1) | YES | | NULL | |
| ADJHSG | int(7) | YES | | NULL | |
| ADJINC | int(7) | YES | | NULL | |
| FINCP | int(6) | YES | | NULL | |
| HINCP | int(6) | YES | | NULL | |
| R60 | int(1) | YES | | NULL | |
| R65 | int(1) | YES | | NULL | |
+----------+------------+------+-----+---------+-------+
And tried to load data using:
LOAD DATA INFILE "C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/Housing_Illinois.csv"
INTO TABLE housing
CHARACTER SET latin1
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\n'
It didn`t work as this message appear:
ERROR 1366 (HY000): Incorrect integer value: '' for column 'FINCP' at
row 2
The row the error message is referring to is:
2012000000051,3,104,2,17,1045360,1056030,,8200,1,1
I believed FINCP which is the blank value ,, right before 8200 is the problem. So I followed this thread instructions: MySQL load NULL values from CSV data
And updated my code to:
LOAD DATA INFILE "C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/Housing_Illinois.csv"
INTO TABLE housing
CHARACTER SET latin1
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\n'
(#SERIALNO, #DIVISION, #PUMA, #REGION, #ST, #ADJHSG, #ADJINC, #FINCP, #HINCP, #R60, #R65)
SET
SERIALNO = nullif(#SERIALNO,''),
DIVISION = nullif(#DIVISION,''),
PUMA = nullif(#PUMA,''),
REGION = nullif(#REGION,''),
ST = nullif(#ST,''),
ADJHSG = nullif(#ADJHSG,''),
ADJINC = nullif(#ADJINC,''),
FINCP = nullif(#FINCP,''),
HINCP = nullif(#HINCP,''),
R60 = nullif(#R60,''),
R65 = nullif(#R65,'');
The first error is now gone but this message appears:
' for column 'R65' at row 12t integer value: '
The row at which this message is referring to is:
2012000000318,3,1602,2,17,1045360,1056030,,,,
There's no error message so I don't know what exactly is the problem. I can only assume that the problem is that there are four consecutive blank values.
Another tip, if I use CSV and change all blank to 0 the code goes smoothly, but I`m not a fan or editing raw data so I would like to know other options.
Bottom line, I have two questions:
Shouldn`t data be loaded with the first code as MySQL should take ,, as null and 0 a plain 0?
What's the problem I'm getting now that I'm using SERIALNO = nullif(#SERIALNO,'')
I want to be able to differentiate between 0 and null/blank values.
Thank you.
MySQL's LOAD DATA tool interprets \N as being a NULL value. So, if your troubled row looked like this:
2012000000318,3,1602,2,17,1045360,1056030,\N,\N,\N,\N
then you might not have this problem. If you have access to a regex replacement tool, you may try searching for the following pattern:
(?<=^)(?=,)|(?<=,)(?=,)|(?<=,)(?=$)
Then, replace with \N. This should fill in all the empty slots with \N, which semantically will be interpreted by MySQL as meaning NULL. Note that if you were to write a table out from MySQL, then nulls would be replaced with \N. The issue is that your data source and MySQL don't know about each other.
Query:
LOAD DATA LOCAL INFILE 'actors.csv'
INTO TABLE Actors
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(ACTOR_ID, FNAME, LNAME);
CSV File:
ACTOR_ID, FNAME, LNAME
"66666","Billy","Lou"
"77777","Sally","Lou"
"88888","Hilly","Lou"
mysql> describe Actors;
+----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| ACTOR_ID | char(5) | NO | PRI | | |
| FNAME | varchar(20) | NO | | NULL | |
| LNAME | varchar(20) | NO | | NULL | |
+----------+-------------+------+-----+---------+-------+
> The output after running query:
| 10047 | Shirley | Jones |
| 10048 | Andre | Vippolis |
| 66666 | Billy | Lou"
"77777 |
| 88888 | Hilly | "Lou"
|
+----------+-------------+---------------+
I am trying to put a CSV file into my database. I've gotten the query
from a MySQL tutorial (except put the values I have in there). When I
run the query, My data is not properly inserted. I already have 2 rows
inserted (10047, 10048) and then I try to put the data from the CSV
file in, but it does not go in properly. It seems that the quotations
are not being read properly. But the statement ENCLOSED BY '"'
should handle the quotations. What am I doing wrong here?
It seems there is \r between
"Lou"
"77777"
and not \n
Use text editor to correct this.
Found a related so post
CSV files frequently have a carriage return/line feed as the line terminator. If the file was generated using Excel, for example, you will almost definitely have that.
A way to correct that is to modify your code as follows:
LOAD DATA LOCAL INFILE 'actors.csv'
INTO TABLE Actors
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(ACTOR_ID, FNAME, LNAME);
I do most of my CSV importing that way.