I have a CSV file with three columns of "movieId", "imdbId", "tmdbId". The column "tmdbId" contains multiple empty rows.(movieId is a froeign key referring to a primary key in another table)
When I read this data frame into R, the empty rows are treated as NA values. If I import this CSV file into mysql DB using the following command, the rows with NA values don’t get inserted in the table, even though I allow NULL values. I should also mention that, I do not get any errors.
Beside the following command, I also tried importing the dataset using MySQL workbench, but it did not work.
any suggestion?
LOAD DATA LOCAL INFILE 'links.csv' INTO TABLE links
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(movieId, imdbId, tmdbId);
I know NULL and NA values are not the same, but I do not understand why R treats empty rows as NA. I tried to replace NA with NULL, but R does not support this operation.
The TABLE
CREATE TABLE links (
movieId int NOT NULL,
imdbId int DEFAULT NULL,
tmdbId int DEFAULT NULL,
KEY movieId (movieId),
CONSTRAINT links_ibfk_1 FOREIGN KEY (movieId) REFERENCES movieId_title (movieId)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
The CSV file looks like this:
enter image description here
Here is an example of an empty row for the third column:
enter image description here
As #Alec suggested, you can do set foreign_key_checks = 0. Then, you can replace zeros with NULL using the following command:
UPDATE table_name
SET col_name= NULL
WHERE col_name = 0;
Related
I have a .csv file which look like this:
and that can be downloaded from here.
I create my DB and my table with the following code:
CREATE DATABASE test_schema;
CREATE TABLE test_schema.teams (
teamkey SMALLINT NOT NULL AUTO_INCREMENT,
teamid CHAR(3) NOT NULL,
yearid YEAR(4) NOT NULL,
leagueid CHAR(2) NOT NULL,
teamrank TINYINT(2) NOT NULL,
PRIMARY KEY (teamkey),
UNIQUE KEY teamkey_UNIQUE (teamkey),
KEY teamid_yearid_leagueid_UNIQUE (teamid, yearid, leagueid),
CONSTRAINT check_teamrank CHECK (((teamrank >= 0) and (teamrank <= 12))),
CONSTRAINT check_year CHECK (((yearid >= 1871) and (yearid <=2155))))
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Now, when I try to import using:
LOAD DATA LOCAL INFILE "path_to_file_in_my_computer/Teams.csv"
INTO TABLE test_schema.teams
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(#teamID, #yearID, #lgID, #Rank);
I get 2895 warnings which are all the same:
Warning (Code 3819): Check constraint 'check_year' is violated.
This warning makes no sense since yearid goes from 1871 to 2018 as can be corroborated if you look at the structure of the Teams.csv file. So any advice or suggestion on how to handle this error will be much appreciated. I'm working on MySQL Workbench 8.0.
PS: I posted a similar question (deleted) today morning but it needed more details that are provided here.
You don't have the column names in the correct order in the LOAD DATA query.
LOAD DATA LOCAL INFILE "path_to_file_in_my_computer/Teams.csv"
INTO TABLE test_schema.teams
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(yearid, leagueid, teamid, #franchid, #divid, teamrank);
You can assign directly to the table column names, you don't need to use #yearID unless you have to do extra processing before storing in the table.
I'm working on a new web project right now, but the data is stored in the excel program, I don't want to add them to the list manually, do you think this is possible?
You have some ways of doing it:
You can use load data.
Let's say you have the table below:
CREATE TABLE `set_of_data` (
`id` int NOT NULL AUTO_INCREMENT,
`x` varchar(10) DEFAULT NULL,
`y` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB ;
Your excel file should be in .csv file format :
The you can use load data.
LOAD DATA INFILE '/var/lib/mysql/your_data.csv' ---path of your file in server, it could be '/var/lib/mysql-files/your_data.csv'
IGNORE INTO TABLE set_of_data
FIELDS TERMINATED BY ';'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(id,x,y);
Another way is that you can create an excel formula for your data and insert it.
This is for small tables, with not so much data.
Is it possible to LOAD DATA a csv into mysql without having to add empty values for non existing columns at the end?
All my optional columns are sorted at the end of the schema:
CREATE TABLE `person` (
id int(20) NOT NULL AUTO_INCREMENT,
firstname varchar(30) NOT NULL,
lastname varchar(30) NOT NULL,
optional1 varchar DEFAULT NULL,
optional... varchar DEFAULT NULL,
optional50 varchar DEFAULT NULL,
PRIMARY KEY (`id`)
) engine=innodb AUTO_INCREMENT=0;
sample.csv:
1;john;doe
2;jabe;doe;;;opt val3;;;;;;opt val9;;;;;;...
Important: I don't want to explicit list all the columns in my LOAD DATA INFILE sql statement (I know that this would work by using a combination of IFNULL and #var).
But can't I just load into the table, telling mysql to ignore any missing fields at the end of each line?
The documentation of MySQL LOAD DATA syntax provides the following information:
By default, when no column list is provided at the end of the LOAD DATA statement, input lines are expected to contain a field for each table column. If you want to load only some of a table's columns, specify a column list.
[...]
If an input line has too few fields, the table columns for which input fields are missing are set to their default values. For numeric types, the column is set to 0.
[...]
An empty field value is interpreted different from a missing field: for string types, the column is set to the empty string.
So given your sample data:
1;john;doe
2;jabe;doe;;;opt val3;;;;;;opt val9;;;;;;...
Record with id 1 will have all optional columns set to NULL (ie their default). For id2, optional string columns will be set to the empty string. .
I cannot tell whether this would be OK for your use case or not. If you do want consistent values in the optional columns, available options would be:
input pre-processing: use SET to set to NULL columns that contains an empty string
LOAD DATA INFILE 'file.txt' INTO TABLE t1
SET
optional1 = NULLIF(optional1, ''),
optional2 = NULLIF(optional1, ''),
...
set up a BEFORE INSERT trigger on the table that sets to NULL empty values
run an update on the table after it was populated
UPDATE t1 SET optional1 = NULLIF(option1, ''), optional2 = NULLIF(optional1, '')
WHERE '' IN (optional1, optional2, ...)
I found out it works as expected if adding the IGNORE keyword to the LOAD DATA statement:
LOAD DATA INFILE 'sample.csv' IGNORE INTO TABLE persons
Thereby, I can define all my optional columns as DEFAULT NULL, and if values are missing, they are set to NULL during import.
I ahve excel sheet with the following columns
Sam/Simon Date Store Customer name Original Order Number
Simon 09/11/2014 Bristol Cr Car 20089691/ 26089697
I need to store these infomration in 2 rows in tables
Simon 09/11/2014 Bristol Cr Car 20089691/
Simon 09/11/2014 Bristol Cr Car 26089697
I need to know the table structure. Split comma separated values from one column to 2 rows in the results and exporting them from excel to mysql.
the actual table structure is as follows.
CREATE TABLE "tblOrderR" (
"intOrderRemedialId" int(10) unsigned NOT NULL AUTO_INCREMENT,
"intOrderId" int(10) unsigned NOT NULL,
"intOrderRemedialGivenPence" int(10) unsigned NOT NULL,
"intRequestedById" smallint(5) unsigned DEFAULT NULL,
"intAuthorizedById" smallint(5) unsigned DEFAULT NULL,
PRIMARY KEY ("intOrderRemedialId"),
KEY "tblOrderRemedial" ("intOrderId"),
KEY "tblOrderRemedial_ibfk_2" ("intRequestedById"),
KEY "tblOrderRemedial_ibfk_3" ("intAuthorizedById"),
CONSTRAINT "tblOrderRemedial_ibfk_1" FOREIGN KEY ("intOrderId") REFERENCES "tblOrder" ("intOrderId"),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Order Remedial Information';
One way to approach this would be:
1) Split out "Original Order Number" Excel column into two separate columns using TEXT TO COLUMNS
2) Save data in csv format
3) Load data into a staging table that looks something like this
samSimon,date,store,customerName,originalOrder1,originalOrder2
4) Run two inserts into your final table: one with originalOrder1 and a second one with originalOrder2. For example:
insert into tblOrderR (columns)
select samSimon,date,store,customerName,originalOrder1 from stagingTable;
insert into tblOrderR (columns)
select samSimon,date,store,customerName,originalOrder2 from stagingTable;
Very pseudo-code answer but hopefully you get the gist of it!
Just import two times.
First time (example code, you might have to adjust the one or the other thing):
LOAD DATA LOCAL INFILE '/path/to/file/excel.csv'
FIELDS TERMINATED BY ';'
LINES TERMINATED BY '\n'
(column1, column2, #my_variable)
SET column3 = SUBSTRING(#my_variable FROM 1 FOR LOCATE('/', #my_variable));
Second time:
LOAD DATA LOCAL INFILE '/path/to/file/excel.csv'
FIELDS TERMINATED BY ';'
LINES TERMINATED BY '\n'
(column1, column2, #my_variable)
SET column3 = SUBSTRING(#my_variable FROM LOCATE('/', #my_variable) + 1);
Relevant manual pages:
string functions like substring() and locate()
load data infile
I've searched all over and can't seem to figure this out, so here is my first Stack Exchange question.
I'm using a java program to run the bulk load process, but I've also tried it straight from my sql client, MySQL Workbench, and I get the same error:
LOAD DATA INFILE '/path/to/file/infile.csv'
INTO TABLE t1
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(category, item, date_time, v1, v2, v3);
Error:
Error Code: 1062. Duplicate entry ''Book'-'Fiction'-2014-04-16 09:33:00' for key 'PRIMARY'
Using my sql client I've confirmed that there is no such current record in the table, in fact I don't have any records for the same category-type pair in the same month. I have many (~16,000) CSV files to load into my MySQL database each month, each file corresponds to a separate category-type pair with different values over the course of the month. I have been successful with this method so far having loaded over 50 million records, however I can't seem to load any more without getting this same error.
My table uses 3 fields to create the PRIMARY key, 2 varchar() and a datetime
'CREATE TABLE `t1` (
`category` varchar(10) NOT NULL,
`item` varchar(15) NOT NULL DEFAULT '''',
`date_time` datetime NOT NULL,
`v1` double DEFAULT NULL,
`v2` double DEFAULT NULL,
`v3` double DEFAULT NULL,
PRIMARY KEY (`category`,`type`,`date_time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
I have worked with databases in the past, but nowhere near this many
records, I don't know if that is the problem.
I could switch to using an auto-incremented id for my primary key,
but it may take up more room considering the large number of records
and I may get duplicates for my (category, item, date_time) which
would be problematic.
I know that MySQL permits a “relaxed” format for values specified as
strings, and I may need to do some additional formatting to figure
this out.
I deleted the first line of my csv file with the value
''Book'-'Fiction'-2014-04-16 09:33:00', but then I get the same 1062
error for the next date time value ''Book'-'Fiction'-2014-04-16
09:35:00'
I thought it might be the way I am formatting my Datetime string
but I am using the "YYYY-MM-DD HH:MM:SS" format which has worked
on thousands of other LOAD DATA INFILE. Just to be safe I tried
using the STR_TO_DATE() function see below
LOAD DATA INFILE '/path/to/file/infile.csv'
INTO TABLE t1
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(category, item, #date_var, v1, v2, v3)
SET date_time = STR_TO_DATE(#date_var, '%Y-%m-%d %H:%i:%s');
Any help would be appreciated.