#1062 - Duplicate entry for key 'PRIMARY' on LOAD DATA INFILE - mysql

I've searched all over and can't seem to figure this out, so here is my first Stack Exchange question.
I'm using a java program to run the bulk load process, but I've also tried it straight from my sql client, MySQL Workbench, and I get the same error:
LOAD DATA INFILE '/path/to/file/infile.csv'
INTO TABLE t1
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(category, item, date_time, v1, v2, v3);
Error:
Error Code: 1062. Duplicate entry ''Book'-'Fiction'-2014-04-16 09:33:00' for key 'PRIMARY'
Using my sql client I've confirmed that there is no such current record in the table, in fact I don't have any records for the same category-type pair in the same month. I have many (~16,000) CSV files to load into my MySQL database each month, each file corresponds to a separate category-type pair with different values over the course of the month. I have been successful with this method so far having loaded over 50 million records, however I can't seem to load any more without getting this same error.
My table uses 3 fields to create the PRIMARY key, 2 varchar() and a datetime
'CREATE TABLE `t1` (
`category` varchar(10) NOT NULL,
`item` varchar(15) NOT NULL DEFAULT '''',
`date_time` datetime NOT NULL,
`v1` double DEFAULT NULL,
`v2` double DEFAULT NULL,
`v3` double DEFAULT NULL,
PRIMARY KEY (`category`,`type`,`date_time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
I have worked with databases in the past, but nowhere near this many
records, I don't know if that is the problem.
I could switch to using an auto-incremented id for my primary key,
but it may take up more room considering the large number of records
and I may get duplicates for my (category, item, date_time) which
would be problematic.
I know that MySQL permits a “relaxed” format for values specified as
strings, and I may need to do some additional formatting to figure
this out.
I deleted the first line of my csv file with the value
''Book'-'Fiction'-2014-04-16 09:33:00', but then I get the same 1062
error for the next date time value ''Book'-'Fiction'-2014-04-16
09:35:00'
I thought it might be the way I am formatting my Datetime string
but I am using the "YYYY-MM-DD HH:MM:SS" format which has worked
on thousands of other LOAD DATA INFILE. Just to be safe I tried
using the STR_TO_DATE() function see below
LOAD DATA INFILE '/path/to/file/infile.csv'
INTO TABLE t1
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(category, item, #date_var, v1, v2, v3)
SET date_time = STR_TO_DATE(#date_var, '%Y-%m-%d %H:%i:%s');
Any help would be appreciated.

Related

SQL: "Warning (Code 3819): Check constraint is violated" makes no sense when using LOAD DATA LOCAL

I have a .csv file which look like this:
and that can be downloaded from here.
I create my DB and my table with the following code:
CREATE DATABASE test_schema;
CREATE TABLE test_schema.teams (
teamkey SMALLINT NOT NULL AUTO_INCREMENT,
teamid CHAR(3) NOT NULL,
yearid YEAR(4) NOT NULL,
leagueid CHAR(2) NOT NULL,
teamrank TINYINT(2) NOT NULL,
PRIMARY KEY (teamkey),
UNIQUE KEY teamkey_UNIQUE (teamkey),
KEY teamid_yearid_leagueid_UNIQUE (teamid, yearid, leagueid),
CONSTRAINT check_teamrank CHECK (((teamrank >= 0) and (teamrank <= 12))),
CONSTRAINT check_year CHECK (((yearid >= 1871) and (yearid <=2155))))
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Now, when I try to import using:
LOAD DATA LOCAL INFILE "path_to_file_in_my_computer/Teams.csv"
INTO TABLE test_schema.teams
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(#teamID, #yearID, #lgID, #Rank);
I get 2895 warnings which are all the same:
Warning (Code 3819): Check constraint 'check_year' is violated.
This warning makes no sense since yearid goes from 1871 to 2018 as can be corroborated if you look at the structure of the Teams.csv file. So any advice or suggestion on how to handle this error will be much appreciated. I'm working on MySQL Workbench 8.0.
PS: I posted a similar question (deleted) today morning but it needed more details that are provided here.
You don't have the column names in the correct order in the LOAD DATA query.
LOAD DATA LOCAL INFILE "path_to_file_in_my_computer/Teams.csv"
INTO TABLE test_schema.teams
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(yearid, leagueid, teamid, #franchid, #divid, teamrank);
You can assign directly to the table column names, you don't need to use #yearID unless you have to do extra processing before storing in the table.

load data local infile imports only 200k out of 400k records

Hello! I am new to MYSQL so kindly explain in as simple language as possible!
I have a csv with 400k rows and want to import it into mysql. I am using LOAD DATA LOCAL INFILE command for this purpose:
LOAD DATA LOCAL INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/Comorbidity Covid-19.csv'
INTO TABLE `comorbidity covid-19`
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
The issue is that only about 200k records are being imported while the csv contains 400k records. Why is this happening? I executed the command both in the command prompt and in MySql Workbench but both give the same output. Also the date column is not being imported correctly. Instead of dates being displayed it is showing 0000-00-00 in each rows.
PS: OPT_LOCAL_INFILE=1 in manage database connections!
PS : Here is some sample data
What I did was first I created an empty table in the database with respective column types by. I created an empty table with only the column headers by right clicking on tables and selecting create new table option where I selected the proper type for each columns.. Date as of and Start Date were given Date type and so on. Then I executed the above query both in command prompt and workbench to import the rows.
show create table comorbidity gives this result:
CREATE TABLE `comorbidity` (
`Date as of` date NOT NULL,
`Start Date` date NOT NULL,
`State` varchar(20) NOT NULL,
`Condition group` varchar(50) NOT NULL,
`Condition` varchar(45) NOT NULL,
`Age group` varchar(15) NOT NULL,
`Covid19 deaths` int NOT NULL,
`Number of mentions` int NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
It may be that the date is not in the correct format that is why it looks wrong. Try to modify the field from excel to a correct format, or perform a DATE() function for the date.
On the subject of importing all the records, check if there is any character that interrupts the execution.
The reason only 200k records were being imported was because I was using:
LINES TERMINATED BY '\n'
When I changed it to:
LINES TERMINATED BY '\r\n'
All 400 k records were imported.

NULL values are not inserted into sql table

I have a CSV file with three columns of "movieId", "imdbId", "tmdbId". The column "tmdbId" contains multiple empty rows.(movieId is a froeign key referring to a primary key in another table)
When I read this data frame into R, the empty rows are treated as NA values. If I import this CSV file into mysql DB using the following command, the rows with NA values don’t get inserted in the table, even though I allow NULL values. I should also mention that, I do not get any errors.
Beside the following command, I also tried importing the dataset using MySQL workbench, but it did not work.
any suggestion?
LOAD DATA LOCAL INFILE 'links.csv' INTO TABLE links
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(movieId, imdbId, tmdbId);
I know NULL and NA values are not the same, but I do not understand why R treats empty rows as NA. I tried to replace NA with NULL, but R does not support this operation.
The TABLE
CREATE TABLE links (
movieId int NOT NULL,
imdbId int DEFAULT NULL,
tmdbId int DEFAULT NULL,
KEY movieId (movieId),
CONSTRAINT links_ibfk_1 FOREIGN KEY (movieId) REFERENCES movieId_title (movieId)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
The CSV file looks like this:
enter image description here
Here is an example of an empty row for the third column:
enter image description here
As #Alec suggested, you can do set foreign_key_checks = 0. Then, you can replace zeros with NULL using the following command:
UPDATE table_name
SET col_name= NULL
WHERE col_name = 0;

LOAD DATA LOCAL INFILE COMMAND several errors

I am trying to load data for Q1 2012 from the below link
https://s3.amazonaws.com/capitalbikeshare-data/index.html
My code is as follows:-
DROP DATABASE IF EXISTS bike;
CREATE DATABASE bike;
USE bike;
DROP TABLE IF EXISTS bike_2012;
CREATE TABLE bike_2012(
bike_duration INT NULL,
bike_start_date TIMESTAMP NULL,
bike_end_date TIMESTAMP NULL,
bike_s_station_no INT(5) NULL,
bike_s_station_name VARCHAR(255) NULL,
bike_e_station_no INT(5) NULL,
bike_e_station_name VARCHAR(255) NULL,
bike_number CHAR(6) NULL,
bike_member_type VARCHAR(25) NULL,
bike_ride_number INT auto_increment PRIMARY KEY);
LOAD DATA LOCAL INFILE 'C:/LAGASA_2018/MSBA/Data_Sources/2012-capitalbikeshare-tripdata/2012Q1-capitalbikeshare-tripdata.csv'
INTO TABLE bike_2012
FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '/n'
('bike_duration', #bike_start_date, #bike_end_date, 'bike_s_station_no','bike_s_station_name',
'bike_e_station_no','bike_e_station_name','bike_number','bike_member_type')
SET 'bike_start_date' = STR_TO_DATE(#bike_start_date, '%c/%e/%Y')
SET 'bike_end_date' = STR_TO_DATE(#bike_end_date, '%c/%e/%Y')
IGNORE 1 LINES;
SELECT * FROM bike_2012 LIMIT 10;
I am facing the following issues:-
Some columns that have integer data also have string data, so those parts are not getting loaded correctly. I tried to add OPTIONALLY ENCLOSED BY '"' but its not working.
Unable to change date to SQL date format
Other errors like Row doesn't contain data for all columns and data truncated for date columns are appearing.
I have been struggling to correct this. Please help.
Thanks and Regards
You won't be able to simply load wrong CSV into DB and fix it.
If you have access to PHP/Python or other language that has a driver to connect to your db engines, load that file into an array, or use something similar to fgets() in php to load it line by line and process each row separately, fix/convert data and then push it to db engine (I would suggest even grouping inserts for speed).
You are dealing not only with conversion, but there might be issues with string encoding (you didn't specify any in your CREATE TABLE which might cause a problem in itself.

Mysql : load data inserting 0 instead of null

Problem: above mentioned load data query inserting 0 instead of null for price field(column)
Mysql Query :
LOAD DATA LOCAL INFILE '/tmp/data.csv'
REPLACE INTO TABLE bug_repeat
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';
file data.csv content is as below
name,\N
name,3
bug_repeat table structure:
CREATE TABLE `bug_repeat` (
`name` varchar(10) DEFAULT NULL,
`price` decimal(12,6) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Note: above query is not working on one of MYSQL server but exact same query working fine on two other MYSQL server. I don't know what going wrong. Can some please let me know what is the exact issue.( all MYSQL servers(version 5.7.22) are on Ubuntu 16.xxx OS). I am getting same problem for bigint data type as well.
show warnings result:
1265 (01000): Data truncated for column 'price' at row 1