MySQL - Load Infile allowing duplicate primary key entries - mysql

Best way to illustrate my problem is with a quick example.
Imagine the following file (loaded into MySQL using "Load Data Infile in Table" command)
Color,Shape
red,square
blue,triangle
green,circle
(Note: My primary key = Color. Unique Key = Shape)
No matter how many times I use the command I still (correctly) just have 3 records, as it doesn't allow duplicate records.
However, if I amend record 3 within MySQL and change it from circle to diamond and re-run the Load Data command I end up with 4 records.
Color,Shape
red,square
blue,triangle
green,diamond
green,circle
I now have 2 x green values in my Primary Key field. If I try to edit one of them I get a "Duplicate Entry for Primary Key Field" error.
I would have expected the Load Data Infile command to skip the record as it creates a duplicate value in the Primary Key field. Instead it seems to only ignore it if the entire record is a duplicate. It doesn't seem to validate fields to ensure that the Primary Key field is always unique.
Why is it failing to do this?

Related

Mysql load data infile leaving unchanged fields

Suppose I have a MySQL table with three fields: key, value1, value2
I want to load data for two fields (key,value1) from file inserts.txt.
Content of inserts.txt:
1;2
3;4
with:
LOAD DATA LOCAL INFILE
"inserts.txt"
REPLACE
INTO TABLE
`test_insert_timestamp`
FIELDS TERMINATED BY ';'
But in case of REPLACE, I want to leave the value2 unchanged.
How could I achieve this?
The REPLACE statement consists in the following algorithm:
MySQL uses the following algorithm for REPLACE (and LOAD DATA ... REPLACE):
Try to insert the new row into the table
While the insertion fails because a duplicate-key error occurs for a
primary key or unique index:
Delete from the table the conflicting row that has the duplicate key
value
Try again to insert the new row into the table
(https://dev.mysql.com/doc/refman/5.7/en/replace.html)
So you can't keep a value from a line which is going to be deleted.
What you want to do is emulating a "ON DUPLICATE KEY UPDATE" logic.
You can't do that within a single LOAD DATA query. What you have to do is to load your data in a temporary table first, then to make an INSERT from your temporary table to your destination table, where you will be able to use the "ON DUPLICATE KEY UPDATE" feature.
The whole process is fully detailed in the most upvoted answer of this question : MySQL LOAD DATA INFILE with ON DUPLICATE KEY UPDATE

SQL Query Not Adding New Entries With INSERT IGNORE INTO

So I have a script that gets data about 100 items at a time and inserts them into a MySQL database with a command like this:
INSERT IGNORE INTO beer(name, type, alcohol_by_volume, description, image_url) VALUES('Bourbon Barrel Porter', 2, '9.1', '', '')
I ran the script once, and it populated the DB with 100 entries. However, I ran the script again with the same SQL syntax, gathering all new data (i.e., no duplicates), but the database is not reflecting any new entries -- it is the same 100 entries I inserted on the first iteration of the script.
I logged the queries, and I can confirm that the queries were making requests with the new data, so it's not a problem in the script not gathering new data.
The name field is a unique field, but no other fields are. Am I missing something?
If you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row still is not inserted, but no error is issued.
If there is no primary key, there can't be duplicate key to ignore. you should always set a primary key, so please do that - and if you want to have additional colums that shouldn't be duplicate, set them as "unique".

LOAD DATA LOCAL INFILE help required

Here's my query for loading mysql table using csv file.
LOAD DATA LOCAL INFILE table.csv REPLACE INTO TABLE table1 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\'' LINES TERMINATED BY 'XXX' IGNORE 1 LINES
SET date_modified = CURRENT_TIMESTAMP;
Suppose my CSV contains 500 records with 15 columns. I changed three rows and terminated them with 'XXX'. I now want to update the mysql table with this file. My primary key is auto-incremented value. When I run this query, all the 500 rows are getting updated with old data and the rows I changed are getting added as new ones. I dont want the new ones. I want my table to be replaced with csv as-is. I tried changing my primary key to non-AI, it still didnt work. Any pointers please?? Thanks.
I am making some assumptions here.
1) You dont have the autonumber value in your file.
Since your primary key is not in your file MySQL will not be able to match rows. A autonumber primary key is a artificial key thus it is not part of the data. MySQL adds this artificial primary key when the row is inserted.
Lets assume your file contained some unique identifier lets call it Identification_Number. This number is both in the file and your table uses it as a primary key in this case MySQL will be able to identify the rows from the file and match them to the rows in the table.
While a lot of people will only use autonumbers in a database I always check if there is not a natural key in the data. If I identify one I do some performance testing with this natural key in a table. Then based on the performance metrics of both I then decide on a key.
Hopefully I did not get your question wrong but I suspect this might be the case.

issue with mysql LOAD DATA when table has a column that is BIGINT and unique

So the table schema looks like:
field - type - index
'id' - INT - PRI
'event_id' - BIGINT(20) - UNI
co1 ... colN (no indexes)
LOAD DATA command:
LOAD DATA LOCAL INFILE 'my-file' REPLACE INTO TABLE mydb.mytable (event_id, col1 .. colN)
and get error:
ERROR 1062 (23000) at line 1: Duplicate entry '2147483647' for key 1
'key 1' refers to the unique key on 'event_id'.
More context:
(1) The table is empty at the time of LOAD DATA.
(2) When I grep for 2147483647 in my source file I get nothing
(3) 2147483647 is the integer max value
(4) I am not actually inserting any value in 'id' -- its just auto incrementing
(5) I am using the 'REPLACE' keyword in the LOAD DATA so even there were dupes, it should know how to deal with them?
This suggest some int overflow issue (ie, I don't think there are any genuine dupes in the source data or in the table), and indeed the values in 'my-file' for 'event_id' are all over the integer max limit. However, the error is odd because 'event_id' column is BIGINT.
As a temporary fix, I dropped the unique index on 'event_id' and the LOAD DATA command worked! The values in 'event_id' were all fine, not truncated or anything. So there is nothing wrong with the table handling the values, but somehow LOAD DATA is checking the uniqueness but as an integer?
Has anyone encountered something similar? Thanks
So what this means is that '2147483647' already exists in that database in a field marked as a key. Just remove the key from that field and you should be fine!
Edit: As stated in your question, event_id is your primary key - you cannot have the same value twice in a primary key.
Best of luck!
the problem is not the data type,
the thing is that you only have 2 fields and one is PK and the other is UNIQUE, so there is no way that you could have repeated a value. When you make an insert or load data it's trying to add twice an entry with this value "2147483647", you have several ways to fixed it.
The first one try to open the file with a text editor and find the repeated value and fixed it, if it's not working try with a mysqldump or phpmyadmin to export the data and edit the file in a text editor.
Make an export again, if the problem persist try export the data by others methods (mysqldum or phpmyadmin) and re import it
Try recreating a table without primary keys, you can use this
create table big_int_test (select * from table1);
and it will make a copy of the table without the PK, INDEXES and FK.
You can delete the index and UNIQUE key constraint, import the data, fix the table (delete the repeated values) and re create the PK and UNIQUE KEY (this is a kind of crazy but It could work)

Insertion without duplication in MySQL

I'm fetching data from a text file or log periodically and it gets inserted in the database every time fetched. Is there a way in MySQL that the insert is only done when the log files are updated or I have to do it using the programming language ? I mean Is there a type of insert that when It sees a duplicate primary key, It doesn't give an error of "Duplicate Entry" .. It just ignore.
Put the fetch in a logrotate postrotate script, and fetch from the just rotated log.
Ignoring duplicates can be done with either INSERT IGNORE OR INSERT .... ON DUPLICATE KEY UPDATE syntax (which will either ignore the lines causing a duplcate unique key, or give you the possibility to alter some values in the existing row.)