Mysql: Insert if row doesnt exist safely with key and unique attribute - mysql

Background:
I built a scraper in python (not sure if that matters). I scrape the website and update my html table. The main table stores the autogenerated_id, url, raw_html, date_it_was_scrapped, last_date_the_page_was_updated (provided by the website). My table has many duplicate urls which it shouldnt so i am planning on making urls unique in the database.
Desired outcome:
I only want to insert a row if the url doesnt exist and update the html if last_date_the_page_was_updated > date_it_was_scrapped.
Solution:
The following stackoverflow post shows how.
I havent tested it because of the selected answers warning: INSERT ... ON DUPLICATE KEY UPDATE statement against a table having more than one unique or primary key is also marked as unsafe.
What I plan to do based on the stackoverflow question.
INSERT INTO html_table (url, raw_html, date_it_was_scrapped, last_date_the_page_was_updated)
VALUES (the data)
ON DUPLICATE KEY UPDATE
url = VALUES(url),
raw_html = VALUES(raw_html),
date_it_was_scrapped = VALUES(date_it_was_scrapped),
last_date_the_page_was_updated=VALUES(last_date_the_page_was_updated)
WHERE last_date_page_was_update > date_it_was_scrapped
Question:
What is unsafe about it and is there a safe way to do it?

From the description of bug 58637, which is linked in the MySQL documentation page that flags the INSERT ... ON DUPLICATE KEY UPDATE as unsafe :
When the table has more than one unique or primary key, this statement is sensitive to the order in which the storage engines checks the keys. Depending on this order, the storage engine may determine different rows to mysql, and hence mysql can update different rows [...] The order that the storage engine checks keys is not deterministic.
I understand that your table has an autoincremented primary key, and that you are planning to add a unique key on the url column. Because the primary key is autoincremented, you will not pass it as a parameter for INSERT commands, as shown in your SQL command. Hence MySQL will not need to check for duplicate on this column ; it will only check for duplicates on url. As a consequence, this INSERT should be safe.
Other remarks regarding your question.
you don't need to update the url command on duplicate keys (we know it is the same)
The purpose of the WHERE clause in your query is unclear, are you sure that it is needed ?
You will need to remove the duplicates before you enable the unique constraint on URL.

Related

Which is better practice: 'check whether a primary ID exists then skip' or 'insert ignore'?

If I'm writing a system which checks an API for new messages every n minutes, which is the better practice? (Each message has a unique ID which is used as the primary ID in my system.)
Would you prefer to:
Look up the primary ID of the message in the database and skip inserting the message it if it already exists
Do 'Insert Ignore'?
Neither of your solutions. Keep reading.
Let the database do the work. If you don't want duplicates, then create unique index on the columns. My guess is:
create unique index unq_messages_messageid on messages (mesageid);
(This will also work on the string or multiple columns, if that is what you really want.)
Once you have the unique index or constraint the following are the two methods I would suggest.
(1) Just do an insert. If there is a duplicate, it will fail. Handle the error in your application code. Good application code handles errors.
(2) Use on duplicate key update (this might one day be replaced by on conflict ignore):
insert into messages ( . . . )
values ( . . .)
on duplicate key update message_id = values(message_id);
The assignment is a no-op -- it does nothing.
Why is this preferred over insert ignore? Simple reason: it only handles the specific error of a duplicate key. Other errors that might occur are still returned to the application.

MySQL "Cannot add or update a child row: a foreign key constraint fails"

I'm new to MySQL and databases in general. I've been tasked with manually moving an old database to a new one of a slightly different format. The challenges include transferring certain columns from a table in one database to another database of a similar format. This is made further difficult in that the source database is MyISAM and the destination is InnoDB.
So I have two databases, A is the source and B is the destination, and am attempting to copy 'most' of a table to a similar table in the destination database.
Here is the command I run:
INSERT INTO B.article (id, ticket_id, article_type_id,
article_sender_type_id, a_from, a_reply_to, a_to, a_cc, a_subject,
a_message_id, a_in_reply_to, a_references, a_content_type, a_body,
incoming_time, content_path, valid_id, create_time, create_by,change_time,
change_by)
SELECT id, ticket_id, article_type_id, article_sender_type_id,
a_from, a_reply_to, a_to, a_cc, a_subject, a_message_id, a_in_reply_to,
a_references, a_content_type, a_body, incoming_time, content_path,
valid_id, create_time, create_by, change_time, change_by
FROM A.article
WHERE id NOT IN ( 1 );
Error:
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`helpdesk`.`article`, CONSTRAINT `FK_article_ticket_id_id` FOREIGN KEY (`ticket_id`) REFERENCES `ticket` (`id`))
The reason for making the command so wordy is that the source has several columns that were unnecessary and so were pruned out of the destination table. The WHERE id NOT IN ( 1 ) is there so that the first row is not copied (it was initialized in both databases and MySQL throws an error if they both have the same 'id' field). I can't tell by the error if it expects 'ticket_id' to be unique between rows, which it is not, or if it is claiming that a row does not have a ticket_id and so can not be copied which is what the error seems to most often be generated by.
I can post the tables in question if that will help answer, but I am unsure of the best way to do that, so some pointing in the right direction there would be helpful as well.
Posts I looked at before:
For forming the command
For looking at this error
Thanks!
You'll want to run a SHOW CREATE TABLE on your destination table:
SHOW CREATE TABLE `B`.`article`;
This will likely show you that there is a foreign key on the table, which requires that a value exist in another table before it can be added to this one. Specifically, from your error, it appears the field ticket_id references the id field in the ticket table. This introduces some complexity in terms of what needs to be migrated first -- the referenced table (ticket) must be populated before the referencing table (article).
Without knowing more about your tables, my guess is that you haven't migrated in the ticket table yet, and it is empty. You'll need to do that before you can fill in the B.article table. It is also possible that your data is corrupt and you need to find which ticket ID is present in the article data you're trying to send over, but not present in the ticket table.
Another alternative is to turn off foreign key checks, but if possible I would avoid that, since the purpose of foreign keys is to ensure data integrity.

Which technique is more efficient for replacing records

I have an app that has to import TONS of data from a remote source. From 500 to 1500 entries per call.
Sometimes some of the data coming in will need to replace data already stored in the dB. If I had to guess, I would say once in 300 or 400 entries would one need to be replaced.
Each incoming entry has a unique ID. So I am trying to figure out if it is more efficient to always issue a delete command based on this ID or to check if there is already an entry THEN delete.
I found this SO post where it talks about the heavy work a dB has to do to delete something. But it is discussing a different issue so I'm not sure if it applies here.
Each incoming entry has a unique ID. So I am trying to figure out if it is more efficient to always issue a delete command based on this ID or to check if there is already an entry THEN delete.
Neither. Use INSERT ... ON DUPLICATE KEY UPDATE ....
Since you are using MySQL and you have a unique key then let MySQL do the work.
You can use
INSERT INTO..... ON DUPLICATE KEY UPDATE......
MySQL will try to insert a new record in the table, is the unique value exists in the table then MySQL will update all the field that you have set after the update
You can read more about the INSERT INTO..... ON DUPLICATE KEY UPDATE...... syntax on
http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

Check for the respective data in a specific column and if not detected, then insert. Otherwise update

I need to insert new row into table foo. But before insert those data, I need to check there was already inserted a row for the respective user name. If there has been already inserted, then I need to update the current data with the new data.
I know to do this using PHP if condition. But I love to do this using MySQL functions/statements by just a one line. Please can anyone help me?
For the example, kindly use the following statement. It should be updated.
$in = "insert into foo(username, text) values('user-x', 'user-x-text')";
Mysql_query($in);
When searching for similar questions, I got this post: Similar question with an answer. But I was struggle to use that solution since I don't know, the process occur by that code snippet will get down the server resources like speed etc. Because this script will run about 20 times per user.
Thank you.
I think INSERT ... ON DUPLICATE KEY UPDATE should be able to work
Make username a UNIQUE index, it doesn't have to be a primary key
If I'm not mistaken, DUPLICATE KEY will run only when you have a collision in any of the columns you supply that is either a primary key or unique index. In your case, text column is neither so it will be ignored for collisions.
INSERT... ON DUPLICATE KEY UPDATE works on unique indexes as confirmed by the MySql docs

mysql circular dependency in foreign key constraints

Given the schema:
What I need is having every user_identities.belongs_to reference an users.id.
At the same time, every users has a primary_identity as shown in the picture.
However when I try to add this reference with ON DELETE NO ACTION ON UPDATE NO ACTION, MySQL says
#1452 - Cannot add or update a child row: a foreign key constraint fails (yap.#sql-a3b_1bf, CONSTRAINT #sql-a3b_1bf_ibfk_1 FOREIGN KEY (belongs_to) REFERENCES users (id) ON DELETE NO ACTION ON UPDATE NO ACTION)
I suspect this is due to the circular dependency, but how could I solve it (and maintain referential integrity)?
The only way to solve this (at least with the limited capabilities of MySQL) to allow NULL values in both FK columns. Creating a new user with a primary identity would then look something like this:
insert into users (id, primary_identity)
values (1, null);
insert into identities (id, name, belongs_to)
values (1, 'foobar', 1);
update users
set primary_identity = 1
where id = 1;
commit;
The only drawback of this solution is that you cannot force that a user has a primary identity (because the column needs to be nullable).
Another option would be to change to a DBMS that supports deferred constraints, then you can just insert the two rows and the constraint will only be checked at commit time. Or use a DBMS where you can have a partial index, then you could use the solution with an is_primary column
I would not implement it this way.
Remove the field primary_identity from table users, and the add an additional field to table user_profiles called is_primary, and use this rather as the indicator of a primary profile
This will prevent having NULLs for FKs, but still does not enforce for primary profile to exists -- that has to be managed by application.
Note the alternate key (unique index) {UserID, ProfileID} on Profile table and matching FK on PrimaryProfile.
The problem seems to be that you are trying to keep the primary identity information in the user_identities table.
Instead, I suggest you put the primary user info (name/email) into the users table. Do not foreign key to the user_identities table.
Only foreign key from the user_identities table
All constraints will now work ok as they are only one way.
user_identities cannot be entered unless the primary user (in table users) is present. Similarly the primary user should not be deletable where there are existing child identities (in user_identities).
You might want to change the name of the tables to "primary_users" and "secondary_users" to make it obvious what is going on.
Does that sound okay?
This question was raised at How to drop tables with cyclic foreign keys in MySQL from the delete side of things, but I think that one of the answers is applicable here as well:
SET foreign_key_checks = 0;
INSERT <user>
INSERT <user identity>
SET foreign_key_checks = 1;
Make that a transaction and commit it all at once. I haven't tried it, but it works for deletes, so I don't know why it wouldn't work for inserts.
I've not used it, but you could try INSERT IGNORE. I'd do the two of those, one for each table, such that once they are both done, referential integrity is maintaing. If you do them in a transaction, you can roll back if there is a problem inserting the second one.
Since you're ignoring constraints with this feature, you should do that check in program code instead, otherwise you may end up with data in your database that ignores your constraints.
Thanks to #Mihai for pointing out the problem with the above. Another approach would be to disable constraints whilst you do inserts, and re-enable them afterwards. However, on a large table that might produce more overhead than is acceptable - try it?