Insertion without duplication in MySQL - mysql

I'm fetching data from a text file or log periodically and it gets inserted in the database every time fetched. Is there a way in MySQL that the insert is only done when the log files are updated or I have to do it using the programming language ? I mean Is there a type of insert that when It sees a duplicate primary key, It doesn't give an error of "Duplicate Entry" .. It just ignore.

Put the fetch in a logrotate postrotate script, and fetch from the just rotated log.
Ignoring duplicates can be done with either INSERT IGNORE OR INSERT .... ON DUPLICATE KEY UPDATE syntax (which will either ignore the lines causing a duplcate unique key, or give you the possibility to alter some values in the existing row.)

Related

Multiple insert in mysql table, some of which may be repeated

I'm developing a php script in which I insert multiple data into a mysql table, but some of this data may already be inserted. I can try to insert each of the individual data and detect the #1062 error (duplicate entry), but it would be very inefficient since it can be more than 100 entries. So, is there any way to do this in one query or must I use a query for each entry to be inserted?
Thanks a lot.
You'll want to use the INSERT ... ON DUPLICATE KEY UPDATE command to accomplish this. In the "ON DUPLICATE KEY UPDATE" section, you can just do some kind of update that changes nothing. For example, "SET fieldName=fieldName", just so nothing is actually changed.
http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

Which technique is more efficient for replacing records

I have an app that has to import TONS of data from a remote source. From 500 to 1500 entries per call.
Sometimes some of the data coming in will need to replace data already stored in the dB. If I had to guess, I would say once in 300 or 400 entries would one need to be replaced.
Each incoming entry has a unique ID. So I am trying to figure out if it is more efficient to always issue a delete command based on this ID or to check if there is already an entry THEN delete.
I found this SO post where it talks about the heavy work a dB has to do to delete something. But it is discussing a different issue so I'm not sure if it applies here.
Each incoming entry has a unique ID. So I am trying to figure out if it is more efficient to always issue a delete command based on this ID or to check if there is already an entry THEN delete.
Neither. Use INSERT ... ON DUPLICATE KEY UPDATE ....
Since you are using MySQL and you have a unique key then let MySQL do the work.
You can use
INSERT INTO..... ON DUPLICATE KEY UPDATE......
MySQL will try to insert a new record in the table, is the unique value exists in the table then MySQL will update all the field that you have set after the update
You can read more about the INSERT INTO..... ON DUPLICATE KEY UPDATE...... syntax on
http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

SQL Query Not Adding New Entries With INSERT IGNORE INTO

So I have a script that gets data about 100 items at a time and inserts them into a MySQL database with a command like this:
INSERT IGNORE INTO beer(name, type, alcohol_by_volume, description, image_url) VALUES('Bourbon Barrel Porter', 2, '9.1', '', '')
I ran the script once, and it populated the DB with 100 entries. However, I ran the script again with the same SQL syntax, gathering all new data (i.e., no duplicates), but the database is not reflecting any new entries -- it is the same 100 entries I inserted on the first iteration of the script.
I logged the queries, and I can confirm that the queries were making requests with the new data, so it's not a problem in the script not gathering new data.
The name field is a unique field, but no other fields are. Am I missing something?
If you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row still is not inserted, but no error is issued.
If there is no primary key, there can't be duplicate key to ignore. you should always set a primary key, so please do that - and if you want to have additional colums that shouldn't be duplicate, set them as "unique".

Is inserting a new database entry faster than checking if the entry exists first?

I was once told that it is faster to just run an insert and let the insert fail than to check if a database entry exists and then inserting if it is missing.
I was also told that that most databases are heavily optimized for reading reading rather than writing, so wouldn't a quick check be faster than a slow insert?
Is this a question of the expected number of collisions? (IE it's faster to insert only if there is a low chance of the entry already existing.) Does it depend on the database type I am running? And for that matter, is it bad practice to have a method that is going to be constantly adding insert errors to my error log?
Thanks.
If the insert is going to fail because of an index violation, it will be at most marginally slower than a check that the record exists. (Both require checking whether the index contains the value.) If the insert is going to succeed, then issuing two queries is significantly slower than issuing one.
You can use INSERT IGNORE so that if the key already exist, the insert command would just be ignored, else the new row will be inserted. This way you need to issue a single query, which checks the duplicate values as well inserts new values too.
still Be careful with INSERT IGNORE as it turns EVERY error into a warning. Read this post for insert ignore
On duplicate key ignore?
I think INSERT IGNORE INTO .... can be used here, either it will insert or ignore it.
If you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row still is not inserted, but no error is issued.
If you want to delete the old value and insert a new value you can use REPLACE You can use REPLACE instead of INSERT to overwrite old rows.
REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted.
Else use the INSERT IGNORE as it will either inserts or ignores.
a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row still is not inserted, but no error is issued.
If your intension is to Insert if its a new record OR Update the record if it already exists then how about doing an UPSERT?
Check out - http://vadivel.blogspot.com/2011/09/upsert-insert-and-update-in-sql-server.html
Instead of checking whether the record exists or not we can try to Update it directly. If there is no matching record then ##RowCount would be 0. Based on that we can Insert it as a new record. [In SQL Server 2008 you can use MERGE concept for this]
EDIT: Please note, I know this works for MS SQL Server and I don't know about MySQL or ORACLE

mysql strange "duplicate entry" error

I have a problem I don't quite understand. I parse some feeds with Ruby and save their contents in a database. I created a "hash"-column which is the md5-hash of every post url. That column is UNIQUE because I don't want to post anything twice.
It works fine actually:
Mysql::Error: Duplicate entry '28edb7c2b3cd074d226fc4ae37baedd7' for key 'hash'
But the script stops at this point. I don't get that, I know for a fact that using INSERT with PHP always worked like a charm, so if there was duplicate entry it ignored it and went on.
Can anybody help me? Would "INSERT IGNORE" create a double entry or would it just ignore the error message and go on?
Sounds like your Ruby script needs some exception handling.
You can rewrite your query so that instead of INSERT INTO it uses
REPLACE INTO ...
or
INSERT INTO ... ON DUPLICATE KEY UPDATE
This way attempting to insert a duplicate key will update the existing record instead of erroring out.
See here and here for more information.
Update:
INSERT IGNORE not touch your existing data if it encounters a duplicate key. The documentation says:
You can use REPLACE instead of INSERT
to overwrite old rows. REPLACE is the
counterpart to INSERT IGNORE in the
treatment of new rows that contain
unique key values that duplicate old
rows: The new rows are used to replace
the old rows rather than being
discarded.
If you use the IGNORE keyword, errors
that occur while executing the INSERT
statement are treated as warnings
instead. For example, without IGNORE,
a row that duplicates an existing
UNIQUE index or PRIMARY KEY value in
the table causes a duplicate-key error
and the statement is aborted. With
IGNORE, the row still is not inserted,
but no error is issued.
In PHP, if MySQL returns an error, it doesn't normally kill the PHP script. It sounds to me as though that's not the case in Ruby. Either catch the exception and process it or use INSERT IGNORE, in which case MySQL returns a warning instead of an error (unless it was told not to).
"INSERT IGNORE" Should Prevent Ruby from exiting and shouldn't effect your data. However if you want to know when this is happening you have to put in some error handling.
begin
DATABASE.query(insertHash)
rescue
puts "Error: " + $!.to_s + "Backtrace >>: " + $#.to_s
end
Should show the error with out exiting the ruby script.
Or you could use this to indicate to the user that there is already an entry
Hope this helps