MySQL performance: nested insert/duplicate key vs multiple updates - mysql

Does anyone know what would be more efficient and use less resources:
Method 1-- Using a single SELECT statement to get data from one table and then iterating through it to execute multiple UPDATEs on another table. E.G. (pseudo-code, execute() runs query):
Query1_resultset = execute("SELECT item_id, sum(views) as view_count FROM tableA WHERE condition=1");
while(Query1_resultset as row) {
execute("UPDATE tableB SET view_count=row.view_count WHERE id=row.item_id");
}
Method 2-- Use a single INSERT.. ON DUPLICATE KEY UPDATE statement with a nested SELECT statement. E.G.:
INSERT INTO tableB (id, view_count) SELECT item_id, SUM(views) as view_count FROM tableA WHERE condition=1 ON DUPLICATE KEY UPDATE view_count=VALUES(view_count);
Note: ID on tableB is a primary key. There actually won't be any INSERTS because I know the key will exist. So it's all UPDATEs. Just using this statement to pass in a single query rather than multiple.
I'm really curious as to why either would be more efficient. Is it the number of queries that determines how quickly it will run? Where is the bottleneck?
I'm looking for something that will scale (the number of rows being updated grows daily).
Any ideas?
Thanks

It depens on your update/insert ratio. If you have lots of inserts and only a couple of updates than the INSERT ... ON DUPLICATE KEY UPDATE statement will be faster.
If you mainly have updates, than you would be better off with an UPDATE statement and an insert as fallback (if there was no update). You could use the multi table update clause to do it with a single update instead of a select followed by an update by the way. If you're doing both a SELECT and an UPDATE than the INSERT will definately be faster.

I think INSERT.. ON DUPLICATE KEY UPDATE is more efficient (otherwise, it wouldn't make much sense to add such an extension). By the way, your first example is not exactly the same as the second one - you neither use transactions nor you lock the table, so it's possible that the record returned by SELECT will not exist by the time you execute UPDATE.

Related

MySQL Bulk Insert Dependent on Another Table

I have a case where I'm doing two queries: query1 is a bulk INSERT ... ON DUPLICATE KEY UPDATE on table1. For query2, I want to do another bulk INSERT on table2 with some application data along with using the ids inserted/updated from query1. I know I can do this with an intermediate query, selecting the ids I need from table1 and then inserting them into table2 along with application data, but I really want to avoid the extra network back-and-forth of that query along with the db overhead. Is there any way I can either get the ids inserted/updated from query1 when running that, or do some kind of complex, but relatively less expensive INSERT ... SELECT FROM in query2 to avoid this?
As far as I know, getting ids added/modified returned from query1 is impossible without a separate query, and I can't think of a way to batch INSERT ... SELECT FROM where the insertion values for each row are dependent on the selected value, but I'd love to be proven wrong, or shown a way around either of those.
There is no way to get a set of IDs as a result of a bulk INSERT.
One option you have is indeed to run a SELECT query to get the IDs and use them in the second bulk INSERT. But that's a hassle.
Another option is to run the 2nd bulk INSERT into a temporary table, let's call it table3, then use INSERT INTO table2 ... SELECT FROM ... table1 JOIN table3 ...
With a similar use case we eventually found that this is the fastest option, given that you index table3 correctly.
Note that in this case you don't have a SELECT that you need to loop over in your code, which is nice.

MySQL delete multiple rows in one query conditions unique to each row

So I know in MySQL it's possible to insert multiple rows in one query like so:
INSERT INTO table (col1,col2) VALUES (1,2),(3,4),(5,6)
I would like to delete multiple rows in a similar way. I know it's possible to delete multiple rows based on the exact same conditions for each row, i.e.
DELETE FROM table WHERE col1='4' and col2='5'
or
DELETE FROM table WHERE col1 IN (1,2,3,4,5)
However, what if I wanted to delete multiple rows in one query, with each row having a set of conditions unique to itself? Something like this would be what I am looking for:
DELETE FROM table WHERE (col1,col2) IN (1,2),(3,4),(5,6)
Does anyone know of a way to do this? Or is it not possible?
You were very close, you can use this:
DELETE FROM table WHERE (col1,col2) IN ((1,2),(3,4),(5,6))
Please see this fiddle.
A slight extension to the answer given, so, hopefully useful to the asker and anyone else looking.
You can also SELECT the values you want to delete. But watch out for the Error 1093 - You can't specify the target table for update in FROM clause.
DELETE FROM
orders_products_history
WHERE
(branchID, action) IN (
SELECT
branchID,
action
FROM
(
SELECT
branchID,
action
FROM
orders_products_history
GROUP BY
branchID,
action
HAVING
COUNT(*) > 10000
) a
);
I wanted to delete all history records where the number of history records for a single action/branch exceed 10,000. And thanks to this question and chosen answer, I can.
Hope this is of use.
Richard.
Took a lot of googling but here is what I do in Python for MySql when I want to delete multiple items from a single table using a list of values.
#create some empty list
values = []
#continue to append the values you want to delete to it
#BUT you must ensure instead of a string it's a single value tuple
values.append(([Your Variable],))
#Then once your array is loaded perform an execute many
cursor.executemany("DELETE FROM YourTable WHERE ID = %s", values)

Get primarys keys affected after select, update or insert only using SQL?

How to get the primary key (assuming know his name by looking show keys) resulting from an insert into?
How to get the primary keys of rows affected by an update? (as in the previous case, independent of the key name).
How to get the primary keys returned from a select query (in the query even if the key is not one of the fields surveyed).
I need to SQLs commands I run after the inserts, updates and selects in my application to obtain such information, it is possible?
My database is MySQL.
I need only sqls because i am making a logic of cache queries to aplicate in many applications (java and php) and i wish that the logic be independent of language.
example:
select name from people
i need that a query executed after this return the pk of these people
SELECT LAST_INSERT_ID();
And seriously, putting "primary key from insert mysql" into Google gets you a Stack Overflow answer as the first result.
EDIT: more discussion based on comments.
If you want to see what rows are affected by an update, just do a SELECT with the same WHERE clause and JOIN criteria as the UPDATE statement, e.g.:
UPDATE foo SET a = 5 WHERE b > 10;
SELECT id FROM foo WHERE b > 10;
If you are INSERTing into a table that does not have an auto-increment primary key, you don't need to do anything special. You already know what the new primary key is, because you set it yourself in the INSERT statement. If you want code that can handle INSERT statements coming from outside of the code that will be tracking PK changes, then you'll either need to parse the INSERT statement, or have the calling code provide information about the primary key.

How best to recalculate group by values

I have a table that stores the summed values of a large table. I'm not calculating them on the fly as I need them frequently.
What is the best way to update these values?
I could delete the relevant rows from the table, do a full group by sum on all the relevant lines and then insert the new data.
Or I could index a timestamp column on the main table, and then only sum the latest values and add them to the existing data. This is complicated because some sums won't exist so both an insert and an update query would need to run.
I realize that the answer depends on the particulars of the data, but what I want to know is if it is ever worth doing the second method; if there are millions of rows being summed in the first example and only tens in the second, would the second be significantly faster to execute?
You can try with triggers on update/delete. Then you check inserted or deleted value and according to it modify the sum in second table.
http://dev.mysql.com/doc/refman/5.0/en/triggers.html
For me there is several ways :
Make a view which should be up-to-date (i don't know if you can do concrete views in mysql)
Make a table which will be up-to-date using a trigger (on update/delete/insert as example) or using a batch during (night, so data will be 1 day old)
Make a stored procedure which will be retrieving and computing only the data needed.
I would do something like this (INSERT UPDATE):
mysql_query("
INSERT INTO sum_table (col1, col2)
SELECT id, SUM(value)
FROM table
GROUP BY id
ON DUPLICATE KEY UPDATE col2 = VALUES(col2)
");
Please let me know if you need more examples.

mysql: select, insert, delete and update in one query

i need in one query use select, insert, delete and update.
(I need copy data from old table in to new, then delete old, and update another).
Insert and select (copy function I was able to, but now i have problem)
I have this query:
INSERT INTO news_n (id, data)
SELECT (id, data)
FROM news
WHERE id > 21
Thanks
You can't do it all in one query, but you can do it all in one transaction if you are using a transactional store engine (like InnoDB). This might be what you want, but it's hard to tell only using the information you provided in your question.
START TRANSACTION;
INSERT...;
DELETE...
UPDATE...;
COMMIT;
In one query i dont think its possible.
You can try writing a Stored Procedure and using Triggers you may achieve that
MySQL does not support MERGE, so you'll have to do it in two queries:
INSERT
INTO news_n (id, data)
SELECT id, data
FROM news
WHERE id > 21
ON DUPLICATE KEY UPDATE
SET data = news.data
DELETE
FROM news_n
WHERE id NOT IN
(
SELECT id
FROM news
WHERE id > 21
)
, provided you have PRIMARY KEY (id) in both tables.
You can't combine Select/Update/etc into one query. You will have to write separate queries for each operation you intend to complete.