In between the 2 query cases below, which one is faster:
update t set v = case when id = 1000000 then 100 when id = 10000000 then 500 else v end
Or
update t set v = 100 where id = 1000000;
update t set v = 500 where id = 10000000;
the table t has an unique index on id and the table can be pretty big (millions of entry).
My guess is that although the second case make multiple queries it is still faster because it can use the index to find the entries while in the first case it is doing a full scan of the table (but it is just a guess, I have actually no clue on how MySQL deal with CASE control flow).
Thank you in advance for any answers !
The second version you have would be better and cleaner, however, a cleaner single update is also possible to take advantage of the index on the "id" columns...
update t
set v = if( id = 1000000, 100, 500 )
where id in ( 1000000, 10000000 )
Related
I have a query to update index if something added in between
update My_Table
set NS_LEFT = NS_LEFT + 10
where NS_THREAD = parentThread and NS_LEFT > oldRight
order by NS_LEFT desc
Its working fine -
Now I if I have to delete something in between then I am using query as below
update My_Table
set NS_LEFT = NS_LEFT - 10
where NS_THREAD = parentThread and NS_LEFT > oldRight
order by NS_LEFT desc
Its is not working and throwing error of duplicate Index -
[Code: 1062, SQL State: 23000] (conn=1517) Duplicate entry '1-1110'
for key 'INDEX'
Index(NS_THREAD,NS_LEFT )
How do solve for delete element
Note
This is my work around for MariaDB only for other other Database its working without OrderBy (Why.. its still open question for me)
What I have done in the past when controlling the order is not an option is to perform two updates. The first shifts the group way up past any currently used values, insuring no collisions. The second then shifts them to where they should be. In general form, the idea can be illustrated with this:
UPDATE aTable SET somevalue = somevalue + 10000 WHERE somevalue > x;
UPDATE aTable SET somevalue = somevalue - 10000 - y WHERE somevalue > x + 10000;
"10000" is just a value that will push the range past collision, y is the amount you actually want to shift them. Obviously if there are already values around 10000, the number will need to be different. To avoid having to query for a safe value, another option if the design permits....
If negative values are not used and the table design allows negative numbers, this is version of the process is a little simpler to apply:
UPDATE aTable SET somevalue = somevalue * -1 WHERE somevalue > x;
UPDATE aTable SET somevalue = (somevalue * -1) - y WHERE somevalue < 0;
This presumes there are not normally negative values, and to be safe the updates should be performed within a transaction (along with the original delete) so that a potential concurrent applications of this solution do not collide. (Edit: Note that transactions/concurrency requirement goes for both forms I have presented.)
Edit: Oh, I just noticed Gordon's answer was quite similar... the bare minus signs looked like flecks on my screen. If Gordon's didn't work, this won't either.
That happens. One solution is to do two updates:
update My_Table
set NS_LEFT = - (NS_LEFT - 10)
where NS_THREAD = parentThread and NS_LEFT > oldRight
order by NS_LEFT desc;
update My_Table
set NS_LEFT = - NS_LEFT
where NS_THREAD = parentThread and NS_LEFT < 0;
I wrote a query thas was taking way too much time (32 minutes) so I tried other methods to find one faster.
I finally wrote another one taking under 5 seconds
The problem is that I don't understand my optimization.
Can someone explain how it happen to be that much faster.
hugeTable has 494 500 rows
smallTable1 has 983 rows
smallTable2 has 983 rows
cursor.execute('UPDATE hugeTable dst,
(
SELECT smallTable1.hugeTableId, smallTable2.valueForHugeTable
FROM smallTable2
INNER JOIN smallTable1 ON smallTable1.id = smallTable2.id
-- This select represent 983 rows
)src
SET dst.columnToUpdate = src.valueForHugeTable
WHERE dst.id2 = %s AND dst.id = src.hugeTableId;', inputId2)
-- Condition : dst.id2 = %s alone target 983 rows.
-- Combinasion of : dst.id2 = %s AND dst.id = src.hugeTableId target a single unique row.
-- This query takes 32 minutes
And here is a way to do the exact same request with more steps, but way faster:
-- First create a temporary table to hold (983) rows from hugeTable that has to be updated
cursor.execute('CREATE TEMPORARY TABLE tmpTable AS
SELECT * from hugeTable
WHERE id2 = %s;', inputid)
-- Update the rows into tmpTable instead of into hugeTable
cursor.execute('UPDATE tmpTable dst,
(
SELECT smallTable1.hugeTableId, smallTable2.valueForHugeTable
FROM smallTable2
INNER JOIN smallTable1 ON smallTable1.id = smallTable2.id
-- This select represent 983 rows
)src
SET dst.columnToUpdate = src.valueForHugeTable
WHERE dst.id = src.hugeTableId;')
-- Then delete the (983) rows we want to update
cursor.execute('DELETE FROM hugeTable WHERE id2 = %s;', inputId2)
-- And create new rows replacing the above deleled ones with rows from tmpTable
cursor.execute('INSERT INTO hugeTable SELECT * FROM tmpTable;')
-- This takes litle under 5 seconds.
I would like to know why the first method takes so much time.
Understanding this will help me getting a new MySql level up.
Add a composite index to dst: INDEX(id2, id) (in either order).
More
Case 1:
UPDATE hugeTable dst,
( SELECT smallTable1.hugeTableId, smallTable2.valueForHugeTable
FROM smallTable2
INNER JOIN smallTable1 ON smallTable1.id = smallTable2.id
)src SET dst.columnToUpdate = src.valueForHugeTable
WHERE dst.id2 = 1234
AND dst.id = src.hugeTableId;
Case 2:
CREATE TEMPORARY TABLE tmpTable AS
SELECT *
from hugeTable
WHERE id2 = 1234;
UPDATE tmpTable dst,
( SELECT smallTable1.hugeTableId, smallTable2.valueForHugeTable
FROM smallTable2
INNER JOIN smallTable1 ON smallTable1.id = smallTable2.id
)src SET dst.columnToUpdate = src.valueForHugeTable
WHERE dst.id = src.hugeTableId;
Without knowing the MySQL version and seeing the EXPLAINs, I can only guess at why they are so different...
The subquery ( SELECT ... JOIN ... ) may or may not be 'materialized' into an implicit temp table. (Newer versions are better at doing such.)
Such a materialized subquery may or may not have an index created for it. (Again, new versions are better.)
If there are no adequate indexes on either dst or src, then the amount of 'effort' is the product of the sizes of the two tables. Note that in Case 2, dst is much smaller. (This may be the answer you are looking for.)
If the tables are not fully cached in RAM, one could artificially involve more I/O than the other. An I/O-bound query is often 10 times as slow as a the same query when it is fully cached in RAM. (This is less likely to be the answer, but may be part of the answer.)
Having a 3-table UPDATE would probably eliminate some of the issues above. And it may (or may not) eliminate the timing difference.
For further discussion, please provide
MySQL version
SHOW CREATE TABLE -- for each table
How big is innodb_buffer_pool_size
SHOW TABLE STATUS -- for each table
EXPLAIN UPDATE ... -- for each UPDATE -- requires at least 5.6
What percentage of the table has ( id2 = inputId2 )?
I have a table with the columns :
pos1 pos2 pos3 pos4 pos5 id
pos1-5 -> varchar , id -> int
I'm using the query:
Query = "select distinct * from (SELECT * FROM database.turkey2 t1 WHERE pos2='"+comboBox2.Text+"' and NOT EXISTS (SELECT 1 FROM database.turkey2used t2 WHERE t1.id = t2.id)) as t3 order by rand() limit "+amount+" ;";
Meaning the user will choose pos2(will make it static) while the rest is random , the amount will be set by the user as well.
What I'm trying to do is to add a condition in this query that force at least 2 diffrent pos(positions) that must be chosen in random.
meaning , I don't want to get 2 or more rows with the same valuse at pos1,2,3,4 and only pos5 will be diffrent(igonore the id - and the rule dosen't apply only to pose 5 of course).
I solved the problem by runnning another query on the already select amount - but that's not good because if the user asked for 200 combinitions then after the "fix" he may lose some rows.
some other info - it's a medium size db (about 10m rows).
Please keep the solution simple because I'm rather new to mysql and c#.
Thanks.
Is there a better / more efficient / shorter way to write this SQL Query:
UPDATE mTable SET score = 0.2537 WHERE user = 'Xthane' AND groupId = 37;
UPDATE mTable SET score = 0.2349 WHERE user = 'Mike' AND groupId = 37;
UPDATE mTable SET score = 0.2761 WHERE user = 'Jack' AND groupId = 37;
UPDATE mTable SET score = 0.2655 WHERE user = 'Isotope' AND groupId = 37;
UPDATE mTable SET score = 0.3235 WHERE user = 'Caesar' AND groupId = 37;
UPDATE mTable
SET score =
case user
when 'Xthane' then 0.2537
when 'Mike' then 0.2349
when 'Jack' then 0.2761
when 'Isotope' then 0.2655
when 'Caesar' then 0.3235
else score
end
where groupId = 37
You can use a CASE statement to perform this type of UPDATE.
UPDATE mTable
SET score
= CASE user
WHEN 'Xthane' THEN 0.2537
WHEN 'Mike' THEN 0.2349
WHEN 'Jack' THEN 0.2761
WHEN 'Isotope' THEN 0.2655
WHEN 'Caesar' THEN 0.3235
ELSE score
END
WHERE groupId = 37
You could create a temporary table, insert score, user and groupid for all the records you want to update then do something like this:
UPDATE
FROM mTable m
INNER JOIN tmpTable t
ON m.groupId = t.groupId
AND m.user = t.user
SET m.score = t.score;
Your original statements look short enough, and are easy enough to understand, and you can determine whether there were any rows affected on each of those separate UPDATE statements.
For a large number of statements, however, there's a considerable amount of overhead making "roundtrips" to the database to execute each individual statement. You can get much faster execution (shorter elapsed time) for a large set of updates by "batching" the updates together in a single statement execution.
So, it depends on what you are trying to achieve.
Better? Depends on how you define that. (Should the statements be more understandable, easier to debug, less resource intensive?
More efficient? In terms of reduced elapsed time, yes, there are other ways to accomplish these same updates, but the statements are not as easy to understand as yours.
Shorter? In terms of SQL statements with fewer characters, yes, there are ways to achieve that. (Some examples are shown in other answers, but note that the effects of the statements in some of those answers is significantly DIFFERENT than your statements.)
The actual performance of those alternatives is really going to depend on the number of rows, and available indexes. (e.g. if you have hundreds of thousands of rows with groupId = 37, but are only updating 5 of those rows).
I have a select statement that I am trying to build a list of scripts as long as the users role is not in the scripts.sans_role_priority field. This works great if there is only one entry into the field but once I add more than one the whole function quits working. I am sure I am overlooking something simple, just need another set of eyes on it. Any help wold be appreciated.
script:
SELECT *
FROM scripts
WHERE active = 1
AND homePage='Y'
AND (role_priority > 40 OR role_priority = 40)
AND (40 not in (sans_role_priority) )
ORDER BY seq ASC
data in scripts.sans_role_priority(varchar) = "30,40".
Additional testing adds this:
When I switch the values in the field to "40, 30" the select works. Continuing to debug...
Maybe you are looking for FIND_IN_SET().
SELECT *
FROM scripts
WHERE active = 1
AND homePage='Y'
AND (role_priority > 40 OR role_priority = 40)
AND NOT FIND_IN_SET('40', sans_role_priority)
ORDER BY seq ASC
Note that having "X,Y,Z" as VARCHAR values in some fields reveals that your DB schema may be improved in order to have X, Y and Z stored as separate values in a related table.
SELECT *
FROM scripts
WHERE active = 1
AND homePage='Y'
AND role_priority >= 40
AND NOT FIND_IN_SET(40,sans_role_priority)
ORDER BY seq ASC
See: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_find-in-set
Note that CSV in databases is just about the worst antipattern you can find.
It should be avoided at all costs because:
You cannot use an index on a CSV field (at least not a mentally sane one);
Joins on CSV fields are a major PITA;
Selects on them are uber-slow;
They violate 1NF.
They waste storage.
Instead of using a CSV field, consider putting sans_role_priority in another table with a link back to scripts.
table script_sans_role_priority
-------------------------------
script_id integer foreign key references script(id)
srp integer
primary key (script_id, srp)
Then the renormalized select will be:
SELECT s.*
FROM scripts s
LEFT JOIN script_sans_role_priority srp
ON (s.id = srp.script_id AND srp.srp = 40)
WHERE s.active = 1
AND s.homePage='Y'
AND s.role_priority >= 40
AND srp.script_id IS NULL
ORDER BY seq ASC
SELECT *
FROM scripts
WHERE active = '1'
AND homePage='Y'
AND role_priority >= '40'
AND sans_role_priority <> '40'
ORDER BY seq ASC