I'm converting Stored Procedures from SQL Server to MySQL and I've found a query that has me stumped:
IF EXISTS (SELECT * FROM dbo.T_ExampleTable WHERE Col1 = #Col1 AND Col2 = #Col2 AND Col3 = #Col3)
SET #return_value = -10
ELSE IF EXISTS (SELECT * FROM dbo.T_ExampleTable WHERE Col1 = #Col1 OR Col2 = #Col2 OR Col3 = #Col3)
SET #return_value = -20;
IF #return_value = 0
BEGIN
MERGE ExampleDB.dbo.T_ExampleTable AS T
USING (VALUES (#Col1, #Col2, #Col3)) AS S (Col1, Col2, Col3)
ON T.Col1 = S.Col1
WHEN NOT MATCHED THEN
INSERT VALUES (S.Col1, S.Col2, S.Col3);
END
For security purposes, I have altered the names of variables but the syntax is the same.
I think I have a grasp on what's happening with the MERGE here (but please correct me if I'm wrong). I assume that we're taking our 3 variables (#Col1, #Col2, and #Col3) and placing them in a result set we call "S". Then, we're checking each row in T_ExampleTable for a matching Col1 value. Finally, we're saying if it DOESN'T match in any of the rows, INSERT these values.
Now, I'm not too familiar with the MERGE syntax, so I may be off with the above assumption. Assuming I'm correct, however, isn't this the same as just performing a SELECT FROM T_ExampleTable WHERE Col1 = #Col1, then checking the ##ROWCOUNT (or FOUND_ROWS() in MySQL) and if it doesn't equal 0, performing an INSERT?
Furthermore, I am even more baffled by logic above the MERGE. If my understanding of how MERGE works is correct (and again, it may be off) then the entire statement is pointless. Because basically, if #Col1 exists within the T_ExampleTable, then #return_value is going to equal -10 or -20. Thus, it can't equal 0 and we won't hit the MERGE. If #Col1 does not exist within T_ExampleTable, then #return_value is going to equal 0, but the MERGE isn't going to return any result set since T.Col1 = S.Col1 will never be true.
This is why I believe I must be misunderstanding something about how MERGE works. Either that, or the SQL Server code I'm porting is just poorly written (which is also possible, I guess).
So ultimately, I'm looking for two answers here. One that helps clarify what the logic of a MERGE statement is and if my understanding of the above code is correct, and another that shows the possible solution for converting this to MySQL.
Thanks!
For practical purposes, I think this is equivalent to this (in both databases):
INSERT INTO T(Col1, Col2, Col3)
SELECT col1, col2, col3
FROM (SELECT #Col1 as col1, #Col2 as col2, #Col3 as col3) s
WHERE NOT EXISTS (SELECT 1
FROM ExampleDB.dbo.T_ExampleTable t
WHERE t.col1 = s.col1
);
There may be some edge cases where they are not exactly the same -- say, in terms of concurrent calls to the stored procedure or when col1 is NULL. But you already have differences in locking between the two databases anyway.
(And, although you didn't ask, I would expect Postgres to be a more natural database to switch to from SQL Server because Postgres and SQL Server have a larger overlap in functionality.)
Related
In the LOAD DATA documentation, there's a SET syntax which allows values to be assigned to column.
...
SET col=#variable;
...
I was wondering if the SET can support assignment from 2 columns from a single table, I can't find anything in the documentation that supports tuple assignment:
...
SET (col1, col2) = (SELECT col1, col2 from table where id=1);
...
Anyone have any knowledge on this? Thank you!
Set doesn't work like this, however I know that in TSQL at least you can do the same thing using select.
Select #var1 = col1, #var2= col2
from table
In some cases I have to copy values from one column to another and set the first to NULL. This SQL-Statement works as expected:
UPDATE lessons SET order_id_old = order_id, order_id = NULL WHERE id = 1
But I'm not sure if this is a right way to do it so. Or should I better use 2 queries for this purpose?
UPDATE lessons SET order_id_old = order_id WHERE id = 1;
UPDATE lessons SET order_id = NULL WHERE id = 1;
I would definitely go with second approach. Here's what the documentation says:
Single-table UPDATE assignments are generally evaluated from left to
right. For multiple-table updates, there is no guarantee that
assignments are carried out in any particular order.
In your case, it's fine at the moment as there is only one table. However, in future, if someone modifies this statement and adds a new table/join (assuming it'll work fine as it did with one table) it will stop working/give inconsistent results.
So, for the readability/maintainability purpose, go ahead with second approach. (Also, I would recommend wrapping both the update statements into a transaction to preserve atomicity)
From the documentation:
If you access a column from the table to be updated in an expression, UPDATE uses the current value of the column. For example, the following statement sets col1 to one more than its current value:
UPDATE t1 SET col1 = col1 + 1;
The second assignment in the following statement sets col2 to the current (updated) col1 value, not the original col1 value. The result is that col1 and col2 have the same value. This behavior differs from standard SQL.
UPDATE t1 SET col1 = col1 + 1, col2 = col1;
Single-table UPDATE assignments are generally evaluated from left to right. For multiple-table updates, there is no guarantee that assignments are carried out in any particular order.
In your case, it should be fine to use the single statement.
what is more efficient:
START TRANSACTION
UPDATE mytable SET foo = 'bar' WHERE (col1 = 813242) AND (col2 = 25343);
UPDATE mytable SET foo = 'bar' WHERE (col1 = 312643) AND (col2 = 8353);
UPDATE mytable SET foo = 'bar' WHERE (col1 = 843564) AND (col2 = 41233);
UPDATE mytable SET foo = 'bar' WHERE (col1 = 321312) AND (col2 = 5325);
UPDATE mytable SET foo = 'bar' WHERE (col1 = 554235) AND (col2 = 6321);
... x 10,000 times or more
COMMIT;
or
UPDATE mytable SET foo = 'bar' WHERE
((col1 = 16344) AND (col2 = 5456)) OR
((col1 = 42134) AND (col2 = 5436)) OR
((col1 = 84563) AND (col2 = 2321)) OR
((col1 = 43216) AND (col2 = 4267)) OR
((col1 = 53248) AND (col2 = 6234)) OR
... x 10,000 times or more
Assuming I have UNIQUE index on (col1,col2)
So my guess is 1st option is nice because of index but it's split into multiple queries, 2nd option is nice because it's only one query but on another hand it does full table scan
this is EXPLAIN when not using OR:
type: ref, possible_keys: myindex_UNIQUE, key: myindex_UNIQUE, ref: const
this is EXPLAIN when using OR:
type: ALL, possible_keys: myindex_UNIQUE, key: null, ref: null
and is there any limit for query WHERE clause?
I aim for max speed
Based on the conversation in the comments, a possible solution:
If you have a large table and a list of values to identify the rows to update, you can create a helper table for the list of values.
Based on the example in the question, the table could be something like this:
CREATE TABLE mytable_operation (
col1 INT
, col2 INT
) ENGINE = MEMORY;
Please note, that the create statement contains the ENGINE = MEMORY hint, so the table will be stored in the memory instead of the disk.
Load the values into this table prior the update.
After all values are loaded into the helper table, you can use the following query to update the values in the production table.
UPDATE
mytable
SET
foo = 'bar'
WHERE
EXISTS (SELECT 1 FROM mytable_operation MO WHERE mytable.col1 = MO.col1 AND mytable.col2 = MO.col2)
Of course, you can use any DML statements to manipulate the production data. (UPDATE with joins, DELETE, INSERT..ON DUPLICATE KEY, etc)
When you finished the data manipulation, you can truncate or drop the helper table.
If the amount of rows are slightly larger in the production table, this solution could be faster than the solutions in the question.
I think theoretically the second one will be faster, but it's merely an educated guess.
It's one statement, so it might need less parsing, less initialization, less roundtrips.
The first one might be faster because it uses the index to find the record. But 10.000 index lookups will probably be slower than a full table scan as well, so I think that advantage is gone as well.
One reason multiple statements can be slower, is when you run the separate updates in a loop from a program. In that case, you got 10.000 times the request/response overhead.
As long as you don't do that and just send the update(s) as a single request, I think performance will be similar and any differences will depend on the hardware and configuration of the server, as well as the current load, the amount of data in the table and the number of rows that you update (or will it always be exactly 10.000?). All in all I can't give you an exact answer, but I hope I've given you insight in some of the factors that affect this performance.
use nosql engine for fastest results when inserting/updating
if you want to see how your index is working , replace the update with a SELECT and run it with explain. This should give you an idea if an index is used the way you want
EXPLAIN how to
and don't forget that an index will normally slow down any update/insert
it s useful in the where clause only
I know this works, but is it good practice?
update test set field1=field1+field2, field2=0;
I don't want field2 to be set to 0 until the value has been added to the field1 total. Will this always work as planned?
It should work if MySQL followed the standard. But it doesn't, (follow the standard) in this case. Note. In standard ISO?ANSI SQL, these statements are equivalent and produce the same changes in the table. (Anyone can try them in SQL-Server or Oracle. The 3rd variation works in Postgres only I think):
UPDATE test
SET field1 = field1 + field2, field2 = 0 ;
UPDATE test
SET field2 = 0, field1 = field1 + field2 ;
UPDATE test
SET (field1, field2) = (field1 + field2, 0) ;
So, in MySQL your statement will work like you expect most of the times, as the MySQL documentation states:
Single-table UPDATE assignments are generally evaluated from left to right. For multiple-table updates, there is no guarantee that assignments are carried out in any particular order.
Note the "generally" though. To be 100% sure, you can do the update in 2 statements (inside a transaction):
UPDATE test
SET field1 = field1 + field2 ;
UPDATE test
SET field2 = 0 ;
or using a temporary table.
found it in the manual (http://dev.mysql.com/doc/refman/5.0/en/update.html):
UPDATE t1 SET col1 = col1 + 1, col2 = col1;
Single-table UPDATE assignments are generally evaluated from left to
right. For multiple-table updates, there is no guarantee that
assignments are carried out in any particular order.
so it will always work
MySQL has this incredibly useful yet proprietary REPLACE INTO SQL Command.
Can this easily be emulated in SQL Server 2005?
Starting a new Transaction, doing a Select() and then either UPDATE or INSERT and COMMIT is always a little bit of a pain, especially when doing it in the application and therefore always keeping 2 versions of the statement.
I wonder if there is an easy and universal way to implement such a function into SQL Server 2005?
This is something that annoys me about MSSQL (rant on my blog). I wish MSSQL supported upsert.
#Dillie-O's code is a good way in older SQL versions (+1 vote), but it still is basically two IO operations (the exists and then the update or insert)
There's a slightly better way on this post, basically:
--try an update
update tablename
set field1 = 'new value',
field2 = 'different value',
...
where idfield = 7
--insert if failed
if ##rowcount = 0 and ##error = 0
insert into tablename
( idfield, field1, field2, ... )
values ( 7, 'value one', 'another value', ... )
This reduces it to one IO operations if it's an update, or two if an insert.
MS Sql2008 introduces merge from the SQL:2003 standard:
merge tablename as target
using (values ('new value', 'different value'))
as source (field1, field2)
on target.idfield = 7
when matched then
update
set field1 = source.field1,
field2 = source.field2,
...
when not matched then
insert ( idfield, field1, field2, ... )
values ( 7, source.field1, source.field2, ... )
Now it's really just one IO operation, but awful code :-(
The functionality you're looking for is traditionally called an UPSERT. Atleast knowing what it's called might help you find what you're looking for.
I don't think SQL Server 2005 has any great ways of doing this. 2008 introduces the MERGE statement that can be used to accomplish this as shown in: http://www.databasejournal.com/features/mssql/article.php/3739131 or http://blogs.conchango.com/davidportas/archive/2007/11/14/SQL-Server-2008-MERGE.aspx
Merge was available in the beta of 2005, but they removed it out in the final release.
What the upsert/merge is doing is something to the effect of...
IF EXISTS (SELECT * FROM [Table] WHERE Id = X)
UPDATE [Table] SET...
ELSE
INSERT INTO [Table]
So hopefully the combination of those articles and this pseudo code can get things moving.
I wrote a blog post about this issue.
The bottom line is that if you want cheap updates and want to be safe for concurrent usage, try:
update t
set hitCount = hitCount + 1
where pk = #id
if ##rowcount < 1
begin
begin tran
update t with (serializable)
set hitCount = hitCount + 1
where pk = #id
if ##rowcount = 0
begin
insert t (pk, hitCount)
values (#id,1)
end
commit tran
end
This way you have 1 operation for updates and a max of 3 operations for inserts. So, if you are generally updating, this is a safe cheap option.
I would also be very careful not to use anything that is unsafe for concurrent usage. It's really easy to get primary key violations or duplicate rows in production.