Where to use ROWLOCK, READPAST with CTE, Subquery and Update? - sql-server-2008

In trying to avoid deadlocks and synchronize requests from multiple services, I'm using ROWLOCK, READPAST. My question is where should I put it in a query that includes a CTE, a subquery and an update statement on the CTE? Is there one key spot or should all three places have it (below)? Or maybe there's a better way to write such a query so that I can select ONLY the rows that will be updated.
alter proc dbo.Notification_DequeueJob
#jobs int = null
as
set nocount on;
set xact_abort on;
declare #now datetime
set #now = getdate();
if(#jobs is null or #jobs <= 0) set #jobs = 1
;with q as (
select
*,
dense_rank() over (order by MinDate, Destination) as dr
from
(
select *,
min(CreatedDt) over (partition by Destination) as MinDate
from dbo.NotificationJob with (rowlock, readpast)
) nj
where (nj.QueuedDt is null or (DATEDIFF(MINUTE, nj.QueuedDt, #now) > 5 and nj.CompletedDt is null))
and (nj.RetryDt is null or nj.RetryDt < #now)
and not exists(
select * from dbo.NotificationJob
where Destination = nj.Destination
and nj.QueuedDt is not null and DATEDIFF(MINUTE, nj.QueuedDt, #now) < 6 and nj.CompletedDt is null)
)
update t
set t.QueuedDt = #now,
t.RetryDt = null
output
inserted.NotificationJobId,
inserted.Categories,
inserted.Source,
inserted.Destination,
inserted.Subject,
inserted.Message
from q as t
where t.dr <= #jobs
go

I don't have an answer off-hand, but there are ways you can learn more.
The code you wrote seems reasonable. Examining the actual query plan for the proc might help verify that SQL Server can generate a reasonable query plan, too.
If you don't have an index on NotificationJob.Destination that includes QueuedDt and CompletedDt, the not exists sub-query might acquire shared locks on the entire table. That would be scary for concurrency.
You can observe how the proc behaves when it acquires locks. One way is to turn on trace flag 1200 temporarily, call your proc, and then turn off the flag. This will generate a lot of information about what locks the proc is acquiring. The amount of info will severely affect performance, so don't use this flag in a production system.
dbcc traceon (1200, -1) -- print detailed information for every lock request. DO NOT DO THIS ON A PRODUCTION SYSTEM!
exec dbo.Notification_DequeueJob
dbcc traceoff (1200, -1) -- turn off the trace flag ASAP

Related

Understanding how to design MySQL indexes for good performance

Through trial and error, I've arrived at a good index for this query, but I'd really like to understand why this and only this index helps, and how to avoid having to repeat the t&e next time.
The InnoDB table structure for a log table is:
This is my query—it's looking for all users who have one kind of action in the log, but not another kind of action. It's also restricting to certain values of org and a certain date range.
SELECT DISTINCT USER AS 'Dormant Users'
FROM db.log
WHERE `action` = #a1
AND `org` = #orgid
AND `logdate` >= #startdate
AND USER NOT IN (SELECT DISTINCT USER
FROM db.log
WHERE `action` = #a2
AND `org` = #orgid
AND `logdate` >= #startdate)
;
With no indexes, this takes about 21 seconds, and EXPLAIN shows this:
So, I thought having an index on org, logdate, and action might help. And it does—if I create an index on those columns in that precise order, the query time is reduced to about 0.3s, and the EXPLAIN output is now:
But, if I change the order of the columns within the index, or even just add another, unrelated index (say on the user column), the query takes about 2 seconds.
So, how can I understand and even design the index to perform well based on that query, and avoid the rather degenerate case of adding another index and harming performance? Or is it just a case of test and see what works?
My answer is not the answer because it is not about how to set the index but how to write your query to make it more efficient.
Avoid using NOT IN if the subquery is not a small table :
SELECT DISTINCT l1.USER AS 'Dormant Users'
FROM db.log l1
WHERE `action` = #a1
AND `org` = #orgid
AND `logdate` >= #startdate
AND NOT EXISTS (SELECT 1
FROM db.log l2
WHERE l1.`user` = l2.`user`
AND l1.`org` = l2.`org`
AND l2.`action` = #a2
AND l2.`logdate` >= #startdate)
;
EDIT : I removed the explanation link as it is not what I thought. I am only a skilled developer and not a DBA. Thus, I have optimized a lot of queries and I always have had better results with NOT EXISTS than NOT IN when volumes get hihg. But I am not able to argue about the internal reason (and I guess it depends on the RDBMS)
...or with an outer join...
SELECT DISTINCT user
FROM log x
LEFT
JOIN log y
ON y.user = x.user
AND y.org = x.org
AND y.action = #a2
AND y.logdate > = #startdate
WHERE x.action` = #a1
AND x.org = #orgid
AND x.logdate >= #startdate
AND y.user IS NULL;
I'm not too hot on indexing, but I'd start with (org, action, logdate)

How to Convert/Migrate MS-SQL Server SELECT Query To Oracle & MySQL?

In our product, we are extending support to Oracle & MySQL, so can anyone please help to migrate the following sample SQL query which works fine with MS-SQL Server, I already tried at my end but somehow it's not working for Oracle/MySQL, any help much appreciated & will convert rest of the queries by myself, thank you.
SELECT A.SERVERID,A.DATAID
,A.CREATETIMESTAMP AS 'Date Time'
,A.OBJECTINSTNAME
,A.PROJECTNAME
,TEMP_IND_1.TEMP_ROW_NUM FROM DATALOG AS A WITH (NOLOCK) INNER JOIN
(
SELECT DATAID,ROW_NUMBER() OVER(ORDER BY CREATETIMESTAMP DESC) AS TEMP_ROW_NUM FROM DATALOG WITH (NOLOCK)
WHERE PROJECTNAME='ProjectA'
) AS TEMP_IND_1 ON A.DATAID = TEMP_IND_1.DATAID
WHERE TEMP_IND_1.TEMP_ROW_NUM BETWEEN 1 AND 50;
You can use same query with removing WITH (NOLOCK) parts. As they have no effect in oracle and you don't need them in oracle. and also column alias is given without as keyword and column aliases has to be double quotes. So your query becomes like this :
SELECT A.SERVERID,A.DATAID
,A.CREATETIMESTAMP "Date Time"
,A.OBJECTINSTNAME
,A.PROJECTNAME
,TEMP_IND_1.TEMP_ROW_NUM FROM DATALOG A INNER JOIN
(
SELECT DATAID,
ROW_NUMBER() OVER(ORDER BY CREATETIMESTAMP DESC) TEMP_ROW_NUM
FROM DATALOG
WHERE PROJECTNAME='ProjectA'
) TEMP_IND_1 ON A.DATAID = TEMP_IND_1.DATAID
WHERE TEMP_IND_1.TEMP_ROW_NUM BETWEEN 1 AND 50;
EDIT:
For mysql there only thing you need to do alter session to set isolation level to read uncommited before executing your original query with out with (no lock) expressions.
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;
-- your query without no lock expressions
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ ; -- set back to original isolation level

MySQL: bulk updating in table

I'm using MySQL 5.6 and I have this issue.
I'm trying to improve my bulk update strategy for this case.
I have a table, called reserved_ids, provided by an external company, to assign unique IDs to its invoices. There is no other way to make this; I can't use auto_increment fields or simulated sequences.
I have this PL pseudocode to make this assignment:
START TRANSACTION;
OPEN invoice_cursor;
read_loop: LOOP
FETCH invoice_cursor INTO internalID;
IF done THEN
LEAVE read_loop;
END IF;
SELECT MIN(SECUENCIAL)
INTO v_secuencial
FROM RESERVED_IDS
WHERE COUNTRY_CODE = p_country_id AND INVOICE_TYPE = p_invoice_type;
DELETE FROM RESERVED_IDS WHERE SECUENCIAL = v_secuencial;
UPDATE MY_INVOICE SET RESERVED_ID = v_secuencial WHERE INVOICE_ID = internalID;
END LOOP read_loop;
CLOSE invoice_cursor;
COMMIT;
So, it's take one - remove - assign, then take next - remove - assign... and so on.
This works, but it's very very slow.
I don't know if there is any approach to make this assignment in a faster way.
I'm looking for something like INSERT INTO SELECT..., but with UPDATE statement, to assign 1000 or 2000 IDs directly, and no one by one.
Please, any suggestion is very helpful for me.
Thanks a lot.
EDIT 1: I have added WHERE clause details, because it was requested by user #vmachan . In the UPDATE...INVOICE clause, I don't filter by other criteria, because I have the direct and indexed invoice ID, which I want to update. Thanks
Finally, I have this solution. It's much faster than my initial approach.
The UPDATE query is
set #a=0;
set #b=0;
UPDATE MY_INVOICE
INNER JOIN
(
select
F.invoice_id,
I.secuencial as RESERVED_ID,
CONCAT_WS(/* format your final invoice ID */) AS FINAL_MY_INVOICE_NUMBER
FROM
(
select if(#a, #a:=#a+1, #a:=1) as current_row, internal_id
from MY_INVOICE
where reserved_id is null
order by internal_id asc
limit 2000
) F
INNER JOIN
(
SELECT if(#b, #b:=#b+1, #b:=1) as current_row, secuencial
from reserved_ids
order by secuencial asc
limit 2000
) I USING (CURRENT_ROW)
) TEMP MY_INVOICE.internal_id=TEMP.INTERNAL_ID
SET MY_INVOICE.RESERVED_ID = TEMP.RESERVED_ID, MY_INVOICE.FINAL_MY_INVOICE_NUMBER=TEMP.FINAL_MY_INVOICE_NUMBER
So, with autogenerated and correlated secuencial numbers #a and #b, we can join two different and no related tables like MY_INVOICE and RESERVED_IDs.
If you want to check this solution, please execute this tricky update following these steps:
Execute #a and then the first inner select in an isolated way: select if(#a, #a:=#a+1, ...
Execute #b and then the second inner select in an isolated way: select if(#b, #b:=#b+1, ...
Execute #a, #b and the big select that builds the TEMP auxiliar table: select F.invoice_id, ...
Execute the UPDATE
Finally, remove the assigned IDs from RESERVED_ID table.
Assignation time reduced drastically. My initial solution was one by one; with this, you assign 2000 (or more) in one single (ok, and a little tricky) update.
Hope this helps.

Magento: Custom MySQL query with locks not working

I'm trying to write a function to SELECT the least-recently fetched value from a table in my database. I do this by SELECTing a row and then immediately changing the last_used field.
Because this involves a SELECT and UPDATE, I'm trying to do this with locks. The locks are to ensure that concurrent executions of this query won't operate on the same row.
The query runs perfectly fine in phpMyAdmin, but fails in Magento. I get the following error:
SQLSTATE[HY000]: General error
Error occurs here:
#0 /var/www/virtual/magentodev.com/htdocs/lib/Varien/Db/Adapter/Pdo/Mysql.php(249): PDOStatement->fetch(2)
Here is my model's code, including the SQL query:
$write = Mage::getSingleton('core/resource')->getConnection('core_write');
$sql = "LOCK TABLES mytable AS mytable_write WRITE, mytable AS mytable_read READ;
SELECT #val := unique_field_to_grab FROM mytable AS mytable_read ORDER BY last_used ASC LIMIT 1;
UPDATE mytable AS mytable_write SET last_used = unix_timestamp() WHERE unique_field_to_grab = #val LIMIT 1;
UNLOCK TABLES;
SELECT #val AS val;";
$result = $write->raw_fetchrow($sql, 'val');
I've also tried using raw_query and query instead of raw_fetchrow with no luck.
Any thoughts on why this doesn't work? Or is there a better way to accomplish this?
EDIT: I'm starting to think this may be related to the PDO driver, which Magento is definitely using. I think phpMyAdmin is using mysqli, but I can't confirm that.
Probably a function that Magento uses doesn't support multiple sql statements.
Call each statement separately.
exec("LOCK TABLES mytable AS mytable_write WRITE, mytable AS mytable_read READ");
exec("SELECT #val := unique_field_to_grab FROM mytable AS mytable_read ORDER BY last_used ASC LIMIT 1");
exec("UPDATE mytable AS mytable_write SET last_used = unix_timestamp() WHERE unique_field_to_grab = #val LIMIT 1");
exec("UNLOCK TABLES");
exec("SELECT #val AS val");
Use appropriate functions instead of exec().

Correct way to generate order numbers in SQL Server

This question certainly applies to a much broader scope, but here it is.
I have a basic ecommerce app, where users can, naturally enough, place orders. Said orders need to have a unique number, which I'm trying to generate right now.
Each order is Vendor-specific. Basically, I have an OrderNumberInfo (VendorID, OrderNumber) table. Now whenever a customer places an order I need to increment OrderNumber for a particuar Vendor and return that value. Naturally, I don't want other processes to interfere with me, so I need to exclusively lock this row somehow:
begin tranaction
declare #n int
select #n = OrderNumber
from OrderNumberInfo
where VendorID = #vendorID
update OrderNumberInfo
set OrderNumber = #n + 1
where OrderNumber = #n and VendorID = #vendorID
commit transaction
Now, I've read about select ... with (updlock rowlock), pessimistic locking, etc., but just cannot fit all this in a coherent picture:
How do these hints play with SQL Server 2008s' snapshot isolation?
Do they perform row-level, page-level or even table-level locks?
How does this tolerate multiple users trying to generate numbers for a single Vendor?
What isolation levels are appropriate here?
And generally - what is the way to do such things?
EDIT
Just to make few things clearer:
Performance in this particular corner of the app is absolutely not an issue: orders will be placed relatively infrequently and will involve an expensive call to vendors' web service, so 1-second delay is pretty tolerable
We really need to have each vendors' order numbers to be independent and sequential
Your solution will create a potential performance bottleneck on OrderNumberInfo table.
Is there any specific reason why the orders can't simply be an identity column, possibly prefixed with a vendor ID on application side (e.g. MSFT-232323)?
The only drawback of this approach is that per-vendor orders will not be an "Add-1-to-get-next-order-#" pattern, but I'm not aware of any technical or business consideration of why that would present a problem, though it might make in-sequence order processing slightly more complicated.
They'd still be incremented and unique per-vendor which is the only real requirement for an order ID.
It will, of course have the added side benefit of very easy vendor-independent logic assuming you ever have any) - such as application-wide QC/reporting.
You could use an OUTPUT clause. This should do it all atomically without requiring a transaction.
-- either return the order number directly as a single column resultset
UPDATE OrderNumberInfo
SET OrderNumber = OrderNumber + 1
OUTPUT DELETED.OrderNumber
WHERE VendorID = #vendorID
-- or use an intermediate table variable to get the order number into #n
DECLARE #n INT
DECLARE #temp TABLE ( OrderNumber INT )
UPDATE OrderNumberInfo
SET OrderNumber = OrderNumber + 1
OUTPUT DELETED.OrderNumber
INTO #temp ( OrderNumber )
WHERE VendorID = #vendorID
SET #n = (SELECT TOP 1 OrderNumber FROM #temp)
The examples above assume that the VendorID column has a unique constraint, or at the very least that there'll only be one row per vendor ID. If that's not the case then you'll potentially be updating and/or returning multiple rows, which doesn't seem like a good idea!
I normally use something like this:
update OrderNumberInfo with (rowlock)
set #OrderNumber = OrderNumber, OrderNumber = OrderNumber + 1
where VendorID = #VendorID
It does not need to be wrapped in a transaction. In fact if you do wrap it in a transaction, then SQL Server will start holding locks on the table and slow everything down. When I need to do things like this in a web service, I always execute it on a separate database connection outside any transaction that might be open at the time, just to make sure.
I believe (but have not proved) that SQL Server uses a latch rather than a transaction to make it atomic, which should be more efficient.
If your table design is such that the vendor row needs to be created on demand if it doesn't exist, then use this logic instead:
declare #error int, #rowcount int
-- Attempt to read and update the number.
update OrderNumberInfo with (rowlock)
set #OrderNumber = OrderNumber, OrderNumber = OrderNumber + 1
where VendorID = #VendorID
select #error = ##error, #rowcount = ##rowcount
if #error <> 0 begin
return #error
end
-- If the update succeeded then exit now.
if #rowcount > 0 begin
return 0
end
-- Insert the row if it doesn't exist yet.
insert into OrderNumberInfo (VendorID, OrderNumber)
select VendorID, 1
where not exists (select null from OrderNumberInfo where VendorID = #VendorID)
select #error = ##error
if #error <> 0 begin
return #error
end
-- Attempt to read and update the number.
update OrderNumberInfo with (rowlock)
set #OrderNumber = OrderNumber, OrderNumber = OrderNumber + 1
where VendorID = #VendorID
select #error = ##error
if #error <> 0 begin
return #error
end
This code still doesn't require a transaction, because each atomic statement will succeed regardless of how many other connections are executing the code simultaneously.
Disclaimer: I have used this without problems on SQL Server 7-2005. I cannot yet comment on its behaviour in 2008.
The way to do this to maintain consistency:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
declare #n int
select #n = OrderNumber
from OrderNumberInfo
where VendorID = #vendorID
update OrderNumberInfo
set OrderNumber = #n + 1
where OrderNumber = #n and VendorID = #vendorID
COMMIT TRANSACTION
This will use the strictest form of isolation and will ensure no funny business.
here it is:
declare #C int=0;
update Table set Code=#C, #C=#C+1