Is there any way to debug a SQL Server 2008 query?
Yes. You can use the T-SQL debugger:
http://technet.microsoft.com/en-us/library/cc646008.aspx
Here is how you step through a query: http://technet.microsoft.com/en-us/library/cc646018.aspx
What exactly do you mean by debug?
Are you seeing incorrect data?
Are you getting unexpected duplicate rows?
What I usually do is start with a known set of data, usually one or two rows if possible, and comment out all joins and where conditions.
Introduce each additional element in your query one at a time starting with joins.
At each step, you should know how many records you are expecting.
As soon as you introduce something like a join or where condition that does not match your prediction, you know you have found the problem statement.
If it is a stored procedure with variables and such, you can always PRINT the values of your variables at different points.
If you want to only execute to a particular point in a stored procedure, then you may RETURN at any time and halt processing.
If you have temp tables that must get destroyed between executions while debugging your procedure, a common trick I use is to create a label like-
cleanup:
then at whatever point I want to bail, I can goto cleanup (I know goto is horrible, but it works great when debugging sprocs)
Yes, you need to set a breakpoint.
Frankly I find the debugging to be virtually useless. I often don't want to see the variables, but rather the records I would be inserting to a table or updating or deleting.
What I do is this when I havea complex sp to debug.
First I create a test variable. I set it =1 when I want to test. This will ensure that All actions iteh transaction are rolled back at the end (don't want to actually change the datbase until you are sure the proc is doing what you want.) by making sure the commit statement requires the test variable to be set to 0.
At the end of the proc I generally have a if test = 1 begin
END statement and between the begin and end, I put the select statments for all the things I want to see the values of. This might include any table variables or temp tables, the records in a particular table after the insert, the records I deleted or whatever else I feel I need to see. If I am testing mulitple times, I might comment out references to tables that I know are right and concentrate only on the ones I've changed that go around.
Now I can see what the effect of my proc is and the changes are rolled back in case they weren't right. To actually commit the changes (and not see the intermediate steps) , I simply change the value of the test variable.
If I use dynamic SQL, I also havea debug variable that instead of executing the dynamic SQl simply prints the results out to the screen. I find all this far more useful in debugging a complex script than breakpoints that show me the value of variables.
Related
When I write a procedure in SQL Server 2008, it always write SET NOCOUNT ON.
I googled it, and saw that it's used to suppress the xx row were effected message, but why should I do it?
Is it for security reasons?
EDIT: ok, so I understand from the current answer that it's used mostly for performance, and coherence with the count of the client...
So is there a reason not to use it? Like if I want my client to be able to compare his count with mine?
I believe SET NOCOUNT ON is mostly used to avoid passing back to the client a potentially misleading information. In a stored procedure, for example, your batch may contain several different statements with their own count of affected records but you may want to pass back to the client just a single, perhaps completely different number.
It's not for security, since a rowcount doesn't really divulge much info, especially compared to the data that is in the same payload.
If you call SQL from an application, the "xxx rows" will be returned to the application as a dataset, with network round trips in between before you get the data, which as Mihai says, can have a performance impact.
Bottom line, it won't hurt to add it to your stored procedure, it is common practice, but you are not obligated to.
As per MSDN SET NOCOUNT ON
Stops the message that shows the count of the number of rows affected
by a Transact-SQL statement or stored procedure from being returned
as part of the result set.
When SET NOCOUNT is ON, the count is not returned. When SET NOCOUNT is
OFF, the count is returned. The ##ROWCOUNT function is updated even
when SET NOCOUNT is ON.
Another related good post on SO
SET NOCOUNT ON usage
Taken from SET NOCOUNT ON Improves SQL Server Stored Procedure Performance
SET NOCOUNT ON turns off the messages that SQL Server sends back to
the client after each T-SQL statement is executed. This is performed
for all SELECT, INSERT, UPDATE, and DELETE statements. Having this
information is handy when you run a T-SQL statement in a query window,
but when stored procedures are run there is no need for this
information to be passed back to the client.
By removing this extra overhead from the network it can greatly
improve overall performance for your database and application.
If you still need to get the number of rows affected by the T-SQL
statement that is executing you can still use the ##ROWCOUNT option.
By issuing a SET NOCOUNT ON this function (##ROWCOUNT) still works and
can still be used in your stored procedures to identify how many rows
were affected by the statement.
so is there a reason not to use it?
Instead use ##ROWCOUNT if you want to compare the count of rows affected.
It's not like that I am having trouble executing my cursors which are enclosed in a stored procedure. But I want to find more efficient way to achieve the same.
Here it goes.
Stored procedure : RawFeed.sql (runs every 5 minutes)
Set #GetATM = Cursor For
Select DeviceCode,ReceivedOn
From RawStatusFeed
Where CRWR=2 AND Processed=0
Order By ReceivedOn Desc
Open #GetATM
Fetch Next
From #GetATM Into #ATM,#ReceivedOn
While ##FETCH_STATUS = 0
Begin
Set #RawFeed=#ATM+' '+Convert(VarChar,#ReceivedOn,121)+' '+'002307'+' '+#ATM+' : Card Reader/Writer - FAULTY '
Exec usp_pushRawDataAndProcess 1,#RawFeed
Fetch Next
From #GetATM Into #ATM,#ReceivedOn
End
Set #GetATM = Cursor For
Select DeviceCode,ReceivedOn
From RawStatusFeed
Where CRWR=0 AND Processed=0
Order By ReceivedOn Desc
Open #GetATM
Fetch Next
From #GetATM Into #ATM,#ReceivedOn
While ##FETCH_STATUS = 0
Begin
Set #RawFeed=#ATM+' '+Convert(Varchar,#ReceivedOn,121)+' '+'002222'+' '+#ATM+' : Card Reader/Writer - OK '
Exec usp_pushRawDataAndProcess 1,#RawFeed
Fetch Next
From #GetATM Into #ATM,#ReceivedOn
End
Likewise I have 10 more SET statements which differ on WHERE condition parameter & string enclosed in #RawFeed variable.
For each row I get I execute another stored procedure on that particular row.
My question is
Is there any better way to achieve the same without using cursors?
Variable #RawFeed Contains following string which is input to usp_pushRawDataAndProcess stored procedure. now this will divide whole string and do some operation like INSERT,UPDATE,DELETE on some tables.
WE JUST CAN NOT PROCESS MORE THAN 1 STRING IN usp_pushRawDataAndProcess
NMAAO226 2012-09-22 16:10:06.123 002073 NMAAO226 : Journal Printer - OK
WMUAO485 2012-09-22 16:10:06.123 002222 WMUAO485 : Card Reader/Writer - OK
SQL Server, like other relational databases, is desgined to, and is pretty good at, working on sets of data.
Databases are not good at procedural code where all the opportunities for optimization are obscured from the query processing engine.
Using RawStatusFeed to store some proprietry request string and then processing a list of those one by one, is going to be ineffiencnt for database code. This might make the inserts very fast for the client, and this might be very important, but it comes at a cost.
If you break the request string down on insert, or better still, before insert via a specialised SP call, then you can store the required changes in some intermediate relational model, rather than a list of strings. Then, every so often, you can process all the changes at once with one call to a stored procedure. Admittedly, it would probably make sense for that stored procedure to contain several query statements. However, with the right indexes and statistics the query processing engine will able to make an efficient execution plan for this new stored procedure.
The exact details of how this should be achieved depend on the exact details of the RawStatusFeed table and the implementation of usp_pushRawDataAndProcess. Although this seems like a rewrite, I don't imagine the DeviceCode column is that complicated.
So, the short answer is certainly yes but, I'd need to know what usp_pushRawDataAndProcess does in detail.
The signature of the usp_pushRawDataAndProcess SP is acting as a bottle neck.
If you can't change usp_pushRawDataAndProcess and and won't create a set based alternative then you are stuck with the bottle neck.
So, rather than removing the bottle neck you could take another tack. Why not make more concurrent instances of the bottle neck to feed the data through.
If you are using SQL Server 2005 or above you could use some CLR to perform numerous instances of usp_pushRawDataAndProcess in parallel.
Here is a link to a project I used before to do something similar.
I had always disliked cursors because of their slow performance. However, I found I didn't fully understand the different types of cursors and that in certain instances, cursors are a viable solution.
When you have a business problem that can only be solved by processing one row at a time, then a cursor is appropriate.
So to improve performance with the cursor, change the type of cursor you are using. Something I didn't know was, if you don't specify which type of cursor you are declaring, you get the Dynamic Optimistic type by default, which is the one that is the slowest for performance because it's doing lots of work under the hood. However, by declaring your cursor as a different type, say a static cursor, it has very good performance.
See these articles for a fuller explanation:
The Truth About Cursors: Part I
The Truth About Cursors: Part II
The Truth About Cursors: Part III
I think the biggest con against cursors is performance, however, not laying out a task in a set based approach would probably rank second. Third would be readability and layout of the tasks as they usually don't have a lot of helpful comments.
The best alternative to a cursor I've found is to rework the logic to take a set based approach.
SQL Server is optimized to run the set based approach. You write the query to return a result set of data, like a join on tables for example, but the SQL Server execution engine determines which join to use: Merge Join, Nested Loop Join, or Hash Join. SQL Server determines the best possible joining algorithm based upon the participating columns, data volume, indexing structure, and the set of values in the participating columns. So it generally the best approach in performance over the procedural cursor approach.
Here is an article on Cursors and how to avoid them. It also discusses the alternatives to cursors.
Alernates for CURSOR in SQL server
1.While loop
2.Recursive CTE
Alernates for CURSOR in SQL server
1. Use temp table. create any column ID as identity column.
2. Use while loop to perform the operation.
I've been trying to figure out what's wrong with a set of queries I've got and I'm just confused at this point.
It's supposed to be in a stored procedure which gets called by a GUI application.
There's only one "tiny" problem, it's first a simple UPDATE, then an INSERT using a SELECT with a subselect and finally another UPDATE. Running these queries together by hand I get a total execution time of 0.057s, not too shabby.
Now, I try creating a stored procedure with these queries in it and five input variables, I run this procedure and on the first attempt it took 47.096s with subsequent calls to it showing similar execution times (35 to 50s). Running the individual queries from the MySQL Workbench still show execution times of less than 0.1s
There really isn't anything fancy about these queries, so why is the stored procedure taking an eternity to execute while the queries by themselves only take a fraction of a second? Is there some kind of MySQL peculiarity that I'm missing here?
Additional testing results:
It seems that if I run the queries in MySQL Workbench but use variables instead of just putting the values of the variables in the queries it runs just as slow as the stored procedure. So I tried changing the stored procedure to just use static values instead of variables and suddenly it ran blazingly fast. Apparently for some reason using a variable makes it run extremely slow (for example, the first UPDATE query goes from taking approximately 0.98s with three variables to 0.04-0.05s when I use the values of variables directly in the query, regardless of if it's in the stored procedure or running the query directly).
So, the problem isn't the stored procedure, it's something related to my use of variables (which is unavoidable).
I had the same problem. After researching for a while, I found out the problem was the collation issue while MySQL was comparing text.
TL;DR: the table was created in one collation while MySQL "thought" the variable was in another collation. Therefore, MySQL cannot use the index intended for the query.
In my case, the table was created with (latin1, latin1_swedish_ci) collation. To make MySQL to use the index, I had to change the where clause in the stored procedure from
UPDATE ... WHERE mycolumn = myvariable
to
UPDATE ... WHERE mycolumn =
convert(myvariable using latin1) collate latin1_swedish_ci
After the change, the stored procedure looked something like this:
CREATE PROCEDURE foo.'bar'()
BEGIN
UPDATE mytable SET mycolumn1 = variable1
WHERE mycolumn2 =
convert(variable2 using latin1) collate latin1_swedish_ci
END;
where (latin1, latin1_swedish_ci) is the same collation that my tableA was created with.
To check if MySQL uses the index or not, you can change the stored procedure to run an explain statement as followed:
CREATE PROCEDURE foo.'bar'()
BEGIN
EXPLAIN SELECT * FROM table WHERE mycolumn2 = variable2
END;
In my case, the explain result showed that no index was used during the execution of the query.
Note that MySQL may use the index when you run the query alone, but still won't use the index for the same query inside a stored procedure, which maybe because somehow MySQL sees the variable in another collation.
More information on the collation issue can be found here:
http://lowleveldesign.wordpress.com/2013/07/19/diagnosing-collation-issue-mysql-stored-procedure/
Back up link:
http://www.codeproject.com/Articles/623272/Diagnosing-a-collation-issue-in-a-MySQL-stored-pro
I had a similar problem.
Running a mysql routine was horrible slow.
But a colleague helped me.
The problem was that AUTOCOMMIT was true;
So every insert into and select was creating a complete transaction.
Then I run my routine with
SET autocommit=0;
at the beginning and
SET autocommit=1;
at the end. The performance went from nearly 500s to 4s
Since I didn't want to waste too much time trying to figure out why using variables in my stored procedures made them extremely slow I decided to employ a fix some people would consider quite ugly. I simply executed each query directly from the data access layer of my application. Not the prettiest way to do it (since a lot of other stuff for this app uses stored procedures) but it works and now the user won't have to wait 40+ seconds for certain actions as they happen almost instantly.
So, not really a solution or explanation of what was happening, but at least it works.
Upvote for a very interesting and important question. I found this discussion of some of the reasons that a stored procedure might be slow. I'd be interested to see readers' reactions to it.
The main recommendation that I took from the interchange: it helps to add more indexes.
Something that we ran across today that makes procedures slow, even when they run very fast as direct queries, is having parameter (or, presumably, variable) names that are the same as column names. The short version is, don't use a parameter name that is the same as one of the columns in the query in which it will be used. For example, if you had a field called account_id and a parameter named the same, change it to something like in_account_id and your run time can go from multiple seconds to hundredths of a second.
So I'm building a C program that connects to a mySQL database. Everything worked perfectly. Then, to save on number of queries, I decided that I would like to execute 10 statements at a time. I set the "CLIENT_MULTI_STATEMENTS" flag in the connection, and separated my statements by semicolons.
When I execute the first batch of 10 statements, it succeeds and mysql_real_query() returns a 0.
When I try the second batch though, it returns a "1" and doesn't work. Nowhere can I find what this "1" error code means, so I was hoping someone may have run into this problem before.
Please note that these are all UPDATE statements, and so I have no need for result sets, it's just several straight-up calls to mysql_real_query().
It's not clear from the documentation whether the errors this function can cause are returned or not, but it should be possible to obtain the actual error using mysql_error().
My guess is that you still have to loop through the result sets whether you're interested in them or not.`
Are these prepared statements? If that's the case then you can't use CLIENT_MULTI_STATEMENTS.
Also, note (from http://dev.mysql.com/doc/refman/5.5/en/c-api-multiple-queries.html) that:
After handling the result from the
first statement, it is necessary to
check whether more results exist and
process them in turn if so. To support
multiple-result processing, the C API
includes the mysql_more_results() and
mysql_next_result() functions. These
functions are used at the end of a
loop that iterates as long as more
results are available. Failure to
process the result this way may result
in a dropped connection to the server.
You have to walk over all the results, regardless of whether or not you care about the values.
Unfortunately I don't have a repro for my issue, but I thought I would try to describe it in case it sounds familiar to someone... I am using SSIS 2005, SP2.
My package has a package-scope user variable - let's call it user_var
first step in the control flow is an Execute SQL task which runs a stored procedure. All that SP does is insert a record in a SQL table (with an identity column) and then go back and get the max ID value. The Execute SQL task saves this output into user_var
the control flow then has a Data Flow Task - it goes and gets some source data, has a derived column which sets a column called run_id to user_var - and saves the data to a SQL destination
In most cases (this template is used for many packages, running every day) this all works great. All of the destination records created get set with a correct run_id.
However, in some cases, there is a set of the destination data that does not get run_id equal to user_var, but instead gets a value of 0 (0 is the default value for user_var).
I have 2 instances where this has happened, but I can't make it happen. In both cases, it was just less that 10,000 records that have run_id = 0. Since SSIS writes data out in 10,000 record blocks, this really makes me think that, for the first set of data written out, user_var was not yet set. Then, after that first block, for the rest of the data, run_id is set to a correct value.
But control passed on to my data flow from the Execute SQL task - it would have seemed reasonable to me that it wouldn't go on until the SP has completed and user_var is set. Maybe it just runs the SP, but doesn't wait for it to complete?
In both cases where this has happened there seemed to be a few packages hitting the table to get a new user_var at about the same time. And in both cases lots of data was written (40 million rows, 60 million rows) - my thinking is that that means the writes were happening for a while.
Sorry to be both long-winded AND vague. A winning combination! Does this sound familiar to anyone? Thanks.
Updating to show the SP I use to get the user_var:
CREATE PROCEDURE [dbo].[sp_GetRunIDForPackage] (#pkg varchar(50)) AS
-- add a new entry for this run of this package - the RUN_ID is an IDENTITY column and so
-- will get created for us
INSERT INTO shared.STAGE_LOAD_JOB( EFFECTIVE_TS, EXECUTED_BY )
VALUES( getdate(), #pkg )
-- now go back into the table and get the new RUN_ID for this package
SELECT MAX( RUN_ID )
FROM shared.STAGE_LOAD_JOB
WHERE EXECUTED_BY = #pkg
Is this variable being accessed lots of times, from lots of places? Do you have a bunch of parallel data flows using the same variable?
We've encountered a bug in both SQL 2005 and 2008 whereby a "race condition" causes the variable to be inaccessable from some threads, and the default value is used. In our case, the variable was our "base folder" location for packages, causing our overall execution control package to not find its sub-packages.
More detail here: SSIS Intermittent variable error: The system cannot find the file specified
Unfortunately, the work-around is to hard-code a default value into the variable that will work when the race condition happens. Easy for us (set base folder to be correct for our prod environment), but looks a lot hard for your issue.
Perhaps you could use multiple variables (one for each data flow), and a bunch of Execute SQL tasks to populate those variables? REALLY ugly, but it should help.
Did you check the value of user_var before getting to the Derived Column Component? It sounds like user_var may be 0 so you are doing run_id = user_var; run_id = 0. I may be naive to think it is that simple but that's the first thing I would check.
Given the procedure code, you might want to replace this:
SELECT MAX( RUN_ID )
FROM shared.STAGE_LOAD_JOB
WHERE EXECUTED_BY = #pkg
with this:
select scope_identity()
The scope_identity() function returns the identity that was entered in the current scope, which is the procedure. Not sure if this will solve the problem, but I find it best to work through them all as they might have unrelated consequences.