Dev Code - Understanding What I Am Seeing - sql-server-2008

This is where I start by saying I am not a developer and this is not my code. As the DBA though it has shown up on plate from a performance perspective. The execution plan shows me that there are CI scans for Table2 aliased as D and Table2 aliased as E. Focusing on Table 2 aliased as E. The scan is coming from the subquery in the where clause for E.SEQ_NBR =
I am also seeing far more executions than need be. I know it depends on the exact index structure on the table, but at a high level is it likely that what I am seeing is a CI scan resulting from the aggregate (min) for every match it finds. Basically it is walking the table for the min SEQ_NBR for each match on EMPLID and other fields?
If likely, is it more a result of the manner in which it is written (I would think incorporating a CTE with some ROW_NUMBER logic would help) or lack of indexing? I am trying to avoid throwing an index at it "just because". I am getting hung up on that sub query in the where clause.
SELECT
D.EMPLID
,D.JOBCODE
,D.DEPTID
,E.DUR
,SUM(D.TL_QUANTITY) 'YTD_TL_QUANTITY'
FROM
Table1 B
,Table2 D
,Table2 E
WHERE
D.TRC = B.TRC
AND B.TL_ERNCD IN ( #0, #1, #2, #3, #4, #5, #6 )
AND D.EMPLID = E.EMPLID
AND D.EMPL_RCD = E.EMPL_RCD
AND D.DUR < = E.DUR
AND D.DUR > = '1/1/' + CAST(DATEPART(YEAR, E.DUR) AS CHAR)
AND E.SEQ_NBR =
( SELECT
MIN(EX.SEQ_NBR)
FROM
Table2 EX
WHERE
E.EMPLID = EX.EMPLID
AND E.EMPL_RCD = EX.EMPL_RCD
AND E.DUR = EX.DUR
)
AND B.EFFDT = ( SELECT
MAX(B_ED.EFFDT)
FROM
Table1 B_ED
WHERE
B.TRC = B_ED.TRC
AND B_ED.EFFDT < = GETDATE()
)
GROUP BY
D.EMPLID
,D.JOBCODE
,D.DEPTID
,E.DUR

The MIN operation has nothing to do with the CL scan. A MIN or Max is calculated using a sort. The problem is most likely the number of times the subquery is being executed. It has to loop through the subquery for every record returned in the parent query. A CTE may be helpful here depending on the size of Table2, but I don't think you need to worry about finding a replacement for the MIN() ... at least not yet.

Correlated subqueries are performance killers. Remove them and replace them with CTEs and JOINs or derived tables.
Try something like this (not tested)
SELECT
D.EMPLID
,D.JOBCODE
,D.DEPTID
,E.DUR
,SUM(D.TL_QUANTITY) 'YTD_TL_QUANTITY'
FROM Table1 B
JOIN Table2 D
ON D.TRC = B.TRC AND D.EMPLID = E.EMPLID
JOIN Table2 E
ON D.EMPL_RCD = E.EMPL_RCD AND D.DUR < = E.DUR
JOIN (SELECT MIN(EX.SEQ_NBR)FROM Table2) EX
ON E.EMPLID = EX.EMPLID
AND E.EMPL_RCD = EX.EMPL_RCD
AND E.DUR = EX.DUR
JOIN (SELECT MAX(B_ED.EFFDT)
FROM Table1
WHERE B_ED.EFFDT < = GETDATE()) B_ED
ON B.TRC = B_ED.TRC
WHERE B.TL_ERNCD IN ( #0, #1, #2, #3, #4, #5, #6 )
AND D.DUR > = '1/1/' + CAST(DATEPART(YEAR, E.DUR) AS CHAR)
As far as the implicit join syntax, do not allow anyone to ever do this again. It is a poor programming technique. As a DBA you can say what you will and will not allow in the database. Code review what is coming in and do not pass it until they remove the implicit syntax.
Why is is bad? In the first place you get accidental cross joins. Further, from a maintenance perspective, you can't tell if the cross join was accidental (and thus the query incorrect) or on purpose. This means the query with a cross join in it is unmaintainable.
Next, if you have to change some of the joins later to outer joins and do not fix all the implict ones at the same time, you can get incorrect results (which may not be noticed by an inexperienced developer. In SQL Server 2008 you cannot use the implicit syntax for an outer join, but it shouldn't have been used even as far back as SQl Server 2000 because Books Online (for SQL Server 2000) states that there are cases where it is misinterpreted. In other words, the syntax in unreliable for outer joins. There is no excuse ever for using an implicit join, you gain nothing from them over using an explicit join and they can create more problems.
You need to educate your developers and tell them that this code (which has been obsolete since 1992!) is not longer acceptable.

This a quick one, but this, CAST('1/1/' + CAST(DATEPART(YEAR, E.DUR) AS CHAR) AS DATETIME), it likely causing a table scan on Table2 E because the function likely has to be evaluated against each row.

Related

SQL: [performance] what is faster, Inner joins or equal statement in "where"

What is the best performing query of the two? I'm looking for a general performance answer :)
select count(A.M_XXX_ROUTE)
from
YY.TRN_HDR_BB C
inner join YY.DLV_CASH_BB A on C.M_NB = A.M_TRD_REF
inner join YY.TABLE#DATA#SITRN_BB B on A.M_XXX_ROUTE = B.M_REF
or
select count(A.M_XXX_ROUTE)
from
YY.DLV_CASH_BB A,
YY.TRN_HDR_BB C,
YYY.TABLE#DATA#SITRN_BB B
Where
C.M_NB=A.M_TRD_REF
and A.M_XXX_ROUTE=B.M_REF;
There are only small differences in the performance between the two, but what if i add 50 more joins or where statements?
They will be the same, because internally, Oracle will recast the ANSI syntax into its own anyway. If I run this:
SQL> explain plan for
2 select *
3 from scott.emp e
4 inner join scott.dept d
5 on d.deptno = e.deptno;
Explained.
then a trace of what is happening under the covers reveals the query to be
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT "E"."EMPNO" "EMPNO","E"."ENAME" "ENAME","E"."JOB" "JOB","E"."MGR" "MGR","E"."HIREDATE" "HIREDATE",
"E"."SAL" "SAL","E"."COMM" "COMM","E"."DEPTNO" "DEPTNO","D"."DEPTNO" "DEPTNO","D"."DNAME" "DNAME","D"."LOC" "LOC"
FROM "SCOTT"."EMP" "E","SCOTT"."DEPT" "D" WHERE "D"."DEPTNO"="E"."DEPTNO"
That trace is called a "10053" trace, but thats a level of detail you probably don't need to worry about.
I dont think so there will be any performaqnce difference, you can confirm the same by explain plan.
1st Query is ANSI standard while 2nd is Oracle old way of doing joins.

Optimize derived table in select

I have sql query:
SELECT tsc.Id
FROM TEST.Services tsc,
(
select * from DICT.Change sp
) spc
where tsc.serviceId = spc.service_id
and tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
and tsc.startDate > GREATEST(spc.StartTime, spc.startDate)
group by tsc.Id;
This query is very, very slow.
Explain:
Can this be optimized? How to rewrite this subquery for another?
What is the point of this query? Why the CROSS JOIN operation? Why do we need to return multiple copies of id column from Services table? And what are we doing with the millions of rows being returned?
Absent a specification, an actual set of requirements for the resultset, we're just guessing at it.
To answer your questions:
Yes, the query could be "optimized" by rewriting it to the resultset that is actually required, and do it much more efficiently than the monstrously hideous SQL in the question.
Some suggestions: ditch the old-school comma syntax for the join operation, and use the JOIN keyword instead.
With no join predicates, it's a "cross" join. Every row matched from one side matched to every row from the right side.) I recommend including the CROSS keyword as an indication to future readers that the absence of an ON clause (or, join predicates in the WHERE clause) is intentional, and not an oversight.
I'd also avoid an inline view, unless there is a specific reason for one.
UPDATE
The query in the question is updated to include some predicates. Based on the updated query, I would write it like this:
SELECT tsc.id
FROM TEST.Services tsc
JOIN DICT.Change spc
ON tsc.serviceid = spc.service_id
AND tsc.startdate > spc.starttime
AND tsc.startdate > spc.starttdate
AND ( tsc.planid = spc.plan_id
OR ( tsc.planid IS NOT NULL AND spc.plan_id = -1 )
)
Ensure that the query is making use of suitable index by looking at the output of EXPLAIN to see the execution plan, in particular, which indexes are being used.
Some notes:
If there are multiple rows from spc that "match" a row from tsc, the query will return duplicate values of tsc.id. (It's not clear why or if we need to return duplicate values. IF we need to count the number of copies of each tsc,id, we could do that in the query, returning distinct values of tsc.id along with a count. If we don't need duplicates, we could return just a distinct list.
GREATEST function will return NULL if any of the arguments are null. If the condition we need is "a > GREATEST(b,c)", we can specify "a > b AND a > c".
Also, this condition:
tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
can be re-written to return an equivalent result (I'm suspicious about the actual specification, and whether this original condition actually satisfies that adequately. Without example data and sample of expected output, we have to rely on the SQL as the specification, so we honor that in the rewrite.)
If we don't need to return duplicate values of tsc.id, assuming id is unique in TEST.Services, we could also write
SELECT tsc.id
FROM TEST.Services tsc
WHERE EXISTS
( SELECT 1
FROM DICT.Change spc
ON spc.service_id = tsc.serviceid
AND spc.starttime < tsc.startdate
AND spc.starttdate < tsc.startdate
AND ( ( spc.plan_id = tsc.planid )
OR ( spc.plan_id = -1 AND tsc.planid IS NOT NULL )
)
)

MySQL Select within another select

I have a query as follows
select
Sum(If(departments.vat, If(weeklytransactions.weekendingdate Between
'2011-01-04' And '2099-12-31', weeklytransactions.takings / 1.2,
If(weeklytransactions.weekendingdate Between '2008-11-30' And '2010-01-01',
weeklytransactions.takings / 1.15, weeklytransactions.takings / 1.175)),
weeklytransactions.takings)) As Total,
weeklytransactions.weekendingdate,......
and another that returns a vat rate as follows
select format(Max(Distinct vat_rates.Vat_Rate),3) From vat_rates Where
vat_rates.Vat_From <= '2011-01-03'
I want to replace the hard coded if statement with the lower query, replacing the date in the lower query with weeklytransactions.weekendingdate.
After Kevin's comments, here is the full query I'm trying to get to work;
Select Max(vat_rates.vat_rate) As r,
If(departments.vat, weeklytransactions.takings / r, weeklytransactions.takings) As Total,
weeklytransactions.weekendingdate,
Week(weeklytransactions.weekendingdate),
round(datediff(weekendingdate, (if(month(weekendingdate)>5,concat(year(weekendingdate),'-06-01'),concat(year(weekendingdate)-1,'-06-01'))))/7,0)+1 as fyweek,
cast((Case When Month(weeklytransactions.weekendingdate) >5 Then Concat(Year(weeklytransactions.weekendingdate), '-',Year(weeklytransactions.weekendingdate) + 1) Else Concat(Year(weeklytransactions.weekendingdate) - 1, '-',Year(weeklytransactions.weekendingdate)) End) as char) As fy,
business_units.business_unit
From departments Inner Join (business_units Inner Join weeklytransactions On business_units.buID = weeklytransactions.businessUnit) On departments.deptid = weeklytransactions.departmentId
Where (vat_rates.vat_from <= weeklytransactions.weekendingdate and business_units.Active = true and business_units.sales=1)
Group By weeklytransactions.weekendingdate, business_units.business_unit Order By fy desc, business_unit, fyweek
Regards
Pete
Assuming I read your question correctly, your problem is about having the result of another SELECT used to be returned by the result of your main query (plus depending on how acquainted you are with SQL, maybe you haven't had the occasion to learn about JOINs?).
You can have subqueries you extract data from within a SELECT, provided you define it within the FROMclause. The following query will work, for example:
SELECT A.a, B.b
FROM A
JOIN (SELECT aggregate(c) FROM C) AS B
Notice that there is no reference to table A within the subquery. Thing is, you cannot just add it like that to the query, as the subquery doesn't know it is a subquery. So the following won't work:
SELECT A.a, B.b
FROM A
JOIN (SELECT aggregate(c) FROM C WHERE C.someValue = A.someValue) AS B
Back to basics. What you want to do here visibly, is to aggregate some data associated to each of the records of another table. For that, you will need merge your SELECT queries and use GROUP BY:
SELECT A.a, aggregate(C.c)
FROM A, C
WHERE C.someValue = A.someValue
GROUP BY A.a
Back to your tables, the following should work:
SELECT w.weekendingdate, FORMAT(MAX(v.Vat_Rate, 3)
FROM weeklytransactions AS w, vat_rates AS v
WHERE v.Vat_From <= w.weekendingdate
GROUP BY w.weekendingdate
Feel free to add and remove fields and conditions as you see fit (I wouldn't be surprised that you'd also want to use a lower bound when filtering the records from vat_rates, since the way I have written it above, for a given weekendingdate, you get records from that week + the weeks before!).
So it looks like my first try did not address the actual problem. With the additional information provided in the comments, as well as the new complete query, let's see how this goes.
We are still missing error messages, but normally the query as written should result in MySQL having the following complaint:
ERROR 1109 (42S02): Unknown table 'vat_rates' in field list
Why? Because the vat_rates table does not appear in the FROM clause, whereas it should. Let's make that more obvious by simplifying the query, removing all references to the business_units table as well as the fields, calculations and order that do not add or remove anything to the problem, leaving us with the following:
SELECT MAX(vat_rates.vat_rate) AS r,
IF(d.vat, w.takings / r, w.takings) AS Total
FROM departments AS d
INNER JOIN weeklytransactions AS w ON w.departmentId = d.deptid
WHERE vat_rates.vat_from <= w.weekendingdate
GROUP BY w.weekendingdate
That cannot work, and will produce the error mentioned above. It looks like there is no FOREIGN ID between the weeklytransactions and vat_rates tables, so we have no choice but to do a CROSS JOIN for the moment, hoping that the condition in the WHERE clause and the aggregate function used to get r are enough to fit the business logic at hand here. The following query should return the expected data instead of an error message (I also remove r since that seems to be an intermediate value judging by the comments that were written):
SELECT IF(d.vat, w.takings / MAX(v.vat_rate), w.takings) AS Total
FROM vat_rates AS v, departments AS d
INNER JOIN weeklytransactions AS w ON w.departmentId = d.deptid
WHERE v.vat_from <= w.weekendingdate
GROUP BY w.weekendingdate
From there, assuming it works, you will only need to put back all the parts I removed to get your final query. I am a tad doubtful about the way the VAT rate is gotten here, but I have no idea what your requirements are in that regard so I leave it up to you to make sure that works as expected.

MySQL: Subquery returns more than 1 row

I know this has been asked plenty times before, but I cant find an answer that is close to mine.
I have the following query:
SELECT c.cases_ID, c.cases_status, c.cases_title, ci.custinfo_FName, ci.custinfo_LName, c.cases_timestamp, o.organisation_name
FROM db_cases c, db_custinfo ci, db_organisation o
WHERE c.userInfo_ID = ci.userinfo_ID AND c.cases_status = '2'
AND organisation_name = (
SELECT organisation_name
FROM db_sites s, db_cases c
WHERE organisation_ID = '111'
)
AND s.sites_site_ID = c.sites_site_ID)
What I am trying to do is is get the cases, where the sites_site_ID which is defined in the cases, also appears in the db_sites sites table alongside its organisation_ID which I want to filter by as defined by "organisation_ID = '111'" but I am getting the response from MySQL as stated in the question.
I hope this makes sense, and I would appreciate any help on this one.
Thanks.
As the error states your subquery returns more then one row which it cannot do in this situation. If this is not expect results you really should investigate why this occurs. But if you know this will happen and want only the first result use LIMIT 1 to limit the results to one row.
SELECT organisation_name
FROM db_sites s, db_cases c
WHERE organisation_ID = '111'
LIMIT 1
Well the problem is, obviously, that your subquery returns more than one row which is invalid when using it as a scalar subquery such as with the = operator in the WHERE clause.
Instead you could do an inner join on the subquery which would filter your results to only rows that matched the ON clause. This will get you all rows that match, even if there is more than one returned in the subquery.
UPDATE:
You're likely getting more than one row from your subquery because you're doing a cross join on the db_sites and db_cases table. You're using the old-style join syntax and then not qualifying any predicate to join the tables on in the WHERE clause. Using this old style of joining tables is not recommended for this very reason. It would be better if you explicitly stated what kind of join it was and how the tables should be joined.
Good pages on joins:
http://dev.mysql.com/doc/refman/5.0/en/join.html (for the right syntax)
http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html (for the differences between the types of joins)
I was battling this for an hour, and overcomplicated it completely. Sometimes a quick break and writing it out on an online forum can solve it for you ;)
Here is the query as it should be.
SELECT c.cases_ID, c.cases_status, c.cases_title, ci.custinfo_FName, ci.custinfo_LName, c.cases_timestamp, c.sites_site_ID
FROM db_cases c, db_custinfo ci, db_sites s
WHERE c.userInfo_ID = ci.userinfo_ID AND c.cases_status = '2' AND (s.organisation_ID = '111' AND s.sites_site_ID = c.sites_site_ID)
Let me re-write what you have post:
SELECT
c.cases_ID, c.cases_status, c.cases_title, ci.custinfo_FName, ci.custinfo_LName,
c.cases_timestamp, c.sites_site_ID
FROM
db_cases c
JOIN
db_custinfo ci ON c.userInfo_ID = ci.userinfo_ID and c.cases_status = '2'
JOIN
db_sites s ON s.sites_site_ID = c.sites_site_ID and s.organization_ID = 111

mysql query not working

i am working on a query which joins several tables.here's the code.
the query works fine until the time i add the third line SUM(SaleItems_T.qtymajor) AS sales. i get an error message which says
Unknown column 'SaleItems_T.qtymajor' in 'field list'
I am trying to build an reorder worksheet.Help is much appreciated.
SELECT ProductMaster_T.ProductName_VC AS PGroup,
StockMain_T.ItemDescription AS Item,
SUM(SaleItems_T.qtymajor) AS sales,
stockbuffers_T.buffer_qty AS BufferQty,
(stkbalance_T.AJ1+stkbalance_T.AR2+stkbalance_T.AD3+stkbalance_T.DX4) AS Stock,
(stkbalance_T.AJ1+stkbalance_T.AR2+stkbalance_T.AD3+stkbalance_T.DX4)-stockbuffers_T.buffer_qty AS Result
FROM ProductMaster_T, StockMain_T, stockbuffers_T, stkbalance_T
WHERE StockMain_T.ItemCode = stockbuffers_T.itemcode
AND
StockMain_T.ItemCode = stkbalance_T.itemid
AND
ProductMaster_T.ProductID = StockMain_T.ProdID
AND
SaleItems_T.ItemID = StockMain_T.ItemCode
ORDER BY
ProductName_VC,ItemDescription ASC
You haven't referenced the SaleItems_T table in your query, either in the FROM clause, or through a JOIN.
This is where your query is wrong:
FROM ProductMaster_T, StockMain_T, stockbuffers_T, stkbalance_T
Change that to:
FROM ProductMaster_T, StockMain_T, stockbuffers_T, stkbalance_T, SaleItems_T
(Please no vote for this. I only put it here as comment space is not suitable fot such long comment.)
You should really use explicit JOIN ... ON join_condition syntax instead of the implicit JOIN via WHERE conditions (this is really old way to do it). It's better because it's hard to forget a condition (or a table, as you did!) and thus less error-prone. It also separates the join conditions (which you'll use in almost every query) from the other conditions you may have in various queries.
So, instead of
FROM ProductMaster_T, StockMain_T
WHERE ProductMaster_T.ProductID = StockMain_T.ProdID
write:
FROM ProductMaster_T
JOIN StockMain_T
ON ProductMaster_T.ProductID = StockMain_T.ProdID
It's also nice to use aliases (with the (optional) AS keyword). It make code more readable:
FROM ProductMaster_T AS p
JOIN StockMain_T AS m
ON p.ProductID = m.ProdID
The whole query could be written as:
SELECT
master.ProductName_VC AS PGroup,
main.ItemDescription AS Item,
SUM(items.qtymajor) AS sales,
buf.buffer_qty AS BufferQty,
(bal.AJ1 + bal.AR2 + bal.AD3 + bal.DX4)
AS Stock,
(bal.AJ1 + bal.AR2 + bal.AD3 + bal.DX4) - buf.buffer_qty
AS Result
FROM ProductMaster_T AS master
JOIN StockMain_T AS main
ON master.ProductID = main.ProdID
JOIN stockbuffers_T AS buf
ON main.ItemCode = buf.itemcode
JOIN stkbalance_T AS bal
ON main.ItemCode = bal.itemid
JOIN SaleItems_T AS items
ON items.ItemID = main.ItemCode
ORDER BY
ProductName_VC ASC,
ItemDescription ASC
GROUP BY ??? main.ItemCode ??? --- depends on your tables'
--- relationships