Optimize derived table in select - mysql

I have sql query:
SELECT tsc.Id
FROM TEST.Services tsc,
(
select * from DICT.Change sp
) spc
where tsc.serviceId = spc.service_id
and tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
and tsc.startDate > GREATEST(spc.StartTime, spc.startDate)
group by tsc.Id;
This query is very, very slow.
Explain:
Can this be optimized? How to rewrite this subquery for another?

What is the point of this query? Why the CROSS JOIN operation? Why do we need to return multiple copies of id column from Services table? And what are we doing with the millions of rows being returned?
Absent a specification, an actual set of requirements for the resultset, we're just guessing at it.
To answer your questions:
Yes, the query could be "optimized" by rewriting it to the resultset that is actually required, and do it much more efficiently than the monstrously hideous SQL in the question.
Some suggestions: ditch the old-school comma syntax for the join operation, and use the JOIN keyword instead.
With no join predicates, it's a "cross" join. Every row matched from one side matched to every row from the right side.) I recommend including the CROSS keyword as an indication to future readers that the absence of an ON clause (or, join predicates in the WHERE clause) is intentional, and not an oversight.
I'd also avoid an inline view, unless there is a specific reason for one.
UPDATE
The query in the question is updated to include some predicates. Based on the updated query, I would write it like this:
SELECT tsc.id
FROM TEST.Services tsc
JOIN DICT.Change spc
ON tsc.serviceid = spc.service_id
AND tsc.startdate > spc.starttime
AND tsc.startdate > spc.starttdate
AND ( tsc.planid = spc.plan_id
OR ( tsc.planid IS NOT NULL AND spc.plan_id = -1 )
)
Ensure that the query is making use of suitable index by looking at the output of EXPLAIN to see the execution plan, in particular, which indexes are being used.
Some notes:
If there are multiple rows from spc that "match" a row from tsc, the query will return duplicate values of tsc.id. (It's not clear why or if we need to return duplicate values. IF we need to count the number of copies of each tsc,id, we could do that in the query, returning distinct values of tsc.id along with a count. If we don't need duplicates, we could return just a distinct list.
GREATEST function will return NULL if any of the arguments are null. If the condition we need is "a > GREATEST(b,c)", we can specify "a > b AND a > c".
Also, this condition:
tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
can be re-written to return an equivalent result (I'm suspicious about the actual specification, and whether this original condition actually satisfies that adequately. Without example data and sample of expected output, we have to rely on the SQL as the specification, so we honor that in the rewrite.)
If we don't need to return duplicate values of tsc.id, assuming id is unique in TEST.Services, we could also write
SELECT tsc.id
FROM TEST.Services tsc
WHERE EXISTS
( SELECT 1
FROM DICT.Change spc
ON spc.service_id = tsc.serviceid
AND spc.starttime < tsc.startdate
AND spc.starttdate < tsc.startdate
AND ( ( spc.plan_id = tsc.planid )
OR ( spc.plan_id = -1 AND tsc.planid IS NOT NULL )
)
)

Related

SQL statement hanging up in MySQL database

I am needing some SQL help. I have a SELECT statement that references several tables and is hanging up in the MySQL database. I would like to know if there is a better way to write this statement so that it runs efficiently and does not hang up the DB? Any help/direction would be appreciated. Thanks.
Here is the code:
Select Max(b.BurID) As BurID
From My.AppTable a,
My.AddressTable c,
My.BurTable b
Where a.AppID = c.AppID
And c.AppID = b.AppID
And (a.Forename = 'Bugs'
And a.Surname = 'Bunny'
And a.DOB = '1936-01-16'
And c.PostcodeAnywhereBuildingNumber = '999'
And c.PostcodeAnywherePostcode = 'SK99 9Q9'
And c.isPrimary = 1
And b.ErrorInd <> 1
And DateDiff(CurDate(), a.ApplicationDate) <= 30)
There is NO mysql error in the log. Sorry.
Pro tip: use explicit JOINs rather than a comma-separated list of tables. It's easier to see the logic you're using to JOIN that way. Rewriting your query to do that gives us this.
select Max(b.BurID) As BurID
From My.AppTable AS a
JOIN My.AddressTable AS c ON a.AppID = c.AppID
JOIN My.BurTable AS b ON c.AppID = b.AppID
WHERE (a.Forename = 'Bugs'
And a.Surname = 'Bunny'
And a.DOB = '1936-01-16'
And c.PostcodeAnywhereBuildingNumber = '999'
And c.PostcodeAnywherePostcode = 'SK99 9Q9'
And c.isPrimary = 1
And b.ErrorInd <> 1
And DateDiff(CurDate(), a.ApplicationDate) <= 30)
Next pro tip: Don't use functions (like DateDiff()) in WHERE clauses, because they defeat using indexes to search. That means you should change the last line of your query to
AND a.ApplicationDate >= CurDate() - INTERVAL 30 DAY
This has the same logic as in your query, but it leaves a naked (and therefore index-searchable) column name in the search expression.
Next, we need to look at your columns to see how you are searching, and cook up appropriate indexes.
Let's start with AppTable. You're screening by specific values of Forename, Surname, and DOB. You're screening by a range of ApplicationDate values. Finally you need AppID to manage your join. So, this compound index should help. Its columns are in the correct order to use a range scan to satisfy your query, and contains the needed results.
CREATE INDEX search1 USING BTREE
ON AppTable
(Forename, Surname, DOB, ApplicationDate, AppID)
Next, we can look at your AddressTable. Similar logic applies. You'll enter this table via the JOINed AppID, and then screen by specific values of three columns. So, try this index
CREATE INDEX search2 USING BTREE
ON AddressTable
(AppID, PostcodeAnywherePostcode, PostcodeAnywhereBuildingNumber, isPrimary)
Finally, we're on to your BurTable. Use similar logic as the other two, and try this index.
CREATE INDEX search3 USING BTREE
ON BurTable
(AppID, ErrorInd, BurID)
This kind of index is called a compound covering index, and can vastly speed up the sort of summary query you have asked about.

MySQL: Subquery returns more than 1 row

I know this has been asked plenty times before, but I cant find an answer that is close to mine.
I have the following query:
SELECT c.cases_ID, c.cases_status, c.cases_title, ci.custinfo_FName, ci.custinfo_LName, c.cases_timestamp, o.organisation_name
FROM db_cases c, db_custinfo ci, db_organisation o
WHERE c.userInfo_ID = ci.userinfo_ID AND c.cases_status = '2'
AND organisation_name = (
SELECT organisation_name
FROM db_sites s, db_cases c
WHERE organisation_ID = '111'
)
AND s.sites_site_ID = c.sites_site_ID)
What I am trying to do is is get the cases, where the sites_site_ID which is defined in the cases, also appears in the db_sites sites table alongside its organisation_ID which I want to filter by as defined by "organisation_ID = '111'" but I am getting the response from MySQL as stated in the question.
I hope this makes sense, and I would appreciate any help on this one.
Thanks.
As the error states your subquery returns more then one row which it cannot do in this situation. If this is not expect results you really should investigate why this occurs. But if you know this will happen and want only the first result use LIMIT 1 to limit the results to one row.
SELECT organisation_name
FROM db_sites s, db_cases c
WHERE organisation_ID = '111'
LIMIT 1
Well the problem is, obviously, that your subquery returns more than one row which is invalid when using it as a scalar subquery such as with the = operator in the WHERE clause.
Instead you could do an inner join on the subquery which would filter your results to only rows that matched the ON clause. This will get you all rows that match, even if there is more than one returned in the subquery.
UPDATE:
You're likely getting more than one row from your subquery because you're doing a cross join on the db_sites and db_cases table. You're using the old-style join syntax and then not qualifying any predicate to join the tables on in the WHERE clause. Using this old style of joining tables is not recommended for this very reason. It would be better if you explicitly stated what kind of join it was and how the tables should be joined.
Good pages on joins:
http://dev.mysql.com/doc/refman/5.0/en/join.html (for the right syntax)
http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html (for the differences between the types of joins)
I was battling this for an hour, and overcomplicated it completely. Sometimes a quick break and writing it out on an online forum can solve it for you ;)
Here is the query as it should be.
SELECT c.cases_ID, c.cases_status, c.cases_title, ci.custinfo_FName, ci.custinfo_LName, c.cases_timestamp, c.sites_site_ID
FROM db_cases c, db_custinfo ci, db_sites s
WHERE c.userInfo_ID = ci.userinfo_ID AND c.cases_status = '2' AND (s.organisation_ID = '111' AND s.sites_site_ID = c.sites_site_ID)
Let me re-write what you have post:
SELECT
c.cases_ID, c.cases_status, c.cases_title, ci.custinfo_FName, ci.custinfo_LName,
c.cases_timestamp, c.sites_site_ID
FROM
db_cases c
JOIN
db_custinfo ci ON c.userInfo_ID = ci.userinfo_ID and c.cases_status = '2'
JOIN
db_sites s ON s.sites_site_ID = c.sites_site_ID and s.organization_ID = 111

Using the right MYSQL JOIN

I'm trying to get all the data from the match table, along with the currently signed up gamers of each type, experienced or not.
Gamers
(PK)Gamer_Id
Gamer_firstName,
Gamer_lastName,
Gamer experience(Y/N)
Gamer_matches
(PK)FK GamerId,
(PK)FK MatchId,
Gamer_score
Match
(PK)Match_Id,
ExperiencedGamers_needed,
InExperiencedGamers_needed
I've tried this query along with many others but it doesn't work, is it a bad join?
SELECT M.MatchId,M.ExperiencedGamers_needed,M.InExperiencedGamers_needed,
(SELECT COUNT(GM.GamerId)
FROM Gamers G, Gamers_matches GM
WHERE G.GamerId = GM.GamerId
AND G.experience = "Y"
AND GM.MatchId = M.MatchId
GROUP BY GM.MatchId)AS ExpertsSignedUp,
(SELECT COUNT(GM.GamerId)
FROM Gamers G, Gamers_matches GM
WHERE G.GamerId = GM.GamerId
AND G.experience = "N"
AND GM.MatchId = M.MatchId
GROUP BY GM.MatchId) AS NovicesSignedUp
FROM MATCHES M
What you've written is called a correlated subquery which forces SQL to re-execute the subquery for each row fetched from Matches. It can be made to work, but it's pretty inefficient. In some complex queries it may be necessary, but not in this case.
I would solve this query this way:
SELECT M.MatchId, M.ExperiencedGamers_needed,M.InExperiencedGamers_needed,
SUM(G.experience = 'Y') AS ExpertsSignedUp,
SUM(G.experience = 'N') AS NovicesSignedUp
FROM MATCHES M
LEFT OUTER JOIN (Gamer_matches GM
INNER JOIN Gamers G ON G.GamerId = GM.GamerId)
ON M.MatchId = GM.MatchId
GROUP BY M.MatchId;
Here it outputs only one row per Match because of the GROUP BY at the end.
There's no subquery to re-execute many times, it's just joining Matches to the respective rows in the other tables once. But I use an outer join in case a Match has zero players of eithe type signed up.
Then instead of using COUNT() I use a trick of MySQL and use SUM() with a boolean expression inside the SUM() function. Boolean expressions in MySQL always return 0 or 1. The SUM() of these is the same as the COUNT() where the expression returns true. This way I can get the "count" of both experts and novices only scanning the Gamers table once.
P.S. MySQL is working in a non-standard way to return 0 or 1 from a boolean expression. Standard ANSI SQL does not support this, nor do many other brands of RDBMS. Standardly, a boolean expression returns a boolean, not an integer.
But you can use a more verbose expression if you need to write standard SQL for portability:
SUM(CASE G.experience WHEN 'Y' THEN 1 WHEN 'N' THEN 0 END) AS ExpertsSignedUp

How to avoid filesort for that mysql query?

I'm using this kind of queries with different parameters :
EXPLAIN SELECT SQL_NO_CACHE `ilan_genel`.`id` , `ilan_genel`.`durum` , `ilan_genel`.`kategori` , `ilan_genel`.`tip` , `ilan_genel`.`ozellik` , `ilan_genel`.`m2` , `ilan_genel`.`fiyat` , `ilan_genel`.`baslik` , `ilan_genel`.`ilce` , `ilan_genel`.`parabirimi` , `ilan_genel`.`tarih` , `kgsim_mahalleler`.`isim` AS mahalle, `kgsim_ilceler`.`isim` AS ilce, (
SELECT `ilanresimler`.`resimlink`
FROM `ilanresimler`
WHERE `ilanresimler`.`ilanid` = `ilan_genel`.`id`
LIMIT 1
) AS resim
FROM (
`ilan_genel`
)
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
LEFT JOIN `kgsim_mahalleler` ON `kgsim_mahalleler`.`id` = `ilan_genel`.`mahalle`
WHERE `ilan_genel`.`ilce` = '703'
AND `ilan_genel`.`durum` = '1'
AND `ilan_genel`.`kategori` = '1'
AND `ilan_genel`.`tip` = '9'
ORDER BY `ilan_genel`.`id` DESC
LIMIT 225 , 15
and this is what i get in explain section:
these are the indexes that i already tried to use:
any help will be deeply appreciated what kind of index will be the best option or should i use another table structure ?
You should first simplify your query to understand your problem better. As it appears your problem is constrained to the ilan_gen1 table, the following query would also show you the same symptoms.:
SELECT * from ilan_gene1 WHERE `ilan_genel`.`ilce` = '703'
AND `ilan_genel`.`durum` = '1'
AND `ilan_genel`.`kategori` = '1'
AND `ilan_genel`.`tip` = '9'
So the first thing to do is check that this is the case. If so, the simpler question is simply why does this query require a file sort on 3661 rows. Now the 'hepsi' index sort order is:
ilce->mahelle->durum->kategori->tip->ozelik
I've written it that way to emphasise that it is first sorted on 'ilce', then 'mahelle', then 'durum', etc. Note that your query does not specify the 'mahelle' value. So the best the index can do is lookup on 'ilce'. Now I don't know the heuristics of your data, but the next logical step in debugging this would be:
SELECT * from ilan_gene1 WHERE `ilan_genel`.`ilce` = '703'`
Does this return 3661 rows?
If so, you should be able to see what is happening. The database is using the hepsi index, to the best of it's ability, getting 3661 rows back then sorting those rows in order to eliminate values according to the other criteria (i.e. 'durum', 'kategori', 'tip').
The key point here is that if data is sorted by A, B, C in that order and B is not specified, then the best logical thing that can be done is: first a look up on A then a filter on the remaining values against C. In this case, that filter is performed via a file sort.
Possible solutions
Supply 'mahelle' (B) in your query.
Add a new index on 'ilan_gene1' that doesn't require 'mahelle', i.e. A->C->D...
Another tip
In case I have misdiagnosed your problem (easy to do when I don't have your system to test against), the important thing here is the approach to solving the problem. In particular, how to break a complicated query into a simpler query that produces the same behaviour, until you get to a very simple SELECT statement that demonstrates the problem. At this point, the answer is usually much clearer.

Dev Code - Understanding What I Am Seeing

This is where I start by saying I am not a developer and this is not my code. As the DBA though it has shown up on plate from a performance perspective. The execution plan shows me that there are CI scans for Table2 aliased as D and Table2 aliased as E. Focusing on Table 2 aliased as E. The scan is coming from the subquery in the where clause for E.SEQ_NBR =
I am also seeing far more executions than need be. I know it depends on the exact index structure on the table, but at a high level is it likely that what I am seeing is a CI scan resulting from the aggregate (min) for every match it finds. Basically it is walking the table for the min SEQ_NBR for each match on EMPLID and other fields?
If likely, is it more a result of the manner in which it is written (I would think incorporating a CTE with some ROW_NUMBER logic would help) or lack of indexing? I am trying to avoid throwing an index at it "just because". I am getting hung up on that sub query in the where clause.
SELECT
D.EMPLID
,D.JOBCODE
,D.DEPTID
,E.DUR
,SUM(D.TL_QUANTITY) 'YTD_TL_QUANTITY'
FROM
Table1 B
,Table2 D
,Table2 E
WHERE
D.TRC = B.TRC
AND B.TL_ERNCD IN ( #0, #1, #2, #3, #4, #5, #6 )
AND D.EMPLID = E.EMPLID
AND D.EMPL_RCD = E.EMPL_RCD
AND D.DUR < = E.DUR
AND D.DUR > = '1/1/' + CAST(DATEPART(YEAR, E.DUR) AS CHAR)
AND E.SEQ_NBR =
( SELECT
MIN(EX.SEQ_NBR)
FROM
Table2 EX
WHERE
E.EMPLID = EX.EMPLID
AND E.EMPL_RCD = EX.EMPL_RCD
AND E.DUR = EX.DUR
)
AND B.EFFDT = ( SELECT
MAX(B_ED.EFFDT)
FROM
Table1 B_ED
WHERE
B.TRC = B_ED.TRC
AND B_ED.EFFDT < = GETDATE()
)
GROUP BY
D.EMPLID
,D.JOBCODE
,D.DEPTID
,E.DUR
The MIN operation has nothing to do with the CL scan. A MIN or Max is calculated using a sort. The problem is most likely the number of times the subquery is being executed. It has to loop through the subquery for every record returned in the parent query. A CTE may be helpful here depending on the size of Table2, but I don't think you need to worry about finding a replacement for the MIN() ... at least not yet.
Correlated subqueries are performance killers. Remove them and replace them with CTEs and JOINs or derived tables.
Try something like this (not tested)
SELECT
D.EMPLID
,D.JOBCODE
,D.DEPTID
,E.DUR
,SUM(D.TL_QUANTITY) 'YTD_TL_QUANTITY'
FROM Table1 B
JOIN Table2 D
ON D.TRC = B.TRC AND D.EMPLID = E.EMPLID
JOIN Table2 E
ON D.EMPL_RCD = E.EMPL_RCD AND D.DUR < = E.DUR
JOIN (SELECT MIN(EX.SEQ_NBR)FROM Table2) EX
ON E.EMPLID = EX.EMPLID
AND E.EMPL_RCD = EX.EMPL_RCD
AND E.DUR = EX.DUR
JOIN (SELECT MAX(B_ED.EFFDT)
FROM Table1
WHERE B_ED.EFFDT < = GETDATE()) B_ED
ON B.TRC = B_ED.TRC
WHERE B.TL_ERNCD IN ( #0, #1, #2, #3, #4, #5, #6 )
AND D.DUR > = '1/1/' + CAST(DATEPART(YEAR, E.DUR) AS CHAR)
As far as the implicit join syntax, do not allow anyone to ever do this again. It is a poor programming technique. As a DBA you can say what you will and will not allow in the database. Code review what is coming in and do not pass it until they remove the implicit syntax.
Why is is bad? In the first place you get accidental cross joins. Further, from a maintenance perspective, you can't tell if the cross join was accidental (and thus the query incorrect) or on purpose. This means the query with a cross join in it is unmaintainable.
Next, if you have to change some of the joins later to outer joins and do not fix all the implict ones at the same time, you can get incorrect results (which may not be noticed by an inexperienced developer. In SQL Server 2008 you cannot use the implicit syntax for an outer join, but it shouldn't have been used even as far back as SQl Server 2000 because Books Online (for SQL Server 2000) states that there are cases where it is misinterpreted. In other words, the syntax in unreliable for outer joins. There is no excuse ever for using an implicit join, you gain nothing from them over using an explicit join and they can create more problems.
You need to educate your developers and tell them that this code (which has been obsolete since 1992!) is not longer acceptable.
This a quick one, but this, CAST('1/1/' + CAST(DATEPART(YEAR, E.DUR) AS CHAR) AS DATETIME), it likely causing a table scan on Table2 E because the function likely has to be evaluated against each row.