Using the right MYSQL JOIN - mysql

I'm trying to get all the data from the match table, along with the currently signed up gamers of each type, experienced or not.
Gamers
(PK)Gamer_Id
Gamer_firstName,
Gamer_lastName,
Gamer experience(Y/N)
Gamer_matches
(PK)FK GamerId,
(PK)FK MatchId,
Gamer_score
Match
(PK)Match_Id,
ExperiencedGamers_needed,
InExperiencedGamers_needed
I've tried this query along with many others but it doesn't work, is it a bad join?
SELECT M.MatchId,M.ExperiencedGamers_needed,M.InExperiencedGamers_needed,
(SELECT COUNT(GM.GamerId)
FROM Gamers G, Gamers_matches GM
WHERE G.GamerId = GM.GamerId
AND G.experience = "Y"
AND GM.MatchId = M.MatchId
GROUP BY GM.MatchId)AS ExpertsSignedUp,
(SELECT COUNT(GM.GamerId)
FROM Gamers G, Gamers_matches GM
WHERE G.GamerId = GM.GamerId
AND G.experience = "N"
AND GM.MatchId = M.MatchId
GROUP BY GM.MatchId) AS NovicesSignedUp
FROM MATCHES M

What you've written is called a correlated subquery which forces SQL to re-execute the subquery for each row fetched from Matches. It can be made to work, but it's pretty inefficient. In some complex queries it may be necessary, but not in this case.
I would solve this query this way:
SELECT M.MatchId, M.ExperiencedGamers_needed,M.InExperiencedGamers_needed,
SUM(G.experience = 'Y') AS ExpertsSignedUp,
SUM(G.experience = 'N') AS NovicesSignedUp
FROM MATCHES M
LEFT OUTER JOIN (Gamer_matches GM
INNER JOIN Gamers G ON G.GamerId = GM.GamerId)
ON M.MatchId = GM.MatchId
GROUP BY M.MatchId;
Here it outputs only one row per Match because of the GROUP BY at the end.
There's no subquery to re-execute many times, it's just joining Matches to the respective rows in the other tables once. But I use an outer join in case a Match has zero players of eithe type signed up.
Then instead of using COUNT() I use a trick of MySQL and use SUM() with a boolean expression inside the SUM() function. Boolean expressions in MySQL always return 0 or 1. The SUM() of these is the same as the COUNT() where the expression returns true. This way I can get the "count" of both experts and novices only scanning the Gamers table once.
P.S. MySQL is working in a non-standard way to return 0 or 1 from a boolean expression. Standard ANSI SQL does not support this, nor do many other brands of RDBMS. Standardly, a boolean expression returns a boolean, not an integer.
But you can use a more verbose expression if you need to write standard SQL for portability:
SUM(CASE G.experience WHEN 'Y' THEN 1 WHEN 'N' THEN 0 END) AS ExpertsSignedUp

Related

Optimize derived table in select

I have sql query:
SELECT tsc.Id
FROM TEST.Services tsc,
(
select * from DICT.Change sp
) spc
where tsc.serviceId = spc.service_id
and tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
and tsc.startDate > GREATEST(spc.StartTime, spc.startDate)
group by tsc.Id;
This query is very, very slow.
Explain:
Can this be optimized? How to rewrite this subquery for another?
What is the point of this query? Why the CROSS JOIN operation? Why do we need to return multiple copies of id column from Services table? And what are we doing with the millions of rows being returned?
Absent a specification, an actual set of requirements for the resultset, we're just guessing at it.
To answer your questions:
Yes, the query could be "optimized" by rewriting it to the resultset that is actually required, and do it much more efficiently than the monstrously hideous SQL in the question.
Some suggestions: ditch the old-school comma syntax for the join operation, and use the JOIN keyword instead.
With no join predicates, it's a "cross" join. Every row matched from one side matched to every row from the right side.) I recommend including the CROSS keyword as an indication to future readers that the absence of an ON clause (or, join predicates in the WHERE clause) is intentional, and not an oversight.
I'd also avoid an inline view, unless there is a specific reason for one.
UPDATE
The query in the question is updated to include some predicates. Based on the updated query, I would write it like this:
SELECT tsc.id
FROM TEST.Services tsc
JOIN DICT.Change spc
ON tsc.serviceid = spc.service_id
AND tsc.startdate > spc.starttime
AND tsc.startdate > spc.starttdate
AND ( tsc.planid = spc.plan_id
OR ( tsc.planid IS NOT NULL AND spc.plan_id = -1 )
)
Ensure that the query is making use of suitable index by looking at the output of EXPLAIN to see the execution plan, in particular, which indexes are being used.
Some notes:
If there are multiple rows from spc that "match" a row from tsc, the query will return duplicate values of tsc.id. (It's not clear why or if we need to return duplicate values. IF we need to count the number of copies of each tsc,id, we could do that in the query, returning distinct values of tsc.id along with a count. If we don't need duplicates, we could return just a distinct list.
GREATEST function will return NULL if any of the arguments are null. If the condition we need is "a > GREATEST(b,c)", we can specify "a > b AND a > c".
Also, this condition:
tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
can be re-written to return an equivalent result (I'm suspicious about the actual specification, and whether this original condition actually satisfies that adequately. Without example data and sample of expected output, we have to rely on the SQL as the specification, so we honor that in the rewrite.)
If we don't need to return duplicate values of tsc.id, assuming id is unique in TEST.Services, we could also write
SELECT tsc.id
FROM TEST.Services tsc
WHERE EXISTS
( SELECT 1
FROM DICT.Change spc
ON spc.service_id = tsc.serviceid
AND spc.starttime < tsc.startdate
AND spc.starttdate < tsc.startdate
AND ( ( spc.plan_id = tsc.planid )
OR ( spc.plan_id = -1 AND tsc.planid IS NOT NULL )
)
)

INNER JOIN ON IF Statement?

In Access I have a query that uses an Inner Join with and IF statement. Can I use this in MySQL or not as it doesn't seem to work as intended, not sure if there's something missing or if MySQL doesn't allow the method?
SELECT Matrix.Model, Payment.Cost
FROM Payment
INNER JOIN Matrix ON if(Left(Matrix.Model,1)='S', Left(Matrix.Model,2), Left(Matrix.Model,1) = Matrix.PreFix)
GROUP BY Matrix.Model
I seem to get everything returned that doesn't start with an S as though the method isn't valid but not causing a major error.
This is your query (using table aliases to make it more readable):
SELECT m.Model, p.Cost
FROM Payment p INNER JOIN
Matrix m
ON if(Left(m.Model,1) = 'S', Left(m.Model, 2), Left(m.Model, 1) = m.PreFix)
GROUP BY m.Model;
Note that there are two possible returns from this statement: Left(m.Model, 2) and Left(m.Model, 1) = m.PreFix. This is a numeric context, so the then clause is interpreted as a number. Strings that start with non-numeric characters (such as S) are converted to 0, which is false.
I would write this without the if():
SELECT m.Model, p.Cost
FROM Payment p INNER JOIN
Matrix m
ON (m.model like 'S%' and m.Prefix = Left(m.Model, 2)) or
(m.model not like 'S%' and m.Prefix = Left(m.Model, 1))
GROUP BY m.Model;
Also, the column p.cost will be a value from an indeterminate row -- it is neither in the group by nor is it in an aggregation function. This construct would not work in any other database. It uses an extension to MySQL, whose use is generally discouraged unless you really understand what you are doing.

MySQL: Subquery returns more than 1 row

I know this has been asked plenty times before, but I cant find an answer that is close to mine.
I have the following query:
SELECT c.cases_ID, c.cases_status, c.cases_title, ci.custinfo_FName, ci.custinfo_LName, c.cases_timestamp, o.organisation_name
FROM db_cases c, db_custinfo ci, db_organisation o
WHERE c.userInfo_ID = ci.userinfo_ID AND c.cases_status = '2'
AND organisation_name = (
SELECT organisation_name
FROM db_sites s, db_cases c
WHERE organisation_ID = '111'
)
AND s.sites_site_ID = c.sites_site_ID)
What I am trying to do is is get the cases, where the sites_site_ID which is defined in the cases, also appears in the db_sites sites table alongside its organisation_ID which I want to filter by as defined by "organisation_ID = '111'" but I am getting the response from MySQL as stated in the question.
I hope this makes sense, and I would appreciate any help on this one.
Thanks.
As the error states your subquery returns more then one row which it cannot do in this situation. If this is not expect results you really should investigate why this occurs. But if you know this will happen and want only the first result use LIMIT 1 to limit the results to one row.
SELECT organisation_name
FROM db_sites s, db_cases c
WHERE organisation_ID = '111'
LIMIT 1
Well the problem is, obviously, that your subquery returns more than one row which is invalid when using it as a scalar subquery such as with the = operator in the WHERE clause.
Instead you could do an inner join on the subquery which would filter your results to only rows that matched the ON clause. This will get you all rows that match, even if there is more than one returned in the subquery.
UPDATE:
You're likely getting more than one row from your subquery because you're doing a cross join on the db_sites and db_cases table. You're using the old-style join syntax and then not qualifying any predicate to join the tables on in the WHERE clause. Using this old style of joining tables is not recommended for this very reason. It would be better if you explicitly stated what kind of join it was and how the tables should be joined.
Good pages on joins:
http://dev.mysql.com/doc/refman/5.0/en/join.html (for the right syntax)
http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html (for the differences between the types of joins)
I was battling this for an hour, and overcomplicated it completely. Sometimes a quick break and writing it out on an online forum can solve it for you ;)
Here is the query as it should be.
SELECT c.cases_ID, c.cases_status, c.cases_title, ci.custinfo_FName, ci.custinfo_LName, c.cases_timestamp, c.sites_site_ID
FROM db_cases c, db_custinfo ci, db_sites s
WHERE c.userInfo_ID = ci.userinfo_ID AND c.cases_status = '2' AND (s.organisation_ID = '111' AND s.sites_site_ID = c.sites_site_ID)
Let me re-write what you have post:
SELECT
c.cases_ID, c.cases_status, c.cases_title, ci.custinfo_FName, ci.custinfo_LName,
c.cases_timestamp, c.sites_site_ID
FROM
db_cases c
JOIN
db_custinfo ci ON c.userInfo_ID = ci.userinfo_ID and c.cases_status = '2'
JOIN
db_sites s ON s.sites_site_ID = c.sites_site_ID and s.organization_ID = 111

Smart counting of id in join mysql

Upon running this query, the COUNT function doesn't filter by the 'check_if_new_customer' flag. I read in this article: http://dev.mysql.com/tech-resources/articles/wizard/page4.html that SUM can be used instead of COUNT in some cases to get more accurate results, however when I try that, I get something very different, it seems to show massive doubling of numbers. I think this may be because I'm summing the UUID that is in the id field instead of counting at that point. Any suggestions on what I could put there to get a count of all of the existing customers vs the new customers?
SELECT
YEAR(so.date_entered),
so.technical_address_country,
so.technical_address_state,
COUNT(so.id) as all_sales,
COUNT(mf.id) as all_jobs,
SUM(so.total_value) as all_value,
COUNT(IF(so.check_if_new_customer=1,so.id,0)) as sales_order_new,
SUM(IF(so.check_if_new_customer = 1,so.total_value,0)) as total_value_new,
COUNT(IF(so.check_if_new_customer=1,mf.id,0)) as jobs_new,
COUNT(IF(so.check_if_new_customer=0,so.id,0)) as sales_order_existing,
SUM(IF(so.check_if_new_customer = 0,so.total_value,0)) as total_value_existing,
COUNT(IF(so.check_if_new_customer=0,mf.id,0)) as jobs_existing,
SUM(IF(so.check_if_new_customer=0,mf.id,0)) as jobs_existing_t
FROM
sugarcrm2.so_order so
LEFT JOIN
sugarcrm2.mf_job mf on so.id = mf.sales_order_id
WHERE
so.date_entered > "2011-10-30" AND
so.technical_address_country IS NOT NULL AND
so.technical_address_state IS NOT NULL AND
so.deleted = 0 AND
so.has_been_promoted = 1
GROUP BY
YEAR(so.date_entered),
so.technical_address_country,
so.technical_address_state
ORDER BY
so.technical_address_country, so.technical_address_state
If you want to use SUM() like COUNT() you will need to pass it either a 1 or 0 so that all of the 1's will sum up to your desired count. So in your example, if you want a sum of all the new jobs you would do this:
SUM(IF(so.check_if_new_customer=1,1,0)) as jobs_new
or if so.check_if_new_customer always returns a 1 or 0 you could alternatively do this:
SUM(so.check_if_new_customer) as jobs_new
COUNT() returns the number of records for which its argument, if specified, is non NULL. Since its argument in this case is the result of an IF() expression (which evaluates to some column's value if true and 0 if false), virtually every record will be counted irrespective of the test condition.
SUM(), as its name suggest, sums the values of its argument. In this case, it would sum the values of the referenced column whenever the test condition is true.
Apparently neither is what you're after, although your question is rather ambiguous as to what exactly you do want. At a guess, you might want something like:
SUM(so.check_if_new_customer)
I want to sum +1 for every time the joined table's id field is not null.
SUM(IF(so.check_if_new_customer=0 AND mf.id IS NOT NULL,1,0)) as jobs_existing

SQL Join only returning 1 row

Not quite sure what I'm missing, but my SQL statement is only returning one row.
SELECT
tl.*,
(tl.topic_total_rating/tl.topic_rates) as topic_rating,
COUNT(pl.post_id) - 1 as reply_count,
MIN(pl.post_time) AS topic_time,
MAX(pl.post_time) AS topic_bump
FROM topic_list tl
JOIN post_list pl
ON tl.topic_id=pl.post_parent
WHERE
tl.topic_board_link = %i
AND topic_hidden != 1
ORDER BY %s
I have two tables (post_list and topic_list), and post_list's post_parent links to a topic_list's topic_id.
Instead of returning all the topics (where their board's topic_board_link is n), it only returns one topic.
You would normally need a GROUP BY clause in there. MySQL has different rules from Standard SQL on the subject of when GROUP BY is needed. This is therefore closer to Standard SQL:
SELECT tl.*,
(tl.topic_total_rating/tl.topic_rates) AS topic_rating,
COUNT(pl.post_id) - 1 AS reply_count,
MIN(pl.post_time) AS topic_time,
MAX(pl.post_time) AS topic_bump
FROM topic_list AS tl
JOIN post_list AS pl ON tl.topic_id = pl.post_parent
WHERE tl.topic_board_link = ? -- %i
AND tl.topic_hidden != 1
GROUP BY tl.col1, ..., topic_rating
ORDER BY ? -- %s
In Standard SQL, you would have to list every column in topic_list, plus the non-aggregate value topic_rating (and you might have to list the expression rather than the display label or column alias in the select list).
You also have a restriction condition on 'topic_board_link' which might be limiting your result set to one group. You cannot normally use a placeholder in the ORDER BY clause, either.