I have a script of the following structure:
SELECT SUM(CASE WHEN pf.info IS NOT NULL THEN 1 ELSE 0 END)
FROM summary s
LEFT JOIN (SELECT id, info FROM items GROUP BY id) pf ON s.id=pf.id
GROUP BY s.date
What I want is to count those id's which are in 'summary' and present in 'items'. 'items' have same id's repeated several times, that's why I do GROUP BY.
This script works as I want, but it is extremely slow, much slower than just doing straightforward LEFT JOIN (and counting each id several times). This doesn't seem to make sense since I need a smaller subspace of that and it should be easier.
So the question is: how to restructure the query to make it quicker?
Use count(distinct ...):
SELECT count(distinct s.id)
FROM summary s
JOIN items i ON s.id = i.id
I don't understand why you are grouping by s.date - there's no clue in your question as to why, so if it's not a mistake and you need to group by date, use this:
SELECT s.date, count(distinct s.id)
FROM summary s
JOIN items i ON s.id = i.id
GROUP BY s.date
Related
The query below is grabbing some information about a category of toys and showing the most recent sale price for three levels of condition (e.g., Brand New, Used, Refurbished). The price for each sale is almost always different. One other thing - the sales table row id's are not necessarily in chronological order, e.g., a toy with a sale id of 5 could have happened later than a toy with a sale id of 10).
This query works but is not performant. It runs in a manageable amount of time, usually about 1s. However, I need to add yet another left join to include some more data, which causes the query time to balloon up to about 9s, no bueno.
Here is the working but nonperformant query:
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN (
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
) AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
But like I said it's slow. The sales table has about 200k rows.
What I tried to do was create the subquery as a view, e.g.,
CREATE VIEW sales_view AS
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
Then replace the subquery with the view, like
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN sales_view AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
Unfortunately, this change causes the query to no longer grab the most recent sale, and the sales price it returns is no longer the most recent.
Why is it that the table view doesn't return the same result as the same select as a subquery?
After reading just about every top-n-per-group stackoverflow question and blog article I could find, getting a query that actually worked was fantastic. But now that I need to extend the query one more step I'm running into performance issues. If anybody wants to sidestep the above question and offer some ways to optimize the original query, I'm all ears!
Thanks for any and all help.
The solution to the subquery performance issue was to use the answer provided here: Groupwise maximum
I thought that this approach could only be used when querying a single table, but indeed it works even when you've joined many other tables. You just have to left join the same table twice using the s.date_sold < s2.date_sold join condition and make sure the where clause looks for the null value in the second table's id column.
I'm trying to get the number of rows of two different tables with two LEFT JOIN in a MySQL query. It works well when I have a COUNT on one table like this :
SELECT a.title, a.image, COUNT(o.id) AS occasions
FROM activity a
LEFT JOIN occasion AS o ON a.id = o.activity_id
WHERE a.user_id = 1
GROUP BY a.id
ORDER BY a.created_at
DESC LIMIT 50
Here, everything works and I get the good number of "occasions".
But when I try to add an additional COUNT with an additional LEFT JOIN, the result of the second COUNT is wrong :
SELECT a.title, a.image, COUNT(o.id) AS occasions, COUNT(au.id) AS users
FROM activity a
LEFT JOIN occasion AS o ON a.id = o.activity_id
LEFT JOIN activity_user AS au ON a.id = au.activity_id
WHERE a.user_id = 4
GROUP BY a.id
ORDER BY a.created_at
DESC LIMIT 50
Here, I get the good number of "occasions" but "users" seems to be a copy of the "occasions" count, which is wrong.
So my question is, how to fix this query to have the two COUNT working together?
COUNT() counts non-NULL values. The simple way to fix your query is to use COUNT(DISTINCT):
SELECT a.title, a.image,
COUNT(DISTINCT o.id) AS occasions, COUNT(DISTINCT au.id) AS users
. . .
And this will probably work. However, it creates an intermediate table that is the Cartesian product of the two tables (for each title). That could grow very big. The more scalable solution is to use subqueries and aggregate before joining.
The used left join for activity user limits your result because the DB is not able to found related data. But when you use LEFT OUTER JOIN the it should return all expected rows and their count.
I'm finding trouble finding a similar example to what I'm trying to achieve. I have 3 tables. From one table I want to get the linking ID number. From another table I want to find the same ID's and add up another column of numbers in that table where the ID number from the 1st table matches. Then on the 3rd table, which is text, I want to group all the text together where the ID matches the main ID number... and return all this in 1 go. My diagram should show what I mean:
So have 2 queries that will on their own return part the results, but Im struggling to build it into 1 single query.
SELECT ticket_charges.ticket_id
, sum(ticket_charges.charge_time) AS Seconds
FROM
ticket_charges
LEFT OUTER JOIN tickets
ON ticket_charges.ticket_id = tickets.id
GROUP BY
ticket_charges.ticket_id
, tickets.id
The 77 and 937 for ticket ID 3 have been added up correctly!!
SELECT tickets.id AS `Ticket Number`
, left(tickets_messages.message, 500) AS `Ticket Message`
FROM
tickets
INNER JOIN tickets_messages
ON tickets.id = tickets_messages.id
GROUP BY
tickets_messages.ticket_id
, tickets.id
The messages are joined together correctly.
I've tried some concatenation on messages, selects within selects, different methods to group by, a couple of sums etc.. but just can't seem to get a result where by the I'm getting the results back correctly with both queries as 1 single query. Either the joined numbers from "charge_time" are very wrong and don't match any resemblance to anything or I end up with hundreds of "message" and strange numbers on the "charge_time"
FYI.. If I try this, I get "Sub query returned more than 1 row" but it's what I thought I should be doing.
SELECT ticket_charges.ticket_id
, sum(ticket_charges.charge_time) AS Seconds
FROM
ticket_charges
LEFT OUTER JOIN tickets
ON ticket_charges.ticket_id = tickets.id
Where (SELECT left(tickets_messages.message, 500)
FROM
tickets
INNER JOIN tickets_messages
ON tickets.id = tickets_messages.id
GROUP BY
tickets.id)
GROUP BY
ticket_charges.ticket_id
, tickets.id
If you really need to do that with a single query, the solution is to do a subquery in one of the jointures.
SELECT t.id, t.person_id, SUM(tc.charge_time), mc.concat
FROM tickets t
INNER JOIN tickets_charges tc ON tc.ticket_id = t.id
INNER JOIN (
SELECT ticket_id, GROUP_CONCAT(message SEPARATOR ' ') as concat
FROM tickets_messages
GROUP BY ticket_id) AS mc
ON mc.ticket_id = t.id
GROUP BY t.id
Try this query -
SELECT
t.id,
t.person_id,
SUM(tc.charge_time) Seconds,
GROUP_CONCAT(LEFT(tm.message, 20)) Message
FROM
tickets t
LEFT JOIN ticket_charges ts
ON ts.ticket_id = t.id
LEFT JOIN tickets_messages tm
ON tm.ticket_id = t.id
GROUP BY
t.id;
Note, that I used 'LEFT(tm.message, 20)', because GROUP_CONCAT function has length limitation - group_concat_max_len.
I have these tables and queries as defined in sqlfiddle.
First my problem was to group people showing LEFT JOINed visits rows with the newest year. That I solved using subquery.
Now my problem is that that subquery is not using INDEX defined on visits table. That is causing my query to run nearly indefinitely on tables with approx 15000 rows each.
Here's the query. The goal is to list every person once with his newest (by year) record in visits table.
Unfortunately on large tables it gets real sloooow because it's not using INDEX in subquery.
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id
Does anyone know how to force MySQL to use INDEX already defined on visits table?
Your query:
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id;
First, is using non-standard SQL syntax (items appear in the SELECT list that are not part of the GROUP BY clause, are not aggregate functions and do not sepend on the grouping items). This can give indeterminate (semi-random) results.
Second, ( to avoid the indeterminate results) you have added an ORDER BY inside a subquery which (non-standard or not) is not documented anywhere in MySQL documentation that it should work as expected. So, it may be working now but it may not work in the not so distant future, when you upgrade to MySQL version X (where the optimizer will be clever enough to understand that ORDER BY inside a derived table is redundant and can be eliminated).
Try using this query:
SELECT
p.*, v.*
FROM
people AS p
LEFT JOIN
( SELECT
id_people
, MAX(year) AS year
FROM
visits
GROUP BY
id_people
) AS vm
JOIN
visits AS v
ON v.id_people = vm.id_people
AND v.year = vm.year
ON v.id_people = p.id;
The: SQL-fiddle
A compound index on (id_people, year) would help efficiency.
A different approach. It works fine if you limit the persons to a sensible limit (say 30) first and then join to the visits table:
SELECT
p.*, v.*
FROM
( SELECT *
FROM people
ORDER BY name
LIMIT 30
) AS p
LEFT JOIN
visits AS v
ON v.id_people = p.id
AND v.year =
( SELECT
year
FROM
visits
WHERE
id_people = p.id
ORDER BY
year DESC
LIMIT 1
)
ORDER BY name ;
Why do you have a subquery when all you need is a table name for joining?
It is also not obvious to me why your query has a GROUP BY clause in it. GROUP BY is ordinarily used with aggregate functions like MAX or COUNT, but you don't have those.
How about this? It may solve your problem.
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
If you need to show the person, the most recent visit, and the note from the most recent visit, you're going to have to explicitly join the visits table again to the summary query (virtual table) like so.
SELECT a.id, a.name, a.year, v.note
FROM (
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
)a
JOIN visits v ON (a.id = v.id_people and a.year = v.year)
Go fiddle: http://www.sqlfiddle.com/#!2/d67fc/20/0
If you need to show something for people that have never had a visit, you should try switching the JOIN items in my statement with LEFT JOIN.
As someone else wrote, an ORDER BY clause in a subquery is not standard, and generates unpredictable results. In your case it baffled the optimizer.
Edit: GROUP BY is a big hammer. Don't use it unless you need it. And, don't use it unless you use an aggregate function in the query.
Notice that if you have more than one row in visits for a person and the most recent year, this query will generate multiple rows for that person, one for each visit in that year. If you want just one row per person, and you DON'T need the note for the visit, then the first query will do the trick. If you have more than one visit for a person in a year, and you only need the latest one, you have to identify which row IS the latest one. Usually it will be the one with the highest ID number, but only you know that for sure. I added another person to your fiddle with that situation. http://www.sqlfiddle.com/#!2/4f644/2/0
This is complicated. But: if your visits.id numbers are automatically assigned and they are always in time order, you can simply report the highest visit id, and be guaranteed that you'll have the latest year. This will be a very efficient query.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT id_people, max(id) id
FROM visits
GROUP BY id_people
)m
JOIN people p ON (p.id = m.id_people)
JOIN visits v ON (m.id = v.id)
http://www.sqlfiddle.com/#!2/4f644/1/0 But this is not the way your example is set up. So you need another way to disambiguate your latest visit, so you just get one row per person. The only trick we have at our disposal is to use the largest id number.
So, we need to get a list of the visit.id numbers that are the latest ones, by this definition, from your tables. This query does that, with a MAX(year)...GROUP BY(id_people) nested inside a MAX(id)...GROUP BY(id_people) query.
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON (p.id_people = v.id_people AND p.year = v.year)
GROUP BY v.id_people
The overall query (http://www.sqlfiddle.com/#!2/c2da2/1/0) is this.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON ( p.id_people = v.id_people
AND p.year = v.year)
GROUP BY v.id_people
)m
JOIN people p ON (m.id_people = p.id)
JOIN visits v ON (m.id = v.id)
Disambiguation in SQL is a tricky business to learn, because it takes some time to wrap your head around the idea that there's no inherent order to rows in a DBMS.
I have three tables that are joined. I almost have the solution but there seems to be one small problem going on here. Here is statement:
SELECT items.item,
COUNT(ratings.item_id) AS total,
COUNT(comments.item_id) AS comments,
AVG(ratings.rating) AS rate
FROM `items`
LEFT JOIN ratings ON (ratings.item_id = items.items_id)
LEFT JOIN comments ON (comments.item_id = items.items_id)
WHERE items.cat_id = '{$cat_id}' AND items.spam < 5
GROUP BY items_id ORDER BY TRIM(LEADING 'The ' FROM items.item) ASC;");
I have a table called items, each item has an id called items_id (notice it's plural). I have a table of individual user comments for each item, and one for ratings for each item. (The last two have a corresponding column called 'item_id').
I simply want to count comments and ratings total (per item) separately. With the way my SQL statement is above, they are a total.
note, total is the total of ratings. It's a bad naming scheme I need to fix!
UPDATE: 'total' seems to count ok, but when I add a comment to 'comments' table, the COUNT function affects both 'comments' and 'total' and seems to equal the combined output.
Problem is you're counting results of all 3 tables joined. Try:
SELECT i.item,
r.ratetotal AS total,
c.commtotal AS comments,
r.rateav AS rate
FROM items AS i
LEFT JOIN
(SELECT item_id,
COUNT(item_id) AS ratetotal,
AVG(rating) AS rateav
FROM ratings GROUP BY item_id) AS r
ON r.item_id = i.items_id
LEFT JOIN
(SELECT item_id,
COUNT(item_id) AS commtotal
FROM comments GROUP BY item_id) AS c
ON c.item_id = i.items_id
WHERE i.cat_id = '{$cat_id}' AND i.spam < 5
ORDER BY TRIM(LEADING 'The ' FROM i.item) ASC;");
In this query, we make the subqueries do the counting properly, then send that value to the main query and filter the results.
I'm guessing this is a cardinality issue. Try COUNT(distinct comments.item_id)