MySQL GROUP BY and SORT BY with JOINS - mysql

I got 3 tables:
items (item_id, timestamp)
items_terms (item_id, term_id)
terms (term_id, term_name)
I need to find 5 most recent terms (term_id, term_name) based on item timestamp. I was trying to solve it like this:
SELECT t.term_id, t.term_name
FROM items i
INNER JOIN items_terms it USING(item_id)
INNER JOIN terms t USING (term_id)
GROUP BY t.term_id
ORDER BY i.timestamp DESC
LIMIT 5
But the problem is that MySQL will group items first (it will take the first term_id) and disregard ORDER BY..
I was also thinking about filtering on PHP side by removing GROUP BY and selecting more than 5 items, but this query needs to support pagination without duplicates on consecutive pages.
Will be glad to see any suggestions.

How about including the timestamp in the select statement:
SELECT t.term_id, t.term_name, MAX(i.timestamp)
FROM items i
INNER JOIN items_terms it USING(item_id)
INNER JOIN terms t USING (term_id)
GROUP BY t.term_id, t.term_name
ORDER BY MAX(i.timestamp) DESC
LIMIT 5

I would suggest reading this article as in MySQL there are several techniques to limit rows from groups in GROUP BY select and few might suit your needs. Generally using HAVING directive with query "global" variables should be preferred as it operates on already grouped result set which positively affects performance.
EDIT: Solution would be:
SELECT DISTINCT
t.term_id,
t.term_name
FROM
items i
INNER JOIN items_terms it USING(item_id)
INNER JOIN terms t USING (term_id)
ORDER BY
i.timestamp DESC
LIMIT 5

SELECT DISTINCT term_id,
DISTINCT term_name
FROM(select t.term_id, t.term_name,i.timestamp FROM items i INNER JOIN items_terms it
on i.item_id=it.item_id
INNER JOIN terms t
on it.term_id=t.term_id
GROUP BY t.term_id
)
ORDER BY i.timestamp DESC
LIMIT 5

Related

How do I keep the order of my inner join in SQL?

I have the following command:
SELECT * FROM Posts P
INNER JOIN (SELECT DISTINCT ThreadId FROM Posts ORDER BY Time DESC) R
ON P.Id = R.ThreadId;
This command selects threads who contain the newest replies. Unfortunately the order of the threads seems to be random. I want the threads to be ordered by the newest replies. In other words: I want my selection to keep the order which I used inside my inner join.
How can I achieve this?
For MySql 8.0+ you can use MAX() window function in the ORDER BY clause:
SELECT *
FROM Posts
ORDER BY MAX(Time) OVER (PARTITION BY ThreadId)
For prior versions use a correlated subquery:
SELECT p1.*
FROM Posts p1
ORDER BY (SELECT MAX(p2.Time) FROM Posts p2 WHERE p2.ThreadId = p1.ThreadId)
You may also want to add as a 2nd argument in the ORDER BY clause , Time DESC.
Your join needs to group and select the last post per thread. The order needs to go on the outside query (not the subquery).
SELECT *
FROM Threads AS t
LEFT JOIN (
SELECT ThreadId, MAX(Time) AS LastPost
FROM Posts
GROUP BY ThreadId
) AS r ON r.ThreadId = t.ThreadId
ORDER BY LastPost DESC
You can use INNER JOIN instead of LEFT JOIN if you want to exclude threads that have no posts (if that is even possible).
you could change the order of your query like this
SELECT ...
FROM BrandsProducts
INNER JOIN Brands ON BrandsProducts.brandid = BrandsProducts.brandid
WHERE ...
ORDER BY ...

Creating a join where I pull a count from another table

I have a table with real estate agent's info and want to pull firstname, fullname, and email from rets_agents.
I want to then get a count of all of their sales from a different table called rets_property_res_mstr.
I created a query that doesn't work yet so I need some help.
SELECT r.firstname, r.fullname, r.email
from rets_agents r
LEFT JOIN rets_property_res_mstr
ON r.email = rets_property_res_mstr.ListAgentEmail
LIMIT 10;

I'm not sure how to get the count in this.
You seem to be looking for aggregation:
SELECT a.firstname, a.fullname, a.email, COUNT(p.ListAgentEmail) cnt
FROM rets_agents a
LEFT JOIN rets_property_res_mstr p ON r.email = p.ListAgentEmail
GROUP BY a.firstname, a.fullname, a.email
ORDER BY ?
LIMIT 10;
Note that, for a LIMIT clause to really make sense, you need a ORDER BY clause so you get a deterministic results (otherwise, it is undefined which records will be shown) - I added that to your query with a question mark that you should replace with the relevant column(s).
I would consider using a CTE for this:
WITH sales as (
SELECT ListAgentEmail, count(*) count_of_sales
FROM rets_property_res_mstr
GROUP BY ListAgentEmail
)
SELECT r.firstname, r.fullname, r.email, count_of_sales
from rets_agents r
LEFT JOIN sales
ON r.email = sales.ListAgentEmail
LIMIT 10;

Multiple COUNT() in JOIN

I'm trying to get the number of rows of two different tables with two LEFT JOIN in a MySQL query. It works well when I have a COUNT on one table like this :
SELECT a.title, a.image, COUNT(o.id) AS occasions
FROM activity a
LEFT JOIN occasion AS o ON a.id = o.activity_id
WHERE a.user_id = 1
GROUP BY a.id
ORDER BY a.created_at
DESC LIMIT 50
Here, everything works and I get the good number of "occasions".
But when I try to add an additional COUNT with an additional LEFT JOIN, the result of the second COUNT is wrong :
SELECT a.title, a.image, COUNT(o.id) AS occasions, COUNT(au.id) AS users
FROM activity a
LEFT JOIN occasion AS o ON a.id = o.activity_id
LEFT JOIN activity_user AS au ON a.id = au.activity_id
WHERE a.user_id = 4
GROUP BY a.id
ORDER BY a.created_at
DESC LIMIT 50
Here, I get the good number of "occasions" but "users" seems to be a copy of the "occasions" count, which is wrong.
So my question is, how to fix this query to have the two COUNT working together?
COUNT() counts non-NULL values. The simple way to fix your query is to use COUNT(DISTINCT):
SELECT a.title, a.image,
COUNT(DISTINCT o.id) AS occasions, COUNT(DISTINCT au.id) AS users
. . .
And this will probably work. However, it creates an intermediate table that is the Cartesian product of the two tables (for each title). That could grow very big. The more scalable solution is to use subqueries and aggregate before joining.
The used left join for activity user limits your result because the DB is not able to found related data. But when you use LEFT OUTER JOIN the it should return all expected rows and their count.

MySQL nested SELECT too slow

I have a script of the following structure:
SELECT SUM(CASE WHEN pf.info IS NOT NULL THEN 1 ELSE 0 END)
FROM summary s
LEFT JOIN (SELECT id, info FROM items GROUP BY id) pf ON s.id=pf.id
GROUP BY s.date
What I want is to count those id's which are in 'summary' and present in 'items'. 'items' have same id's repeated several times, that's why I do GROUP BY.
This script works as I want, but it is extremely slow, much slower than just doing straightforward LEFT JOIN (and counting each id several times). This doesn't seem to make sense since I need a smaller subspace of that and it should be easier.
So the question is: how to restructure the query to make it quicker?
Use count(distinct ...):
SELECT count(distinct s.id)
FROM summary s
JOIN items i ON s.id = i.id
I don't understand why you are grouping by s.date - there's no clue in your question as to why, so if it's not a mistake and you need to group by date, use this:
SELECT s.date, count(distinct s.id)
FROM summary s
JOIN items i ON s.id = i.id
GROUP BY s.date

MySQL is not using INDEX in subquery

I have these tables and queries as defined in sqlfiddle.
First my problem was to group people showing LEFT JOINed visits rows with the newest year. That I solved using subquery.
Now my problem is that that subquery is not using INDEX defined on visits table. That is causing my query to run nearly indefinitely on tables with approx 15000 rows each.
Here's the query. The goal is to list every person once with his newest (by year) record in visits table.
Unfortunately on large tables it gets real sloooow because it's not using INDEX in subquery.
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id
Does anyone know how to force MySQL to use INDEX already defined on visits table?
Your query:
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id;
First, is using non-standard SQL syntax (items appear in the SELECT list that are not part of the GROUP BY clause, are not aggregate functions and do not sepend on the grouping items). This can give indeterminate (semi-random) results.
Second, ( to avoid the indeterminate results) you have added an ORDER BY inside a subquery which (non-standard or not) is not documented anywhere in MySQL documentation that it should work as expected. So, it may be working now but it may not work in the not so distant future, when you upgrade to MySQL version X (where the optimizer will be clever enough to understand that ORDER BY inside a derived table is redundant and can be eliminated).
Try using this query:
SELECT
p.*, v.*
FROM
people AS p
LEFT JOIN
( SELECT
id_people
, MAX(year) AS year
FROM
visits
GROUP BY
id_people
) AS vm
JOIN
visits AS v
ON v.id_people = vm.id_people
AND v.year = vm.year
ON v.id_people = p.id;
The: SQL-fiddle
A compound index on (id_people, year) would help efficiency.
A different approach. It works fine if you limit the persons to a sensible limit (say 30) first and then join to the visits table:
SELECT
p.*, v.*
FROM
( SELECT *
FROM people
ORDER BY name
LIMIT 30
) AS p
LEFT JOIN
visits AS v
ON v.id_people = p.id
AND v.year =
( SELECT
year
FROM
visits
WHERE
id_people = p.id
ORDER BY
year DESC
LIMIT 1
)
ORDER BY name ;
Why do you have a subquery when all you need is a table name for joining?
It is also not obvious to me why your query has a GROUP BY clause in it. GROUP BY is ordinarily used with aggregate functions like MAX or COUNT, but you don't have those.
How about this? It may solve your problem.
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
If you need to show the person, the most recent visit, and the note from the most recent visit, you're going to have to explicitly join the visits table again to the summary query (virtual table) like so.
SELECT a.id, a.name, a.year, v.note
FROM (
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
)a
JOIN visits v ON (a.id = v.id_people and a.year = v.year)
Go fiddle: http://www.sqlfiddle.com/#!2/d67fc/20/0
If you need to show something for people that have never had a visit, you should try switching the JOIN items in my statement with LEFT JOIN.
As someone else wrote, an ORDER BY clause in a subquery is not standard, and generates unpredictable results. In your case it baffled the optimizer.
Edit: GROUP BY is a big hammer. Don't use it unless you need it. And, don't use it unless you use an aggregate function in the query.
Notice that if you have more than one row in visits for a person and the most recent year, this query will generate multiple rows for that person, one for each visit in that year. If you want just one row per person, and you DON'T need the note for the visit, then the first query will do the trick. If you have more than one visit for a person in a year, and you only need the latest one, you have to identify which row IS the latest one. Usually it will be the one with the highest ID number, but only you know that for sure. I added another person to your fiddle with that situation. http://www.sqlfiddle.com/#!2/4f644/2/0
This is complicated. But: if your visits.id numbers are automatically assigned and they are always in time order, you can simply report the highest visit id, and be guaranteed that you'll have the latest year. This will be a very efficient query.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT id_people, max(id) id
FROM visits
GROUP BY id_people
)m
JOIN people p ON (p.id = m.id_people)
JOIN visits v ON (m.id = v.id)
http://www.sqlfiddle.com/#!2/4f644/1/0 But this is not the way your example is set up. So you need another way to disambiguate your latest visit, so you just get one row per person. The only trick we have at our disposal is to use the largest id number.
So, we need to get a list of the visit.id numbers that are the latest ones, by this definition, from your tables. This query does that, with a MAX(year)...GROUP BY(id_people) nested inside a MAX(id)...GROUP BY(id_people) query.
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON (p.id_people = v.id_people AND p.year = v.year)
GROUP BY v.id_people
The overall query (http://www.sqlfiddle.com/#!2/c2da2/1/0) is this.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON ( p.id_people = v.id_people
AND p.year = v.year)
GROUP BY v.id_people
)m
JOIN people p ON (m.id_people = p.id)
JOIN visits v ON (m.id = v.id)
Disambiguation in SQL is a tricky business to learn, because it takes some time to wrap your head around the idea that there's no inherent order to rows in a DBMS.