My Tables look like:
# Table user
user_id PK
...
# Table buy
buy_id PK
user_id FK
...
# Table offert
offert_id
user_id
...
Well i need to know the last 'buy' of 1 'user' and get the count of 'offert' this 'user' has, I tried something like:
select b.buy_id,count(distinct c.offert_id) as cv from user a
inner join buy b using(user_id) left join offert c using(user_id) where a.user_id=4
group by a.user_id order by b.buy_id desc
but it always returns the first 'buy' not the last, look like this order by doesn't make any effect
I know that i can do it with sub queries but i would like know if is there a way to do it whout use sub queries, maybe using max functions but idk how to do it.
thanks.
Your approach is simply not guaranteed to work. One big reason is that the group by is processed before the order by.
Assuming that you mean the biggest buy_id for each user, you can do this as:
select u.user_id, u.last_buy_id, count(distinct o.offert_id)
from (select u.*,
(select buy_id from buy b where u.user_id = u.user_id order by buy_id desc limit 1
) last_buy_id
from user u
) left outer join
offert o
on o.user_id = u.user_id
group by u.user_id;
The first subquery uses a correlated subquery to get the last buy id for each user. It then joins in offert and does the aggregation. Note that this version includes the user_id in the aggregation.
Related
I want to get customer data from all the businesses with more than 1 user.
For this I think I need a subquery to count more than 1 user and then the outer query to give me their emails.
I have tried subqueries in the WHERE and HAVING clause
SELECT u.mail
FROM users u
WHERE count IN (
SELECT count (u.id_business)
FROM businesses b
INNER JOIN users u ON b.id = u.id_business
GROUP BY b.id, u.id_business
HAVING COUNT (u.id_business) >= 2
)
I believe that you do not need a subquery, everything can be achieved in a joined aggregate query with a HAVING clause, like :
SELECT u.mail
FROM users u
INNER JOIN businesses b on b.id = u.id_business
GROUP BY u.id, u.email
HAVING COUNT (*) >= 2
NB : in case several users may have the same email, I have added the primary key of users to the GROUP BY clause (I assumed that the pk is called id) : you may remove this if email is a unique field in users.
I'm currently outputting all of my members by adding the MySQL clause ORDER BY id DESC, but I feel that doesn't reward people that are active on my service.
I thought about judging the order by the amount of entries in another table they have under their ID.
Essentially, I'm asking if it's possible to order columns in a MAIN table counting the amount of rows where the users ID is in the column of the row.
Something pseudo to this
SELECT user_id,name,etc FROM users ORDER BY (
COUNT(SELECT FROM users_interactions WHERE user_id = user_id) *******
) ASC
In the end of the COUNT statement, the user_id = user_id was just a guess.
You are almost there - what you need to do is to put COUNT inside SELECT:
SELECT user_id,name,etc FROM users u ORDER BY (
SELECT COUNT(*)
FROM users_interactions i
WHERE i.user_id = u.user_id
) ASC
You could also do it using a JOIN, like this:
SELECT u.user_id, u.name, u.etc
FROM users u
LEFT OUTER JOIN users_interactions i ON i.user_id = u.user_id
GROUP BY u.user_id, u.name, u.etc
ORDER BY COUNT(*) ASC
I'm trying to get a subset of data based on the latest id and dates. It seems that when selecting other fields in the table they are not in sync with the max id and dates returned.
Any idea how I can fix this?
MySQL:
SELECT MAX(m.id) as id, m.sender_id, m.receiver_id, MAX(m.date) as date, m.content, l.username, p.gender
FROM messages m
LEFT JOIN login_users l on l.user_id = m.sender_id
LEFT JOIN profiles p ON p.user_id = l.user_id
WHERE m.receiver_id=3
GROUP BY m.sender_id ORDER BY date DESC LIMIT 0, 7
The data for content isn't the correct one. It seems to be returning random content and not the content that is tied to the row for max id and max date.
Do I need to do some sort of sub select to fix this?
To answer the question in the title, "Why doesn't my content field match my MAX(id) field", that's because there is no guarantee that the values returned for the non-aggregate fields will be from the row where the MAX value is found. This is the documented behavior, and this is what we expect.
Other DBMS would throw an error on the statement, MySQL is just more lax, and you are getting values from one row, but it's not guaranteed to be the row that either of the MAX values (id or date) is found on.
You have two separate aggregate expression MAX(m.id) and MAX(m.date). Note that there is no guarantee that those values will come from the same row.
The rule in other databases is that every non-aggregate expression in the SELECT list needs to appear in the GROUP BY. (MySQL is more lax about that, and doesn't make that a requirement.)
One way to "fix" the query so that it does return values from the row with the MAX value is to use an inline view (query) that gets the MAX(id) grouped by what you want to GROUP BY, and then a JOIN back to the original table to get other values on the row.
From your statement it's not clear what result set you want returned. If you want the row that has the maximum id and you also want the row with maximum date, then you could something like this:
SELECT m.id
, m.sender_id
, m.receiver_id
, m.date
, m.content
, l.username
, p.gender
FROM ( SELECT t.sender_id
, t.receiver_id
, MAX(t.id) AS max_id
, MAX(t.date) AS max_date
FROM messages t
WHERE t.receiver_id=3
GROUP
BY t.sender_id
, t.receiver_id
) s
JOIN messages m
ON m.sender_id = s.sender_id
AND m.receiver_id = s.receiver_id
AND ( m.id = s.max_id OR m.date = s.max_date)
LEFT
JOIN login_users l on l.user_id = m.sender_id
LEFT
JOIN profiles p ON p.user_id = l.user_id
ORDER BY m.date DESC LIMIT 0, 7
The inline view aliased as "s" returns the max values, and then that gets joined back to the messages table, aliased as "m".
NOTE
In most cases, we find that a JOIN (query) will perform better than an IN (query), because of the different access plans. You can see the difference in plans with an EXPLAIN.
For performance, you'll want an index
... ON messages (`receiver_id`, `sender_id`, `id`, `date`)
There's an equality predicate on receiver_id, so that should be the leading column, to get a range scan (instead of a full scan). You want the sender_id column next, because that should allow MySQL to avoid a "Using filesort" operation to get the rows grouped. The id and date columns are included, so that the inline view query can be satisfied entirely from the index pages without a need to access the pages in the table. (The EXPLAIN should show "Using where; Using index".)
That same index should also suitable for the outer query, though it does need to access the "content" column from the table pages, so the EXPLAIN will not show "Using index" for that step. (It's likely that the "content" column is much longer than we would want in the index.)
Using a join
SELECT LatestM.id, m.sender_id, m.receiver_id, m.date, m.content, l.username, p.gender
(
SELECT sender_id, MAX(id) AS id
FROM messages
WHERE receiver_id=3
GROUP BY sender_id
) LatestM
INNER JOIN messages m
ON LatestM.sender_id = m.sender_id AND LatestM.id = m.id
LEFT JOIN login_users l on l.user_id = m.sender_id
LEFT JOIN profiles p ON p.user_id = l.user_id
WHERE m.receiver_id = 3
ORDER BY date DESC
LIMIT 0, 7
Problem with this is that if the latest id does not reflect the latest date then the date returned will not be the latest one.
Well, you could probably solve it without a subselect, but doing one is fairly straight forward. Something like this should work, just make the subselect return the id's of the interesting rows in messages, and get the data for only them.
SELECT m.id as id, m.sender_id, m.receiver_id, m.date as date,
m.content, l.username, p.gender
FROM messages m
LEFT JOIN login_users l on l.user_id = m.sender_id
LEFT JOIN profiles p ON p.user_id = l.user_id
WHERE m.id IN (
SELECT max(id) FROM messages
WHERE receiver_id=3
GROUP BY sender_id
)
ORDER BY date DESC
LIMIT 0, 7
The reason that your original query does not match up fields is that GROUP BY really requires aggregate functions (like MAX/MIN/SUM/...) applied to every field you select that's not grouped by. The reason the query even runs is that MySQL does not enforce that, but instead returns indeterminate fields from any row that is matching. Afaik, all other SQL RDBMS' refuse to run the query.
EDIT: As for performance, a few indexes that are likely to help are;
CREATE INDEX ix_inner ON messages(receiver_id, sender_id, id);
CREATE INDEX ix_login_users ON login_users(user_id);
CREATE INDEX ix_profiles ON profiles(user_id);
I have these tables and queries as defined in sqlfiddle.
First my problem was to group people showing LEFT JOINed visits rows with the newest year. That I solved using subquery.
Now my problem is that that subquery is not using INDEX defined on visits table. That is causing my query to run nearly indefinitely on tables with approx 15000 rows each.
Here's the query. The goal is to list every person once with his newest (by year) record in visits table.
Unfortunately on large tables it gets real sloooow because it's not using INDEX in subquery.
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id
Does anyone know how to force MySQL to use INDEX already defined on visits table?
Your query:
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id;
First, is using non-standard SQL syntax (items appear in the SELECT list that are not part of the GROUP BY clause, are not aggregate functions and do not sepend on the grouping items). This can give indeterminate (semi-random) results.
Second, ( to avoid the indeterminate results) you have added an ORDER BY inside a subquery which (non-standard or not) is not documented anywhere in MySQL documentation that it should work as expected. So, it may be working now but it may not work in the not so distant future, when you upgrade to MySQL version X (where the optimizer will be clever enough to understand that ORDER BY inside a derived table is redundant and can be eliminated).
Try using this query:
SELECT
p.*, v.*
FROM
people AS p
LEFT JOIN
( SELECT
id_people
, MAX(year) AS year
FROM
visits
GROUP BY
id_people
) AS vm
JOIN
visits AS v
ON v.id_people = vm.id_people
AND v.year = vm.year
ON v.id_people = p.id;
The: SQL-fiddle
A compound index on (id_people, year) would help efficiency.
A different approach. It works fine if you limit the persons to a sensible limit (say 30) first and then join to the visits table:
SELECT
p.*, v.*
FROM
( SELECT *
FROM people
ORDER BY name
LIMIT 30
) AS p
LEFT JOIN
visits AS v
ON v.id_people = p.id
AND v.year =
( SELECT
year
FROM
visits
WHERE
id_people = p.id
ORDER BY
year DESC
LIMIT 1
)
ORDER BY name ;
Why do you have a subquery when all you need is a table name for joining?
It is also not obvious to me why your query has a GROUP BY clause in it. GROUP BY is ordinarily used with aggregate functions like MAX or COUNT, but you don't have those.
How about this? It may solve your problem.
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
If you need to show the person, the most recent visit, and the note from the most recent visit, you're going to have to explicitly join the visits table again to the summary query (virtual table) like so.
SELECT a.id, a.name, a.year, v.note
FROM (
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
)a
JOIN visits v ON (a.id = v.id_people and a.year = v.year)
Go fiddle: http://www.sqlfiddle.com/#!2/d67fc/20/0
If you need to show something for people that have never had a visit, you should try switching the JOIN items in my statement with LEFT JOIN.
As someone else wrote, an ORDER BY clause in a subquery is not standard, and generates unpredictable results. In your case it baffled the optimizer.
Edit: GROUP BY is a big hammer. Don't use it unless you need it. And, don't use it unless you use an aggregate function in the query.
Notice that if you have more than one row in visits for a person and the most recent year, this query will generate multiple rows for that person, one for each visit in that year. If you want just one row per person, and you DON'T need the note for the visit, then the first query will do the trick. If you have more than one visit for a person in a year, and you only need the latest one, you have to identify which row IS the latest one. Usually it will be the one with the highest ID number, but only you know that for sure. I added another person to your fiddle with that situation. http://www.sqlfiddle.com/#!2/4f644/2/0
This is complicated. But: if your visits.id numbers are automatically assigned and they are always in time order, you can simply report the highest visit id, and be guaranteed that you'll have the latest year. This will be a very efficient query.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT id_people, max(id) id
FROM visits
GROUP BY id_people
)m
JOIN people p ON (p.id = m.id_people)
JOIN visits v ON (m.id = v.id)
http://www.sqlfiddle.com/#!2/4f644/1/0 But this is not the way your example is set up. So you need another way to disambiguate your latest visit, so you just get one row per person. The only trick we have at our disposal is to use the largest id number.
So, we need to get a list of the visit.id numbers that are the latest ones, by this definition, from your tables. This query does that, with a MAX(year)...GROUP BY(id_people) nested inside a MAX(id)...GROUP BY(id_people) query.
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON (p.id_people = v.id_people AND p.year = v.year)
GROUP BY v.id_people
The overall query (http://www.sqlfiddle.com/#!2/c2da2/1/0) is this.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON ( p.id_people = v.id_people
AND p.year = v.year)
GROUP BY v.id_people
)m
JOIN people p ON (m.id_people = p.id)
JOIN visits v ON (m.id = v.id)
Disambiguation in SQL is a tricky business to learn, because it takes some time to wrap your head around the idea that there's no inherent order to rows in a DBMS.
I have two tables user (one) and transaction (many) and I need to get the average time in days from when a user was created to when they made their first transaction. I'm using AVG(TIMESTAMPDIFF) which is working well, except that the GROUP BY returns an average against every user instead of one single average for all unique users in the transaction table. If I remove the GROUP BY, I get a single average figure but it takes into account multiple transactions from users, whereas I just want to have one per user (the first they made).
Here's my SQL:
SELECT AVG(TIMESTAMPDIFF(DAY, u.date_created, t.transaction_date)) AS average
FROM transaction t
LEFT JOIN user u ON u.id = t.user_id
WHERE t.user_id IS NOT NULL AND t.status = 1
GROUP BY t.user_id;
I'd appreciate it if someone can help me return the average for unique users only. It's fine to break the query down into two, but the tables are large so returning lots of data and putting it back in is a no-go. Thanks in advance.
SELECT AVG(TIMESTAMPDIFF(DAY, S.date_created, S.transaction_date)) AS average
FROM (
SELECT u.date_created, t.transaction_date
FROM transaction t
INNER JOIN user u ON u.id = t.user_id
WHERE t.status = 1
GROUP BY t.user_id
HAVING u.date_created = MIN(u.date_created)
) s
I replaced the LEFT JOIN with an INNER JOIN because I think that's what you want, but it's not 100% equivalant to your WHERE t.user_id IS NOT NULL.
Feel free to put the LEFT JOIN back if need be.
select avg( TIMESTAMPDIFF(DAY, u.date_created, min_tdate) ) as average
from user u
inner join
(select t.user_id, min(t.transaction_date) as min_tdate
from transaction t
where t.status=1;
group by t.user_id
) as min_t
on u.id=min_t.user_id;