maximum rows per group subset - mysql

I have a query like this:
SELECT * FROM user AS u
JOIN article AS a
ON u.id = a.userid
GROUP BY u.id
How can I extract maximum 10 articles for each particular user?

Mysql don't have window functions for such type of results another work around is to use user defined variables to get the n result per group
SELECT * FROM (
SELECT a.*,
#r:= CASE WHEN #g = userid THEN #r + 1 ELSE 1 END row_num,
#g:= userid
FROM (SELECT *
FROM `user` AS u
JOIN article AS a
ON u.id = a.userid
ORDER BY u.id,a.id DESC
) a
CROSS JOIN (SELECT #g:=NULL,#r:0) b
) t
WHERE row_num <=10

Related

LIMITing a SQL JOIN, with JOIN conditions

I have a problem similar to LIMITing a SQL JOIN, but with a slightly more complex requirement.
I want to search for Users and associated Transactions, which lie within a time range:
SELECT u.*, t.*
FROM User u
JOIN Transaction t ON t.user_id = u.id
WHERE t.timestamp >= ? and t.timestamp <= ?;
So far, so good. Now I want to repeat the query, but with a LIMIT on the number of users returned. There should be no limit on the number of transactions returned for a given user, though.
If I follow the approach suggested in the other question, this would translate into:
SELECT u.*, t.*
FROM (SELECT * FROM User LIMIT 10) u
JOIN Transaction t ON t.user_id = u.id
WHERE t.timestamp >= ? and t.timestamp <= ?;
This will not produce what I want: it will return the first 10 users, who might not have any transactions associated.
I want to return 10 users who have at least one associated transaction in the given time range.
How can I achieve this using MySQL?
You can use variables for this:
SELECT *
FROM (
SELECT *,
#rn := IF(#uid = user_id, #rn,
IF(#uid := user_id, #rn +1, #rn + 1)) AS rn
FROM (
SELECT u.*, t.*
FROM User u
JOIN Transaction t ON t.user_id = u.id
WHERE t.timestamp >= x and t.timestamp <= y) AS t
CROSS JOIN (SELECT #rn := 0, #uid := 0) AS vars
ORDER BY user_id) AS x
WHERE x.rn <= 10
Variable #rn is incremented by 1 every time a new user is returned by the query. So we can control the number of users returned using #rn <= 10.
You can do this without variables, but it requires repeating the join logic:
SELECT u.*, t.*
FROM (SELECT *
FROM User
WHERE EXISTS (SELECT 1
FROM Transaction t
WHERE t.user_id = u.id AND
t.timestamp >= ? and t.timestamp <= ?
)
LIMIT 10
) u JOIN
Transaction t
ON t.user_id = u.id
WHERE t.timestamp >= ? and t.timestamp <= ?;
EDIT:
Probably the fastest answer is something like this:
select u.*, t.*
from (select user_id
from (select user_id
from transaction t
where t.timestamp >= ? and t.timestamp <= ?
limit 1000
) t
limit 30
) tt join
user u
on tt.userid = u.id join
transaction t
on tt.userid = t.userid and t.timestamp >= ? and t.timestamp <= ?;
The first subquery chooses 1,000 matching records in the transaction table. My guess is that this is more than enough to get 30 users. This list is then joined to the user and transaction table to get the final results. By limiting the list without having to do a full table scan, the first query should be pretty fast . . . especially with an additional index on (timestamp, user).

MySQL: where count is higher than average

I want to select posts from users who have specific followers which is higher than the overall average (compared to other users)
The problem is when I use AVG() it limits the number of posts/users coming through, yet I can't use GROUP BY j.id as it will break the average count and WHERE j2.fCount >= j2.oAvg stops working properly
Here's my code
SELECT * FROM (
SELECT j.*, ROUND(AVG(j.fCount)) as oAvg
FROM (
SELECT p.id , COUNT(fCount.id) as fCount
FROM `post` p
LEFT JOIN `table` table ON ...
LEFT JOIN `user` user ON ....
LEFT JOIN `follow` fCount ON fCount.user_id=user.id AND fCount.follow_id=table.ids
WHERE p.user_id=fCount.user_id
group by p.id
) j
---- > `GROUP BY j.id` - BREAKS THE AVERAGE BELOW
) j2
WHERE j2.fCount >= j2.oAvg
Thank you :)
because you're trying to compare to average, you might have to do your inner query twice like this.
SELECT *,
(SELECT AVG(fCount) as average FROM
(SELECT COUNT(fCount.id) as fCount
FROM post p
LEFT JOIN follow fCount ON fCount.user_id = p.user_id
GROUP BY p.id
)j1
)as average
FROM
(SELECT p2.id, COUNT(fCount2.id) as fCount
FROM post p2
LEFT JOIN follow fCount2 ON fCount2.user_id = p2.user_id
GROUP BY p2.id
)j2
HAVING fCount >= average
sqlfiddle
just replace inner queries of j1 and j2 with your j
if you just want to run inner query once you can use user-defined variables to total up your count divide it by count to calculate your own average like this
SELECT id,fCount,#sum/#count as average
FROM
(SELECT id,
fCount,
#sum := #sum + fCount as total,
#count := #count + 1 as posts
FROM
(SELECT p.id,COUNT(fCount.id) as fCount
FROM post p
LEFT JOIN follow fCount ON fCount.user_id = p.user_id
GROUP BY p.id
)j,
(SELECT #sum:=0.0,#count:=0.0)initialize
)T
HAVING fCount >= average
sqlfiddle

SQL - Select discint AND count from JOIN query

I have below query:
SELECT u.*
(SELECT sum(trs.amount)
FROM transactions trs
WHERE u.id = trs.user AND trs.type = 'Recycle' AND
trs.TIME >= UNIX_TIMESTAMP(CURDATE())
) as amt
FROM (SELECT DISTINCT user_by
FROM xeon_users_rented
) AS xur JOIN
users u
ON xur.user_by = u.username
LIMIT 50
Which selects some data from my database. The above query works fine. However, I would like to also select count(*) from xeon_users_rented where user_by = u.username This is what I have attempted:
SELECT u.*
(SELECT sum(trs.amount)
FROM transactions trs
WHERE u.id = trs.user AND trs.type = 'Recycle' AND
trs.TIME >= UNIX_TIMESTAMP(CURDATE())
) as amt,
(SELECT DISTINCT count(*)
FROM xeon_users_rented
WHERE xur.user_by = u.username
) AS ttl
FROM (SELECT DISTINCT user_by
FROM xeon_users_rented
) AS xur JOIN
users u
ON xur.user_by = u.username
LIMIT 50
However, that gives me the total number of rows in xeon_users_rented as ttl - not the total distinct rows where username = user_by
I think you can do what you want just by tinkering with your subquery a little. That is, change the select distinct to a group by:
SELECT u.*, xur.cnt,
(SELECT sum(trs.amount)
FROM transactions trs
WHERE u.id = trs.user AND trs.type = 'Recycle' AND
trs.TIME >= UNIX_TIMESTAMP(CURDATE())
) as amt
FROM (SELECT user_by, COUNT(*) as cnt
FROM xeon_users_rented
GROUP BY user_by
) xur JOIN
users u
ON xur.user_by = u.username
LIMIT 50;
Some notes:
SELECT DISTINCT is not really necessary, because you can do the same logic using GROUP BY. So, it is more important to understand GROUP BY.
You are using LIMIT with no ORDER BY. That means that you can get a different set of rows each time you run the query. Bad practice.

Sub Query counting character strings in MySQL

LEFT JOIN
(
SELECT user_id, review, COUNT(user_id) totalCount
FROM reviews
GROUP BY user_id
) b ON b.user_id= b.user_id
I am trying to fit WHERE LENGTH(review) > 100 in this somewhere but every I put it, it gives me problems.
The sub-query above counts all total reviews by user_id. I simply want to add one more qualification. Only count reviews greater than 100 length.
On a side note, I've seen the function CHAR_LENGTH -- not sure if that i what I need either.
EDIT:
Here is complete query working perfectly as expected for my needs:
static public $top_users = "
SELECT u.username, u.score,
(COALESCE(a.totalCount, 0) * 4) +
(COALESCE(b.totalCount, 0) * 5) +
(COALESCE(c.totalCount, 0) * 1) +
(COALESCE(d.totalCount, 0) * 2) +
(COALESCE(u.friend_points, 0)) AS totalScore
FROM users u
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM items
GROUP BY user_id
) a ON a.user_id= u.user_id
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM reviews
GROUP BY user_id
) b ON b.user_id= u.user_id
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM ratings
GROUP BY user_id
) c ON c.user_id = u.user_id
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM comments
GROUP BY user_id
) d ON d.user_id = u.user_id
ORDER BY totalScore DESC LIMIT 25;";
LENGTH() returns the length of the string measured in bytes. You probably want CHAR_LENGTH() as it will give you the actual characters.
SELECT user_id, review, COUNT(user_id) totalCount
FROM reviews
WHERE CHAR_LENGTH(review) > 100
GROUP BY user_id, review
You're also not using GROUP BY correctly.
See the documentation
The query that you want is:
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount,
sum(case when length(review) > 100 then 1 else 0 end
) as NumLongReviews
FROM reviews
GROUP BY user_id
) b ON b.user_id= b.user_id
This counts both the reviews and the "long" reviews. That count is done using a case statement nested in a sum() function.

MySQL INNER JOIN select only one row from second table

I have a users table and a payments table, for each user, those of which have payments, may have multiple associated payments in the payments table. I would like to select all users who have payments, but only select their latest payment. I'm trying this SQL but i've never tried nested SQL statements before so I want to know what i'm doing wrong. Appreciate the help
SELECT u.*
FROM users AS u
INNER JOIN (
SELECT p.*
FROM payments AS p
ORDER BY date DESC
LIMIT 1
)
ON p.user_id = u.id
WHERE u.package = 1
You need to have a subquery to get their latest date per user ID.
SELECT u.*, p.*
FROM users u
INNER JOIN payments p
ON u.id = p.user_ID
INNER JOIN
(
SELECT user_ID, MAX(date) maxDate
FROM payments
GROUP BY user_ID
) b ON p.user_ID = b.user_ID AND
p.date = b.maxDate
WHERE u.package = 1
SELECT u.*, p.*
FROM users AS u
INNER JOIN payments AS p ON p.id = (
SELECT id
FROM payments AS p2
WHERE p2.user_id = u.id
ORDER BY date DESC
LIMIT 1
)
Or
SELECT u.*, p.*
FROM users AS u
INNER JOIN payments AS p ON p.user_id = u.id
WHERE NOT EXISTS (
SELECT 1
FROM payments AS p2
WHERE
p2.user_id = p.user_id AND
(p2.date > p.date OR (p2.date = p.date AND p2.id > p.id))
)
These solutions are better than the accepted answer because they work correctly when there are multiple payments with same user and date. You can try on SQL Fiddle.
SELECT u.*, p.*, max(p.date)
FROM payments p
JOIN users u ON u.id=p.user_id AND u.package = 1
GROUP BY u.id
ORDER BY p.date DESC
Check out this sqlfiddle
SELECT u.*
FROM users AS u
INNER JOIN (
SELECT p.*,
#num := if(#id = user_id, #num + 1, 1) as row_number,
#id := user_id as tmp
FROM payments AS p,
(SELECT #num := 0) x,
(SELECT #id := 0) y
ORDER BY p.user_id ASC, date DESC)
ON (p.user_id = u.id) and (p.row_number=1)
WHERE u.package = 1
You can try this:
SELECT u.*, p.*
FROM users AS u LEFT JOIN (
SELECT *, ROW_NUMBER() OVER(PARTITION BY userid ORDER BY [Date] DESC) AS RowNo
FROM payments
) AS p ON u.userid = p.userid AND p.RowNo=1
There are two problems with your query:
Every table and subquery needs a name, so you have to name the subquery INNER JOIN (SELECT ...) AS p ON ....
The subquery as you have it only returns one row period, but you actually want one row for each user. For that you need one query to get the max date and then self-join back to get the whole row.
Assuming there are no ties for payments.date, try:
SELECT u.*, p.*
FROM (
SELECT MAX(p.date) AS date, p.user_id
FROM payments AS p
GROUP BY p.user_id
) AS latestP
INNER JOIN users AS u ON latestP.user_id = u.id
INNER JOIN payments AS p ON p.user_id = u.id AND p.date = latestP.date
WHERE u.package = 1
#John Woo's answer helped me solve a similar problem. I've improved upon his answer by setting the correct ordering as well. This has worked for me:
SELECT a.*, c.*
FROM users a
INNER JOIN payments c
ON a.id = c.user_ID
INNER JOIN (
SELECT user_ID, MAX(date) as maxDate FROM
(
SELECT user_ID, date
FROM payments
ORDER BY date DESC
) d
GROUP BY user_ID
) b ON c.user_ID = b.user_ID AND
c.date = b.maxDate
WHERE a.package = 1
I'm not sure how efficient this is, though.
SELECT U.*, V.* FROM users AS U
INNER JOIN (SELECT *
FROM payments
WHERE id IN (
SELECT MAX(id)
FROM payments
GROUP BY user_id
)) AS V ON U.id = V.user_id
This will get it working
Matei Mihai given a simple and efficient solution but it will not work until put a MAX(date) in SELECT part so this query will become:
SELECT u.*, p.*, max(date)
FROM payments p
JOIN users u ON u.id=p.user_id AND u.package = 1
GROUP BY u.id
And order by will not make any difference in grouping but it can order the final result provided by group by. I tried it and it worked for me.
My answer directly inspired from #valex very usefull, if you need several cols in the ORDER BY clause.
SELECT u.*
FROM users AS u
INNER JOIN (
SELECT p.*,
#num := if(#id = user_id, #num + 1, 1) as row_number,
#id := user_id as tmp
FROM (SELECT * FROM payments ORDER BY p.user_id ASC, date DESC) AS p,
(SELECT #num := 0) x,
(SELECT #id := 0) y
)
ON (p.user_id = u.id) and (p.row_number=1)
WHERE u.package = 1
This is quite simple do The inner join and then group by user_id and use max aggregate function in payment_id assuming your table being user and payment query can be
SELECT user.id, max(payment.id)
FROM user INNER JOIN payment ON (user.id = payment.user_id)
GROUP BY user.id
If you do not have to return the payment from the query you can do this with distinct, like:
SELECT DISTINCT u.*
FROM users AS u
INNER JOIN payments AS p ON p.user_id = u.id
This will return only users which have at least one record associated in payment table (because of inner join), and if user have multiple payments, will be returned only once (because of distinct), but the payment itself won't be returned, if you need the payment to be returned from the query, you can use for example subquery as other proposed.