I have a problem similar to LIMITing a SQL JOIN, but with a slightly more complex requirement.
I want to search for Users and associated Transactions, which lie within a time range:
SELECT u.*, t.*
FROM User u
JOIN Transaction t ON t.user_id = u.id
WHERE t.timestamp >= ? and t.timestamp <= ?;
So far, so good. Now I want to repeat the query, but with a LIMIT on the number of users returned. There should be no limit on the number of transactions returned for a given user, though.
If I follow the approach suggested in the other question, this would translate into:
SELECT u.*, t.*
FROM (SELECT * FROM User LIMIT 10) u
JOIN Transaction t ON t.user_id = u.id
WHERE t.timestamp >= ? and t.timestamp <= ?;
This will not produce what I want: it will return the first 10 users, who might not have any transactions associated.
I want to return 10 users who have at least one associated transaction in the given time range.
How can I achieve this using MySQL?
You can use variables for this:
SELECT *
FROM (
SELECT *,
#rn := IF(#uid = user_id, #rn,
IF(#uid := user_id, #rn +1, #rn + 1)) AS rn
FROM (
SELECT u.*, t.*
FROM User u
JOIN Transaction t ON t.user_id = u.id
WHERE t.timestamp >= x and t.timestamp <= y) AS t
CROSS JOIN (SELECT #rn := 0, #uid := 0) AS vars
ORDER BY user_id) AS x
WHERE x.rn <= 10
Variable #rn is incremented by 1 every time a new user is returned by the query. So we can control the number of users returned using #rn <= 10.
You can do this without variables, but it requires repeating the join logic:
SELECT u.*, t.*
FROM (SELECT *
FROM User
WHERE EXISTS (SELECT 1
FROM Transaction t
WHERE t.user_id = u.id AND
t.timestamp >= ? and t.timestamp <= ?
)
LIMIT 10
) u JOIN
Transaction t
ON t.user_id = u.id
WHERE t.timestamp >= ? and t.timestamp <= ?;
EDIT:
Probably the fastest answer is something like this:
select u.*, t.*
from (select user_id
from (select user_id
from transaction t
where t.timestamp >= ? and t.timestamp <= ?
limit 1000
) t
limit 30
) tt join
user u
on tt.userid = u.id join
transaction t
on tt.userid = t.userid and t.timestamp >= ? and t.timestamp <= ?;
The first subquery chooses 1,000 matching records in the transaction table. My guess is that this is more than enough to get 30 users. This list is then joined to the user and transaction table to get the final results. By limiting the list without having to do a full table scan, the first query should be pretty fast . . . especially with an additional index on (timestamp, user).
Related
I have below query:
SELECT u.*
(SELECT sum(trs.amount)
FROM transactions trs
WHERE u.id = trs.user AND trs.type = 'Recycle' AND
trs.TIME >= UNIX_TIMESTAMP(CURDATE())
) as amt
FROM (SELECT DISTINCT user_by
FROM xeon_users_rented
) AS xur JOIN
users u
ON xur.user_by = u.username
LIMIT 50
Which selects some data from my database. The above query works fine. However, I would like to also select count(*) from xeon_users_rented where user_by = u.username This is what I have attempted:
SELECT u.*
(SELECT sum(trs.amount)
FROM transactions trs
WHERE u.id = trs.user AND trs.type = 'Recycle' AND
trs.TIME >= UNIX_TIMESTAMP(CURDATE())
) as amt,
(SELECT DISTINCT count(*)
FROM xeon_users_rented
WHERE xur.user_by = u.username
) AS ttl
FROM (SELECT DISTINCT user_by
FROM xeon_users_rented
) AS xur JOIN
users u
ON xur.user_by = u.username
LIMIT 50
However, that gives me the total number of rows in xeon_users_rented as ttl - not the total distinct rows where username = user_by
I think you can do what you want just by tinkering with your subquery a little. That is, change the select distinct to a group by:
SELECT u.*, xur.cnt,
(SELECT sum(trs.amount)
FROM transactions trs
WHERE u.id = trs.user AND trs.type = 'Recycle' AND
trs.TIME >= UNIX_TIMESTAMP(CURDATE())
) as amt
FROM (SELECT user_by, COUNT(*) as cnt
FROM xeon_users_rented
GROUP BY user_by
) xur JOIN
users u
ON xur.user_by = u.username
LIMIT 50;
Some notes:
SELECT DISTINCT is not really necessary, because you can do the same logic using GROUP BY. So, it is more important to understand GROUP BY.
You are using LIMIT with no ORDER BY. That means that you can get a different set of rows each time you run the query. Bad practice.
I have a query like this:
SELECT * FROM user AS u
JOIN article AS a
ON u.id = a.userid
GROUP BY u.id
How can I extract maximum 10 articles for each particular user?
Mysql don't have window functions for such type of results another work around is to use user defined variables to get the n result per group
SELECT * FROM (
SELECT a.*,
#r:= CASE WHEN #g = userid THEN #r + 1 ELSE 1 END row_num,
#g:= userid
FROM (SELECT *
FROM `user` AS u
JOIN article AS a
ON u.id = a.userid
ORDER BY u.id,a.id DESC
) a
CROSS JOIN (SELECT #g:=NULL,#r:0) b
) t
WHERE row_num <=10
LEFT JOIN
(
SELECT user_id, review, COUNT(user_id) totalCount
FROM reviews
GROUP BY user_id
) b ON b.user_id= b.user_id
I am trying to fit WHERE LENGTH(review) > 100 in this somewhere but every I put it, it gives me problems.
The sub-query above counts all total reviews by user_id. I simply want to add one more qualification. Only count reviews greater than 100 length.
On a side note, I've seen the function CHAR_LENGTH -- not sure if that i what I need either.
EDIT:
Here is complete query working perfectly as expected for my needs:
static public $top_users = "
SELECT u.username, u.score,
(COALESCE(a.totalCount, 0) * 4) +
(COALESCE(b.totalCount, 0) * 5) +
(COALESCE(c.totalCount, 0) * 1) +
(COALESCE(d.totalCount, 0) * 2) +
(COALESCE(u.friend_points, 0)) AS totalScore
FROM users u
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM items
GROUP BY user_id
) a ON a.user_id= u.user_id
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM reviews
GROUP BY user_id
) b ON b.user_id= u.user_id
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM ratings
GROUP BY user_id
) c ON c.user_id = u.user_id
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM comments
GROUP BY user_id
) d ON d.user_id = u.user_id
ORDER BY totalScore DESC LIMIT 25;";
LENGTH() returns the length of the string measured in bytes. You probably want CHAR_LENGTH() as it will give you the actual characters.
SELECT user_id, review, COUNT(user_id) totalCount
FROM reviews
WHERE CHAR_LENGTH(review) > 100
GROUP BY user_id, review
You're also not using GROUP BY correctly.
See the documentation
The query that you want is:
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount,
sum(case when length(review) > 100 then 1 else 0 end
) as NumLongReviews
FROM reviews
GROUP BY user_id
) b ON b.user_id= b.user_id
This counts both the reviews and the "long" reviews. That count is done using a case statement nested in a sum() function.
I'm trying to get a subset of records in a GROUP BY, I've seen a lot of crazy solutions out there, but they just seem too complicated, is there any more efficient way to do this.
SELECT user_id, GROUP_CONCAT(item_id ORDER BY `timestamp`) AS items
FROM wb_user_book_current_item GROUP BY user_id
So this will return me all the current items for all users which is okay so far. But I only want the ten most recent items. Adding ORDER BY to the GROUP_CONCAT helps, but it still doesn't give me the last ten records.
EDIT
If I do something like this and hard code the user_id then I can get the results I want for that one user, problem is combining it so that I don't need to hard code the user_id and can for instance just get ALL users last ten items
SELECT GROUP_CONCAT(cp2.item_id) AS items
FROM (SELECT cp.user_id, cp.item_id
FROM wb_user_book_current_item cp
WHERE cp.user_id=1 ORDER BY cp.`timestamp`
LIMIT 10) AS cp2
GROUP BY cp2.user_id
This is a difficult problem, but how about this:
SELECT user_id, GROUP_CONCAT(item_id ORDER BY `timestamp`) AS items
FROM wb_user_book_current_item T
WHERE NOT EXISTS
(
SELECT 1
FROM wb_user_book_current_item T2
WHERE T2.user_id = T.user_id
ORDER BY T2.`timestamp` DESC
LIMIT 10,1
)
OR T.`timestamp` > (
SELECT T2.`timestamp`
FROM wb_user_book_current_item T2
WHERE T2.user_id = T.user_id
ORDER BY T2.`timestamp` DESC
LIMIT 10,1
)
GROUP BY user_id
This of course assumes you won't have two rows with the same timestamp for the same user.
If your timestamp field is always a positive integer, you can also replace the NOT EXISTS...OR with a COALESCE:
SELECT user_id, GROUP_CONCAT(item_id ORDER BY `timestamp`) AS items
FROM wb_user_book_current_item T
WHERE T.`timestamp` > COALESCE((
SELECT T2.`timestamp`
FROM wb_user_book_current_item T2
WHERE T2.user_id = T.user_id
ORDER BY T2.`timestamp` DESC
LIMIT 10,1
), 0)
GROUP BY user_id
Original answer, but apparently MySQL doesn't understand how to do this properly and complains the subselect returns multiple rows. Of course we want multiple rows; it's a GROUP_CONCAT. Grr.
Unfortunately, I think there's no real way around using a subquery:
SELECT T.user_id,
GROUP_CONCAT((SELECT T2.item_id
FROM wb_user_book_current_item T2
WHERE T2.user_id = T.user_id
ORDER BY T2.`timestamp`
LIMIT 10)) AS items
FROM wb_user_book_current_item T
GROUP BY user_id
Otherwise, adding LIMIT anywhere else will either limit the number of groups, or limit from the total recordset over the table (and not the group) - neither of which are what you are trying to achieve.
So came across a nice solution here that works pretty well.
http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
It's something like this put all together:
SET #num := 0, #user_id := '';
SELECT cp2.user_id, CONCAT(cp2.item_id) AS items
FROM (
SELECT cp.user_id, cp.item_id,
#num := IF(#user_id = cp.user_id, #num + 1, 1) AS row_number,
#user_id := cp.user_id AS dummy
FROM wb_user_curent_item AS cp
ORDER BY cp.user_id ASC, cp.`timestamp` DESC
) AS cp2 WHERE cp2.row_number <= 10
GROUP BY cp2.user_id
So basically it just uses the num increment to limit the records rather than using LIMIT
SELECT
i.user_id,
GROUP_CONCAT(i.item_id ORDER BY i.timestamp) AS items
FROM
( SELECT DISTINCT user_id
FROM wb_user_book_current_item
) AS du
JOIN
wb_user_book_current_item AS i
ON i.user_id = du.user_id
AND i.timestamp <= COALESCE(
( SELECT i2.item_id
FROM wb_user_book_current_item AS i2
WHERE i2.user_id = du.user_id
ORDER BY i2.timestamp ASC
LIMIT 1 OFFSET 9
)
, '2038-01-19 03:14:07')
GROUP BY
i.user_id ;
An index on (user_id, timestamp, item_id) will help efficiency.
Try this:
SELECT
user_id,
GROUP_CONCAT(item_id ORDER BY `timestamp`) AS items
FROM wb_user_book_current_item
GROUP BY user_id
LIMIT 0, 10
UPDATE: I didn't notice the GROUP_CONCAT so you will have to use sub query in conunction with LIMIT
use LIMIT
SELECT column_name(s)
FROM table_name
LIMIT number
I'm working with a mysql query that is supposed to select all messages addressed or sent by the user. I need to group all messages with same UID so that I show a single thread for each differente user (this means it should eliminate all messages except the last with same UID). My problem is that I started using GROUP BY to do it but sometimes the row that remains is actually the older message instead of the latest.
This is what I was trying:
SELECT `UID`, `Name`, `Text`, `A`.`Date`
FROM `Users`
INNER JOIN (
(
SELECT *, To_UID AS UID FROM `Messages` WHERE `From_UID` = '$userID' AND `To_UID` != '$userID'
)
UNION ALL
(
SELECT *, From_UID AS UID FROM `Messages` WHERE `To_UID` = '$userID' AND `From_UID` != '$userID'
)
) AS A
ON A.UID = Users.ID
GROUP BY UID // This doesn't work
How can I show only the row with the most resent date per UID?
use DISTINCT and only use ORDER BY date
GROUP BY actually sometimes displays a random row, which isn't always commonly discussed.
you can try some thing like this:
select UID, Name, Text, c.date
from User
inner join (
select if(b.From_UID = '$userID', b.To_UID, b.From_UID) as UID,
*
from Messages as b
inner join(
select if(c.From_UID = '$userID', c.To_UID, c.From_UID) as UID,
max(c.date) as date
from Messages as c
where c.From_UID = '$userID' or c.To_UID = '$userID'
group by UID
) as d on d.date = b.date and d.UID = b.UID
) as e on e.UID = Users.id
)
or create a temp table / stored procedure to make life easier
Temp table
create temp table t
select if(From_UID = '$userID', To_UID, From_UID) as UID, * from Messages
select UID, Name, Text, date
from User
inner join (
select *
from t as t1
inner join(
select
t2.UID,
max(t2.date) as date
from t as t2
group by t2.UID
) as t3 on t3.date = t1.date and t3.UID = t1.UID
) as e on e.UID = Users.id