How to select maximum 3 items per users in MySQL? - mysql

I run a website where users can post items (e.g. pictures). The items are stored in a MySQL database.
I want to query for the last ten posted items BUT with the constraint of a maximum of 3 items can come from any single user.
What is the best way of doing it? My preferred solution is a constraint that is put on the SQL query requesting the last ten items. But ideas on how to set up the database design is very welcome.
Thanks in advance!
BR

It's pretty easy with a correlated sub-query:
SELECT `img`.`id` , `img`.`userid`
FROM `img`
WHERE 3 > (
SELECT count( * )
FROM `img` AS `img1`
WHERE `img`.`userid` = `img1`.`userid`
AND `img`.`id` > `img1`.`id` )
ORDER BY `img`.`id` DESC
LIMIT 10
The query assumes that larger id means added later
Correlated sub-queries are a powerful tool! :-)

This is difficult because MySQL does not support the LIMIT clause on sub-queries. If it did, this would be rather trivial... But alas, here is my naïve approach:
SELECT
i.UserId,
i.ImageId
FROM
UserSuppliedImages i
WHERE
/* second valid ImageId */
ImageId = (
SELECT MAX(ImageId)
FROM UserSuppliedImages
WHERE UserId = i.UserId
)
OR
/* second valid ImageId */
ImageId = (
SELECT MAX(ImageId)
FROM UserSuppliedImages
WHERE UserId = i.UserId
AND ImageId < (
SELECT MAX(ImageId)
FROM UserSuppliedImages
WHERE UserId = i.UserId
)
)
/* you get the picture...
the more "per user" images you want, the more complex this will get */
LIMIT 10;
You did not comment on having a preferred result order, so this selects the latest images (assuming ImageId is an ascending auto-incrementing value).
For comparison, on SQL Server the same would look like this:
SELECT TOP 10
img.ImageId,
img.ImagePath,
img.UserId
FROM
UserSuppliedImages img
WHERE
ImageId IN (
SELECT TOP 3 ImageId
FROM UserSuppliedImages
WHERE UserId = img.UserId
)

I would first select 10 distinct users, then selecting images from each of those users with a LIMIT 3, possibly by a union of all those and limit that to 10.
That would atleast narrow down the data you need to process to a fair amount.

Related

order by makes query slow

I have two tables :
video (ID, TITLE, ..., UPLOADED_DATE)
join_video_category (ID (not used), ID_VIDEO_ ID_CATEGORY)
rows in video : 4 500 000 |
rows in join_video_category : 5 800 000
1 video can have many category.
I have a query works perfectly, 20 ms max to get result :
SELECT * FROM video WHERE ID IN
(SELECT ID_VIDEO FROM join_video_category WHERE ID_CATEGORY=11)
LIMIT 1000;
This query take 1000 video, the order is not important.
BUT, when i would like to get 10 latest video from a category, my query take arround 30-40 seconds :
SELECT * FROM video WHERE ID IN
(SELECT ID_VIDEO FROM join_video_category WHERE ID_CATEGORY=11)
ORDER BY UPLOADED_DATE DESC LIMIT 10;
I have index on ID_CATEGORY, ID_VIDEO, UPLOADED_DATE, PRIMARY ON ID video and join_video_category.
I have tested it with JOIN on my query, it's the same result.
First, the comparisons are to two very different queries. The first returns a bunch of videos whenever it encounters them. The second has to read all the videos and then sort them.
Try rewriting this as a JOIN:
SELECT v.*
FROM video v JOIN
join_video_category vc
ON v.id = bc.id_video
WHERE vc.ID_CATEGORY = 11
ORDER BY v.UPLOADED_DATE DESC
LIMIT 10;
That may or may not help. You have a lot of data and so you might have a lot of videos for a given category. If so, a where clause that gets more recent data might really help:
SELECT v.*
FROM video v JOIN
join_video_category vc
ON v.id = bc.id_video
WHERE vc.ID_CATEGORY = 11 AND v.UPLOADED_DATE >= '2015-01-01'
ORDER BY v.UPLOADED_DATE DESC
LIMIT 10;
Finally, if that doesn't work, consider adding something like UPLOADED_DATE into join_video_category. Then, this query should blaze:
select vc.video_id
from join_vdeo_category vc
where vc.ID_CATEGORY = 11
order by vc.UPLOADED_DATE desc
limit 10;
with an index on join_video_category(id_category, uploaded_date, video_id).
solution #1:
replacing "in" with "exists" would improve the performance, please try the below query.
SELECT * FROM video WHERE exists
(SELECT * FROM join_video_category WHERE ID_CATEGORY=11 AND join_video_category.ID_VIDEO = video.ID)
ORDER BY UPLOADED_DATE DESC LIMIT 10;
solution #2:
1) create tem_table
CREATE TABLE TEMP_TABLE AS SELECT * FROM join_video_category WHERE ID_CATEGORY=11;
2) use the temp table in solution #1
SELECT * FROM video WHERE exists
(SELECT * FROM temp_table WHERE temp_table.ID_VIDEO = video.ID)
ORDER BY UPLOADED_DATE DESC LIMIT 10;
Good Luck!!
If it is 1:Many, don't use an extra table between Video and Category. However, your row counts imply that it is Many:Many.
If it is 1:Many, simply have the category_id in the Video table, then simplify all the queries.
If it is Many:Many, then be sure to use this pattern for the junction table:
CREATE TABLE map_video_category (
video_id ...,
category_id ...,
PRIMARY KEY(video_id, category_id), -- both ids, one direction
INDEX (category_id, video_id) -- both ids, the other direction
) ENGINE=InnoDB; -- significantly better than MyISAM on INDEX handling here
The ID that you mentioned is a waste. The composite keys are optimal for all situations, and will improve performance in most situations.
Do not use IN ( SELECT ... ); the optimizer does a poor job of optimizing it. Change to a JOIN, LEFT JOIN, EXISTS, or some other construct.

Highscores on multiple columns, efficient query, right approach

Let's say we've got high scores table with columns app_id, best_score, best_time, most_drops, longest_something and couple more.
I'd like to collect top three results ON EACH CATEGORY grouped by app_id?
For now I'm using separate rank queries on each category in a loop:
SELECT app_id, best_something1,
FIND_IN_SET( best_something1,
(SELECT GROUP_CONCAT( best_something1
ORDER BY best_something1 DESC)
FROM highscores )) AS rank
FROM highscores
ORDER BY best_something1 DESC LIMIT 3;
Two things worth to add:
All columns for specific app are being updated at the same time (can consider creating a helper table).
the result of prospective "turbo query" might be requested quite often - as often as updating the values.
I'm quite basic with SQL and suspect that it has many more commands that combined together could do the magic?
What I'd expect from this post is that some wise owl would at least point the direction where to go or how to go.
The sample table:
http://sqlfiddle.com/#!2/eef053/1
Here is sample result too (already in json format, sry):
{"total_blocks":[["13","174","1"],["9","153","2"],["10","26","3"]],"total_games":[["13","15","1"],["9","12","2"],["10","2","3"]],"total_score":[["13","410","1"],["9","332","2"],["11","88","3"]],"aver_pps":[["11","4.34011","1"],["13","2.64521","2"],["12","2.60623","3"]],"aver_drop_per_game":[["11","20","1"],["10","13","2"],["9","12.75","3"]],"aver_drop_val":[["11","4.4","1"],["13","2.35632","2"],["9","2.16993","3"]],"aver_score":[["11","88","1"],["9","27.6667","2"],["13","27.3333","3"]],"best_pps":[["13","4.9527","1"],["11","4.34011","2"],["9","4.13076","3"]],"most_drops":[["11","20","1"],["9","16","2"],["13","16","2"]],"longest_drop":[["9","3","1"],["13","2","2"],["11","2","2"]],"best_drop":[["11","42","1"],["13","36","2"],["9","30","3"]],"best_score":[["11","88","1"],["13","78","2"],["9","58","3"]]}
When I encounter this scenario, I prefer to employ the UNION clause, and combine the queries tailored to each ORDERing and LIMIT.
http://dev.mysql.com/doc/refman/5.1/en/union.html
UNION combines the result rows vertically (top 3 rows for 5 sort categories yields 15 rows).
For your specific purpose, you might then pivot them as sub-SELECTs, rolling them up with GROUP_CONCAT GROUPed on user so that each has the delimited list.
I'd test something like this query, to see if the performance is any better or not. I think this comes pretty close to satisfying the specification:
( SELECT 99 AS seq_
, a.category
, CONVERT(a.val,DOUBLE) AS val
, FIND_IN_SET(a.val,r.highest_vals) AS rank
, a.user_id
FROM ( SELECT 'total_blocks' AS category
, b.`total_blocks` AS val
, b.user_id
FROM app b
ORDER BY b.`total_blocks` DESC
LIMIT 3
) a
CROSS
JOIN ( SELECT GROUP_CONCAT(s.val ORDER BY s.val DESC) AS highest_vals
FROM ( SELECT t.`total_blocks` AS val
FROM app t
ORDER BY t.`total_blocks` DESC
LIMIT 3
) s
) r
ORDER BY a.val DESC
)
UNION ALL
( SELECT 97 AS seq_
, a.category
, CONVERT(a.val,DOUBLE) AS val
, FIND_IN_SET(a.val,r.highest_vals) AS rank
, a.user_id
FROM ( SELECT 'XXX' AS category
, b.`XXX` AS val
, b.user_id
FROM app b
ORDER BY b.`XXX` DESC
LIMIT 3
) a
CROSS
JOIN ( SELECT GROUP_CONCAT(s.val ORDER BY s.val DESC) AS highest_vals
FROM ( SELECT t.`XXX` AS val
FROM app t
ORDER BY t.`XXX` DESC
LIMIT 3
) s
) r
ORDER BY a.val DESC
)
ORDER BY seq_ DESC, val DESC
To unpack this a little bit... this is essentially separate queries that are combined with UNION ALL set operator.
Each of the queries returns a literal value to allow for ordering. (In this case, I've given the column a rather anonymous name seq_ (sequence)... if the specific order isn't important, then this could be removed.
Each query is also returning a literal value that tells which "category" the row is for.
Because some of the values returned are INTEGER, and others are FLOAT, I'd cast all of those values to floating point, so the datatypes of each query line up.
For the FLOAT (floating point) type values, there can be a problem with comparison. So I'd go with casting those to decimal and stringing them together into a list using GROUP_CONCAT (as the original query does).
Since we are returning only three rows from each query, we only need to concatenate together the three largest values. (If there's a two way "tie" for first place, we'll return rank values of 1, 1, 3.)
Suitable indexes for each query will improve performance for large sets.
... ON app (total_blocks, user_id)
... ON app (best_pps,user_id)
... ON app (XXX,user_id)

How to sort result by highest value, using 2 columns?

I have a lot of users with websites and I want to select all websites and sort them by visitor amount. The users can specify the visitor amount in 2 ways. Either they can input it manually as a string that is stored in fb.visitor in the query below.
The second way is that he user install a Javascript Tracking Code on their site that then adds entries to the table tracking_visits and the total amount of visits is count(tv.id) below.
I want to be able to sort this result in 2 ways.
1) I want to get the highest result on top and lowest at bottom, using both columns. Example the Result should be:
99'947 ( COUNT(tv.id) )
75'412 ( COUNT(tv.id) )
40'000 ( fb.visitors )
37'482 ( COUNT(tv.id) )
30'000 ( fb.visitors )
2) Second sort I would like to be able to get all COUNT(tv.id) on top, highest first, and then get fb.visitors with highest first below. Example:
99'947 ( COUNT(tv.id) )
75'412 ( COUNT(tv.id) )
37'482 ( COUNT(tv.id) )
40'000 ( fb.visitors )
30'000 ( fb.visitors )
My current Query looks like this:
SELECT cs.userid, fb.visitors, COUNT( tv.id )
FROM campaigns_signups cs
INNER JOIN fe_blogs fb ON cs.userid = fb.userid
INNER JOIN tracking_visits tv ON tv.blogid = cs.userid
WHERE tv.visitdate
BETWEEN "2013-09-04"
AND "2013-10-04"
AND cs.campaignid = "97"
AND cs.status < "4"
GROUP BY tv.blogid
ORDER BY COUNT( tv.id ) , fb.visitors DESC
Note that the Dates and Integers in the Query is just examples.
The problem with this query is that it only selects the result that has entries in tracking_visits. I want to select a result where I get BOTH bloggers who have visitor amount in tracking_visits AND blogs who have visitor amount in fb.visitors.
For your first task, you can use ORDER BY GREATEST(COUNT(tv.id), fb.visitors) DESC. Documentation on GREATEST. For your second, you will want to use UNION. Documentation on UNION.
If for your first task you want each site to yield two rows (one for the greatest of the two values and the other for the least), you can again achieve this using UNION.
You are looking for greatest
select greatest(ifnull(fb.visitors,0),count(tv_id)) from.... order by 1
select greatest(ifnull(fb.visitors,0),count(tv_id)) from....
order by
case when greatest(ifnull(fb.visitors,0),count(tv_id))=fb.bisitors then 2 else 1 end, greatest(ifnull(fb.visitors,0),count(tv_id))
the second order by case orders by source of value and then by value size
For the second option of selecting the COUNT(tv.id) first, I was able to accomplish this by the following query:
SELECT *, tv.tracked_visits
FROM campaigns_signups cs
INNER JOIN fe_blogs fb ON cs.userid = fb.userid
LEFT JOIN (
SELECT blogid, COUNT( id ) AS "tracked_visits"
FROM tracking_visits
WHERE visitdate
BETWEEN "2013-09-04"
AND "2013-10-04"
GROUP BY blogid
) AS tv ON tv.blogid = cs.userid
WHERE cs.campaignid = :campaignid
AND cs.status < :status
ORDER BY tv.tracked_visits DESC , fb.visitors DESC

Limit results instead of group results

I am struggling to find the right answer looking around the internet for how to do this, I have a join and a group. When I add a Limit to the end it limits the groups and not the actual results.
SELECT COUNT(*) AS `numrows`, `people`.`age`
FROM (`events`)
JOIN `people`
ON `events`.`id` = `people`.`id`
WHERE `people`.`priority` = '1'
GROUP BY `people`.`age`
ORDER BY `numrows`
LIMIT 150
The limit always changes so this needs to be dynamic, the idea is to miss out the first 150 or x amount of rows from both tables but not the to limit the groups.
EDIT= I think I have explained this badly, I actually want to start from 150 rows or x, limit is the only way I know to do this dynamically. so the idea is if the last search was retrieved 150 rows, then lets say next time there are 250 results but I want to ignore the first 150 which were found last time etc.. Hope that makes better sense.
there limit or start from needs to be after the WHERE in the join, I think that's the only place it would work.
EDIT SQL =
SELECT COUNT( * ) AS `numrows`, `people`.`age`
FROM (
SELECT `id`, `events`.`pid`
FROM `events`
ORDER BY `id`
LIMIT 1050
)limited
JOIN `people` ON `people`.`age` = limited.id
WHERE `people`.`priority` = '1'
GROUP BY `people`.`age`
ORDER BY `numrows` DESC
Thanks for your help
I suspect you mean something like this?
SELECT COUNT(*) AS numrows, people.age
FROM (
SELECT id FROM events ORDER BY id LIMIT 150
) limited
JOIN people ON people.id = limited.id
GROUP BY people.age
ORDER BY numrows;
I did it using where between on timestamp, it seems like the only sensible way to do it.

Get previous and next row from mysql based on rank

I use this mysql query to get rank by point. I need to get previous and next item by rank.
For example: item's rank is 99. At item page, I want to show 100th, 101th, 98th and 97th items.
http://erincfirtina.com/apps/urdemo/track.php?tid=10
i need to do related tracks list
Here is my mysql query which get rank:
SELECT
uo.*,
( SELECT COUNT(*) FROM tracks ui WHERE (ui.point, ui.id) >= (uo.point, uo.id) ) AS rank
FROM tracks uo WHERE id = 10
You never asked a question.
One thing I am observing is that you are using a table (uo) inside the subselect that isn't part of the subselect.
Maybe you are looking for:
SELECT uo.*, COUNT(*) AS rank
FROM tracks ui, tracks uo
WHERE (ui.point, ui.id) >= (uo.point, uo.id)
AND uo.id = 10;
Hard for me to test its accuracy with no idea of what your table looks like, or what your question actually is.
This query should work, although it will scale very badly.
SELECT *
FROM (
SELECT
uo.*,
( SELECT COUNT(*) FROM tracks ui WHERE (ui.point, ui.id) >= (uo.point, uo.id) ) AS rank
FROM tracks uo WHERE id = 10
) t
ORDER BY t.rank DESC