SQL Group by using the First N elements in each group [duplicate] - mysql

This question already has an answer here:
Top N per Group Sql problem in mysql
(1 answer)
Closed 9 years ago.
Suppose I have the next table:
+------------+---------+
| MovieId | rating |
+------------+---------+
| 1 | 4 |
| 1 | 3 |
| 1 | 2 |
| 1 | 4 |
| 1 | 5 |
| 2 | 3 |
| 2 | 4 |
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
| 3 | 5 |
| 4 | 4 |
| 4 | 2 |
+------------+---------+
I would like to get the average by group BUT using the first 2 elements on each group.
Example:
+------------+---------+
| MovieId | rating |
+------------+---------+
| 1 | 4 |
| 1 | 3 |
| 2 | 3 |
| 2 | 4 |
| 3 | 1 |
| 3 | 2 |
| 4 | 4 |
| 4 | 2 |
+------------+---------+
answer expected:
+------------+---------+
| MovieId | AVG |
+------------+---------+
| 1 | 3.5 |
| 2 | 3.5 |
| 3 | 1.5 |
| 4 | 3 |
+------------+---------+
This is the SQL query I have to get the AVG for all of the movies. But as I said, I would like to use just the first 2 elements for each group.
SELECT movieid, AVG(cast(rating as DECIMAL(10,2))) AS AVG
FROM ratings
group by movieid
If you can help me to make the SQL I appreciate. I will also use Linq just in case some of you know it.

In a SQL DBMS -- as in the relational model -- there is no "first". Do you mean any arbitrary 2 rows for each movie, or the two highest ratings, or something else?
If you can't define an order, then the query is meaningless.
If you can define an order, join the table to itself as I show in my canonical example to create a ranking, and select where RANK < 3.

FOR Mysql:-
select id, avg(rating)
from (SELECT a.*, #num := #num + 1 rownum,
(select count(*)
from movies m
where m.id<=a.id) last_count,
(select count(*)
from movies m1
where a.id=m1.id) grp_count
from movies a, (SELECT #num := 0) d) f
where grp_count-(last_count-rownum)<=2
group by id;
you can use rownum function in oracle. And row_number() function in sql server.

This is a solution in SQL
Create table #tempMovie (movieId int ,rating int)
INSERT INTO #tempMovie
Select * from table where movieidid=1 Limit 2
Union all
Select * from table where movieidid=2 Limit 2
Union all
Select * from table where movieidid=3 Limit 2
Union all
Select * from table where movieidid=4 Limit 2
Temporary table #tempmovie table will contain data like this
+------------+---------+
| MovieId | rating |
+------------+---------+
| 1 | 4 |
| 1 | 3 |
| 2 | 3 |
| 2 | 4 |
| 3 | 1 |
| 3 | 2 |
| 4 | 4 |
| 4 | 2 |
+------------+---------+
then apply group by
Select movieId, AVG(rating)
from #tempMovie
Group by movieId
Drop table #tempmovie

Related

SQL query that randoms the id from all posible ids in table and outputs the rows containing that id

I want a query that selects all rows that have the UploadedbyUserID = Rand() (selects random id from possible UploadbyUserID in this case 4, 3 and 22 and only those 3 not 2 nor 5)
And if the rand gives 4 it outputs this:
+------+------+------------+--------------------+
| id | name | date | UploadedbyUserID |
+------+------+------------+--------------------+
| 1 | 2222 | Testing | 4 |
| 2 | Jack | description| 4 |
| 6 | Zara | 2007-02-06 | 4 |
+------+------+------------+--------------------+
This is the whole table
+------+------+------------+--------------------+
| id | name | date | UploadedbyUserID |
+------+------+------------+--------------------+
| 1 | 2222 | Testing | 4 |
| 2 | Jack | description| 4 |
| 3 | ffdsd| 2007-05-06 | 4 |
| 4 | dsm | 2007-05-27 | 3 |
| 5 | dddd | 2007-04-06 | 3 |
| 6 | Zara | 2007-02-06 | 4 |
| 7 | John | 2007-01-24 | 22 |
+------+------+------------+--------------------+
and if it randomizes 3 it outputs this
+------+------+------------+--------------------+
| id | name | date | UploadedbyUserID |
+------+------+------------+--------------------+
| 4 | dsm | 2007-05-27 | 3 |
| 5 | dddd | 2007-04-06 | 3 |
+------+------+------------+--------------------+
Ask if you need more information
Hmmm. This is one way:
select t.*
from (select uploadedbyuserid
from t
order by rand()
limit 1
) u join
t
using (uploadedbyuserid);
First, let me say that this is weighted by the number of times that a user has uploaded something. So, user "4" would appear a bit more often than "3", in your example. If this is an issue:
select t.*
from (select uploadedbyuserid
from (select distinct uploadedbyuserid from t) t
order by rand()
limit 1
) u join
t
using (uploadedbyuserid);
The next observation is that this can be compute intensive. If you have lots of rows, there are various ways to speed these up. For instance, one simple method would be to get about 1 out of 10000 rows:
select t.*
from (select uploadedbyuserid
from (select distinct uploadedbyuserid
from t
) t
where rand() < 0.001
order by rand()
limit 1
) u join
t
using (uploadedbyuserid);

Alternate order by logic in MySQL

I'm looking to allow for a custom ordering logic through mySQL that allows the following data set:
+----+-----------------+------------+-------+--+
| ID | item | Popularity | Views | |
+----+-----------------+------------+-------+--+
| 1 | A special place | 3 | 10 | |
| 2 | Another title | 5 | 12 | |
| 3 | Words go here | 1 | 15 | |
| 4 | A wonder | 2 | 8 | |
+----+-----------------+------------+-------+--+
To return an order that alternates, row by row, by popularity and then by views, so the return results look like:
+----+-----------------+------------+-------+--+
| ID | item | Popularity | Views | |
+----+-----------------+------------+-------+--+
| 3 | Words go here | 1 | 15 | |
| 2 | Another title | 5 | 12 | |
| 4 | A wonder | 2 | 8 | |
| 1 | A special place | 3 | 10 | |
+----+-----------------+------------+-------+--+
Where you will see the first row returns the 'most popular', the second row returns the most views, the third row returns the second most popular, and the 4th row returns the 2nd most views.
Currently I'm gathering an entire table through mySQL twice, and then merging these results in PHP. This isn't going to cut it when the database is large. Is this possible in mysql at all?
I guess something along these lines could work. Consider the following:
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,x INT NOT NULL
,y INT NOT NULL
);
INSERT INTO my_table VALUES
(1,3,10),
(2,5,12),
(3,1,15),
(4,2, 8)
(5,4, 1);
We can rank x and y in turn, and then arrange those ranks in a single list - so will have x1,y1,x2,y2,etc - but all rows will appear twice; once for the x rank and once for the y rank...
SELECT * FROM
(
( SELECT a.*, COUNT(*) rank FROM my_table a JOIN my_table b ON b.x <= a.x GROUP BY a.id )
UNION ALL
( SELECT a.*, COUNT(*) rank FROM my_table a JOIN my_table b ON b.y <= a.y GROUP BY a.id )
) n
ORDER BY rank
+----+---+----+------+
| id | x | y | rank |
+----+---+----+------+
| 5 | 4 | 1 | 1 |
| 3 | 1 | 15 | 1 |
| 4 | 2 | 8 | 2 |
| 4 | 2 | 8 | 2 |
| 1 | 3 | 10 | 3 |
| 1 | 3 | 10 | 3 |
| 5 | 4 | 1 | 4 |
| 2 | 5 | 12 | 4 |
| 2 | 5 | 12 | 5 |
| 3 | 1 | 15 | 5 |
+----+---+----+------+
Now we can just grab the lowest rank for each id...
SELECT id
, x
, y
FROM
(
( SELECT a.*, COUNT(*) rank FROM my_table a JOIN my_table b ON b.x <= a.x GROUP BY a.id )
UNION ALL
( SELECT a.*, COUNT(*) rank FROM my_table a JOIN my_table b ON b.y <= a.y GROUP BY a.id )
) m
GROUP
BY id,x,y
ORDER
BY MIN(rank);
+----+---+----+
| id | x | y |
+----+---+----+
| 3 | 1 | 15 |
| 5 | 4 | 1 |
| 4 | 2 | 8 |
| 1 | 3 | 10 |
| 2 | 5 | 12 |
+----+---+----+
Incidentally, this should be faster with variables - but I cannot make that solution work at present - senior moment, perhaps.

MySQL Query to get Similar likes

I am designing a simple architecture where i have a table which stores users and some elements that they like so my table structure is something like this:
+---------+---------+
| user_id | like_id |
+---------+---------+
| 1 | 4 |
| 2 | 2 |
| 4 | 4 |
| 4 | 3 |
| 5 | 4 |
| 6 | 7 |
| 7 | 5 |
| 34 | 6 |
| 3 | 8 |
| 2 | 3 |
| 2 | 5 |
| 1 | 3 |
| 1 | 10 |
| 1 | 12 |
| 2 | 10 |
+---------+---------+
Now what i will have is id of any user (lets say user_id = 1 ) and i want a query to get all the other users who have similar Likes as that of 1.
So in the Output for user_id = 1 will be :
+---------------------------+------------------------+----------------+
| users_with_common_likes | no_of_common_likes | common_likes |
+---------------------------+------------------------+----------------+
| 4 | 2 | 3,4 |
| 2 | 2 | 3,10 |
| 5 | 1 | 4 |
+---------------------------+------------------------+----------------+
What I have achieved :
I can do this using a sub-query as below :
SELECT user_id
FROM `user_likes`
WHERE `like_id`
IN (
SELECT GROUP_CONCAT( `like_id` )
FROM user_likes
WHERE user_id =1
)
AND user_id !=1
LIMIT 0 , 30
However this query is not giving all the users,it misses the user_id = 2 which has like id 3 in common with user_id=1.
and i cant figure out how to find the remaining 2 columns.
Also I feel that this is not the best way to to this as this table will contain thousands of data and it may effect system performance.
I would like to do this with a single Mysql Query.
This assumes a PK formed on user_id,like_id...
SELECT y.user_id
, GROUP_CONCAT(y.like_id) likes
, COUNT(*) total
FROM my_table x
JOIN my_table y
ON y.like_id = x.like_id
AND y.user_id <> x.user_id
WHERE x.user_id = 1
GROUP
BY y.user_id;

mysql: order -> limit -> sum... possible?

i am loosing it over the following problem:
i have a table with participants and points. each participant can have up to 11 point entries of which i only want the sum of the top 6.
in this example lets say we want the top 2 of 3
+----+---------------+--------+
| id | participantid | points |
+----+---------------+--------+
| 1 | 1 | 11 |
+----+---------------+--------+
| 2 | 3 | 1 |
+----+---------------+--------+
| 3 | 3 | 4 |
+----+---------------+--------+
| 4 | 2 | 3 |
+----+---------------+--------+
| 5 | 1 | 5 |
+----+---------------+--------+
| 6 | 2 | 10 |
+----+---------------+--------+
| 7 | 2 | 9 |
+----+---------------+--------+
| 8 | 1 | 3 |
+----+---------------+--------+
| 9 | 3 | 4 |
+----+---------------+--------+
as a result i want something like
+---------------+--------+
| participantid | points |
+---------------+--------+
| 2 | 19 |
+---------------+--------+
| 1 | 16 |
+---------------+--------+
| 3 | 8 |
+---------------+--------+
(it should be ordered DESC by the resulting points)
is this at all possible with mysql? in one query?
oh and the resulting participant ids should be resolved into the real names from another 'partcipant' table where
+----+------+
| id | name |
+----+------+
| 1 | what |
+----+------+
| 2 | ev |
+----+------+
| 3 | er |
+----+------+
but that should be doable with a join at some point... i know...
Using one of the answers from ROW_NUMBER() in MySQL for row counts, and then modifying to get the top.
SELECT ParticipantId, SUM(Points)
FROM
(
SELECT a.participantid, a.points, a.id, count(*) as row_number
FROM scores a
JOIN scores b ON a.participantid = b.participantid AND cast(concat(a.points,'.', a.id) as decimal) <= cast(concat(b.points,'.', b.id) as decimal)
GROUP BY a.participantid, a.points, a.id
) C
WHERE row_number IN (1,2)
GROUP BY ParticipantId
Had an issue with ties until I arbitrarily broke them with the id

SQL, difficult fetching data query

Suppose I have such a table:
+-----+---------+-------+
| ID | TIME | DAY |
+-----+---------+-------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |
| 1 | 1 | 2 |
| 2 | 2 | 2 |
| 3 | 3 | 2 |
| 1 | 1 | 3 |
| 2 | 2 | 3 |
| 3 | 3 | 3 |
| 1 | 1 | 4 |
| 2 | 2 | 4 |
| 3 | 3 | 4 |
| 1 | 1 | 5 |
| 2 | 2 | 5 |
| 3 | 3 | 5 |
+-----+---------+-------+
I want to fetch a table which represents 2 IDs which got the largest sum of TIME within the last 3 days (means from 3 to 5 in a DAY column)
So the correct result would be:
+-----+---------+
| ID | SUM |
+-----+---------+
| 3 | 9 |
| 2 | 6 |
+-----+---------+
The original table is much larger and more complex. So i need a generic approach.
Thanks in advance.
And so I just learned that MySQL used LIMIT instead of TOP...
fiddle
CREATE TABLE tbl (ID INT,tm INT,dy INT);
INSERT INTO tbl (id, tm, dy) VALUES
(1,1,1)
,(2,2,1)
,(3,3,1)
,(1,1,2)
,(1,1,1)
SELECT ID
,SUM(SumTimeForDay) SumTimeFromLastThreeDays
FROM (SELECT ID
,SUM(tm) SumTimeForDay
FROM tbl
GROUP BY ID, dy
HAVING dy > MAX(dy) -3) a
GROUP BY id
ORDER BY SUM(SumTimeForDay) DESC
LIMIT 2
select t1.`id`, sum(t1.`time`) as `sum`
from `table` t1
inner join ( select distinct `day` from `table` order by `day` desc limit 3 ) t2
on t2.`da`y = t1.`day`
group by t1.`id`
order by sum(t1.`time`) desc
limit 2