MySQL Ordering a query - further question - mysql

Further to a recently answered question, I have the following code:
SELECT q21coding, COUNT(q21coding) AS Count
FROM tresults_acme
WHERE q21 IS NOT NULL AND q21 <> ''
GROUP BY q21coding
ORDER BY IF(q21coding = 'Other', 1, 0) ASC, Count DESC
It brings back the following:
q21coding Count
Difficulty in navigating/finding content 53
Positive comments 28
Suggestions for improvement 14
Inappropriate content/use 13
Improve search facility 6
Include information about staff and teams 5
Content needs updating 4
Other 30
You'll notice that Other is now at the bottom - However is there a way of ensuring that Positive comments and Other is ALWAYS the bottom two (with other at the bottom) regardless of the Count size?
Thanks,
Homer

Actually there was no need to use IF(q21coding = 'Other', 1, 0) in your original query. In MySQL you can use any expression in the ORDER BY caluse and q21coding = 'Other' would have been enough:
... ORDER BY q21coding = 'Other', Count DESC
The q21coding = 'Other' expression will return 1 if true, or 0 if false. That will put rows with a q21coding = 'Other' at the bottom.
What you need to do to have 'Positive Comments' and 'Other' both at the bottom is something like this:
... ORDER BY q21coding = 'Other', q21coding = 'Positive comments', Count DESC
Basic test case:
CREATE TABLE my_table (id int, q21coding varchar(100), count int);
INSERT INTO my_table VALUES (1, 'Inappropriate content/use', 13);
INSERT INTO my_table VALUES (2, 'Other', 30);
INSERT INTO my_table VALUES (3, 'Difficulty in navigating/finding content', 53);
INSERT INTO my_table VALUES (4, 'Positive comments', 28);
INSERT INTO my_table VALUES (5, 'Improve search facility', 6);
INSERT INTO my_table VALUES (6, 'Content needs updating', 4);
INSERT INTO my_table VALUES (7, 'Suggestions for improvement', 14);
INSERT INTO my_table VALUES (8, 'Include information about staff and teams', 5);
Result:
SELECT q21coding, count
FROM my_table
ORDER BY q21coding = 'Other', q21coding = 'Positive comments', Count DESC;
+-------------------------------------------+-------+
| q21coding | count |
+-------------------------------------------+-------+
| Difficulty in navigating/finding content | 53 |
| Suggestions for improvement | 14 |
| Inappropriate content/use | 13 |
| Improve search facility | 6 |
| Include information about staff and teams | 5 |
| Content needs updating | 4 |
| Positive comments | 28 |
| Other | 30 |
+-------------------------------------------+-------+
8 rows in set (0.00 sec)

Related

Using the count function on third table in two table select statement in MariaDB

I just spent a few hours reading through the MariaDB docs and various questions here trying to figure out a SQL statement that did what I want. I'm definitely not an expert... eventually I did get the result I expected, but I have no idea why it works. I want to be sure I am actually getting the result I want, and it isn't just working for the few test cases I have thrown at it.
I have three tables guestbook, users, and user_likes. I am trying to write a SQL statement that will return the user name and first name from users, post content, post date, post id from guestbook, and a third column likes which is the total number of times that post id from guestbook appears in the user_likes table. It should only return posts which are of type standard and should order the rows by ascending post date.
Sample data:
CREATE TABLE users
(`user_id` int, `user_first` varchar(6), `user_last` varchar(7),
`user_email` varchar(26), `user_uname` varchar(6))
;
INSERT INTO users
(`user_id`, `user_first`, `user_last`, `user_email`, `user_uname`)
VALUES
(0, 'Bob', 'Abc', 'email#example.com', 'user1'),
(13, 'Larry', 'Abc', 'email#example.com', 'user2'),
(15, 'Noel', 'Abc', 'email#example.com', 'user3'),
(16, 'Kate', 'Abc', 'email#example.com', 'user4'),
(17, 'Walter', 'Sobchak', 'walter.sobchak#shabbus.com', 'Walter'),
(18, 'Jae', 'Abc', 'email#example.com', 'user5')
;
CREATE TABLE user_likes
(`user_id` int, `post_id` int, `like_id` int)
;
INSERT INTO user_likes
(`user_id`, `post_id`, `like_id`)
VALUES
(0, 23, 1),
(0, 41, 2),
(13, 23, 7)
;
CREATE TABLE guestbook
(`post_id` int, `user_id` int, `post_date` datetime,
`post_content` varchar(27), `post_type` varchar(8),
`post_level` int, `post_parent` varchar(4))
;
INSERT INTO guestbook
(`post_id`, `user_id`, `post_date`, `post_content`,
`post_type`, `post_level`, `post_parent`)
VALUES
(2, 0, '2018-12-15 20:32:40', 'test1', 'testing', 0, NULL),
(8, 0, '2018-12-16 14:06:40', 'test2', 'testing', 0, NULL),
(9, 13, '2018-12-16 15:47:55', 'test4', 'testing', 0, NULL),
(23, 0, '2018-12-25 17:59:46', 'Merry Christmas!', 'standard', 0, NULL),
(39, 16, '2018-12-26 00:28:04', 'Hello!', 'standard', 0, NULL),
(40, 15, '2019-01-27 00:46:12', 'Hello 2', 'standard', 0, NULL),
(41, 18, '2019-02-25 00:44:35', 'What are you doing?', 'standard', 0, NULL)
;
I tried a whole bunch of convoluted statements involving count and couldn't get what I wanted. Through what seems like dumb luck I stumbled into creating this statement which appears to be giving me what I want.
SELECT
u.user_uname, u.user_first, g.post_id, g.post_date,
g.post_content, count(user_likes.post_id) AS likes
FROM
users AS u, guestbook AS g
LEFT JOIN
user_likes on g.post_id=user_likes.post_id
WHERE
u.user_id=g.user_id AND g.post_type='standard'
GROUP BY
g.post_id
ORDER BY
g.post_date ASC;
Question:
Why does this count function appear to work?
The count function that I was able to get working is this, but it only works for hard coded post_id values.
SELECT COUNT(CASE post_id WHEN 23 THEN 1 ELSE null END) FROM user_likes;
When I try to match the post_id from guestbook table by changing to this I get an incorrect value which appears to be the whole table of user_likes.
SELECT COUNT(case when guestbook.post_id=user_likes.post_id then 1 else null end) FROM guestbook, user_likes;
Adding a GROUP BY guestbook.post_id to the end gets me closer, but now I need to figure out how to combine that with my original select statement.
+----------------------------------------------------------------------------+
| COUNT(case when guestbook.post_id=user_likes.post_id then 1 else null end) |
+----------------------------------------------------------------------------+
| 0 |
| 0 |
| 0 |
| 2 |
| 0 |
| 0 |
| 1 |
+----------------------------------------------------------------------------+
This is the output I want, which I am getting. I just don't trust that my statement is reliable or correct.
+------------+------------+---------+---------------------+---------------------+-------+
| user_uname | user_first | post_id | post_date | post_content | likes |
+------------+------------+---------+---------------------+---------------------+-------+
| user1 | Bob | 23 | 2018-12-25 17:59:46 | Merry Christmas! | 2 |
| user4 | Kate | 39 | 2018-12-26 00:28:04 | Hello! | 0 |
| user3 | Noel | 40 | 2019-01-27 00:46:12 | Hello 2 | 0 |
| user5 | Jae | 41 | 2019-02-25 00:44:35 | What are you doing? | 1 |
+------------+------------+---------+---------------------+---------------------+-------+
Fiddle of statement working: http://sqlfiddle.com/#!9/968656/1/0
JOIN + COUNT -- A query first combines the tables as directed by the JOIN and ON clauses. The result is put (at least logically) into a temporary table. Often this temp table has many more rows than any of the tables being JOINed.
Then the COUNT(..) is performed. It is counting the number of rows in that temp table. Maybe that count is exactly what you want, maybe it is a hugely inflated number.
count(user_likes.post_id) has the additional hiccup of not counting any rows where user_likes.post_id IS NULL. That is usually irrelevant, in which case, you should simply say COUNT(*).
Please don't use the commalist form for joining. Always use FROM a JOIN b ON ... where the ON clause says how tables a and b are related. If there is also some filtering, put that into the WHERE clause.
If the COUNT is too big, put aside the query you have developed and start over to develop a query that does exactly one thing -- compute the county. This query will probably use fewer tables.
Then build on that to get any other data you need. It may look something like
SELECT ...
FROM ( SELECT foo, COUNT(*) AS ct FROM t1 GROUP BY foo ) AS sub1
JOIN t2 ON t2.foo = sub1.foo
JOIN t3 ON ...
WHERE ...
Get that initial query that gets the right COUNT. Then, if needed, come back for more help.
As tried by Bryan
OK, I made a few changes.
SELECT u.user_uname, u.user_first,
g2.post_id, g2.post_content, g2.post_date,
sub.likes
FROM
(
SELECT g.post_id,
SUM(g.post_id = ul.post_id) AS likes
FROM guestbook AS g
JOIN user_likes AS ul
WHERE g.post_type = 'standard'
) AS sub
JOIN guestbook AS g2 ON sub.post_id = g2.post_id
JOIN users AS u ON u.user_id = g2.user_id;
Indexes:
guestbook: (post_type, post_id) -- for derived table
guestbook: (post_id) -- for outer SELECT
users: (user_id)
user_likes: (post_id)
Notes:
ORDER BY removed since it was useless in context.
COUNT..CASE changed to shorter SUM.
JOIN ON used
Since there is only one value coming from the derived table, this might work equally well:
SELECT u.user_uname, u.user_first,
g.post_id, g.post_content, g.post_date,
( SELECT COUNT(*)
FROM user_likes AS ul
WHERE g.post_id = ul.post_id
) AS likes
FROM guestbook AS g
JOIN users AS u USING(user_id);
WHERE g.post_type = 'standard'
This involved lots of changes; see if it looks 'right'. It is now a lot simpler.
Indexes are same as above.

How do I select the max(timestamp) from a relational mysql table fast

We are developing a ticket system and for the dashboard we want to show the tickets with it's latest status. We have two tables. The first one for the ticket itself and a second table for the individual edits.
The system is running already, but the performance for the dashboard is very bad (6 seconds for ~1300 tickets). At first we used a statemant which selected 'where timestamp = (select max(Timestamp))' for every ticket. In the second step we created a view which only includes the latest timestamp for every ticket, but we are not able to also include the correct status into this view.
So the main Problem might be, that we can't build a table in which for every ticket the lastest ins_date and also the latest status is selected.
Simplyfied database looks like:
CREATE TABLE `ticket` (
`id` int(10) NOT NULL,
`betreff` varchar(100) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `ticket_relation` (
`id` int(11) NOT NULL,
`ticket` int(10) NOT NULL,
`info` varchar(10000) DEFAULT NULL,
`status` int(1) NOT NULL DEFAULT '0',
`ins_date` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`ins_user` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `ticket` (`id`, `betreff`) VALUES
(1, 'Technische Frage'),
(2, 'Ticket 2'),
(3, 'Weitere Fragen');
INSERT INTO `ticket_relation` (`id`, `ticket`, `info`, `status`, `ins_date`, `ins_user`) VALUES
(1, 1, 'Betreff 1', 0, '2019-05-28 11:02:18', 123),
(2, 1, 'Betreff 2', 3, '2019-05-28 12:07:36', 123),
(3, 2, 'Betreff 3', 0, '2019-05-29 06:49:32', 123),
(4, 3, 'Betreff 4', 1, '2019-05-29 07:44:07', 123),
(5, 2, 'Betreff 5', 1, '2019-05-29 07:49:32', 123),
(6, 2, 'Betreff 6', 3, '2019-05-29 08:49:32', 123),
(7, 3, 'Betreff 7', 2, '2019-05-29 09:49:32', 123),
(8, 2, 'Betreff 8', 1, '2019-05-29 10:49:32', 123),
(9, 3, 'Betreff 9', 2, '2019-05-29 11:49:32', 123),
(10, 3, 'Betreff 10', 3, '2019-05-29 12:49:32', 123);
I have created a SQL Fiddle: http://sqlfiddle.com/#!9/a873b6/3
The first three Statements are attempts that won't work correct or way too slow. The last one is the key I think, but I don't understand, why this gets the status wrong.
The attempt to create the table with latest ins_date AND status for each ticket:
SELECT
ticket, status, MAX(ins_date) as max_date
FROM
ticket_relation
GROUP BY
ticket
ORDER BY
ins_date DESC;
This query gets the correct (latest) ins_date for every ticket, but not the latest status:
+--------+--------+----------------------+
| ticket | status | max_date |
+--------+--------+----------------------+
| 3 | 1 | 2019-05-29T12:49:32Z |
+--------+--------+----------------------+
| 2 | 0 | 2019-05-29T10:49:32Z |
+--------+--------+----------------------+
| 1 | 0 | 2019-05-28T12:07:36Z |
+--------+--------+----------------------+
Expected output would be this:
+--------+--------+----------------------+
| ticket | status | max_date |
+--------+--------+----------------------+
| 3 | 3 | 2019-05-29T12:49:32Z |
+--------+--------+----------------------+
| 2 | 1 | 2019-05-29T10:49:32Z |
+--------+--------+----------------------+
| 1 | 3 | 2019-05-28T12:07:36Z |
+--------+--------+----------------------+
Is there a efficient way to select the latest timestamp and status for every ticket in the tiket-table?
Other approach is to think filtering not GROUPing..
Query
SELECT
ticket_relation_1.ticket
, ticket_relation_1.status
, ticket_relation_1.ins_date
FROM
ticket_relation AS ticket_relation_1
LEFT JOIN
ticket_relation AS ticket_relation_2
ON
ticket_relation_1.ticket = ticket_relation_2.ticket
AND
ticket_relation_1.ins_date < ticket_relation_2.ins_date
WHERE
ticket_relation_2.id IS NULL
ORDER BY
ticket_relation_1.id DESC
Result
| ticket | status | ins_date |
| ------ | ------ | ------------------- |
| 3 | 3 | 2019-05-29 12:49:32 |
| 2 | 1 | 2019-05-29 10:49:32 |
| 1 | 3 | 2019-05-28 12:07:36 |
see demo
This query would require a index KEY(ticket, ins_date, id) to get max performance..
One solution would be to use a subquery to compute the latest insert date for each ticket, and then to join the results with the original table, like:
SELECT t.ticket, t.status, t.ins_date
FROM ticket_relation t
INNER JOIN (
SELECT ticket, max(ins_date) max_ins_date
FROM ticket_relation
GROUP BY ticket
) x ON t.ticket = x.ticket AND t.ins_date = x.max_ins_date
For better performance with this query, you want an index on (ticket, ins_date).
Anoter option would be to use a NOT EXISTS condition to ensure that only the latest record is selected, like:
SELECT t.ticket, t.status, t.ins_date
FROM ticket_relation t
WHERE NOT EXISTS (
SELECT 1
FROM ticket_relation t1
WHERE t1.ticket = t.ticket AND t1.ins_date > t.ins_date)
)
NB: when dealing with GROUP BY, all non-aggregated columns must appear in the GROUP BY clause. Else, you will get either an error or unprectictable results (depending on whether server option ONLY_FULL_GROUP_BY is, respectively, enabled or disabled).
If you are able to upgrade to a recent version of mysql (8.0), then window functions can be used to simplify the query and possibly increase its performance, like:
SELECT ticket, status, ins_date
FROM (
SELECT
ticket,
status,
ins_date,
row_number() over(partition by ticket order by ins_date desc) rn
FROM ticket_relation
) x WHERE rn = 1
You can try below query -
SELECT
ticket, status, ins_date as max_date
FROM ticket_relation a
where ins_date in (select max(ins_date) from ticket_relation b where a.ticket=b.ticket)

MySQL select result row numbers

I have a table which contains users and some scores associated with them. something like this:
uid | username | score | time_spent
1 | test | 25 | 12
then I am sorting this table based on score and time_spent. As a result I get some kind of highscores table.
what I want to do is to assign row numbers to this sorted table to have the information about the specific users place in the highscores table and then select a specific user from this sorted table with row number.
I tried to do it like this:
SET #row_number = 0;
SELECT * FROM
(SELECT uid, username, score, time_spent, #row_number:=#row_number+1 AS row_number,
SUM(score) AS points_awarded,
MIN(time_spent) AS time
FROM results
GROUP BY uid
ORDER BY points_awarded DESC, time ASC) as t
WHERE t.uid=1
but this does not work correctly. The result row I get has always the last number of total records.
You must have the #row_number in the outer query:
SET #row_number = 0;
SELECT
t.*, #row_number:=#row_number+1 AS row_number
FROM (
SELECT
uid, username,
SUM(score) AS points_awarded,
MIN(time_spent) AS time
FROM results
GROUP BY uid, username
) t
ORDER BY t.points_awarded DESC, t.time ASC
See the demo.
INSERT INTO results
(`uid`, `username`, `score`, `time_spent`)
VALUES
('1', 'test1', '25', '12'),
('1', 'test1', '20', '13'),
('1', 'test1', '20', '11'),
('2', 'test2', '12', '17'),
('2', 'test2', '29', '16'),
('2', 'test2', '25', '15'),
('3', 'test3', '45', '18'),
('3', 'test3', '15', '69');
Results:
| uid | username | points_awarded | time | row_number |
| --- | -------- | -------------- | ---- | ---------- |
| 2 | test2 | 66 | 15 | 1 |
| 1 | test1 | 65 | 11 | 2 |
| 3 | test3 | 60 | 18 | 3 |
If you only want the position of a single user at a time, the following should work:
-- get best score and time for the user
SELECT score, time_spent
INTO #u_score, #u_time
FROM results
WHERE uid = 2
ORDER BY score DESC, time_spent ASC
LIMIT 1;
SELECT *, -- below: count "better" distinct users
(SELECT COUNT(DISTINCT uid)+1 FROM results WHERE score > #u_score
OR (score = #u_score AND time_spent < #u_time)) AS pos
FROM results
WHERE uid = 2
AND score = #u_score
AND time_spent = #u_time;
EDIT: The request below should give you the complete "leaderboard", which you can then use as subquery from to get a specific user, like you did in your example:
SET #row_number = 0;
SELECT t.*, #row_number:=#row_number+1 AS row_number
FROM (
SELECT r1.*
FROM results r1
LEFT JOIN results r2
ON r1.uid = r2.uid
AND (r1.score < r2.score
OR (r1.score = r2.score
AND r1.time_spent > r2.time_spent))
WHERE r2.uid IS NULL
ORDER BY r1.score DESC, r1.time_spent ASC
) AS t
EDIT2: I assumed each row in your table was a separate score "attempt" and that you wanted to take into consideration the best attempt of each user, but it looks like you want the sum of these scores, so forpas's answer is the one you want :)

Custom query with group by and then count

I am using events.I would like to know how to calculate sum in event or using single query
http://sqlfiddle.com/#!9/ad6d1c/1
DDL for question:
CREATE TABLE `table1` (
`id` int(11) NOT NULL,
`group_id` int(11) NOT NULL DEFAULT '0',
`in_use` tinyint(1) NOT NULL DEFAULT '1' COMMENT '0->in_use,1->not_in_use',
`auto_assign` tinyint(1) NOT NULL DEFAULT '0' COMMENT '0->Yes,1->No'
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `table1`
ADD PRIMARY KEY (`id`);
ALTER TABLE `table1`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;
INSERT INTO `table1` (`id`, `group_id`, `in_use`, `auto_assign`) VALUES
(1, 3, 1, 0),(2, 2, 0,1),(3, 1, 1, 1),(4, 3, 1, 0),(5, 3, 0, 0),(6, 3, 0, 1),
(7, 3, 1, 0),(8, 3, 0, 1),(9, 3, 0, 1),(10, 3, 0, 1),(11, 3, 0, 1),(12, 3, 1, 1),
(13, 3, 1, 0),(14, 3, 0, 0),(15, 3, 0, 0),(16, 3, 0, 0),(17, 3, 0, 0),(18, 3, 1, 1),
(19, 3, 0, 0),(20, 3, 0, 0)
Expected Output :
| count | in_use | auto_assign | sum | check_count |
|-------|--------|-------------|------|------------ |
| 7 | 0 | 0 | 11 | 5 |
| 5 | 0 | 1 | 07 | 3 |
| 4 | 1 | 0 | 11 | 5 |
| 2 | 1 | 1 | 07 | 3 |
Here we can see that auto_assign=0 have total 11 count(7+4) and
auto_assign=1 have 7 count(5+2) this count should be stored into new column sum.
check_count column is percentage value of sum column.Percentage will be predefined.
Lets take 50%, So count 11(sum column value) ->50% = 5.5 = ROUND(5.5) == 5(In integer). Same way count 7(sum column value)->50% = 3.5 =ROUND(3.5)=3(Integer)
Here 5 > 4(auto_assign=0 and in_use=1 ).So have to insert record into another table(table2). if not then not.
Same way, If 3 >2 then also need to insert record into another table(table2).if not then not.
Note : This logic I would like to implement in event
This is bit complicated, but please suggest me how to do this in event.
Detail clarification :
here percentage_Value is 5 for auto_assign =0.But auto_assign=0 and in_use=1 have count is 4 which less than 5 ,then have to insert record into table 2.
suppose,if we get count is 6 for auto_assign=0 and in_use=1 ,Then no need to insert record into table2.
Same way,
here percentage_Value is 3 for auto_assign =1.But auto_assign=1 and in_use=1 have count is 2 which less than 3 ,then have to insert record into table 2.
suppose,if we get count is 4 for auto_assign=1 and in_use=1 ,Then no need to insert record into table2.
Insert query into table2:
Insert into table2(cli_group_id,auto_assign,percentage_value,result_value) values(3,0,5,4)
DEMO Fiddle
Break the problem down: we need a count of the records by auto_Assigns; so we generate a derived table (B) with that value and join back to your base table on auto_Assign. This then gives us the column we need for some and we use the truncate function and a division model to get the check_count
SELECT count(*), in_use, A.Auto_Assign, B.SumC, truncate(B.SumC/2,0) as check_Count
FROM table1 A
INNER JOIN (Select Auto_Assign, count(*) sumC
from table1
where Group_ID = 3
Group by Auto_Assign) B
on A.Auto_Assign = B.Auto_Assign
WHERE GROUP_ID = 3
Group by in_use, A.Auto_Assign
we can eliminate the double where clause by joining on it:
SELECT count(*), in_use, A.Auto_Assign, B.SumC, truncate(B.SumC/2,0) as check_Count
FROM table1 A
INNER JOIN (Select Auto_Assign, count(*) sumC, Group_ID
from table1
where Group_ID = 3
Group by Auto_Assign, Group_ID) B
on A.Auto_Assign = B.Auto_Assign
and A.Group_ID = B.Group_ID
Group by in_use, A.Auto_Assign
I'd need clarification on the rest of the question: I'm not sure what 5 > 4 your'e looking at and I see no 3 other than the check count but that's not "the same way" so I'm not sure what you're after.
Here 5 > 4(auto_assign=0 and in_use=1 ).So have to insert record into another table(table2). if not then not.
Same way, If 3 >2 then also need to insert record into another table(table2).if not then not.
Note : This logic I would like to implement in event
This is bit complicated, but please suggest me how to do this in event.
So to create the event: DOCS
Which results in:
CREATE EVENT myevent
ON SCHEDULE AT CURRENT_TIMESTAMP + INTERVAL 6 Minutes
DO
INSERT INTO table2
SELECT count(*) as mCount
, in_use
, A.Auto_Assign
, B.SumC, truncate(B.SumC/2,0) as check_Count
FROM table1 A
INNER JOIN (SELECT Auto_Assign, count(*) sumC, Group_ID
FROM table1
WHERE Group_ID = 3
GROUP BY Auto_Assign, Group_ID) B
ON A.Auto_Assign = B.Auto_Assign
AND A.Group_ID = B.Group_ID
GROUP BY in_use, A.Auto_Assign

(mysql) Select 50 highest rated items, with at most one item coming from each user

I'm not sure how to go about doing this efficiently in MySQL and would appreciate any help.
The goal is to select 50 of the top-selling items, with at most one item from each user. I'm used to doing this with either CTE's or DISTINCT ON, but of course that's not an option in MySQL. I'm hoping for a single-query solution, and I'd like to avoid using stored procedures.
The basic schema is a table of items posted by users, and a table of sales with a field determining the score of that particular sale.
CREATE TABLE items (
item_id INT PRIMARY KEY,
user_id INT NOT NULL
)
CREATE TABLE sales (
item_id INT NOT NULL,
score INT NOT NULL
)
-- Create some sample data
INSERT INTO items VALUES (1, 1), (2, 1), (3, 1), (4, 2), (5, 2), (6, 3), (7, 3);
INSERT INTO sales VALUES (1, 1), (1, 1), (2, 1), (3, 2), (3, 1), (4, 3), (4, 2), (5, 2), (6, 1), (6, 1), (6, 1), (7, 2);
The result of the query against this sample data should be
+---------+---------+-------------+
| user_id | item_id | total_score |
+---------+---------+-------------+
| 2 | 4 | 5 |
| 1 | 3 | 3 |
| 3 | 6 | 3 |
+---------+---------+-------------+
Here's the PostgreSQL solution:
SELECT DISTIN ON (items.user_id)
items.user_id,
items.item_id,
SUM(sales.score) AS total_score
FROM items
JOIN sales ON (sales.item_id = items.item_id)
GROUP BY items.item_id
ORDER BY total_score DESC
LIMIT 50
Here's the MySQL solution I've come up with, but it's quite ugly. I tried doing essentially the same thing using a temporary table, but in the process learned that MySQL doesn't allow joining to a temporary table multiple times in the same query.
SELECT items_scores.user_id, items_scores.item_id, items_scores.total_score
FROM (
SELECT items.user_id, items.item_id, SUM(sales.score) as total_score
FROM items
JOIN sales ON
sales.item_id = items.item_id
GROUP BY items.item_id
) AS items_scores
WHERE items_scores.total_score =
(
SELECT MAX(t.total_score)
FROM (
SELECT items.user_id, items.item_id, SUM(sales.score) as total_score
FROM items
JOIN sales ON
sales.item_id = items.item_id
GROUP BY items.item_id
) AS t
WHERE t.user_id = items_scores.user_id
)
ORDER BY items_scores.total_score DESC
MySQL query for it:
select user, item, total_score
from (
select sum(sales.score) as total_score, items.user_id as user, items.item_id as item
from sales
inner join items on sales.item_id = items.item_id
group by item,user
order by total_score desc) as t
group by user limit 50;
Output:
+------+------+-------------+
| user | item | total_score |
+------+------+-------------+
| 1 | 3 | 3 |
| 2 | 4 | 5 |
| 3 | 6 | 3 |
+------+------+-------------+
3 rows in set (0.00 sec)
Some explanation
MySQL documentation says:
However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses.
In our subquery... the nonagregated columns are user_id and item_id , we expect them to be same for every group that we are doing the sum on. Also we are not doing any order by that can influence the agregation..we want all the values of the group to be summed up. Finally we are sorting the output and saving it as a derived table.
Finally we run a select query on this derived table where we do the Group By user .. and Limit the output to 50