Top 'n' results for each keyword - mysql

I have a query to get the top 'n' users who commented on a specific keyword,
SELECT `user` , COUNT( * ) AS magnitude
FROM `results`
WHERE `keyword` = "economy"
GROUP BY `user`
ORDER BY magnitude DESC
LIMIT 5
I have approx 6000 keywords, and would like to run this query to get me the top 'n' users for each and every keyword we have data for. Assistance appreciated.

Since you haven't given the schema for results, I'll assume it's this or very similar (maybe extra columns):
create table results (
id int primary key,
user int,
foreign key (user) references <some_other_table>(id),
keyword varchar(<30>)
);
Step 1: aggregate by keyword/user as in your example query, but for all keywords:
create view user_keyword as (
select
keyword,
user,
count(*) as magnitude
from results
group by keyword, user
);
Step 2: rank each user within each keyword group (note the use of the subquery to rank the rows):
create view keyword_user_ranked as (
select
keyword,
user,
magnitude,
(select count(*)
from user_keyword
where l.keyword = keyword and magnitude >= l.magnitude
) as rank
from
user_keyword l
);
Step 3: select only the rows where the rank is less than some number:
select *
from keyword_user_ranked
where rank <= 3;
Example:
Base data used:
mysql> select * from results;
+----+------+---------+
| id | user | keyword |
+----+------+---------+
| 1 | 1 | mysql |
| 2 | 1 | mysql |
| 3 | 2 | mysql |
| 4 | 1 | query |
| 5 | 2 | query |
| 6 | 2 | query |
| 7 | 2 | query |
| 8 | 1 | table |
| 9 | 2 | table |
| 10 | 1 | table |
| 11 | 3 | table |
| 12 | 3 | mysql |
| 13 | 3 | query |
| 14 | 2 | mysql |
| 15 | 1 | mysql |
| 16 | 1 | mysql |
| 17 | 3 | query |
| 18 | 4 | mysql |
| 19 | 4 | mysql |
| 20 | 5 | mysql |
+----+------+---------+
Grouped by keyword and user:
mysql> select * from user_keyword order by keyword, magnitude desc;
+---------+------+-----------+
| keyword | user | magnitude |
+---------+------+-----------+
| mysql | 1 | 4 |
| mysql | 2 | 2 |
| mysql | 4 | 2 |
| mysql | 3 | 1 |
| mysql | 5 | 1 |
| query | 2 | 3 |
| query | 3 | 2 |
| query | 1 | 1 |
| table | 1 | 2 |
| table | 2 | 1 |
| table | 3 | 1 |
+---------+------+-----------+
Users ranked within keywords:
mysql> select * from keyword_user_ranked order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql | 1 | 4 | 1 |
| mysql | 2 | 2 | 3 |
| mysql | 4 | 2 | 3 |
| mysql | 3 | 1 | 5 |
| mysql | 5 | 1 | 5 |
| query | 2 | 3 | 1 |
| query | 3 | 2 | 2 |
| query | 1 | 1 | 3 |
| table | 1 | 2 | 1 |
| table | 3 | 1 | 3 |
| table | 2 | 1 | 3 |
+---------+------+-----------+------+
Only top 2 from each keyword:
mysql> select * from keyword_user_ranked where rank <= 2 order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql | 1 | 4 | 1 |
| query | 2 | 3 | 1 |
| query | 3 | 2 | 2 |
| table | 1 | 2 | 1 |
+---------+------+-----------+------+
Note that when there are ties -- see users 2 and 4 for keyword "mysql" in the examples -- all parties in the tie get the "last" rank, i.e. if the 2nd and 3rd are tied, both are assigned rank 3.
Performance: adding an index to the keyword and user columns will help. I have a table being queried in a similar way with 4000 and 1300 distinct values for the two columns (in a 600000-row table). You can add the index like this:
alter table results add index keyword_user (keyword, user);
In my case, query time dropped from about 6 seconds to about 2 seconds.

You can use a pattern like this (from Within-group quotas (Top N per group)):
SELECT tmp.ID, tmp.entrydate
FROM (
SELECT
ID, entrydate,
IF( #prev <> ID, #rownum := 1, #rownum := #rownum+1 ) AS rank,
#prev := ID
FROM test t
JOIN (SELECT #rownum := NULL, #prev := 0) AS r
ORDER BY t.ID
) AS tmp
WHERE tmp.rank <= 2
ORDER BY ID, entrydate;
+------+------------+
| ID | entrydate |
+------+------------+
| 1 | 2007-05-01 |
| 1 | 2007-05-02 |
| 2 | 2007-06-03 |
| 2 | 2007-06-04 |
| 3 | 2007-07-01 |
| 3 | 2007-07-02 |
+------+------------+

Related

How to query MIN value of MAX subquery with two distinct columns?

I have a table like this:
+---------------+--------------+------+-----+----------+
| Field | Type | Null | Key | Default |
+---------------+--------------+------+-----+----------+
| id | smallint(6) | NO | PRI | NULL |
| Book | tinyint(4) | NO | | NULL |
| Chapter | smallint(6) | NO | | NULL |
| Paragraph | smallint(6) | NO | | NULL |
| Text | text | YES | | NULL |
| RevisionNum | mediumint(9) | NO | PRI | NULL |
+---------------+--------------+------+-----+----------+
mysql> select id,Book,Chapter,Paragraph,RevisionNum FROM MyTable ORDER BY id LIMIT 11;
+-----+------+---------+-----------+-------------+
| id | Book | Chapter | Paragraph | RevisionNum |
+-----+------+---------+-----------+-------------+
| 1 | 1 | 1 | 1 | 0 |
| 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 2 |
| 2 | 1 | 2 | 2 | 0 |
| 2 | 1 | 2 | 2 | 1 |
| 2 | 1 | 2 | 2 | 2 |
| 2 | 1 | 2 | 2 | 3 |
| 3 | 1 | 2 | 3 | 0 |
| 4 | 1 | 2 | 4 | 0 |
| 4 | 1 | 2 | 4 | 1 |
| 5 | 1 | 3 | 5 | 0 |
+-----+------+---------+-----------+-------------+
To find a book or chapter which has no unrevised paragraph,
I wish to query either the minimum value of the maximums of
all the distinct id's for that chapter or book, or else in
some fashion determine that no id remains unedited (with a
MAX(RevisionNum) of zero).
Most of my attempts to date have ended in errors like this one:
SELECT DISTINCT Book,RecordNum FROM MyTable
-> WHERE 0 < ALL (SELECT DISTINCT RecordNum,MAX(RevisionNum)
FROM MyTable
WHERE MAX(RevisionNum) > 0);
ERROR 1111 (HY000): Invalid use of group function
...And I wasn't using the "GROUP BY" function at all!
The following query produces results, but simply
gives ALL id's, and does not actually show a unique
set of Book records, as requested. How could this happen?
SELECT DISTINCT Book,id,MAX(RevisionNum) FROM MyTable GROUP BY id LIMIT 5;
+------+----+------------------+
| Book | id | MAX(RevisionNum) |
+------+----+------------------+
| 1 | 1 | 30 |
| 1 | 2 | 16 |
| 1 | 3 | 15 |
| 1 | 4 | 10 |
| 1 | 5 | 9 |
+------+----+------------------+
What would the correct query be to give results more like this:
+------+-----+-----------------------+
| Book | id | MIN(MAX(RevisionNum)) |
+------+-----+-----------------------+
| 1 | 5 | 3 |
| 2 | 17 | 1 |
| 3 | 33 | 2 |
| 4 | 147 | 0 |
| 5 | 225 | 2 |
+------+-----+-----------------------+
Are you looking for two levels of aggregation?
select id, book, min(max_revisionnum)
from (select id, book, chapter, paragraph, max(revisionnum) as max_revisionnum
from mytable
group by id, book, chapter, paragraph
) t
group by id, book;
EDIT:
Based on your comment, you can use:
select *
from (select id, book, chapter, paragraph, max(revisionnum) as max_revisionnum,
row_number() over (partition by book order by max(revisionnum) desc) as seqnum
from mytable
group by id, book, chapter, paragraph
) t
where seqnum = 1;
Here is a db<>fiddle.
In older versions of MariaDB, you can use a correlated subquery:
select t.*
from mytable t
where (id, book, chapter, paragraph, revisionnum) = (select t2.id, t2.book, t2.chapter, t2.paragraph, t2.revisionnum
from mytable t2
where t2.book = t.book
order by t2.revisionnum desc
limit 1
);
For this query, try adding an index on (book, revisionnum desc).

MySQL - Recursive Reorder Algorithm / UPDATE with Variable Incrementation

Sorry for the kind of meaningless title, but I couldn't come up with a more fitting one.
I have a MySQL table, which looks like this:
SELECT * FROM `table`
+----+-----------+----------+-------+
| id | dimension | order_by | value |
+----+-----------+----------+-------+
| 1 | 1 | 1 | 1st |
| 2 | 1 | 100 | 3rd |
| 3 | 2 | 300 | 5th |
| 4 | 3 | 999 | 6th |
| 5 | 1 | 2 | 2nd |
| 6 | 2 | 1 | 4th |
+----+-----------+----------+-------+
I am listing all entries ordered by dimension (first) and order_by (second), which looks like this:
SELECT * FROM `table` ORDER BY `dimension`, `order_by`
+----+-----------+----------+-------+
| id | dimension | order_by | value |
+----+-----------+----------+-------+
| 1 | 1 | 1 | 1st |
| 5 | 1 | 2 | 2nd |
| 2 | 1 | 100 | 3rd |
| 6 | 2 | 1 | 4th |
| 3 | 2 | 300 | 5th |
| 4 | 3 | 999 | 6th |
+----+-----------+----------+-------+
Now I'd like to write a function, that rearranges the order_by, if possible with just one update query, to make it look that way:
SELECT * FROM `table` ORDER BY `dimension`, `order_by`
+----+-----------+----------+-------+
| id | dimension | order_by | value |
+----+-----------+----------+-------+
| 1 | 1 | 1 | 1st |
| 5 | 1 | 2 | 2nd |
| 2 | 1 | 3 | 3rd |
| 6 | 2 | 1 | 4th |
| 3 | 2 | 2 | 5th |
| 4 | 3 | 1 | 6th |
+----+-----------+----------+-------+
What I got so far (which, unfortunately, doesn't start recounting for each dimension):
UPDATE `table` AS `l`
JOIN (SELECT #i=1 FROM `table`) AS `i`
SET `order_by` = #i:=i
Now, my question would be: Is it possible to do it with just one UPDATE query?
You have to introduce another variable holding the value of the previous row.
UPDATE Table1 t
INNER JOIN (
SELECT
id, /*your primary key I assume*/
#new_ob:=if(#prev != dimension, 1, #new_ob + 1) as new_ob,
#prev := dimension /*In this line, the value of the current row is assigned. In the previous line, the variable still holds the value of the previous row*/
FROM
Table1
, (SELECT #prev := null, #new_ob := 0) var_init_subquery
ORDER BY dimension, order_by
) st ON t.id = st.id
SET t.order_by = st.new_ob;
see it working live in an sqlfiddle

MYSQL - how do i select no more than x rows max with the same field value y?

this question is a bit tricky to formulate, so probably has been asked before.
i am selecting rows from a table of interrelating data. i only want a maximum of n rows which have the same value x of some field/column in the table to show up in my set. there is a global limit, in essence i always want the query to return the same amount of rows, with no more than n rows sharing value x. how do i do this?
here's an example of the data (dots are supposed to indicate that this table is large, let's say 20000 rows of data):
some_table
+----+----------+-------------+------------+
| id | some_id | some_column | another_id |
+----+----------+-------------+------------+
| 1 | 10 | value | 8 |
| 2 | 10 | value | 5 |
| 3 | 10 | value | 2 |
| 4 | 20 | value | 3 |
| 5 | 30 | value | 9 |
| 6 | 30 | value | 1 |
| 7 | 30 | value | 4 |
| 8 | 30 | value | 6 |
| 9 | 30 | value | 7 |
| 10 | 40 | value | 10 |
| .. | ... | ... | ... |
| .. | ... | ... | ... |
| .. | ... | ... | ... |
| .. | ... | ... | ... |
+----+----------+-------------+------------+
now here's my select:
select * from some_table where some_column="value" order by another_id limit 6
but instead of returning rows with another_id = 1 thru 6 i want to get no more than 2 rows with the same value of some_id. in other words, i'd like to get:
result set
+----+----------+-------------+------------+
| id | some_id | some_column | another_id |
+----+----------+-------------+------------+
| 6 | 30 | value | 1 |
| 3 | 10 | value | 2 |
| 1 | 10 | value | 3 |
| 7 | 30 | value | 4 |
| 4 | 20 | value | 8 |
| 10 | 40 | value | 10 |
+----+----------+-------------+------------+
note that the results are ordered by another_id, but there are no more than 2 results with the same value of some_id.
how can i best (meaning preferably in one query and reasonably fast) get there? thanks!
select id, some_id, some_column, another_id from (
select
t.*,
#rn := if(#prev = some_id, #rn + 1, 1) as rownumber,
#prev := some_id
from some_table t
, (select #prev := null, #rn := 0) var_init
where some_column="value"
order by some_id, id
) sq where rownumber <= 2
order by another_id;
see it working live in an sqlfiddle
First we order by some_id, id in the subquery to do the right calculations. Then we order by another_id in the outer query to have correct ordering.

SQL, difficult fetching data query

Suppose I have such a table:
+-----+---------+-------+
| ID | TIME | DAY |
+-----+---------+-------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |
| 1 | 1 | 2 |
| 2 | 2 | 2 |
| 3 | 3 | 2 |
| 1 | 1 | 3 |
| 2 | 2 | 3 |
| 3 | 3 | 3 |
| 1 | 1 | 4 |
| 2 | 2 | 4 |
| 3 | 3 | 4 |
| 1 | 1 | 5 |
| 2 | 2 | 5 |
| 3 | 3 | 5 |
+-----+---------+-------+
I want to fetch a table which represents 2 IDs which got the largest sum of TIME within the last 3 days (means from 3 to 5 in a DAY column)
So the correct result would be:
+-----+---------+
| ID | SUM |
+-----+---------+
| 3 | 9 |
| 2 | 6 |
+-----+---------+
The original table is much larger and more complex. So i need a generic approach.
Thanks in advance.
And so I just learned that MySQL used LIMIT instead of TOP...
fiddle
CREATE TABLE tbl (ID INT,tm INT,dy INT);
INSERT INTO tbl (id, tm, dy) VALUES
(1,1,1)
,(2,2,1)
,(3,3,1)
,(1,1,2)
,(1,1,1)
SELECT ID
,SUM(SumTimeForDay) SumTimeFromLastThreeDays
FROM (SELECT ID
,SUM(tm) SumTimeForDay
FROM tbl
GROUP BY ID, dy
HAVING dy > MAX(dy) -3) a
GROUP BY id
ORDER BY SUM(SumTimeForDay) DESC
LIMIT 2
select t1.`id`, sum(t1.`time`) as `sum`
from `table` t1
inner join ( select distinct `day` from `table` order by `day` desc limit 3 ) t2
on t2.`da`y = t1.`day`
group by t1.`id`
order by sum(t1.`time`) desc
limit 2

MYSQL - how to string comparisons and query?

+--------------------+---------------+------+-----+---------+-------+
| ID | GKEY |GOODS | PRI | COUNTRY | Extra |
+--------------------+---------------+------+-----+---------+-------+
| 1 | BOOK-1 | 1 | 10 | | |
| 2 | PHONE-1 | 2 | 12 | | |
| 3 | BOOK-2 | 1 | 13 | | |
| 4 | BOOK-3 | 1 | 10 | | |
| 5 | PHONE-2 | 2 | 10 | | |
| 6 | PHONE-3 | 2 | 20 | | |
| 7 | BOOK-10 | 2 | 20 | | |
| 8 | BOOK-11 | 2 | 20 | | |
| 9 | BOOK-20 | 2 | 20 | | |
| 10 | BOOK-21 | 2 | 20 | | |
| 11 | PHONE-30 | 2 | 20 | | |
+--------------------+---------------+------+-----+---------+-------+
Above is my table. I want to get all records which GKEY > BOOK-2, Who can tell me the expression with mysql?
Using " WHERE GKEY>'BOOK-2' " Cannot get the correct results.
How about (something like):
(this is MSSQL - I guess it will be similar in MySQL)
select
*
from
(
select
*,
index = convert(int,replace(GKEY,'BOOK-',''))
from table
where
GKEY like 'BOOK%'
) sub
where
sub.index > 2
By way of explanation: The inner query basically recreates your table, but only for BOOK rows, and with an extra column containing the index in the right data type to make a greater than comparison work numerically.
Alternatively something like this:
select
*
from table
where
(
case
when GKEY like 'BOOK%' then
case when convert(int,replace(GKEY,'BOOK-','')) > 2 then 1
else 0
end
else 0
end
) = 1
Essentially the problem is that you need to check for BOOK before you turn the index into a numberic, as the other values of GKEY would create an error (without doing some clunky string handling).
SELECT * FROM `table` AS `t1` WHERE `t1`.`id` > (SELECT `id` FROM `table` AS `t2` WHERE `t2`.`GKEY`='BOOK-2' LIMIT 1)