How to query MIN value of MAX subquery with two distinct columns? - mysql

I have a table like this:
+---------------+--------------+------+-----+----------+
| Field | Type | Null | Key | Default |
+---------------+--------------+------+-----+----------+
| id | smallint(6) | NO | PRI | NULL |
| Book | tinyint(4) | NO | | NULL |
| Chapter | smallint(6) | NO | | NULL |
| Paragraph | smallint(6) | NO | | NULL |
| Text | text | YES | | NULL |
| RevisionNum | mediumint(9) | NO | PRI | NULL |
+---------------+--------------+------+-----+----------+
mysql> select id,Book,Chapter,Paragraph,RevisionNum FROM MyTable ORDER BY id LIMIT 11;
+-----+------+---------+-----------+-------------+
| id | Book | Chapter | Paragraph | RevisionNum |
+-----+------+---------+-----------+-------------+
| 1 | 1 | 1 | 1 | 0 |
| 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 2 |
| 2 | 1 | 2 | 2 | 0 |
| 2 | 1 | 2 | 2 | 1 |
| 2 | 1 | 2 | 2 | 2 |
| 2 | 1 | 2 | 2 | 3 |
| 3 | 1 | 2 | 3 | 0 |
| 4 | 1 | 2 | 4 | 0 |
| 4 | 1 | 2 | 4 | 1 |
| 5 | 1 | 3 | 5 | 0 |
+-----+------+---------+-----------+-------------+
To find a book or chapter which has no unrevised paragraph,
I wish to query either the minimum value of the maximums of
all the distinct id's for that chapter or book, or else in
some fashion determine that no id remains unedited (with a
MAX(RevisionNum) of zero).
Most of my attempts to date have ended in errors like this one:
SELECT DISTINCT Book,RecordNum FROM MyTable
-> WHERE 0 < ALL (SELECT DISTINCT RecordNum,MAX(RevisionNum)
FROM MyTable
WHERE MAX(RevisionNum) > 0);
ERROR 1111 (HY000): Invalid use of group function
...And I wasn't using the "GROUP BY" function at all!
The following query produces results, but simply
gives ALL id's, and does not actually show a unique
set of Book records, as requested. How could this happen?
SELECT DISTINCT Book,id,MAX(RevisionNum) FROM MyTable GROUP BY id LIMIT 5;
+------+----+------------------+
| Book | id | MAX(RevisionNum) |
+------+----+------------------+
| 1 | 1 | 30 |
| 1 | 2 | 16 |
| 1 | 3 | 15 |
| 1 | 4 | 10 |
| 1 | 5 | 9 |
+------+----+------------------+
What would the correct query be to give results more like this:
+------+-----+-----------------------+
| Book | id | MIN(MAX(RevisionNum)) |
+------+-----+-----------------------+
| 1 | 5 | 3 |
| 2 | 17 | 1 |
| 3 | 33 | 2 |
| 4 | 147 | 0 |
| 5 | 225 | 2 |
+------+-----+-----------------------+

Are you looking for two levels of aggregation?
select id, book, min(max_revisionnum)
from (select id, book, chapter, paragraph, max(revisionnum) as max_revisionnum
from mytable
group by id, book, chapter, paragraph
) t
group by id, book;
EDIT:
Based on your comment, you can use:
select *
from (select id, book, chapter, paragraph, max(revisionnum) as max_revisionnum,
row_number() over (partition by book order by max(revisionnum) desc) as seqnum
from mytable
group by id, book, chapter, paragraph
) t
where seqnum = 1;
Here is a db<>fiddle.
In older versions of MariaDB, you can use a correlated subquery:
select t.*
from mytable t
where (id, book, chapter, paragraph, revisionnum) = (select t2.id, t2.book, t2.chapter, t2.paragraph, t2.revisionnum
from mytable t2
where t2.book = t.book
order by t2.revisionnum desc
limit 1
);
For this query, try adding an index on (book, revisionnum desc).

Related

mysql pivot using column and row numbers

I am stuck in this situation where I need to use Row Number and Column Number values from table's columns to derive the output mentioned below. I have tried everything - if/else, case when/then but not helping.
Any help/suggestions are really appreciated!
Here is a mocked up sample data present in db table -
+--------+--------+--------+----------+-------------+
| Record | ColNbr | RowNbr | ColTitle | CellContent |
+--------+--------+--------+----------+-------------+
| 1 | 1 | 1 | Unit | sqf |
| 1 | 1 | 2 | Unit | cm |
| 1 | 2 | 1 | Desc | roof |
| 1 | 2 | 2 | Desc | rod |
| 1 | 3 | 1 | Material | concrete |
| 1 | 3 | 2 | Material | steel |
| 1 | 4 | 1 | Quantity | 100 |
| 1 | 4 | 2 | Quantity | 12 |
| 1 | 1 | 1 | Unit | liter |
| 1 | 1 | 2 | Unit | ml |
| 1 | 2 | 1 | Desc | bowl |
| 1 | 2 | 2 | Desc | plate |
| 1 | 3 | 1 | Material | plastic |
| 1 | 3 | 2 | Material | glass |
| 1 | 4 | 1 | Quantity | 2 |
| 1 | 4 | 2 | Quantity | 250 |
+--------+--------+--------+----------+-------------+
Expected Output -
+--------+--------+--------+----------+-------------+
| Record | Unit | Desc | Material | Quantity |
+--------+--------+--------+----------+-------------+
| 1 | sqf | roof | concrete | 100 |
| 1 | cm | rod | steel | 12 |
| 2 | liter | bowl | plastic | 2 |
| 2 | ml | plate | glass | 250 |
+--------+--------+--------+----------+-------------+
If your actual data is like that, I suggest that you consider to separate the data to; for example, 4 different tables (unit,description,material & a table to store all that ids+quantity). The former 3 tables will store the prerequisite info that get minor updates throughout time and the last table will store all the quantity records. Let's say your tables will look something like this:
CREATE TABLE `Unit` (
unit_id INT,
unit_name VARCHAR(50));
+---------+-----------+
| unit_id | unit_name |
+---------+-----------+
| 1 | sqf |
| 2 | cm |
| 3 | liter |
| 4 | ml |
+---------+-----------+
CREATE TABLE `Description` (
desc_id INT,
desc_name VARCHAR(50));
+---------+-----------+
| desc_id | desc_name |
+---------+-----------+
| 1 | roof |
| 2 | rod |
| 3 | bowl |
| 4 | plate |
+---------+-----------+
CREATE TABLE `Material` (
mat_id INT,
mat_name VARCHAR(50));
+--------+----------+
| mat_id | mat_name |
+--------+----------+
| 1 | concrete |
| 2 | steel |
| 3 | plastic |
| 4 | glass |
+--------+----------+
CREATE TABLE `Records` (
unit_id INT,
desc_id INT,
mat_id INT,
quantity DECIMAL(14,4));
+---------+---------+--------+----------+
| unit_id | desc_id | mat_id | Quantity |
+---------+---------+--------+----------+
| 1 | 1 | 1 | 100 |
| 2 | 2 | 2 | 12 |
| 3 | 3 | 3 | 2 |
| 4 | 4 | 4 | 250 |
+---------+---------+--------+----------+
Then you insert the data accordingly.
Anyhow, for your existing data example, it could be done but there are some concern over whether the unit+desc+material+quantity matching are correct. The only way I can maybe at least think that it's correctly matched is by giving all of the query a similar ORDER BY clause. Hence, the following:
SELECT A.record,A.unit,B.Desc,C.Material,D.Quantity FROM
(SELECT #rn:=#rn+1 AS record,CASE WHEN coltitle='unit' THEN cellcontent END AS Unit
FROM yourtable, (SELECT #rn :=0 ) v
HAVING unit IS NOT NULL
ORDER BY colnbr) A LEFT JOIN
(SELECT #rn1:=#rn1+1 AS record,CASE WHEN coltitle='Desc' THEN cellcontent END AS `Desc`
FROM yourtable, (SELECT #rn1 :=0 ) v
HAVING `Desc` IS NOT NULL
ORDER BY colnbr) B ON a.record=b.record LEFT JOIN
(SELECT #rn2:=#rn2+1 AS record,CASE WHEN coltitle='material' THEN cellcontent END AS Material
FROM yourtable, (SELECT #rn2:=0 ) v
HAVING Material IS NOT NULL
ORDER BY colnbr) C ON a.record=c.record LEFT JOIN
(SELECT #rn3:=#rn3+1 AS record,CASE WHEN coltitle='Quantity' THEN cellcontent END AS Quantity
FROM yourtable, (SELECT #rn3:=0 ) v
HAVING Quantity IS NOT NULL
ORDER BY colnbr) D ON a.record=d.record;
The idea here is to make a sub-query based on COLTITLE then assign a numbering/ranking (#rn,#rn1,#rn2,#rn3) variable to each of the sub-query and join them up using LEFT JOIN. Now, this experiment works to exactly return the output that you need but its not a definite answer because there are some part that is questionable especially on matching the combination correctly. Hopefully, this will give you some idea.

How can I determine which user is the top one in the specific tag?

I have a question and answer website like stackoverflow. Here is the structure of some tables:
-- {superfluous} means some other columns which are not related to this question
// q&a
+----+-----------------+--------------------------+------+-----------+-----------+
| id | title | body | type | related | author_id |
+----+-----------------+--------------------------+------+-----------+-----------+
| 1 | How can I ... | I'm trying to make ... | q | NULL | 3 |
| 2 | | You can do that by ... | a | 1 | 1 |
| 3 | Why should I .. | I'm wonder, why ... | q | NULL | 1 |
| 4 | | First of all you ... | a | 1 | 2 |
| 5 | | Because that thing ... | a | 3 | 2 |
+----+-----------------+--------------------------+------+-----------+-----------+
// users
+----+--------+-----------------+
| id | name | {superfluous} |
+----+--------+-----------------+
| 1 | Jack | |
| 2 | Peter | |
| 3 | John | |
+----+--------+-----------------+
// votes
+----+----------+-----------+-------+-----------------+
| id | user_id | post_id | value | {superfluous} |
+----+----------+-----------+-------+-----------------+
| 1 | 3 | 4 | 1 | |
| 2 | 1 | 1 | -1 | |
| 3 | 2 | 1 | 1 | |
| 4 | 3 | 2 | -1 | |
| 5 | 1 | 4 | 1 | |
| 6 | 3 | 5 | -1 | |
+----+--------+-------------+-------+-----------------+
// tags
+----+------------+-----------------+
| id | name | {superfluous} |
+----+------------+-----------------+
| 1 | PHP | |
| 2 | SQL | |
| 3 | MySQL | |
| 4 | HTML | |
| 5 | CSS | |
| 6 | C# | |
+----+------------+-----------------+
// q&aTag
+-------+--------+
| q&aid | tag_id |
+-------+--------+
| 1 | 1 |
| 1 | 4 |
| 3 | 5 |
| 3 | 4 |
| 4 | 6 |
+-------+--------+
Now I need to find top users in a specific tag. For example, I need to find Peter as top user in PHP tag. Because his answer for question1 (which has PHP tag) has earned 2 upvotes. Is doing that possible?
Try this:
select q1.title, u.id, u.name, sum(v.value) total from `q&a` q1
left join `q&atag` qt ON q1.id = qt.`q&aid`
inner join tags t ON qt.tag_id = t.id
left join `q&a` q2 ON q2.related = q1.id
left join users u ON q2.author_id = u.id
left join votes v ON v.post_id = q2.id
where t.name = 'PHP'
group by q1.id, u.id
and here is a simple divided solution:
Let us divide it into sub queries:
get the id of the tag you will search for: select id from tags where name = 'PHP'
get the questions with this tag: select 'q&aid' from 'q&aTag' where tag_id = 1.
get the ids of answers for that question: select id, author_id fromq&awhere related in (2.)
get the final query: select user_id, sum(value) from votes where post_id in (3.) group by user_id
Now combining them all give the result:
select user_id, sum(`value`) total from votes
where post_id in (
select id from `q&a` where related in (
select `q&aid` from `q&aTag` where tag_id IN (
select id from tags where name = 'PHP'
)
)
)
group by user_id
you can add this at the end if you want only one record:
order by total desc limit 1

SQL Advanced SELECT Statement

translations
+---------+----------------+----------+---------+
| id_user | id_translation | referrer | id_word |
+---------+----------------+----------+---------+
| 1 | 3 | NULL | 4 |
| 1 | 17 | NULL | 3 |
| 2 | 17 | NULL | 5 |
| 2 | 17 | NULL | 1 |
| 2 | 17 | NULL | 7 |
words
+----+------+
| id | word |
+----+------+
| 4 | out |
+----+------+
users_translations
+---------+----------------+----------+---------+
| id_user | id_translation | referrer | id_word |
+---------+----------------+----------+---------+
| 1 | 17 | 1 | 4 |
| 2 | 17 | 2 | 4 |
| 3 | 18 | NULL | 4 |
I need to select all translations for current word and id_translation, but if in the row referrer = 1 (current user), then I don't need another results (translations from another users for current word), if there is no referrer = 1, show all.
SELECT DISTINCT `t`.*, `ut`.`id_user` AS tuser
FROM translations AS t
LEFT JOIN users_translations AS ut ON `t`.`id` = `ut`.`id_translation`
INNER JOIN words ON `words`.`id` = `ut`.`id_word` OR `words`.`id` = `t`.`id_word`
WHERE (`word` = 'help')
ORDER BY `t`.`translation` ASC
+----+-------------+---------+---------+-------+
| id | translation | id_word | id_user | tuser |
+----+-------------+---------+---------+-------+
| 17 | допомагати | 4 | 1 | 2 |
| 17 | допомагати | 4 | 1 | 1 |
First row doesn't need, because we have tuser = 1. When there is no tuser = 1, all results should be returned.
I don't understand how to build select statement and I will be very appreciative that somebody shows me how to make it work.
First thing that comes to mind
--add this to your where clause
id_user <=
CASE WHEN EXISTS(SELECT * FROM translations WHERE id_user = 1 AND id_word = words.id_word)
THEN 1
ELSE (SELECT MAX(Id) FROM translations)
END

Top 'n' results for each keyword

I have a query to get the top 'n' users who commented on a specific keyword,
SELECT `user` , COUNT( * ) AS magnitude
FROM `results`
WHERE `keyword` = "economy"
GROUP BY `user`
ORDER BY magnitude DESC
LIMIT 5
I have approx 6000 keywords, and would like to run this query to get me the top 'n' users for each and every keyword we have data for. Assistance appreciated.
Since you haven't given the schema for results, I'll assume it's this or very similar (maybe extra columns):
create table results (
id int primary key,
user int,
foreign key (user) references <some_other_table>(id),
keyword varchar(<30>)
);
Step 1: aggregate by keyword/user as in your example query, but for all keywords:
create view user_keyword as (
select
keyword,
user,
count(*) as magnitude
from results
group by keyword, user
);
Step 2: rank each user within each keyword group (note the use of the subquery to rank the rows):
create view keyword_user_ranked as (
select
keyword,
user,
magnitude,
(select count(*)
from user_keyword
where l.keyword = keyword and magnitude >= l.magnitude
) as rank
from
user_keyword l
);
Step 3: select only the rows where the rank is less than some number:
select *
from keyword_user_ranked
where rank <= 3;
Example:
Base data used:
mysql> select * from results;
+----+------+---------+
| id | user | keyword |
+----+------+---------+
| 1 | 1 | mysql |
| 2 | 1 | mysql |
| 3 | 2 | mysql |
| 4 | 1 | query |
| 5 | 2 | query |
| 6 | 2 | query |
| 7 | 2 | query |
| 8 | 1 | table |
| 9 | 2 | table |
| 10 | 1 | table |
| 11 | 3 | table |
| 12 | 3 | mysql |
| 13 | 3 | query |
| 14 | 2 | mysql |
| 15 | 1 | mysql |
| 16 | 1 | mysql |
| 17 | 3 | query |
| 18 | 4 | mysql |
| 19 | 4 | mysql |
| 20 | 5 | mysql |
+----+------+---------+
Grouped by keyword and user:
mysql> select * from user_keyword order by keyword, magnitude desc;
+---------+------+-----------+
| keyword | user | magnitude |
+---------+------+-----------+
| mysql | 1 | 4 |
| mysql | 2 | 2 |
| mysql | 4 | 2 |
| mysql | 3 | 1 |
| mysql | 5 | 1 |
| query | 2 | 3 |
| query | 3 | 2 |
| query | 1 | 1 |
| table | 1 | 2 |
| table | 2 | 1 |
| table | 3 | 1 |
+---------+------+-----------+
Users ranked within keywords:
mysql> select * from keyword_user_ranked order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql | 1 | 4 | 1 |
| mysql | 2 | 2 | 3 |
| mysql | 4 | 2 | 3 |
| mysql | 3 | 1 | 5 |
| mysql | 5 | 1 | 5 |
| query | 2 | 3 | 1 |
| query | 3 | 2 | 2 |
| query | 1 | 1 | 3 |
| table | 1 | 2 | 1 |
| table | 3 | 1 | 3 |
| table | 2 | 1 | 3 |
+---------+------+-----------+------+
Only top 2 from each keyword:
mysql> select * from keyword_user_ranked where rank <= 2 order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql | 1 | 4 | 1 |
| query | 2 | 3 | 1 |
| query | 3 | 2 | 2 |
| table | 1 | 2 | 1 |
+---------+------+-----------+------+
Note that when there are ties -- see users 2 and 4 for keyword "mysql" in the examples -- all parties in the tie get the "last" rank, i.e. if the 2nd and 3rd are tied, both are assigned rank 3.
Performance: adding an index to the keyword and user columns will help. I have a table being queried in a similar way with 4000 and 1300 distinct values for the two columns (in a 600000-row table). You can add the index like this:
alter table results add index keyword_user (keyword, user);
In my case, query time dropped from about 6 seconds to about 2 seconds.
You can use a pattern like this (from Within-group quotas (Top N per group)):
SELECT tmp.ID, tmp.entrydate
FROM (
SELECT
ID, entrydate,
IF( #prev <> ID, #rownum := 1, #rownum := #rownum+1 ) AS rank,
#prev := ID
FROM test t
JOIN (SELECT #rownum := NULL, #prev := 0) AS r
ORDER BY t.ID
) AS tmp
WHERE tmp.rank <= 2
ORDER BY ID, entrydate;
+------+------------+
| ID | entrydate |
+------+------------+
| 1 | 2007-05-01 |
| 1 | 2007-05-02 |
| 2 | 2007-06-03 |
| 2 | 2007-06-04 |
| 3 | 2007-07-01 |
| 3 | 2007-07-02 |
+------+------------+

MYSQL - how to string comparisons and query?

+--------------------+---------------+------+-----+---------+-------+
| ID | GKEY |GOODS | PRI | COUNTRY | Extra |
+--------------------+---------------+------+-----+---------+-------+
| 1 | BOOK-1 | 1 | 10 | | |
| 2 | PHONE-1 | 2 | 12 | | |
| 3 | BOOK-2 | 1 | 13 | | |
| 4 | BOOK-3 | 1 | 10 | | |
| 5 | PHONE-2 | 2 | 10 | | |
| 6 | PHONE-3 | 2 | 20 | | |
| 7 | BOOK-10 | 2 | 20 | | |
| 8 | BOOK-11 | 2 | 20 | | |
| 9 | BOOK-20 | 2 | 20 | | |
| 10 | BOOK-21 | 2 | 20 | | |
| 11 | PHONE-30 | 2 | 20 | | |
+--------------------+---------------+------+-----+---------+-------+
Above is my table. I want to get all records which GKEY > BOOK-2, Who can tell me the expression with mysql?
Using " WHERE GKEY>'BOOK-2' " Cannot get the correct results.
How about (something like):
(this is MSSQL - I guess it will be similar in MySQL)
select
*
from
(
select
*,
index = convert(int,replace(GKEY,'BOOK-',''))
from table
where
GKEY like 'BOOK%'
) sub
where
sub.index > 2
By way of explanation: The inner query basically recreates your table, but only for BOOK rows, and with an extra column containing the index in the right data type to make a greater than comparison work numerically.
Alternatively something like this:
select
*
from table
where
(
case
when GKEY like 'BOOK%' then
case when convert(int,replace(GKEY,'BOOK-','')) > 2 then 1
else 0
end
else 0
end
) = 1
Essentially the problem is that you need to check for BOOK before you turn the index into a numberic, as the other values of GKEY would create an error (without doing some clunky string handling).
SELECT * FROM `table` AS `t1` WHERE `t1`.`id` > (SELECT `id` FROM `table` AS `t2` WHERE `t2`.`GKEY`='BOOK-2' LIMIT 1)