MySQL GROUP BY "group" and order by largest ID

MySQL GROUP BY "group" and order by largest ID - mysql

I have the following table: Tree. I am trying to select the highest Primary Key ID per scenario_id
id user_id scenario_id
----------------------------------
100 1 10
200 1 10
300 1 5
400 1 5
500 1 5
SELECT * FROM tree
WHERE user_id = 1
GROUP BY scenario_id
ORDER BY id DESC
With my above query I don't get the largest ID. I get 300 and 100 -- But I want to get 200 and 500.
Here is the table dump to test:
CREATE TABLE IF NOT EXISTS `tree` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`scenario_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1;
INSERT INTO `tree` (`id`, `user_id`, `scenario_id`) VALUES
(5, 1, 5),
(100, 1, 10),
(200, 1, 10),
(300, 1, 5),
(400, 1, 5),
(500, 1, 5);

Use an aggregate function to get a specific value for a group
SELECT scenario_id, max(id) as max_id
FROM tree
WHERE user_id = 1
GROUP BY scenario_id

If you would like to keep your select * and avoid grouping to get these results from the same record you could also use a self join:
SELECT t1.*
FROM
tree t1 LEFT JOIN tree t2 ON t1.scenario_id = t2.scenario_id AND t2.id > t1.id
WHERE
t2.id IS NULL;
Sometimes this can be useful to pull additional fields that you can't get as efficiently using a group by/aggregate solution.

Related

How to get subsequent total of column in next row using mysql select query

I am creating one SQL query for stock entry. In the last column, I need the total of the previous stock purchase.
S No Product Code Qty Qty Total
1 PO1 5 5
2 PO1 12 17
3 PO1 10 27
4 PO1 8 35
5 PO1 9 44
6 PO1 16 60
In every row of the last column, I am adding all the previous quantity for example S no. 1 quantity is 5. In S No. 2 I am adding quantity 5 and 12=17. S No 3 I am adding 5 + 12+10=27 and Soo on.
I am sorry if it is a duplicate question. I search google and StackOverflow but didn't get the answer. I am new to MySQL
I have added query below. I am new to SQL, any help appreciated,
Thanks in Advance.
CREATE TABLE `stock_table` (
`ID` int(11) NOT NULL,
`product_code` varchar(20) NOT NULL,
`qty` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `stock_table` (`ID`, `product_code`, `qty`) VALUES
(1, 'PO1', 5),
(2, 'PO1', 12),
(3, 'PO1', 10),
(4, 'PO1', 8),
(5, 'PO1', 9),
(6, 'PO1', 16);

You need a running sum. If you are using MySQL v 8.0, You can use SUM window function -
SELECT `ID`,
`product_code`,
`qty`,
SUM(`qty`) OVER (PARTITION BY `product_code` ORDER BY `ID`) Qty_Total
FROM `stock_table`;
fiddle

Your version of MariaDB does not support window functions. That leaves you with two options. One is a correlated subquery; the second is variables.
The first is easier to implement:
select st.*,
(select sum(st2.qty)
from stock_table st2
where st2.product_code = st.product_code and
st2.id <= st.id
) as running_qty
from stock_table st;
For performance, you want an index on stock_table(product_code, id, qty).

You can use correlated subquery to sum up the desired column as the row proceeds by increasing ID value :
select t.*,
( select sum(`qty`) from `stock_table` where `ID` <= t.`ID` ) as "Qty Total"
from `stock_table` t
order by t.`ID`;
Demo
P.S. : as the sample data has unique product_code value, I didn't need to include the part related to product_code column's matching within the subquery, if it's the case, then consider converting your derived column to :
( select sum(`qty`)
from `stock_table`
where `ID` <= t.`ID`
and `product_code` = t.`product_code` ) as "Qty Total"

AVG with LIMIT and GROUP BY

I'm looking to make a SQL query, but I can't do it... and I can't find an example like mine.
I have a simple table People with 3 columns, 7 records :
I'd like to get for each team, the average points of 2 bests people.
My Query:
SELECT team
, (SELECT AVG(point)
FROM People t2
WHERE t1.team = t2.team
ORDER
BY point DESC
LIMIT 2) as avg
FROM People t1
GROUP
BY team
Current result: (average on all people of each team)
Apparently, it's not possible to use a limit into subquery. "ORDER BY point DESC LIMIT 2" is ignored.
Result expected:
I want the average points of 2 bests people (with highest points) for each team, not the average points of all people of each team.
How can I do that? If anyone has any idea..
I'm on MySQL Database
Link of Fiddle : http://sqlfiddle.com/#!9/8c80ef/1
Thanks !

You can try this.
try to make a order number by a subquery, which order by point desc.
then only get top 2 row by each team, if you want to get other top number just modify the number in where clause.
CREATE TABLE `People` (
`id` int(11) NOT NULL,
`name` varchar(20) NOT NULL,
`team` varchar(20) NOT NULL,
`point` int(4) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO `People` (`id`, `name`, `team`, `point`) VALUES
(1, 'Luc', 'Jupiter', 10),
(2, 'Marie', 'Saturn', 0),
(3, 'Hubert', 'Saturn', 0),
(4, 'Albert', 'Jupiter', 50),
(5, 'Lucy', 'Jupiter', 50),
(6, 'William', 'Saturn', 20),
(7, 'Zeus', 'Saturn', 40);
ALTER TABLE `People`
ADD PRIMARY KEY (`id`);
ALTER TABLE `People`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=8;
Query 1:
SELECT team,avg(point) totle
FROM People t1
where (
select count(*)
from People t2
where t2.id >= t1.id and t1.team = t2.team
order by t2.point desc
) <=2 ## if you want to get other `top` number just modify this number
group by team
Results:
| team | totle |
|---------|-------|
| Jupiter | 50 |
| Saturn | 30 |

This is a pain in MySQL. If you want the two highest point values, you can do:
SELECT p.team, AVG(p2.point)
FROM people p
WHERE p.point >= (SELECT DISTINCT p2.point
FROM people p2
WHERE p2.team = p.team
ORDER BY p2.point DESC
LIMIT 1, 1 -- get the second one
);
Ties make this tricky, and your question isn't clear on what to do about them.

Fast group rank() function

There are various ways people try to emulate MSSQL RANK() or ROW_NUMBER() functions in MySQL, but all of them I've tried so far are slow.
I have a table that looks like this:
CREATE TABLE ratings
(`id` int, `category` varchar(1), `rating` int)
;
INSERT INTO ratings
(`id`, `category`, `rating`)
VALUES
(3, '*', 54),
(4, '*', 45),
(1, '*', 43),
(2, '*', 24),
(2, 'A', 68),
(3, 'A', 43),
(1, 'A', 12),
(3, 'B', 22),
(4, 'B', 22),
(4, 'C', 44)
;
Except it has 220,000 records. There are about 90,000 unique id's.
I wanted to rank the id's first by looking at the categories which were not * where a higher rating is a lower rank.
SELECT g1.id,
g1.category,
g1.rating,
Count(*) AS rank
FROM ratings AS g1
JOIN ratings AS g2 ON (g2.rating, g2.id) >= (g1.rating, g1.id)
AND g1.category = g2.category
WHERE g1.category != '*'
GROUP BY g1.id,
g1.category,
g1.rating
ORDER BY g1.category,
rank
Output:
id category rating rank
2 A 68 1
3 A 43 2
1 A 12 3
4 B 22 1
3 B 22 2
4 C 44 1
Then I wanted to take the smallest rank an id had, and average that with the rank they have within the * category. Giving a total query of:
SELECT X1.id,
(X1.rank + X2.minrank) / 2 AS OverallRank
FROM
(SELECT g1.id,
g1.category,
g1.rating,
Count(*) AS rank
FROM ratings AS g1
JOIN ratings AS g2 ON (g2.rating, g2.id) >= (g1.rating, g1.id)
AND g1.category = g2.category
WHERE g1.category = '*'
GROUP BY g1.id,
g1.category,
g1.rating
ORDER BY g1.category,
rank) X1
JOIN
(SELECT id,
Min(rank) AS MinRank
FROM
(SELECT g1.id,
g1.category,
g1.rating,
Count(*) AS rank
FROM ratings AS g1
JOIN ratings AS g2 ON (g2.rating, g2.id) >= (g1.rating, g1.id)
AND g1.category = g2.category
WHERE g1.category != '*'
GROUP BY g1.id,
g1.category,
g1.rating
ORDER BY g1.category,
rank) X
GROUP BY id) X2 ON X1.id = X2.id
ORDER BY overallrank
Giving me
id OverallRank
3 1.5000
4 1.5000
2 2.5000
1 3.0000
This query is correct and the output I want, but it just hangs on my real table of 220,000 records. How can I optimize it? My real table has an index on id,rating and category and id,category
Edit:
Result of SHOW CREATE TABLE ratings:
CREATE TABLE `rating` (
`id` int(11) NOT NULL,
`category` varchar(255) NOT NULL,
`rating` int(11) NOT NULL DEFAULT '1500',
`rd` int(11) NOT NULL DEFAULT '350',
`vol` float NOT NULL DEFAULT '0.06',
`wins` int(11) NOT NULL,
`losses` int(11) NOT NULL,
`streak` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`streak`,`rd`,`id`,`category`),
UNIQUE KEY `id_category` (`id`,`category`),
KEY `rating` (`rating`,`rd`),
KEY `streak_idx` (`streak`),
KEY `category_idx` (`category`),
KEY `id_rating_idx` (`id`,`rating`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
The PRIMARY KEY is the most common use case of queries to this table, that is why it's the clustered key. It's worth noting that the server is a raid 10 of SSDs with a 9GB/s FIO random read. So I don't suspect the indices not being clustered will affect much.
Output of (select count(distinct category) from ratings) is 50
In the interest that this could be how the data is or an oversight on me, I am included the export of the entire table. It is only 200KB zipped: https://www.dropbox.com/s/p3iv23zi0uzbekv/ratings.zip?dl=0
The first query takes 27 seconds to run

You can use temporary tables with an AUTO_INCREMENT column to generate ranks (row number).
For example - to generate ranks for the '*' category:
drop temporary table if exists tmp_main_cat_rank;
create temporary table tmp_main_cat_rank (
rank int unsigned auto_increment primary key,
id int NOT NULL
) engine=memory
select null as rank, id
from ratings r
where r.category = '*'
order by r.category, r.rating desc, r.id desc;
This runs in something like 30 msec. While your approach with the selfjoin takes 45 seconds on my machine. Even with a new index on (category, rating, id) it still takes 14 seconds to run.
To generate ranks per group (per category) is a bit more complicated. We can still use an AUTO_INCREMENT column, but will need to calculate and substract an offset per category:
drop temporary table if exists tmp_pos;
create temporary table tmp_pos (
pos int unsigned auto_increment primary key,
category varchar(50) not null,
id int NOT NULL
) engine=memory
select null as pos, category, id
from ratings r
where r.category <> '*'
order by r.category, r.rating desc, r.id desc;
drop temporary table if exists tmp_cat_offset;
create temporary table tmp_cat_offset engine=memory
select category, min(pos) - 1 as `offset`
from tmp_pos
group by category;
select t.id, min(t.pos - o.offset) as min_rank
from tmp_pos t
join tmp_cat_offset o using(category)
group by t.id
This runs in about 220 msec. The selfjoin solution takes 42 sec or 13 sec with the new index.
Now you just need to combine the last query with the first temp table, to get your final result:
select t1.id, (t1.min_rank + t2.rank) / 2 as OverallRank
from (
select t.id, min(t.pos - o.offset) as min_rank
from tmp_pos t
join tmp_cat_offset o using(category)
group by t.id
) t1
join tmp_main_cat_rank t2 using(id);
Overall runtime is ~280 msec without an additional index and ~240 msec with an index on (category, rating, id).
A note to the selfjoin approach: It's an elegant solution and performs fine with a small group size. It's fast with an average group size <= 2. It can be acceptable for a group size of 10. But you have an average group size 447 (count(*) / count(distinct category)). That means every row is joined with 447 other rows (on average). You can see the impact by removing the group by clause:
SELECT Count(*)
FROM ratings AS g1
JOIN ratings AS g2 ON (g2.rating, g2.id) >= (g1.rating, g1.id)
AND g1.category = g2.category
WHERE g1.category != '*'
The result is more than 10M rows.
However - with an index on (category, rating, id) your query runs in 33 seconds on my machine.

get the id of the row with the least value, group by an other column

I ran into a problem trying to pull one action per user with the least priority, the priority is based on other columns content and is an integer,
This is the initial query :
SELECT
CASE
...
END AS dummy_priority,
id,
user_id
FROM
actions
Result :
id user_id priority
1 2345 1
2 2345 3
3 2999 5
4 2999 2
5 3000 10
Desired result :
id user_id priority
1 2345 1
4 2999 2
5 3000 10
Following what i want i tried
SELECT x.id, x.user_id, MIN(x.priority)
FROM (
SELECT
CASE
...
END AS priority,
id,
user_id
FROM
actions
) x
GROUP BY x.user_id
Which didn't work
Error Code: 1055. Expression #1 of SELECT list is not in GROUP BY
clause and contains nonaggregated column 'x.id' which is not
functionally dependent on columns in GROUP BY clause;
Most examples of this I found were extracting just the user_id and priority and then doing an inner join with both of them to get the row, but I can't do that since (priority, user_id) isn't unique
A simple verifiable example would be
CREATE TABLE `actions` (
`id` int(11) NOT NULL,
`user_id` int(11) DEFAULT NULL,
`priority` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `actions` (`id`, `user_id`, `priority`) VALUES
(1, 2345, 1),
(2, 2345, 3),
(3, 2999, 5),
(4, 2999, 2),
(5, 3000, 10);
how to extract the desired result (please hold in mind that this table is a subquery)?

The proper way to do this would involve a subquery of some sort . . . and that would require repeating the case definition.
Here is another method, using the substring_index()/group_concat() trick:
SELECT SUBSTRING_INDEX(GROUP_CONCAT(x.id ORDER BY x.priority), ',', 1) as id,
x.user_id, MIN(x.priority)
FROM (SELECT (CASE ...
END) AS priority,
id, user_id
FROM actions a
) x
GROUP BY x.user_id;

And that proper way in full...
SELECT x...
, CASE...x... priority
FROM my_table x
JOIN
( SELECT user_id
, MIN(CASE...) priority
FROM my_table
GROUP
BY user_id
) y
ON y.user_id = x.user_id
AND y.priority = CASE...x...;

This should work ...
SELECT id , user_id, priority FROM actions act
INNER JOIN
(SELECT
user_id, MIN(priority) AS priority
FROM
actions
GROUP BY user_id) pri
ON act.user_id = pri.user_id AND act.priority = pri.prority

Retrieve one id referred by two ids

I'm stuck with the following problem:
SQL query for the table:
CREATE TABLE IF NOT EXISTS `thread_users` (
`thread_id` bigint(20) unsigned NOT NULL,
`user_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`thread_id`,`user_id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Let's say that I have those data:
INSERT INTO `thread_users` (`thread_id`, `user_id`) VALUES
(1, 1),
(1, 2),
(2, 1),
(2, 2),
(2, 3),
(3, 1),
(3, 4);
I need to retrieve the thread_id referred by only 2 ids: X & Y (both known).
With the above data, I want to be able to retrieve the thread_id where only user_id = 1 & user_id = 2 are present.
What i Know for sure about this table:
If a thread is composed by only 2 users, there is no other threads containing only those two ids. (It's check outside mysql before the insertion)
A user can't be present in a thread more than once. (primary key)
What i have thinking of to resolve this problem:
Sum up (user_id 1 + user_id 2) search for SUMs equal to that result + (user_id = X OR user_id = Y). But i haven't been able to write correctly this query AND I also need to check the number of user_id in that thread...
Obviously: searching id where the number of user_id on threads are equal to 2 and where user_id are equals to X & Y.
Thanks for the help guys!

SELECT tu1.thread_id
FROM thread_users AS tu1
INNER JOIN thread_users AS tu2
ON tu1.thread_id = tu2.thread_id
AND tu1.user_id <> tu2.user_id
LEFT OUTER JOIN thread_users AS tu3
ON tu1.thread_id = tu3.thread_id
AND tu1.user_id <> tu3.user_id
AND tu2.user_id <> tu3.user_id
WHERE tu1.user_id = 1
AND tu2.user_id = 2
AND tu3.user_id IS NULL

Something like this
SELECT thread_id FROM thread_users WHERE user_id IN(1,2) GROUP BY thread_id
HAVING COUNT(user_id)=2
SQL fiddle

I think this is a bit simpler than the JOIN example, and also a bit faster:
SELECT `thread_id`, GROUP_CONCAT(`user_id`) AS this_match FROM `thread_users`
GROUP BY `thread_id` HAVING this_match = '1,2'

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL GROUP BY "group" and order by largest ID - mysql

Use an aggregate function to get a specific value for a group SELECT scenario_id, max(id) as max_id FROM tree WHERE user_id = 1 GROUP BY scenario_id

Related

How to get subsequent total of column in next row using mysql select query

AVG with LIMIT and GROUP BY

Fast group rank() function

get the id of the row with the least value, group by an other column

Retrieve one id referred by two ids

Categories

Resources