get max, min, count and mode (occurrence) - mysql

I have an items table in my database that i want my query to process the values and give me the data of the max price, min price, most recurrent max price in that specific item category and no of items (and ignore the ones that are null), so here is my items table:
id
category
min_price
max_price
1
kids
10
100
2
adult
20
200
3
both
null
null
4
adult
20
100
5
adult
50
100
6
adult
50
200
7
kids
20
100
8
both
20
100
9
kids
null
null
10
adult
10
500
11
misc
null
null
I want the query to return this result:
category
min_price
max_price
price_mode
no_items
kids
10
100
100
3
adult
20
500
200
5
both
20
100
100
2
misc
null
null
null
1
so just to further explain the adult lowest price in 20 and highest is 500 and the 100 and 200 max_price has 2 occurrences both i want to take the highest as the price_mode which is 200 in this case and the no_items is just the count of how many times adult is shown in the table.
am struggling to get the mode honestly and grouping it correctly to get the output I want.
Below is the commands to create table and feed it with data. Tried to put it in SqlFiddle but that's not working for me i don't know why.
CREATE TABLE IF NOT EXISTS `items` (
`id` int(6) unsigned NOT NULL,
`category` TEXT NOT NULL,
`min_price` FLOAT DEFAULT NULL,
`max_price` FLOAT DEFAULT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `items` (`id`, `category`, `min_price`, `max_price`) VALUES
('kids', 10, 100),
('adult', 20, 200),
('both', null, null),
('adult', 20, 100),
('adult', 50, 100),
('adult', 50, 200),
('kids', 20, 100),
('both', 20, 100),
('kids', null, null),
('adult', 10, 500),
('misc', null, null);

Your create table + insert data syntax doesn't work in fiddle because your data VALUES are for just 3 columns whereby you define 4 columns in the INSERT:
INSERT INTO `items` (`id`, `category`, `min_price`, `max_price`) VALUES
('kids' , 10 , 100),
/*where's the value for `id`?*/
...
If you remove id from the INSERT syntax, it won't work as well because you've set it as PRIMARY KEY so it can't be empty. What you can do in addition to removing id from INSERT is to define AUTO_INCREMENT on the id column:
CREATE TABLE IF NOT EXISTS `items` (
`id` int(6) unsigned NOT NULL AUTO_INCREMENT,
....
Now, to get the expected result on your price_mode, you may want to try using GROUP_CONCAT() with ORDER and define which of the data in there that you want to return. Let's say you do GROUP_CONCAT(max_price ORDER BY max_price DESC) to return the set with max_price in descending order like this:
SELECT category,
MIN(min_price),
MAX(max_price),
GROUP_CONCAT(max_price ORDER BY max_price DESC),
COUNT(*)
FROM items
GROUP BY category;
Then you'll get a result like this:
category
MIN(min_price)
MAX(max_price)
GROUP_CONCAT(max_price ORDER BY max_price DESC)
COUNT(*)
adult
10
500
500,200,200,100,100
5
both
20
100
100
2
kids
10
100
100,100
3
misc
NULL
NULL
NULL
1
So, there's a consistent pattern in the GROUP_CONCAT() result that you probably can work out with. Assuming that you want the second largest value in the set, you can apply SUBSTRING_INDEX() twice to get it like this:
SELECT category,
MIN(min_price) AS min_price,
MAX(max_price) AS max_price,
SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT(max_price ORDER BY max_price DESC),',',2),',',-1)
AS price_mode,
COUNT(*) AS no_items
FROM items
GROUP BY category;
This return the following result:
category
min_price
max_price
price_mode
no_items
adult
10
500
200
5
both
20
100
100
2
kids
10
100
100
3
misc
NULL
NULL
NULL
1
Demo fiddle
The following is an updated suggestion after getting further clarification:
SELECT i.category,
MIN(i.min_price),
MAX(i.max_price),
v2.mp AS price_mode,
COUNT(DISTINCT i.id)
FROM items i
LEFT JOIN
(SELECT cat,
mp,
cnt,
CASE WHEN cat = #cat
THEN #rownum := #rownum + 1
ELSE #rownum:=1 END AS rownum,
#cat := cat
FROM
(SELECT category cat,
max_price mp,
COUNT(*) cnt
FROM items
GROUP BY category,
max_price) v1
CROSS JOIN (SELECT #rownum := 1,
#cat := NULL) seq
WHERE mp IS NOT NULL
ORDER BY cat, cnt DESC, mp DESC) v2
ON i.category=v2.cat
AND v2.rownum=1
GROUP BY i.category, v2.mp;
The query starts with getting the COUNT(*) value of category and max_price combination. Then generating a custom row numbering on it with a WHERE condition that doesn't return max_price with NULL after the first operation. Probably the crucial part here is the ORDER BY cat, cnt DESC, mp DESC since the row numberings are assigned based on it. Otherwise, the row numbering will mess up. Finally, LEFT JOIN the items table with it with ON i.category=v2.cat AND v2.rownum=1 condition. It's important to make sure the v2.rownum=1 is placed at ON condition instead of WHERE in order to return the last row value of misc; since the subqueries will not have the value with the present sample data.
Here's an updated fiddle for reference, including the sample of 3 adult=NULL.

Maybe this query will help
with maximumvaluecounts
as ( select
count(max_price) as c, category, max_price
from yourtable
group by category
),
maximumcountpercategory
as ( select category,max(c) as c
from maximumvaluecounts
group by category
),
modes as ( select category, max_price as modevalue
from maximumcountpercategory m1
join maximumvaluecounts m2
on m1.category=m2.category
and m1.c=m2.c
)
, others as (
select
category,
min(min_price) as min_price,
max(max_price) as max_price,
count(max_price) as no_items
from yourtable
group by category
)
select o.*, m.modevalue as price_mode
from others o join
modes m on o.category=m.category

Related

MySQL Query to select from table specific rows

I have a table which looks has the following values:
product_id
custom_id
custom_value
1
10
A
1
9
V
2
10
B
3
3
Q
I am looking for a mysql query to get all values from product_id once and select the row which has custom_id = "10" in case it is available. Nevertheless in case custom_id = 10 is not available for a product_id I would still like to return the product_id but also only once.
So the result I am looking for is
product_id
custom_id
custom_value
1
10
A
2
10
B
3
NULL
NULL
Could please someone direct me in the right direction.
select product_id, custom_id, custom_value from table where custom_id = 10
does of course only return the values for product_id "1" and "2"
You can select the first set of rows, then union by a distinct of all the other product id's
select product_id, custom_id, custom_value from table where custom_id = 10
union
select distinct product_id, NULL as custom_id, NULL as custom_value where custom_id <> 10
You can first generate a ROW_NUMBER to get the first element for each "product_id", then transform to NULL values for which "product_id" does not match your value 10, using the IF function.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY product_id ORDER BY custom_id = 10 DESC) AS rn
FROM tab
)
SELECT product_id,
IF(custom_id=10, custom_id, NULL) AS custom_id,
IF(custom_id=10, custom_value, NULL) AS custom_value
FROM cte
WHERE rn = 1
Check the demo here.

Fast group rank() function

There are various ways people try to emulate MSSQL RANK() or ROW_NUMBER() functions in MySQL, but all of them I've tried so far are slow.
I have a table that looks like this:
CREATE TABLE ratings
(`id` int, `category` varchar(1), `rating` int)
;
INSERT INTO ratings
(`id`, `category`, `rating`)
VALUES
(3, '*', 54),
(4, '*', 45),
(1, '*', 43),
(2, '*', 24),
(2, 'A', 68),
(3, 'A', 43),
(1, 'A', 12),
(3, 'B', 22),
(4, 'B', 22),
(4, 'C', 44)
;
Except it has 220,000 records. There are about 90,000 unique id's.
I wanted to rank the id's first by looking at the categories which were not * where a higher rating is a lower rank.
SELECT g1.id,
g1.category,
g1.rating,
Count(*) AS rank
FROM ratings AS g1
JOIN ratings AS g2 ON (g2.rating, g2.id) >= (g1.rating, g1.id)
AND g1.category = g2.category
WHERE g1.category != '*'
GROUP BY g1.id,
g1.category,
g1.rating
ORDER BY g1.category,
rank
Output:
id category rating rank
2 A 68 1
3 A 43 2
1 A 12 3
4 B 22 1
3 B 22 2
4 C 44 1
Then I wanted to take the smallest rank an id had, and average that with the rank they have within the * category. Giving a total query of:
SELECT X1.id,
(X1.rank + X2.minrank) / 2 AS OverallRank
FROM
(SELECT g1.id,
g1.category,
g1.rating,
Count(*) AS rank
FROM ratings AS g1
JOIN ratings AS g2 ON (g2.rating, g2.id) >= (g1.rating, g1.id)
AND g1.category = g2.category
WHERE g1.category = '*'
GROUP BY g1.id,
g1.category,
g1.rating
ORDER BY g1.category,
rank) X1
JOIN
(SELECT id,
Min(rank) AS MinRank
FROM
(SELECT g1.id,
g1.category,
g1.rating,
Count(*) AS rank
FROM ratings AS g1
JOIN ratings AS g2 ON (g2.rating, g2.id) >= (g1.rating, g1.id)
AND g1.category = g2.category
WHERE g1.category != '*'
GROUP BY g1.id,
g1.category,
g1.rating
ORDER BY g1.category,
rank) X
GROUP BY id) X2 ON X1.id = X2.id
ORDER BY overallrank
Giving me
id OverallRank
3 1.5000
4 1.5000
2 2.5000
1 3.0000
This query is correct and the output I want, but it just hangs on my real table of 220,000 records. How can I optimize it? My real table has an index on id,rating and category and id,category
Edit:
Result of SHOW CREATE TABLE ratings:
CREATE TABLE `rating` (
`id` int(11) NOT NULL,
`category` varchar(255) NOT NULL,
`rating` int(11) NOT NULL DEFAULT '1500',
`rd` int(11) NOT NULL DEFAULT '350',
`vol` float NOT NULL DEFAULT '0.06',
`wins` int(11) NOT NULL,
`losses` int(11) NOT NULL,
`streak` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`streak`,`rd`,`id`,`category`),
UNIQUE KEY `id_category` (`id`,`category`),
KEY `rating` (`rating`,`rd`),
KEY `streak_idx` (`streak`),
KEY `category_idx` (`category`),
KEY `id_rating_idx` (`id`,`rating`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
The PRIMARY KEY is the most common use case of queries to this table, that is why it's the clustered key. It's worth noting that the server is a raid 10 of SSDs with a 9GB/s FIO random read. So I don't suspect the indices not being clustered will affect much.
Output of (select count(distinct category) from ratings) is 50
In the interest that this could be how the data is or an oversight on me, I am included the export of the entire table. It is only 200KB zipped: https://www.dropbox.com/s/p3iv23zi0uzbekv/ratings.zip?dl=0
The first query takes 27 seconds to run
You can use temporary tables with an AUTO_INCREMENT column to generate ranks (row number).
For example - to generate ranks for the '*' category:
drop temporary table if exists tmp_main_cat_rank;
create temporary table tmp_main_cat_rank (
rank int unsigned auto_increment primary key,
id int NOT NULL
) engine=memory
select null as rank, id
from ratings r
where r.category = '*'
order by r.category, r.rating desc, r.id desc;
This runs in something like 30 msec. While your approach with the selfjoin takes 45 seconds on my machine. Even with a new index on (category, rating, id) it still takes 14 seconds to run.
To generate ranks per group (per category) is a bit more complicated. We can still use an AUTO_INCREMENT column, but will need to calculate and substract an offset per category:
drop temporary table if exists tmp_pos;
create temporary table tmp_pos (
pos int unsigned auto_increment primary key,
category varchar(50) not null,
id int NOT NULL
) engine=memory
select null as pos, category, id
from ratings r
where r.category <> '*'
order by r.category, r.rating desc, r.id desc;
drop temporary table if exists tmp_cat_offset;
create temporary table tmp_cat_offset engine=memory
select category, min(pos) - 1 as `offset`
from tmp_pos
group by category;
select t.id, min(t.pos - o.offset) as min_rank
from tmp_pos t
join tmp_cat_offset o using(category)
group by t.id
This runs in about 220 msec. The selfjoin solution takes 42 sec or 13 sec with the new index.
Now you just need to combine the last query with the first temp table, to get your final result:
select t1.id, (t1.min_rank + t2.rank) / 2 as OverallRank
from (
select t.id, min(t.pos - o.offset) as min_rank
from tmp_pos t
join tmp_cat_offset o using(category)
group by t.id
) t1
join tmp_main_cat_rank t2 using(id);
Overall runtime is ~280 msec without an additional index and ~240 msec with an index on (category, rating, id).
A note to the selfjoin approach: It's an elegant solution and performs fine with a small group size. It's fast with an average group size <= 2. It can be acceptable for a group size of 10. But you have an average group size 447 (count(*) / count(distinct category)). That means every row is joined with 447 other rows (on average). You can see the impact by removing the group by clause:
SELECT Count(*)
FROM ratings AS g1
JOIN ratings AS g2 ON (g2.rating, g2.id) >= (g1.rating, g1.id)
AND g1.category = g2.category
WHERE g1.category != '*'
The result is more than 10M rows.
However - with an index on (category, rating, id) your query runs in 33 seconds on my machine.

get the id of the row with the least value, group by an other column

I ran into a problem trying to pull one action per user with the least priority, the priority is based on other columns content and is an integer,
This is the initial query :
SELECT
CASE
...
END AS dummy_priority,
id,
user_id
FROM
actions
Result :
id user_id priority
1 2345 1
2 2345 3
3 2999 5
4 2999 2
5 3000 10
Desired result :
id user_id priority
1 2345 1
4 2999 2
5 3000 10
Following what i want i tried
SELECT x.id, x.user_id, MIN(x.priority)
FROM (
SELECT
CASE
...
END AS priority,
id,
user_id
FROM
actions
) x
GROUP BY x.user_id
Which didn't work
Error Code: 1055. Expression #1 of SELECT list is not in GROUP BY
clause and contains nonaggregated column 'x.id' which is not
functionally dependent on columns in GROUP BY clause;
Most examples of this I found were extracting just the user_id and priority and then doing an inner join with both of them to get the row, but I can't do that since (priority, user_id) isn't unique
A simple verifiable example would be
CREATE TABLE `actions` (
`id` int(11) NOT NULL,
`user_id` int(11) DEFAULT NULL,
`priority` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `actions` (`id`, `user_id`, `priority`) VALUES
(1, 2345, 1),
(2, 2345, 3),
(3, 2999, 5),
(4, 2999, 2),
(5, 3000, 10);
how to extract the desired result (please hold in mind that this table is a subquery)?
The proper way to do this would involve a subquery of some sort . . . and that would require repeating the case definition.
Here is another method, using the substring_index()/group_concat() trick:
SELECT SUBSTRING_INDEX(GROUP_CONCAT(x.id ORDER BY x.priority), ',', 1) as id,
x.user_id, MIN(x.priority)
FROM (SELECT (CASE ...
END) AS priority,
id, user_id
FROM actions a
) x
GROUP BY x.user_id;
And that proper way in full...
SELECT x...
, CASE...x... priority
FROM my_table x
JOIN
( SELECT user_id
, MIN(CASE...) priority
FROM my_table
GROUP
BY user_id
) y
ON y.user_id = x.user_id
AND y.priority = CASE...x...;
This should work ...
SELECT id , user_id, priority FROM actions act
INNER JOIN
(SELECT
user_id, MIN(priority) AS priority
FROM
actions
GROUP BY user_id) pri
ON act.user_id = pri.user_id AND act.priority = pri.prority

Selecting a subset of rows with MySql: conditionally limiting number of entries selected

this is a followup question to my previous query. I hope that posting a new question is appropriate in the circumstances: Selecting a subset of rows from a PHP table
I have an sql table that looks like this (for example):
id seller price amount
1 tom 350 500
2 tom 350 750
3 tom 350 750
4 tom 370 850
5 jerry 500 1000
I want to select one row per seller: in particular, for each seller I want the row with the cheapest price, and the largest amount at that price. In the example above, I want rows 2 and 5 (or 3 and 5, I don't care which of 2 and 3 I get as long as I only get one of them).
I am using this:
dbquery("SELECT a.* FROM $marketdb a
INNER JOIN
(
SELECT seller, MAX(amount) amount
FROM $marketdb
WHERE price=$minprice
GROUP BY seller
) b ON a.seller = b.seller AND
a.amount = b.amount;");
But this is giving me rows 2,3 and 5, and I only want one of rows 2 and 3.
I also have a nagging suspicion that this might not always return the minimum price rows either. My tests so far have been confused by the fact that I am getting more than one row with the same amount entered for a given seller.
If someone could point out my error I would be most appreciative.
Thanks!
EDIT: my apologies, I did not ask what I mean to ask. I would like rows returned from the global min price, max 1 per seller, not the min price for each seller. This would be only row 2 or 3 above. Sorry!
Just try adding another group by on seller as you want single row for a seller
to final query like
SELECT a.* FROM $marketdb a
INNER JOIN
(
SELECT seller, MAX(amount) amount
FROM $marketdb
WHERE price=$minprice
GROUP BY seller
)
b ON a.seller = b.seller AND
a.amount = b.amount group by a.seller;
Test this SQL fiddle:
http://sqlfiddle.com/#!2/7de03/2/0
CREATE TABLE `sellers` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`seller` VARCHAR(16) NOT NULL,
`price` FLOAT NOT NULL,
`amount` INT UNSIGNED NOT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `sellers` VALUES (1, 'tom', 350, 500);
INSERT INTO `sellers` VALUES (2, 'tom', 350, 750);
INSERT INTO `sellers` VALUES (3, 'tom', 350, 750);
INSERT INTO `sellers` VALUES (4, 'tom', 350, 850);
INSERT INTO `sellers` VALUES (5, 'jerry', 500, 600);
INSERT INTO `sellers` VALUES (6, 'jerry', 500, 1000);
INSERT INTO `sellers` VALUES (7, 'jerry', 500, 800);
SELECT * FROM
(SELECT DISTINCT * FROM sellers ORDER BY price ASC, amount DESC) t0
GROUP BY seller;
Kind of... works :)
There's an ugly hack at the end of this answer, but if you don't care which row is returned then I guess it saves some typing. Although, if you really don't care which row is returned, that tends to point to a more fundamental flaw in your schema design!
SELECT x.*
FROM market x
JOIN
( SELECT seller,MIN(price) min_price FROM market GROUP BY seller) y
ON y.seller = x.seller
AND y.min_price = x.price
JOIN
( SELECT seller,price,MAX(amount) max_amount FROM market GROUP BY seller,price) z
ON z.seller = y.seller
AND y.min_price = z.price
AND z.max_amount = x.amount
GROUP
BY seller;
Another method, which i dislike but which is popular with others here, goes something like this...
SELECT x.*
FROM
( SELECT *
FROM market
ORDER
BY seller
, price
, amount DESC
, id
) x
GROUP
BY seller;
You may need to GROUP BY the seller column outside of the join. Also, your WHERE clause looks like where price is a set number, instead of <=.
Query:
SQLFIDDLEExample
SELECT s.*
FROM sellers s
WHERE s.id = (SELECT s2.id
FROM sellers s2
WHERE s2.seller = s.seller
ORDER BY s2.price ASC, s2.amount DESC
LIMIT 1)
Result:
| ID | SELLER | PRICE | AMOUNT |
--------------------------------
| 2 | tom | 350 | 750 |
| 5 | jerry | 500 | 1000 |

MySQL sum() on different group bys

Ok, I have a query over two tables. I need to get two sums. I do a group by so the sum() works correctly.
SELECT sum(a.x), sum(b.y) FROM a,b GROUP BY a.n where a.n=b.m
So far this works well, but the problem is i need to group them differently for the second sum (sum(b.y)), than for the first sum (sum(a.x)).
The real query is somewhat more complex but this is my main problem.
This is what i actually try to select sum(stock.amount) - if( sold.amount IS NULL , 0, sum( sold.amount ) )
How can I solve that in one query?
since you are not writing down the tables I am gonna make a wild guess and assume the tables are like :
stock : id, item_id, amount
sold : id, item_id, amount
then again I assume that you need the stock_in_total, sold_total, left_total counts
SELECT
stock_sums.item_id,
stock_sums.st_sum as stock_in_total,
COALESCE(sold_sums.so_sum,0) as sold_total,
(stock_sums.st_sum - COALESCE(sold_sums.so_sum,0)) as left_total
FROM (
SELECT stock.item_id as item_id, SUM(stock.amount) as st_sum
FROM stock
GROUP BY item_id
) as stock_sums
LEFT JOIN (
SELECT sold.item_id as item_id, SUM(sold.amount) as so_sum
FROM sold
GROUP by item_id
) as sold_sums ON stock_sums.item_id = sold_sums.item_id
I hope this would help.
Here is how I would do it. I assume that Stock is the main table, with an ID and an amount, and that Sold maps to Stock via an ID value, and has zero to many records for each Stock item.
SELECT Q1.id, Q1.Total1, Q2.Total2
, Q1.Total1 - COALESCE(Q2.Total2,0) as Outstanding
FROM (
SELECT id, SUM(amount) as Total1
FROM Stock GROUP BY id
) as Q1
LEFT OUTER JOIN (
SELECT id, SUM(Amount) as Total2
FROM Sold GROUP BY id
) as Q2
ON Q2.id = Q1.id
Note that simply formatting your SQL into a clean way forces you to break it into logical parts and will often reveal exactly what is wrong with the query.
The example above also handles correctly the cases where there is not match in the Sold table.
Cheers,
Daniel
(Code Assumptions)
DROP TABLE Stock
CREATE TABLE Stock (
id integer
, amount decimal(10,2)
)
INSERT INTO Stock (id, amount ) VALUES ( 1, 10.1);
INSERT INTO Stock (id, amount ) VALUES ( 2, 20.2);
INSERT INTO Stock (id, amount ) VALUES ( 3, 30.3);
SELECT * FROM STOCK
DROP TABLE Sold
CREATE TABLE Sold (
id integer
, amount decimal(10,2)
)
INSERT INTO Sold (id, amount ) VALUES ( 1, 1.1);
INSERT INTO Sold (id, amount ) VALUES ( 1, 2.2);
INSERT INTO Sold (id, amount ) VALUES ( 1, 3.3);
INSERT INTO Sold (id, amount ) VALUES ( 2, 2.22);
SELECT * FROM Sold
SELECT Q1.id, Q1.Total1, Q2.Total2
, Q1.Total1 - COALESCE(Q2.Total2,0) as Outstanding
FROM (
SELECT id, SUM(amount) as Total1
FROM Stock GROUP BY id
) as Q1
LEFT OUTER JOIN (
SELECT id, SUM(Amount) as Total2
FROM Sold GROUP BY id
) as Q2
ON Q2.id = Q1.id
Results:
id Total1 Total2 Outstanding
1 10.10 6.60 3.50
2 20.20 2.22 17.98
3 30.30 30.30
REVISION
It sounds like you want the total amount of stock you have as one count for each different stock. Then you want how much stock you have left for each stock based on what has been sold. Correct?
If so check this out:
select stock, sum(a.x) as sharesBeforeSale, (sum(a.x) - sum(b.y)) as sharesAfterSale
FROM db.table1 a, db.table2 b
WHERE a.UNIQUEID = b.UNIQUEID AND b.y IS NOT NULL
GROUP BY a.UNIQUEID;
Does that accomplish what you are looking to do?
stock sharesBeforeSale sharesAfterSale
duk 100 25
orc 101 101
yrc 54 41
Enjoy!
Sample tables
db.table1 (stock owned):
UNIQUEID x stock
1 100 duk
2 101 orc
3 54 yrc
db.table2 (stock sold):
UNIQUEID y
1 75
2 0
3 13