MySQL - How do i get the Mode from a group by Select - mysql

I have created a bunch of MySQL script that import products into magento, i have used a GROUP BY to group products into configurable products
in the past, i have used MIN() to get the lowest price in the group and used that value as the price for the configurable products but as of late there have been some cases like this
Simple Product ID | Price ($)
------------------------------
1001 | 10
1002 | 10
1003 | 5
1004 | 10
1005 | 20
1006 | 10
in this situation, when i create the configurable product, MIN(Price) would return 5, i only just learned that with configurable products i could have a negative difference which means now i need to change my code so instead of getting minimum (5) i get the Mode (10)
I figured i just have to replace the MIN() in my query with MODE() but there doesn't seem to be a MODE() function for MySQL, here's an example of the query i'm using
INSERT INTO import_table
(
product_id, stock_id, price
)
SELECT ODT.product_id, ODT.stock_id, MIN(ODT.Price)
FROM org_data_table AS ODT
GROUP BY ODT.stock_id
Is there a function that already exists to get the Mode? if not then what do i need to do to get it, i would like to just change the MIN() part of my query

You can retrieve the MODE of price through below query
SELECT ODT.Price as price, count(ODT.Price) AS cnt
FROM org_data_table as ODT
GROUP BY ODT.price
ORDER BY cnt DESC
LIMIT 1;
This will return the Price Mode value, and then you can utilize this in SELECT statement as below so you can insert the Mode value, see below query:
INSERT INTO import_table
(
product_id, stock_id, price
)
SELECT ODT.product_id, ODT.stock_id, (SELECT ODT.Price as price
FROM org_data_table as ODT
GROUP BY ODT.price
ORDER BY count(ODT.Price) DESC
LIMIT 1) as price
FROM org_data_table AS ODT
GROUP BY ODT.stock_id

This can be a little tricky because MODEs can have multiple prices -- if 2 prices have the same count, then they are both considered the MODE.
This query should return any records from the org_data_table whose price is in the MODE:
SELECT DISTINCT T.Product_Id, T.Stock_Id, T.Price
FROM org_data_table T
JOIN (
SELECT COUNT(*) cnt, Price
FROM org_data_table
GROUP BY Price
HAVING COUNT(*) = (
SELECT Max(cnt)
FROM (
SELECT COUNT(*) cnt, Price
FROM org_data_table
GROUP BY Price
) t
)
) T2 ON T.Price = T2.Price
And here is some sample Fiddle: http://sqlfiddle.com/#!2/8f1a2/1
Obviously you can add:
INSERT INTO import_table (product_id, stock_id, price)
before the query to insert the records.
Hope this helps.

Related

Way to do MAX(evaluation_expression, return_expression) in SQL

I find myself often wanting to get an adjacent row value when I do a MIN or MAX statement. For example in the following statement:
WITH people AS (
select 'Greg' as name, 20 as age union
select 'Tom' as name, 17 as age
) SELECT MAX(age) FROM people;
# MAX(age)
20
The MAX function does the equivalent of: MAX(eval_expression=age, return_expression=age), where it always has the same evaluation and return value (implicitly). However, I would like to find the name of the person with the max age. So, the conceptual syntax would be: MAX(eval_expression=age, return_expression=name). This is a pattern I find myself using quite frequently and usually end up hacking something together like:
WITH people AS (
select 'Greg' as name, 20 as age union
select 'Tom' as name, 17 as age
) SELECT name FROM people NATURAL JOIN (SELECT name, MAX(age) age FROM people) _;
# name
'Greg'
Is there a generic way to do the MAX(expr, return) that I'm trying to accomplish?
Update: to provide an example where an aggregation is required:
with sales as (
select DATE '2014-01-01' as date, 100 as sales, 'Fish' as product union
select DATE '2014-01-01' as date, 105 as sales, 'Potatoes' as product union
select DATE '2014-01-02' as date, 84 as sales, 'Salsa' as product
) select date, max(sales) from sales group by date
# date, max(sales)
2014-01-01, 105
2014-01-02, 84
And how to get the equivalent of: MAX(expr=sales, return=product)? Something like:
WITH sales AS (
select DATE '2014-01-01' as d, 100 as revenue, 'Fish' as product union
select DATE '2014-01-01' as d, 105 as revenue, 'Potatoes' as product union
select DATE '2014-01-02' as d, 84 as revenue, 'Salsa' as product
) SELECT d AS date, product FROM sales NATURAL JOIN (SELECT d, MAX(revenue) AS revenue FROM sales GROUP BY d) _;
# date, product
2014-01-01, Potatoes
2014-01-02, Salsa
Unless I'm missing something here -
use limit with order by:
WITH people AS (
select 'Greg' as name, 20 as age union
select 'Tom' as name, 17 as age
)
SELECT name
FROM people
ORDER BY age DESC
LIMIT 1;
# name
'Greg'
If you want to use first_value(), I would recommend:
select distinct date,
first_value(product) over(partition by date order by sales desc) top_product
from sales
No need for aggregation here, nor for a frame specification in the window function. The window function walks the dataset starting from the row with the greatst sales, so all rows in the partition get the same top_product assigned. Then distinct retains only one row per partition.
But basically, this ends up as a greatest-n-per group problem, where you want the row with the greatest sale for each date. The first_value() solution does not scale well if you want more than one column on that row. A typical solution is to rank records in a subquery, then filter. Again, no aggregation is needed, that's filtering logic only:
select *
from (
select s.*
row_number() over(partition by date order bys ales desc) rn
from sales
) t
where rn = 1
One solution would be to use an unbounded window function such as FIRST_VALUE, where you can sort the date partition by sales. Here would be an example:
;WITH sales AS (
select DATE '2014-01-01' as date, 100 as sales, 'Fish' as product union
select DATE '2014-01-01' as date, 105 as sales, 'Potatoes' as product union
select DATE '2014-01-01' as date, 103 as sales, 'Lettuce' as product union
select DATE '2014-01-02' as date, 84 as sales, 'Salsa' as product
)
SELECT DISTINCT date, LAST_VALUE(product) OVER (
partition by date
order by sales
-- Default: https://dev.mysql.com/doc/refman/8.0/en/window-functions-frames.html
-- rows between unbounded preceding and current row
rows between unbounded preceding and unbounded following
) top_product
FROM sales group by date;
# date, top_product
'2014-01-01', 'Potatoes'
'2014-01-02', 'Salsa'
I think the subselect might be easier to read (at least for me), but this is another option. You'd have to check on the performance of the two but I'd think the analytic function (without the not-indexeable join) would be much faster.

MySQL most price change over time

price | date | product_id
100 | 2020-09-21 | 1
400 | 2020-09-20 | 2
300 | 2020-09-20 | 3
200 | 2020-09-19 | 1
400 | 2020-09-18 | 2
I add an entry into this table every day with a product's price that day.
Now I want to get most price drops for the last week (all dates up to 2020-09-14), in this example it would only return the product_id = 1, because that's the only thing that changed.
I think I have to join the table to itself, but I'm not getting it to work.
Here's something that I wanted to return the most price changes over the last day, however it's not working.
select pt.price, pt.date, pt.product_id, (pt.price - py.price) as change
from prices as pt
inner join (
select *
from prices
where date > '2020-09-20 19:33:43'
) as py
on pt.product_id = py.product_id
where pt.price - py.price > 0
order by change
I understand that you want to count how many times the price of each product changed over the last 7 days.
A naive approach would use aggregation and count(distinct price) - but it fails when a product's price changes back and forth.
A safer approach is window functions: you can use lag() to retrieve the previous price, and compare it against the current price; it is then easy to aggregate and count the price changes:
select product_id, sum(price <> lag_price) cnt_price_changes
from (
select t.*, lag(price) over(partition by product_id order by date) lag_price
from mytable t
where date >= current_date - interval 7 day
) t
group by product_id
order by price_changes desc
Try using MAX() and MIN() instead....
select MAX(pt.price), MIN(pt.price), MAX(pt.price) - MIN(pt.price) as change
from prices as pt
inner join (
select *
from prices
where date > '2020-09-20 19:33:43'
) as py
on pt.product_id = py.product_id
order by change
Instead of subtracting every row by every other row to get the result, you can find the max and min's easily by means of MAX() and MIN(), and, ultimately, **MAX() - MIN()**. Relevant lines from the linked MySQL documentation...
MAX(): Returns the maximum value of expr.
MIN(): Returns the minimum value of expr.
You won't be able to pull the other fields (id's, dates) since this is a GROUP BY() implied by the MAX() and MIN(), but you should then be able to get that info by query SELECT * FROM ... WHERE price = MAX_VALUE_JUST_ACQUIRED.
This examples will get you results per WeekOfYear and WeekOfMonth regarding the lowering of the price per product.
SELECT
COUNT(m1.product_id) as total,
m1.product_id,
WEEK(m1.ddate) AS weekofyear
FROM mytest m1
WHERE m1.price < (SELECT m2.price FROM mytest m2 WHERE m2.ddate<m1.ddate AND m1.product_id=m2.product_id LIMIT 0,1)
GROUP BY weekofyear,m1.product_id
ORDER BY weekofyear;
SELECT
COUNT(m1.product_id) as total,
m1.product_id,
FLOOR((DAYOFMONTH(ddate) - 1) / 7) + 1 AS weekofmonth
FROM mytest m1
WHERE m1.price < (SELECT m2.price FROM mytest m2 WHERE m2.ddate<m1.ddate AND m1.product_id=m2.product_id LIMIT 0,1)
GROUP BY weekofmonth,m1.product_id
ORDER BY weekofmonth;
Try this out in SQLFiddle.

can more than one aggregation functions be used in `select` in case of `group by`?

here is the table:
here is my query
select sum(if(customer_pref_delivery_date = min(order_date), 1, 0)) immidiate_percentage
from Delivery
group by customer_id;
then I get error
Invalid use of group function
Ignore what I want to do logically, I'm wondering why I get this error? when I remove sum from the select it works, so I'm thinking maybe in sql only one aggregation function (like min in this case) be allowed to use in select when doing group by? is that true?
You can't nest aggregations in the select list.
If you want to get the percentage of the orders that need to be delivered at the same day as the order per customer then do this:
select customer_id,
100.0 * avg(customer_pref_delivery_date = order_date) immediate_percentage
from Delivery
group by customer_id;
See the demo.
Results:
| customer_id | immediate_percentage |
| ----------- | -------------------- |
| 1 | 0 |
| 2 | 50 |
| 3 | 50 |
| 4 | 100 |
You cannot nest aggregation functions. It simply doesn't make sense. Each aggregation function produces one value per group -- the value cannot then be compared to the rows that comprise the group in the same select.
I don't think you need any second aggregation for what you are doing:
select customer_id, sum( customer_pref_delivery_date = order_date ) as immediate_percentage
from (select d.*,
min(order_date) over (partition by customer_id) as min_order_date
from Delivery
) d
group by customer_id;
It seems strange that you are calling a sum a "percentage" but that is the logic you have in your query.
If you really want to compare to the minimum order date for each customer, you can use a window function:
select customer_id, sum( customer_pref_delivery_date = min_order_date ) as immediate_percentage
from (select d.*,
min(order_date) over (partition by customer_id) as min_order_date
from Delivery
) d
group by customer_id;
You cannot nest aggregate functions.
One way to work around this would to compute the minimum order date per customer in a subquery, and then join it with the original table, as follows:
select
d.customer_id
sum(if(d.customer_pref_delivery_date = dmin.min_order_date), 1, 0)) immidiate_percentage
from
delivery d
inner join (
select customer_id, min(order_date) min_order_date
from delivery
group by customer_id
) dmin on dmin.customer_id = d.customer_id
group by d.customer_id;
In MySQL 8.0, you can use window functions:
select
d.customer_id,
sum(if(customer_pref_delivery_date = min_order_date), 1, 0)) immidiate_percentage
from (
select
customer_id,
customer_pref_delivery_date,
min(order_date) over(partition by customer_id) min_order_date
from delivery
) t
group by customer_id

How to get rows with max date when grouping in MySQL?

I have a table with prices and dates on product:
id
product
price
date
I create a new record when price change. And I have a table like this:
id product price date
1 1 10 2014-01-01
2 1 20 2014-02-17
3 1 5 2014-03-28
4 2 25 2014-01-05
5 2 12 2014-02-08
6 2 30 2014-03-12
I want to get last price for all products. But when I group with "product", I can't get a price from a row with maximum date.
I can use MAX(), MIN() or COUNT() function in request, but I need a result based on other value.
I want something like this in final:
product price date
1 5 2014-03-28
2 30 2014-03-12
But I don't know how. May be like this:
SELECT product, {price with max date}, {max date}
FROM table
GROUP BY product
Alternatively, you can have subquery to get the latest get for every product and join the result on the table itself to get the other columns.
SELECT a.*
FROM tableName a
INNER JOIN
(
SELECT product, MAX(date) mxdate
FROM tableName
GROUP BY product
) b ON a.product = b.product
AND a.date = b.mxdate
I think the easiest way is a substring_index()/group_concat() trick:
SELECT product,
substring_index(group_concat(price order by date desc), ',', 1) as PriceOnMaxDate
max(date)
FROM table
GROUP BY product;
Another way, that might be more efficient than a group by is:
select p.*
from table t
where not exists (select 1
from table t2
where t2.product = t.product and
t2.date > t.date
);
This says: "Get me all rows from the table where the same product does not have a larger date." That is a fancy way of saying "get me the row with the maximum date for each product."
Note that there is a subtle difference: the second form will return all rows that on the maximum date, if there are duplicates.
Also, for performance an index on table(product, date) is recommended.
You can use a subquery that groups by product and return the maximum date for every product, and join this subquery back to the products table:
SELECT
p.product,
p.price,
p.date
FROM
products p INNER JOIN (
SELECT
product,
MAX(date) AS max_date
FROM
products
GROUP BY
product) m
ON p.product = m.product AND p.date = m.max_date
SELECT
product,
price,
date
FROM
(SELECT
product,
price,
date
FROM table_name ORDER BY date DESC) AS t1
GROUP BY product;

Finding the 2nd most expensive total products in MySQL

I'm working on simple queries to learn MySQL, in my example database, I keep track of Stores which sells electronic devices, I have a table Sells(Store, Item, Price).
And example data is,
'Best Buy', 'Galaxy S', 1000
'Buy More', 'Macbook Air', 2000
'Best Buy', 'Microsoft Mouse', 20
'Best Buy', 'Macbook Pro Cover', 40
'Buy More', 'Asus Zenbook', 2000
And so on..
I tried the following sql statement, but it says:
Error Code: 1111. Invalid use of group function 0.000 sec
SELECT store
FROM sells
WHERE SUM(price) <
(SELECT SUM(price) AS total
FROM sells
GROUP BY store
ORDER BY total DESC
LIMIT 1)
GROUP BY store
ORDER BY SUM(price) DESC
I would be appreciate if you can help me
Thanks
This will just plain show the second most expensive store;
SELECT STORE
FROM TABLE_A
GROUP BY STORE
ORDER BY SUM(PRICE) DESC
LIMIT 1,1
Demo here.
If you want the price displayed too, you can just select that too;
SELECT STORE, SUM(PRICE) TOTAL_PRICE
FROM TABLE_A
GROUP BY STORE
ORDER BY TOTAL_PRICE DESC
LIMIT 1,1
Demo here.
Edit: If you have several most expensive stores and several second most expensive stores, the query to get the all the second most expensive ones becomes quite a bit more convoluted; I'm sure someone can beat the efficiency of this one;
SELECT STORE, SUM(PRICE) TOTAL_PRICE
FROM TABLE_A
GROUP BY STORE
HAVING TOTAL_PRICE =
(SELECT SUM(PRICE) TMP
FROM TABLE_A
GROUP BY STORE
HAVING TMP <
(SELECT SUM(PRICE) TMP2
FROM TABLE_A
GROUP BY STORE
ORDER BY TMP2 DESC
LIMIT 1)
ORDER BY TMP DESC LIMIT 1)
Demo here.
You can do like this;
SELECT *,
SUM(price) AS totalprice
FROM sells
GROUP BY store
ORDER BY totalprice DESC
LIMIT 2
You first select the sum of the prices and store it temporarily in for ex. totalprice then as you already did group by store. To get the most expensive stores order the sum backwards and then limit to just two results.
You will be able to get the totalprice just as an ordinary column when you loop out the results
almost correct,
SELECT SUM(price) as price_total FROM sells GROUP BY store
if you want to order by you can do subquery, like:
SELECT price_total FROM (SELECT SUM(price) as price_total FROM sells GROUP BY store) as res ORDER BY price LIMIT 2
if you want to take 2nd you might make another query but i think it is better to use your back-end language
SELECT distinct price from sells ORDER BY price DESC, and in your code, just take the second one.
If you need the rest of the info, do this:
SELECT * from sells
WHERE price = (SELECT distinct price from sells ORDER BY price DESC LIMIT 1,1)
didn'T test it but should work
SELECT S.store
FROM (
SELECT SUM(T.price) AS sum_price
FROM formList_Total AS T
GROUP BY T.store
) AS S
ORDER BY sum_price DESC
LIMIT 1 , 1
Sorry, went to testing, here what i ended up with.