Trying to utilize a window function instead of this script - mysql

I'm trying to improve my query for this topic at hand. I'm trying to find the top 5 and bottom 5 growth rates per state from 2020 to 2021 in my org. The table has the columns as specified: orderid, orderdate, totaldue, state, etc. (these are probably the most important columns). This is the query I created so far, while it works I think it would be more efficient if I was able to implement a window function instead.
SELECT state, SUM(TotalDue) as sum
into #temp2020
from table
where OrderDate like "2020%"
group by StateProvince
order by sum desc;
SELECT state, SUM(TotalDue) as sum
into #temp2021
from table
where OrderDate like "2021%"
group by StateProvince
order by sum desc;
--top 5 growth rates--
select #temp2020.state, ((#temp2021.sum-#temp2020.sum)/#temp2020.sum) as 'growthrate'
from #temp2020
join #temp2021 on #temp2021.state = #temp2020.state
order by growthrate desc limit 5
--bottom 5 growth rates--
select #temp2020.state, ((#temp2021.sum-#temp2020.sum)/#temp2020.sum) as 'growthrate'
from #temp2020
join #temp2021 on #temp2021.state = #temp2020.state
order by growthrate asc limit 5
drop table if exists #temp2020
drop table if exists #temp2021

You could use DENSE_RANK here:
WITH cte AS (
SELECT state, SUM(TotalDue) AS sum,
DENSE_RANK() OVER (ORDER BY SUM(TotalDue)) rnk_asc,
DENSE_RANK() OVER (ORDER BY SUM(TotalDue) DESC) rnk_desc
FROM yourTable
WHERE YEAR(OrderDate) IN (2020, 2021)
GROUP BY state
)
SELECT state, sum
FROM cte
WHERE rnk_asc <= 5 OR rnk_desc <= 5
ORDER BY state, sum;

Related

Not getting this SQL query

Print all details of the 16th order placed by each customer if any.
How to print exact 16th Order?
SELECT COUNT(orderId)
FROM orders
GROUP BY CustomerID
ORDER BY CustomerID;
We can use a CTE and RANK to create a list of all orderId's, customerID's and their "order" as you named it.
Then we fetch those entries from the entire result whose order is 16.
WITH result AS
(
SELECT orderId, customerID,
RANK() OVER
(PARTITION BY customerID
ORDER BY orderId) AS rnk
FROM orders
)
SELECT orderId, customerID
FROM result
WHERE rnk=16
GROUP BY orderId, customerID
ORDER BY customerID;
For customerID's having less than 16 orders, nothing will be selected.
We can also use ROW_NUMBER instead of RANK in the above query, this makes no difference in your use case.
Select * from
(
SELECT *,
DENSE_RANK()
OVER(
PARTITION BY customerID
ORDER BY orderID
) my_rank
FROM orders
) as myTable
where my_rank = 16
order by CustomerID;
You can just use offset like:
SELECT *
FROM orders
GROUP BY CustomerID
ORDER BY CustomerID
LIMIT 1 OFFSET 15;
and set the OFFSET value to 15 so it skips the first 15 values and prints from the 16th value and limit it to only one row by setting the LIMIT value to 1

How to get 1 result per year

This is my code currently.
SELECT manufacturer, model, year, units_sold_m
FROM `quick-catcher-350001.Phone_Sales.Phone`
WHERE smartphone=TRUE
ORDER BY units_sold_m DESC
I would like to add a line to this to get only 1 result per year to learn what was the top selling phone of each year.
Thanks
Since MYSQL 8.0, this can be done using ROW_NUMBER. You can use a subquery to build the row_number grouped by year and sorted by the sold units desc and then always take the rows only with row_num = 1:
SELECT sub.manufacturer, sub.model, sub.year, sub.units_sold_m
FROM
(SELECT manufacturer, model, year, units_sold_m,
ROW_NUMBER() OVER(PARTITION BY year
ORDER BY units_sold_m DESC) AS row_num
FROM yourtable
WHERE smartphone=TRUE) AS sub
WHERE row_num = 1;
Please note this will show one row per year only, according to your requirement.
If your stated requirement is incorrect and you want to show all entries per year when multiple entries have the same maximum units_sold_m, you should use the MAX function instead:
SELECT manufacturer, model, year, units_sold_m
FROM yourtable
WHERE (year, units_sold_m) IN
(SELECT year, MAX(units_sold_m)
FROM yourtable
WHERE smartphone=TRUE
GROUP BY year)
ORDER BY year;
Please see here the difference: db<>fiddle

SELECT COUNT() INSIDE A SELECT MAX() SQL

I've got this table called player_mast in a db (data are just an example), and I want to find the club which supplied the most number of players to the 2016 EURO cup.
player_id
country_id
jersey_no
player_name
posi_to_play
dt_of_bir
age
playing_club
1231
1231
10
Hazard
striker
2/3/1991
33
Chelsea
Why this query doesn't work? It seems right to me:
SELECT playing_club, MAX(NumberOfPlayerForTeam)
FROM (
SELECT playing_club, COUNT(player_id) AS NumberOfPlayerForTeam
FROM player_mast
GROUP BY(playing_club))
GROUP BY(playing_club);
Try this
SELECT playing_club, NumberOfPlayerForTeam<br>
FROM (<br>
SELECT playing_club, COUNT(player_id) AS NumberOfPlayerForTeam<br>
FROM player_mast<br>
GROUP BY(playing_club))<br>
ORDER BY NumberOfPlayerForTeam DESC LIMIT 1;
If you want the playing clubs that have the most rows in your table, you can use rank():
SELECT pm.*
FROM (SELECT playing_club, COUNT(*) AS NumberOfPlayerForTeam,
RANK() OVER (ORDER BY COUNT(*) DESC) as seqnum
FROM player_mast
GROUP BY playing_club
) pm
WHERE seqnum = 1;
Note:
COUNT(<column name>) counts the number of non-NULL values in the column. There is no need to do this additional check; COUNT(*) does what you want.
Parentheses are not needed around the GROUP BY keys.

Way to do MAX(evaluation_expression, return_expression) in SQL

I find myself often wanting to get an adjacent row value when I do a MIN or MAX statement. For example in the following statement:
WITH people AS (
select 'Greg' as name, 20 as age union
select 'Tom' as name, 17 as age
) SELECT MAX(age) FROM people;
# MAX(age)
20
The MAX function does the equivalent of: MAX(eval_expression=age, return_expression=age), where it always has the same evaluation and return value (implicitly). However, I would like to find the name of the person with the max age. So, the conceptual syntax would be: MAX(eval_expression=age, return_expression=name). This is a pattern I find myself using quite frequently and usually end up hacking something together like:
WITH people AS (
select 'Greg' as name, 20 as age union
select 'Tom' as name, 17 as age
) SELECT name FROM people NATURAL JOIN (SELECT name, MAX(age) age FROM people) _;
# name
'Greg'
Is there a generic way to do the MAX(expr, return) that I'm trying to accomplish?
Update: to provide an example where an aggregation is required:
with sales as (
select DATE '2014-01-01' as date, 100 as sales, 'Fish' as product union
select DATE '2014-01-01' as date, 105 as sales, 'Potatoes' as product union
select DATE '2014-01-02' as date, 84 as sales, 'Salsa' as product
) select date, max(sales) from sales group by date
# date, max(sales)
2014-01-01, 105
2014-01-02, 84
And how to get the equivalent of: MAX(expr=sales, return=product)? Something like:
WITH sales AS (
select DATE '2014-01-01' as d, 100 as revenue, 'Fish' as product union
select DATE '2014-01-01' as d, 105 as revenue, 'Potatoes' as product union
select DATE '2014-01-02' as d, 84 as revenue, 'Salsa' as product
) SELECT d AS date, product FROM sales NATURAL JOIN (SELECT d, MAX(revenue) AS revenue FROM sales GROUP BY d) _;
# date, product
2014-01-01, Potatoes
2014-01-02, Salsa
Unless I'm missing something here -
use limit with order by:
WITH people AS (
select 'Greg' as name, 20 as age union
select 'Tom' as name, 17 as age
)
SELECT name
FROM people
ORDER BY age DESC
LIMIT 1;
# name
'Greg'
If you want to use first_value(), I would recommend:
select distinct date,
first_value(product) over(partition by date order by sales desc) top_product
from sales
No need for aggregation here, nor for a frame specification in the window function. The window function walks the dataset starting from the row with the greatst sales, so all rows in the partition get the same top_product assigned. Then distinct retains only one row per partition.
But basically, this ends up as a greatest-n-per group problem, where you want the row with the greatest sale for each date. The first_value() solution does not scale well if you want more than one column on that row. A typical solution is to rank records in a subquery, then filter. Again, no aggregation is needed, that's filtering logic only:
select *
from (
select s.*
row_number() over(partition by date order bys ales desc) rn
from sales
) t
where rn = 1
One solution would be to use an unbounded window function such as FIRST_VALUE, where you can sort the date partition by sales. Here would be an example:
;WITH sales AS (
select DATE '2014-01-01' as date, 100 as sales, 'Fish' as product union
select DATE '2014-01-01' as date, 105 as sales, 'Potatoes' as product union
select DATE '2014-01-01' as date, 103 as sales, 'Lettuce' as product union
select DATE '2014-01-02' as date, 84 as sales, 'Salsa' as product
)
SELECT DISTINCT date, LAST_VALUE(product) OVER (
partition by date
order by sales
-- Default: https://dev.mysql.com/doc/refman/8.0/en/window-functions-frames.html
-- rows between unbounded preceding and current row
rows between unbounded preceding and unbounded following
) top_product
FROM sales group by date;
# date, top_product
'2014-01-01', 'Potatoes'
'2014-01-02', 'Salsa'
I think the subselect might be easier to read (at least for me), but this is another option. You'd have to check on the performance of the two but I'd think the analytic function (without the not-indexeable join) would be much faster.

Get top item for each year

I have a datatable with some records. Using mysql I am able to get a result grouped by a specific period (year) and users and ordered (in descending order) by number of species.
SELECT YEAR(entry_date) AS period, uid AS user, COUNT(DISTINCT pid) AS species
FROM records
WHERE YEAR(entry_date)<YEAR(CURDATE())
GROUP BY period, uid
ORDER by period, species DESC
Please see attached picture of the result. But what if I only want the get the TOP USER (and number of species) for EACH year (the red marked rows)? How can I achieve that?
I am able to handle this later in my php code but it would be nice to have this sortered out already in mysql query.
Thanks for your help!
If you are running MySQL 8.0, you can use RANK() to rank records in years partitions by their count of species, and then filter on the top record per group:
SELECT *
FROM (
SELECT
YEAR(entry_date) AS period,
uid AS user,
COUNT(DISTINCT pid) AS species,
RANK() OVER(PARTITION BY YEAR(entry_date) ORDER BY COUNT(DISTINCT pid) DESC) rn
FROM records
WHERE entry_date < DATE_FORMAT(CURRENT_DATE, '%Y-01-01')
GROUP BY period, uid
) t
WHERE rn = 1
ORDER by period
This preserves top ties, if any. Note that uses an index-friendly filter on the dates in the WHERE clause.
In earlier versions, an equivalent option is to filter with a HAVING clause and a correlated subquery:
SELECT
YEAR(entry_date) AS period,
uid AS user,
COUNT(DISTINCT pid) AS species
FROM records r
WHERE entry_date < DATE_FORMAT(CURRENT_DATE, '%Y-01-01')
GROUP BY period, uid
HAVING COUNT(DISTINCT pid) = (
SELECT COUNT(DISTINCT r1.pid) species1
FROM records r1
WHERE YEAR(r1.entry_date) = period
GROUP BY r1.uid
ORDER BY species1 DESC
LIMIT 1
)
ORDER by period