How to get 1 result per year - mysql

This is my code currently.
SELECT manufacturer, model, year, units_sold_m
FROM `quick-catcher-350001.Phone_Sales.Phone`
WHERE smartphone=TRUE
ORDER BY units_sold_m DESC
I would like to add a line to this to get only 1 result per year to learn what was the top selling phone of each year.
Thanks

Since MYSQL 8.0, this can be done using ROW_NUMBER. You can use a subquery to build the row_number grouped by year and sorted by the sold units desc and then always take the rows only with row_num = 1:
SELECT sub.manufacturer, sub.model, sub.year, sub.units_sold_m
FROM
(SELECT manufacturer, model, year, units_sold_m,
ROW_NUMBER() OVER(PARTITION BY year
ORDER BY units_sold_m DESC) AS row_num
FROM yourtable
WHERE smartphone=TRUE) AS sub
WHERE row_num = 1;
Please note this will show one row per year only, according to your requirement.
If your stated requirement is incorrect and you want to show all entries per year when multiple entries have the same maximum units_sold_m, you should use the MAX function instead:
SELECT manufacturer, model, year, units_sold_m
FROM yourtable
WHERE (year, units_sold_m) IN
(SELECT year, MAX(units_sold_m)
FROM yourtable
WHERE smartphone=TRUE
GROUP BY year)
ORDER BY year;
Please see here the difference: db<>fiddle

Related

Trying to utilize a window function instead of this script

I'm trying to improve my query for this topic at hand. I'm trying to find the top 5 and bottom 5 growth rates per state from 2020 to 2021 in my org. The table has the columns as specified: orderid, orderdate, totaldue, state, etc. (these are probably the most important columns). This is the query I created so far, while it works I think it would be more efficient if I was able to implement a window function instead.
SELECT state, SUM(TotalDue) as sum
into #temp2020
from table
where OrderDate like "2020%"
group by StateProvince
order by sum desc;
SELECT state, SUM(TotalDue) as sum
into #temp2021
from table
where OrderDate like "2021%"
group by StateProvince
order by sum desc;
--top 5 growth rates--
select #temp2020.state, ((#temp2021.sum-#temp2020.sum)/#temp2020.sum) as 'growthrate'
from #temp2020
join #temp2021 on #temp2021.state = #temp2020.state
order by growthrate desc limit 5
--bottom 5 growth rates--
select #temp2020.state, ((#temp2021.sum-#temp2020.sum)/#temp2020.sum) as 'growthrate'
from #temp2020
join #temp2021 on #temp2021.state = #temp2020.state
order by growthrate asc limit 5
drop table if exists #temp2020
drop table if exists #temp2021
You could use DENSE_RANK here:
WITH cte AS (
SELECT state, SUM(TotalDue) AS sum,
DENSE_RANK() OVER (ORDER BY SUM(TotalDue)) rnk_asc,
DENSE_RANK() OVER (ORDER BY SUM(TotalDue) DESC) rnk_desc
FROM yourTable
WHERE YEAR(OrderDate) IN (2020, 2021)
GROUP BY state
)
SELECT state, sum
FROM cte
WHERE rnk_asc <= 5 OR rnk_desc <= 5
ORDER BY state, sum;

How to use a column in select statement which is not in aggregate function nor in group by clause? [duplicate]

This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
Closed 10 months ago.
Above is the table and on the basis of which I have to answer the below question in my past interview.
Q. The most recent order value for each customer?
Answer which I have given in interview:
select customerID, ordervalue, max(orderdate)
from office
group by customerID;
I know since we are not using ordervalue in aggregate and nor in group by so this query will throw an error in SQL but I want to know how to answer this question.
Many times in my past interviewers asked a question where I need to use a column in select statement which is not in aggregate function or nor in group by. So I want know in general what is a workaround for it with an example so that I can resolve these type of questions or how to answer these questions.
The work around depends on what is being asked. For the requirements you have above, I think it makes sense to create (customerid, MAX(orderdate)) pairs.
SELECT customerid, MAX(orderdate)
FROM office
GROUP BY customerid;
Then you can use them to match the row you need from the table.
SELECT customerid, ordervalue, orderdate
FROM office
WHERE (customerid, orderdate) IN
(SELECT customerid, MAX(orderdate)
FROM office
GROUP BY customerid);
Note, this assumes there is only one order per customer per day. If there were more than one, you would see the most recent order(s) per customer. You could add also a GROUP BY on the outer query if needed.
SELECT customerid, MAX(ordervalue), orderdate
FROM office AS tt
WHERE (customerid, orderdate) IN
(SELECT customerid, MAX(orderdate)
FROM office
GROUP BY customerid)
GROUP BY customerid, orderdate;
If the non-aggregate column you need in the SELECT is functionally dependent on the column in the GROUP BY, you can add a subquery in the SELECT.
We can extend your example by adding a name column, where the name of different customers could be the same. If you wanted name instead of ordervalue, just match the customerid of the outer query to get name.
SELECT customerid,
(SELECT name FROM office WHERE customerid=o.customerid LIMIT 1) AS name,
MAX(orderdate)
FROM office AS o
GROUP BY customerid;
You are approaching the task as follows: Aggregate all rows to get one result line per customer, showing the maximum order date and its order value. The problem with this: you'd need an aggregate function to get the value for the maximum order date. The only DBMS I know of featuring such a function is Oracle with KEEP FIRST/LAST.
So look at the task from a different angle. Don't think aggregation-wise where you could count and add up values for a group and get the minimum or maximum value over all the group's rows, because after all you just want to pick single rows. (That is, pick the top 1 row per customer.) In order to pick rows, you'll use a WHERE clause.
One option has been shown by Steve in his answer:
select *
from office
where (customerid, orderdate) in
(
select customerid, max(orderdate)
from office
group by customerid
);
This is a good, straight-forward approach. (Some DBMS, though, don't feature tuples with IN clauses.)
Another way to get the "best" row for a customer would be to pick those rows for which not exists a better row:
select *
from office
where not exists
(
select null
from office better
where better.customerid = office.customerid
and better.orderdate > office.orderdate
);
And then there is the option to use a window function (aka analytic function) in order to get those rows. One example is to get the maximum dates along with the rows' data:
select customerid, ordervalue, orderdate
from
(
select
customerid, ordervalue, orderdate,
max(orderdate) over (partition by customerid) as max_orderdate
from office
)
where orderdate = max_orderdate;
And with ROW_NUMBER, RANK, and DENSE_RANK there are window functions to assign numbers to your rows in the order you want. You number them such that the best rows get number 1 and pick them. The big advantage here: you can apply any order, deal with ties and not only get the top 1, but the top n rows.
select customerid, ordervalue, orderdate
from
(
select
customerid, ordervalue, orderdate,
row_number() over (partition by customerid order by orderdate desc) as rn
from office
)
where rn = 1;

SQL query to get count of records on previous updated date (not latest update) and group by certain column

I have table T
SELECT country, count(*) ,max(upated_date) from T
GROUP BY country
This will give me count of records and latest update date by country.
How to get count of records on previous latest updated date by country?
note: updated date is different for each country
basically I want like this
If you want the second last update date, then use window functions:
SELECT country, count(*),
max(updated_date),
max(case when seqnum = 2 then updated_date end) as penultimate_updated_date
FROM (SELECT t.*,
RANK() OVER (PARTITION BY country ORDER BY updated_date DESC) as seqnum
FROM T
) t
GROUP BY country

Get top item for each year

I have a datatable with some records. Using mysql I am able to get a result grouped by a specific period (year) and users and ordered (in descending order) by number of species.
SELECT YEAR(entry_date) AS period, uid AS user, COUNT(DISTINCT pid) AS species
FROM records
WHERE YEAR(entry_date)<YEAR(CURDATE())
GROUP BY period, uid
ORDER by period, species DESC
Please see attached picture of the result. But what if I only want the get the TOP USER (and number of species) for EACH year (the red marked rows)? How can I achieve that?
I am able to handle this later in my php code but it would be nice to have this sortered out already in mysql query.
Thanks for your help!
If you are running MySQL 8.0, you can use RANK() to rank records in years partitions by their count of species, and then filter on the top record per group:
SELECT *
FROM (
SELECT
YEAR(entry_date) AS period,
uid AS user,
COUNT(DISTINCT pid) AS species,
RANK() OVER(PARTITION BY YEAR(entry_date) ORDER BY COUNT(DISTINCT pid) DESC) rn
FROM records
WHERE entry_date < DATE_FORMAT(CURRENT_DATE, '%Y-01-01')
GROUP BY period, uid
) t
WHERE rn = 1
ORDER by period
This preserves top ties, if any. Note that uses an index-friendly filter on the dates in the WHERE clause.
In earlier versions, an equivalent option is to filter with a HAVING clause and a correlated subquery:
SELECT
YEAR(entry_date) AS period,
uid AS user,
COUNT(DISTINCT pid) AS species
FROM records r
WHERE entry_date < DATE_FORMAT(CURRENT_DATE, '%Y-01-01')
GROUP BY period, uid
HAVING COUNT(DISTINCT pid) = (
SELECT COUNT(DISTINCT r1.pid) species1
FROM records r1
WHERE YEAR(r1.entry_date) = period
GROUP BY r1.uid
ORDER BY species1 DESC
LIMIT 1
)
ORDER by period

MySQL limit 5 per month

I try to show the 'top 5' per month of worked hours.
I have the following query:
SELECT
concat(m.firstname, " ",m.lastname) AS name,
SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(pl.end_activity,pl.start_activity)))) AS activity,
month(start_activity) AS month,
year(start_activity) AS year
FROM
log AS pl
INNER JOIN
employee AS m
ON
m.employee = pl.employee
GROUP BY
name,
year,
month,
ORDER BY
year,
month,
activity
I tried: limit 0,5 bit it gives me only the first 5 records of all. How can I show 5 records ordered by month?
In MySQL version 8.0.2 and above, we can utilize Window Functions. We can utilize Row_Number() window function to determine row numbers within a partition of concatenated expression of year and month. Ordering within the partition is done based on the descending order of activity.
We can then use this result-set as a Derived Table, and consider row number up-to 5. This will give us 5 rows per month, having top activity values.
SELECT dt.*
FROM
(
SELECT
concat(m.firstname, " ",m.lastname) AS name,
SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(pl.end_activity,pl.start_activity)))) AS activity,
month(start_activity) AS month,
year(start_activity) AS year,
ROW_NUMBER() OVER (PARTITION BY CONCAT(year(start_activity), month(start_activity))
ORDER BY SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(pl.end_activity,pl.start_activity)))) DESC) AS row_no
FROM
log AS pl
INNER JOIN
employee AS m
ON
m.employee = pl.employee
GROUP BY
name,
year,
month
) AS dt
WHERE dt.row_no <= 5
ORDER BY
dt.year,
dt.month,
dt.activity