How to group by year from a concatenated column - mysql

Having a MySQL table as this, where the id is a concatenation of the date with Ymd format (of the moment the row is inserted) with an incremental id.
| id | weight |
| 20200128001 | 100 |
| 20200601002 | 250 |
| 20201208003 | 300 |
| 20210128001 | 150 |
| 20210601002 | 200 |
| 20211208003 | 350 |
To make a sum of 'weight' by a single year I'm making:
SELECT sum(weight) as weight FROM `table` WHERE id LIKE '2020%';
resulting in this case as:
650
How can I make it result in a table of weights by year, instead of querying by every single possible year, resulting in this case as:
| date | weight |
| 2020 | 650 |
| 2021 | 700 |

Use one of the string processing functions in MySQL like left()
SELECT LEFT(id,4) as Year, SUM(weight) as Weight
FROM `table`
GROUP BY LEFT(id,4)
ORDER BY LEFT(id,4)
And if you want to limit the results to just those 2 years
SELECT LEFT(id,4) as Year, SUM(weight) as Weight
FROM `table`
WHERE LEFT(id,4) IN (2021, 2022)
GROUP BY LEFT(id,4)
ORDER BY LEFT(id,4)

Related

SQL query to find maximum salary

I have a table like this
----------------------
| ID | Name | Salary |
| -- | --- | --- |
| 1 | A | 1000 |
| 2 | B | 4000 |
| 3 | C | 5000 |
| 4 | B | 600 |
| 5 | C | 2000 |
| 6 | A | 5000 |
| 7 | B | 4000 |
----------------------
And I want to query the maximum salary in the whole table and the maximum salary of each student. I can write two queries like,
>> SELECT MAX(Salary) FROM TABLE
>> SELECT NAME, MAX(SALARY) FROM TABLE GROUP BY NAME
Now, I want to do the same in a single query without using two queries. How do I approach?
ROLLUP can be used to give an extra row as a 'summary', and so give the overall MAX value...
SELECT name, MAX(salary)
FROM TABLE
GROUP BY name
WITH ROLLUP
ORDER BY GROUPING(name) DESC,
name
Which would yield...
| Name | Salary |
| --- | --- |
| NULL | 5000 |
| A | 5000 |
| B | 4000 |
| C | 5000 |
here is one way:
SELECT NAME, MAX(SALARY) , max(max(salary)) over()
FROM TABLE GROUP BY NAME
You can use GROUP BY to group around the Name field and ORDER BY ... DESC on the Salary field to get the max.
SELECT PrimaryField, MaxField
FROM MyTable
GROUP By PrimaryField
ORDER BY MaxField DESC;
Explanation:
SELECT PrimaryField, MaxField — The fields we want.
FROM MyTable — The table we want.
GROUP By PrimaryField — What we want to be treated as a primary field.
ORDER BY MaxField DESC; — Since we group by PrimaryField, we get only one row for each PrimaryField unique value. If we order the MaxField, we can get the min or max of it as needed.
For you specifically:
SELECT Name, Salary
FROM TABLE
GROUP By Name
ORDER BY Salary DESC;

Mysql Rank based on Avg

I have a table with these columns
-----------------------------------------------------------------------
|id | deviceId | totalMethaneInGrams | totalFeedInMinutes | date |
-----------------------------------------------------------------------
|1 |141 | 402 |305 |2020-10-13 |
|2 |141 | 410 |368 |2020-10-13 |
|3 |145 | 361 |300 |2020-10-13 |
-----------------------------------------------------------------------
Now i want to calculate an average of totalMethaneInGrams and totalFeedInMinutes for a subset of devices. where date is less than some day. Group them by devicedId, order them by avg(totalMethaneInGrams) and get a global rank of those devices based on avg(totalMethaneInGrams).
This what i have up until now,
SELECT
deviceId,
ROUND(avg(totalFeedInMinutes),2) as methane,
ROUND(avg(totalMethaneInGrams)) as feed
FROM sensor_data
WHERE
deviceId IN (141,123,145) AND date < '2020-10-14'
GROUP BY deviceId
ORDER BY methane
Now what i don't understand is how to calculate global rank. My understanding is we need to calculate rank of all devices in the table. Then i can just search for the devices in the returned global dataset. Can it be done in a single query ?
mysql does not do rank over multiple columns well (eg feed within methane) a workaround is to do separately and join
SELECT t.deviceId,
ROUND(avg(totalMethaneInGrams),2) methane,
rank() over (order by ROUND(avg(totalMethaneInGrams),2 ) desc) as rankmethane,
max(feed) feed,
max(rankfeed) rankfeed
fROM t
join
(SELECT
deviceId,
ROUND(avg(totalFeedInMinutes)) as feed,
rank() over (order by ROUND(avg(totalFeedInMinutes),2 ) desc) as rankfeed
fROM t
WHERE deviceId IN (141,123,145) AND date < '2020-10-14'
group by deviceid) s
on s.deviceid = t.deviceid
WHERE t.deviceId IN (141,123,145) AND date < '2020-10-14'
GROUP BY t.deviceId ;
+----------+---------+-------------+------+----------+
| deviceId | methane | rankmethane | feed | rankfeed |
+----------+---------+-------------+------+----------+
| 141 | 406.00 | 1 | 337 | 2 |
| 145 | 361.00 | 2 | 400 | 1 |
+----------+---------+-------------+------+----------+
2 rows in set (0.002 sec)

How can I retrieve all the columns on a timerange aggregation?

I am currently struggling on how to aggregate my daily data in other time aggregations (weeks, months, quarters etc).
Here is how my raw data type looks like:
| date | traffic_type | visits |
|----------|--------------|---------|
| 20180101 | 1 | 1221650 |
| 20180101 | 2 | 411424 |
| 20180101 | 4 | 108407 |
| 20180101 | 5 | 298117 |
| 20180101 | 6 | 26806 |
| 20180101 | 7 | 12033 |
| 20180101 | 8 | 80368 |
| 20180101 | 9 | 69544 |
| 20180101 | 10 | 39919 |
| 20180101 | 11 | 26291 |
| 20180102 | 1 | 1218490 |
| 20180102 | 2 | 410965 |
| 20180102 | 4 | 108037 |
| 20180102 | 5 | 297727 |
| 20180102 | 6 | 26719 |
| 20180102 | 7 | 12019 |
| 20180102 | 8 | 80074 |
First, I would like to check the sum of visits regardless of traffic_type:
SELECT date, SUM(visits) as visits_per_day
FROM visits_tbl
GROUP BY date
Here is the outcome:
| ymd | visits_per_day |
|:--------:|:--------------:|
| 20180101 | 2294563 |
| 20180102 | 2289145 |
| 20180103 | 2300367 |
| 20180104 | 2310256 |
| 20180105 | 2368098 |
| 20180106 | 2372257 |
| 20180107 | 2373863 |
| 20180108 | 2364236 |
However, if I want to check the specific day which the visits_per_day was the highest for each time aggregation (eg.: Month), I am struggling to retrieve the right output.
Here is what I did:
SELECT
(date div 100) as y_month, MAX(visits_per_day) as max_visit_per_day
FROM
(SELECT date, SUM(visits) as visits_per_day
FROM visits_tbl
GROUP BY date) as t1
GROUP BY
y_month
And here is the output of my query:
| y_month | max_visit_per_day |
|:-------:|:-----------------:|
| 201801 | 2435845 |
| 201802 | 2519000 |
| 201803 | 2528097 |
| 201804 | 2550645 |
However, I cannot know what was the exact day where the visits_per_day was the highest.
Desired output:
| y_month | max_visit_per_day | ymd |
|:-------:|:-----------------:|:--------:|
| 201801 | 2435845 | 20180130 |
| 201802 | 2519000 | 20180220 |
| 201803 | 2528097 | 20180325 |
| 201804 | 2550645 | 20180406 |
ymd would represent the day in which the visits_per_day was the highest.
This logic would be used in a dashboard with the help of programming in order to automatically select the time aggregation.
Can someone please help me?
This is a job for the structured part of structured query language. That is, you will write some subqueries and treat them as tables.
You already know how to find the number of visits per day. Let's add the month for each day to that query (http://sqlfiddle.com/#!9/a8455e/13/0).
SELECT date DIV 100 as month, date,
SUM(visits) as visits
FROM visits_tbl
GROUP BY date
Next you need to find the largest number of daily visits in each month. (http://sqlfiddle.com/#!9/a8455e/12/0)
SELECT month, MAX(visits) max_daily_visits
FROM (
SELECT date DIV 100 as month, date,
SUM(visits) as visits
FROM visits_tbl
GROUP BY date
) dayvisits
GROUP BY month
Then, the trick is retrieving the date on which that maximum occurred in each month. That requires a join. Without common table expressions (which MySQL lacks) you need to repeat the first subquery. (http://sqlfiddle.com/#!9/a8455e/11/0)
SELECT detail.*
FROM (
SELECT month, MAX(visits) max_daily_visits
FROM (
SELECT date DIV 100 as month, date,
SUM(visits) as visits
FROM visits_tbl
GROUP BY date
) dayvisits
GROUP BY month
) maxvisits
JOIN (
SELECT date DIV 100 as month, date,
SUM(visits) as visits
FROM visits_tbl
GROUP BY date
) detail ON detail.visits = maxvisits.max_daily_visits
AND detail.month = maxvisits.month
The outline of this rather complex query helps explain it. Instead of that subquery, we'll use an imaginary table called dayvisits.
SELECT detail.*
FROM (
SELECT month, MAX(visits) max_daily_visits
FROM dayvisits
GROUP BY date DIV 100
) maxvisits
JOIN dayvisits detail ON detail.visits = maxvisits.max_daily_visits
AND detail.month = maxvisits.month
You're seeking an extreme value for each month in the subquery. (This is a fairly standard sort of SQL operation.) To do that you find that value with a MAX() ... GROUP BY query. Then you join that to the subquery itself to find the other values corresponding to the extreme value.
If you did have common table expressions, the query would look like this. YOu might consider adopting the MySQL fork called MariaDB, which has CTEs.
WITH dayvisits AS (
SELECT date DIV 100 as month, date,
SUM(visits) as visits
FROM visits_tbl
GROUP BY date
)
SELECT dayvisits.*
FROM (
SELECT month, MAX(visits) max_daily_visits
FROM dayvisits
GROUP BY month
) maxvisits
JOIN dayvisits ON dayvisits.visits = maxvisits.max_daily_visits
AND dayvisits.month = maxvisits.month
[Query Check on MSSQL] its quick and efficient.
select visit_sum_day_wise.date
, visit_sum_day_wise.Max_Visits
, visit_sum_day_wise.traffic_type
, LAST_VALUE(visit_sum_day_wise.visits) OVER(PARTITION BY
visit_sum_day_wise.date ORDER BY visit_sum_day_wise.date ROWS BETWEEN
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS max_visit_per_day
from (
select visits_tbl.date , visits_tbl.visits , visits_tbl.traffic_type
,max(visits_tbl.visits ) OVER ( PARTITION BY visits_tbl.date ORDER
BY visits_tbl.date ROWS BETWEEN UNBOUNDED PRECEDING AND 0
PRECEDING) Max_visits
from visits_tbl
) as visit_sum_day_wise
where visit_sum_day_wise.visits = (select max(visits_B.visits ) from
visits_tbl visits_B where visits_B.Date = visit_sum_day_wise.date )
enter image description here

select sum of top 10 rows grouped by location column

I have a table of sales paired to the employee that sold it and at which location.
+---------------+----------------------+-----------+-------------+
| Units | location | name | mnt |
+---------------+----------------------+-----------+-------------+
| 5 | abc | bob | 2014-03-01 |
| 3 | abc | tim | 2014-03-01 |
| 4 | xyz | paul | 2014-03-01 |
| 1 | nyc | joe | 2014-03-01 |
+---------------+----------------------+-----------+-------------+
I want to get the stores with the highest sales (sum of units). The query should return the top 10 stores, with the units they sold ordered descending.
I tried this but only got 1 row returned and that too looks wrong.
SELECT * FROM myTable WHERE region='NE' ORDER BY SUM(units) LIMIT 10
FYI: there are additional columns in the table that i have omitted as they dont add much value to the question. One such column is the region column that is in the where clause.
try this
SELECT SUM(units), myTable.* FROM myTable GROUP BY location ORDER BY SUM(units) DESC LIMIT 10
Something like this:
SELECT location, COUNT(Units) FROM myTable WHERE region='NE' GROUP BY location ORDER BY COUNT(Units) LIMIT 10

How to calculated multiple moving average in MySQL

Using table below, How would get a column for 5 period moving average, 10 period moving average, 5 period exponential moving average.
+--------+------------+
| price | data_date |
+--------+------------+
| 122.29 | 2009-10-08 |
| 122.78 | 2009-10-07 |
| 121.35 | 2009-10-06 |
| 119.75 | 2009-10-05 |
| 119.02 | 2009-10-02 |
| 117.90 | 2009-10-01 |
| 119.61 | 2009-09-30 |
| 118.81 | 2009-09-29 |
| 119.33 | 2009-09-28 |
| 121.08 | 2009-09-25 |
+--------+------------+
The 5-row moving average in your example won't work. The LIMIT operator applies to the return set, not the rows being considered for the aggregates, so changing it makes no difference to the aggregate values.
SELECT AVG(a.price) FROM (SELECT price FROM t1 WHERE data_date <= ? ORDER BY data_date DESC LIMIT 5) AS a;
Replace ? with the date whose MA you need.
SELECT t1.data_date,
( SELECT SUM(t2.price) / COUNT(t2.price) as MA5 FROM mytable AS t2 WHERE DATEDIFF(t1.data_date, t2.data_date) BETWEEN 0 AND 6 )
FROM mytable AS t1 ORDER BY t1.data_date;
Change 6 to 13 for 10-day MA