MySQL / BigQuery - Weighted Average & Group By

MySQL / BigQuery - Weighted Average & Group By - mysql

I am trying to calculate a weighted average of a dataset and return the maximum value, monthly over a period of 12 months along with its' corrosponding ticket description.
I'm aware that there are tons of questions out there addressing similar problems, but I have yet to find a solution that combines the syntaxes I believe are required.
Here's some sample table data:
Month_Begin_Date
Priority
ticket_about_tag
Phone_Time
Occurances
2019-02-01
Urgent
Power Bill
22.42
36
2019-02-01
Normal
Power Bill
3.41
89
2019-05-01
Normal
Wifi Issue
45.32
12
Here's my current query for determining the weighted average:
SELECT (Month_Begin_Date,
(sum(phone_time * occurances))/sum(occurances)) AS Weighted_Average_Phone_Time
FROM database
GROUP BY month_begin_date
This returns the weighted average total for all ticket_about_tags, monthly.
But I still need to get this so that it displays the maximum weighted average grouped by ticket description. I.e. something that looks like this:
Month_Begin_Date
ticket_about_tag
Weighted_average_phone_time
2019-01-01
Power Bill
22.42
2019-02-01
Power Bill
3.41
2019-03-01
Wifi Issue
45.32
I've tried adding this as a subquery into another query in order to return the data I'm after, like so:
SELECT Month_Begin_date, Ticket_About_Tag, Phone_Average_Handle_Time
FROM database WHERE CONCAT(month_begin_date,phone_time) IN
(SELECT CONCAT (Month_Begin_Date,
(sum(phone_time * occurances))/sum(occurances)) AS Weighted_Average_Phone_Time
FROM database
GROUP BY month_begin_date
)
ORDER BY month_begin_date ASC
Thanks very much for any assistance

Not sure I got your question right, but using the following data:
Month_Begin_Date
Priority
Ticket_About_Tag
Phone_Time
Occurences
2019-02-01
Urgent
Power Bill
22.42
36
2019-02-01
Normal
Power Bill
3.41
89
2019-05-01
Normal
Wifi Issue
45.32
12
2019-02-01
Urgent
Wifi Issue
14.2
7
2019-02-01
Normal
Wifi Issue
30.7
5
Is this the query you're after?
SELECT
Month_Begin_Date, Ticket_About_Tag,
SUM(Phone_Time * Occurences) / SUM(Occurences) AS Weighted_Average_Phone_Time
FROM `database`
GROUP BY Month_Begin_Date, Ticket_About_Tag
ORDER BY Month_Begin_Date ASC, Ticket_About_Tag ASC;
That gives you a result like the one you posted:
Month_Begin_Date
Ticket_About_Tag
Weighted_Average_Phone_Time
2019-02-01
Power Bill
8.884880083084106
2019-02-01
Wifi Issue
21.075000206629436
2019-05-01
Wifi Issue
45.31999969482422
Response to your comment
To answer your comment you could:
SELECT
a.Month_Begin_Date,
a.Ticket_About_Tag,
b.Max_Weighted_Average_Phone_Time
FROM (
SELECT
Month_Begin_Date,
Ticket_About_Tag,
SUM(Phone_Time * Occurences) / SUM(Occurences) AS Weighted_Average_Phone_Time
FROM `database`
GROUP BY Month_Begin_Date, Ticket_About_Tag
) a
LEFT JOIN (
SELECT
b1.Month_Begin_Date,
MAX(b1.Weighted_Average_Phone_Time) AS Max_Weighted_Average_Phone_Time
FROM (
SELECT
Month_Begin_Date,
Ticket_About_Tag,
SUM(Phone_Time * Occurences) / SUM(Occurences) AS Weighted_Average_Phone_Time
FROM `database`
GROUP BY Month_Begin_Date, Ticket_About_Tag
) b1
GROUP BY b1.Month_Begin_Date
) b ON a.Month_Begin_Date = b.Month_Begin_Date
WHERE a.Weighted_Average_Phone_Time = b.Max_Weighted_Average_Phone_Time
That gives you the following output:
Month_Begin_Date
Ticket_About_Tag
Max_Weighted_Average_Phone_Time
2019-02-01
Wifi Issue
21.075000206629436
2019-05-01
Wifi Issue
45.31999969482422
There are other ways of doing this, but I think this is by far the easiest way to understand without using other SQL constructs. It reflects your need of going through the same data twice, first to aggregate by month and ticket tag, then to find the maximum of the aggregate data by month.

Related

mysql query to identify groups of data based on timestmp

I have records of smartmeter in an mysql database.
Records in timestamp order looking in generall as follow:
key
timestamp
watt now
000001
2022-10-04-01-01-01
10
000002
2022-10-04-01-02-01
10
000003
2022-10-04-01-03-01
101
000004
2022-10-04-01-04-01
101
000005
2022-10-04-01-05-01
102
000006
2022-10-04-01-06-01
101
000007
2022-10-04-01-07-01
102
000008
2022-10-04-01-08-01
10
000009
2022-10-04-01-09-01
10
000010
2022-10-04-01-09-01
10
000011
2022-10-04-01-09-01
107
000012
2022-10-04-01-09-01
101
000013
2022-10-04-01-09-01
109
000014
2022-10-04-01-09-01
10
000015
2022-10-04-01-09-01
10
I want to identify the groups with bigger number (lets say > 100)
and give them an incresing id. Also I want to get per group the first and last key id
Result of query should look like this:
month
day
numbers of group
first id
last id
average watt
10
04
0
000003
000007
102
10
04
1
000011
0000013
105
Any help apreciated

You'll need something to identify them as a group. My first thought was using RANK() or DENSE_RANK() but after multiple tries, I couldn't find a way. Then I thought about using LAG() but still I'm stuck at how to re-identify the rows as new group. After testing many times, I come up with this suggestion:
WITH cte AS (
SELECT s1.*,
#n := COALESCE(IF(s1.skey=1,1,s2.skey), #n) As newGroup
FROM smartmeter s1
LEFT JOIN (
SELECT skey,
stimestamp,
watt,
LENGTH(watt) AS lenwatt,
LAG(LENGTH(watt)) OVER (ORDER BY skey) llwatt
FROM smartmeter) s2 ON s1.skey=s2.skey
AND lenwatt != llwatt)
SELECT MONTH(stimestamp) AS Month,
DAY(stimestamp) AS Day,
ROW_NUMBER() OVER (ORDER BY MIN(skey)) AS 'numbers of group',
MIN(skey) AS 'first id',
MAX(skey) AS 'last id',
AVG(watt) AS 'Average watt',
CEIL(AVG(watt)) AS 'Average watt rounded',
newGroup
FROM cte
WHERE watt >= 100
GROUP BY newGroup, MONTH(stimestamp), DAY(stimestamp)
By the way, I've changed some of your column names because key is actually a reserve word. Although you can use it as column name as long as you wrap it in backticks, I personally find it's a hassle to do it every time.
Ok, so my idea was to use LENGTH(watt) and ORDER BY skey in the LAG() function. Then I'll separate those rows where the length doesn't match and use that as a starting point for each new group. After that, I left join the result of that with smartmeter table. The next challenge is to assign each of the rows that doesn't match with previous skey value then I've found this answer and applied it into the cte.
Once those are done, I just write another query to fulfil your expected result. Although, some part of it is not exactly as what you expected.
Here's a demo fiddle

MYSQL has 2 tables. Same dates total. Compare and subtract dates

tablo1 tablo2
-------------------------- ------------------------------
fiyat1 tarih1 fiyat2 tarih2
---------- ------------ ----------- -----------
1200 03-2017 2100 03-2017
1050 03-2017 5200 03-2017
3250 04-2017 3200 04-2017
2501 04-2017
6100 05-2017
1100 05-2017
Collecting the same dates at price 1, collecting the same dates at price 2,
subtract 2 totals, group by date.
I want to print something like this:
-----------------------
05-2017 7200
04-2017 2511
03-2017 -5050
The question is true, but the result is wrong. I tried this.
SELECT tablo1.tarih1,
tablo1.fiyat1,
SUM(tablo1.fiyat1),
tablo2.tarih2,
tablo2.fiyat2,
SUM(tablo1.fiyat1),
(SUM(tablo1.fiyat1) - SUM(tablo2.fiyat2)) AS sonuc
FROM tablo1 INNER JOIN
tablo2 ON tablo1.tarih1 = tablo2.tarih2
GROUP BY tablo1.tarih1

With the table structure being as it is, the query that can be written to get the desired result is:
SELECT t1.tarih1, (COALESCE(t1.fiyat1, 0) - COALESCE(t2.fiyat2, 0)) AS sonuc
FROM
(SELECT tarih1, SUM(fiyat1) AS fiyat1
FROM tablo1
GROUP BY tarih1
) AS t1
LEFT JOIN
(SELECT tarih2, SUM(fiyat2) AS fiyat2
FROM tablo2
GROUP BY tarih2
) AS t2
ON t1.tarih1 = t2.tarih2
ORDER BY t1.tarih1 DESC;
However, I'd like to offer a couple of suggestions:
It's generally a good idea to store the date in MySQL date format: YYYY-MM-DD. It'll be much easier for you to run yearly reports, if there ever was a need.
As far as book-keeping is concerned, maybe you'll find the following Q&A to be of your interest: What is a common way to save 'debit' and 'credit' information?

MYSQL query - cross tab? Union? Join? Select? What should I be looking for?

Not sure what exactly it is I should be looking for, so I'm reaching out for help.
I have two tables that through queries I need to spit out one. the two tables are as follows:
Transactions:
TransactionID SiteID EmployeeName
520 2 Michael
521 3 Gene
TransactionResponse:
TransactionID PromptMessage Response PromptID
520 Enter Odometer 4500 14
520 Enter Vehicle ID 345 13
521 Enter Odometer 5427 14
521 Enter Vehicle ID 346 13
But what I need is the following, let's call it TransactionSummary:
TransactionID SiteID EmployeeName 'Odometer' 'VehicleID'
520 2 Michael 4500 345
521 3 Gene 5427 346
The "PromptID" column is the number version of "PromptMessage" so I could query off that if it's easier.
A good direction for what this query would be called is the least I'm hoping for. True extra credit for working examples or even using this provided example would be awesome!

For a predefined number of possible PromptID values you can use something like the following query:
SELECT t.TransactionID, t.SiteID, t.EmployeeName,
MAX(CASE WHEN PromptID = 13 THEN Response END) AS 'VehicleID',
MAX(CASE WHEN PromptID = 14 THEN Response END) AS 'Odometer'
FROM Transactions AS t
LEFT JOIN TransactionResponse AS tr
ON t.TransactionID = tr.TransactionID AND t.SiteID = tr.SiteID
GROUP BY t.TransactionID, t.SiteID, t.EmployeeName
The above query uses what is called conditional aggregation: a CASE expression is used within an aggregate function, so as to conditionally account for a subset of records within a group.

Break up monthly data into daily

I have budget data for a company in the following montly format. SqlFiddle link here.
Dept# YearMonth Budget($)
--------------------------
001 201301 100
001 201302 110
001 201303 105
.. ..... ...
002 201301 200
... ..... ...
I am required to break this up into daily records, which would look like this:
Dept# Date Budget($)
--------------------------
001 20130101 xxx
001 20130102 xxx
001 20130103 xxx
.. ..... ...
I need to generate daily records from each record in the source table. I don't want to assume that each month has 30 days. How do I determine the actual number of days for each month and break it up in the format shown above?
I appreciate any kind of help. Thanks!

Try:
with cte as
(select [dept#], [YearMonth], convert(datetime,[YearMonth]+'01',112) [Date], [Budget($)]
from budget
union all
select [dept#], [YearMonth], dateadd(d, 1, [Date]) [Date], [Budget($)]
from cte
where datediff(m,[Date],dateadd(d, 1, [Date]))=0
)
select [dept#], [Date],
1.0*[Budget($)] / count(*) over (partition by [dept#], [YearMonth]) [DailyBudget($)]
from cte
order by 1,2
(There's an implicit conversion from integer to floating point in the budget, as otherwise the daily rate will be rounded to the nearest dollar - this will not be necessary if the budget datatype is already held as something like numeric(10,2).)
(SQLFiddle here)

Need help with MySQL query getting results to average for year y and y+1

I have a MySQL query:
SELECT px.player, px.pos, px.year, px.age, px.gp, px.goals, px.assists
, 1000 - ABS(p1.gp - px.gp) - ABS(p1.goals - px.goals) - ABS(p1.assists - px.assists) sim
FROM hockey p1
JOIN hockey px
ON px.player <> p1.player
WHERE p1.player = 'John Smith'
AND p1.year = 2010
HAVING sim >= 900
ORDER BY sim DESC
This gets me a table of results, something like this:
player pos year age gp goals assists sim
Player1 LW 2002 25 75 29 32 961
Player2 LW 2000 27 82 29 27 956
Player3 RW 2000 27 78 29 33 955
Player4 LW 2009 26 82 30 30 940
Player5 RW 2001 25 79 33 24 938
Player6 LW 2008 25 82 23 24 936
Player7 LW 2006 27 79 26 33 932
Instead, I would like it to do two things. Average the data and add a player count, so I get something like:
players age gp goals assists sim
7 26 79 28 29 945
I tried avg(px.age), avg(px.gp), avg(px.goals)...etc but I am running into errors with my "sim" formula.
Second issue is that underneath that, I would like to have the average of the data for the FOLLOWING year. In other words data from Player1 in 2003, data from Player2 in 2001, etc.
I am stuck as to HOW to get the data to average AND to get it for the following year.
Can anyone help me with either or both of these issues?

To get a single subtotal of counts and averages, just wrap your original query AS the inner select... something like... (pq = "PreQuery" select result)
Select
max( "Tot Players" ) Players,
max( "->" ) position,
count(*) Year,
avg( pq.age ) AvgAge,
avg( pq.gp ) AvgGP,
avg( pq.goals ) AvgGoals,
avg( pq.assists ) AvgAssists,
avg( pq.sim ) AvgSim
from
( SELECT
px.player,
px.pos,
px.year,
px.age,
px.gp,
px.goals,
px.assists,
1000 - ABS(p1.gp - px.gp)
- ABS(p1.goals - px.goals)
- ABS(p1.assists - px.assists) sim
FROM
hockey p1
JOIN hockey px ON px.player <> p1.player
WHERE
p1.player = 'John Smith'
AND p1.year = 2010
HAVING
sim >= 900
ORDER BY
sim DESC ) pq
If your original query worked, this should get you your overall averages. However, with the INNER query with a having and order, might cause a problem. You might need to kill the order by since it really makes no difference in the outer most query. As for the HAVING clause in the INNER query, might need to be moved to a WHERE pq.sim >= 900 in the OUTER SQL-Select.
Additionally, if you wanted the results of all players first, THEN the total, take your original query and merge it with this one... As you'll see, to keep the columns in synch with BOTH queries, I've put a bogus for player and position so it won't crash on mismatched unions... Notice my COUNT column actually would correspond with the YEAR column of the ORIGINAL query.
For the prior year... As Rob mentioned, you would just do a UNION of the two queries just showing the respective year you were qualifying for in each UNION...
EDIT --- CLARIFICATION for 2nd YEAR....
Per your subsequent comment clarification, you would have to get the basis as the basis of the year +1... if you then want the overall averages again, those would be wrapped to an outer max / avg, etc... But I think THIS is what you want for the subsequent year per player
SELECT
PrimaryQry.PrimaryPlayer,
PrimaryQry.PrimaryPos,
PrimaryQry.PrimaryYear,
PrimaryQry.PrimaryAge,
PrimaryQry.PrimaryGP,
PrimaryQry.PrimaryGoals,
PrimaryQry.PrimaryAssists,
PrimaryQry.player,
PrimaryQry.pos,
PrimaryQry.year,
PrimaryQry.age,
PrimaryQry.gp,
PrimaryQry.goals,
PrimaryQry.assists,
PrimaryQry.sim,
p2.pos PrimaryPos2,
p2.year PrimaryYear2,
p2.age PrimaryAge2,
p2.gp PrimaryGP2,
p2.goals PrimaryGoals2,
p2.assists PrimaryAssists2,
px2.player player2,
px2.pos pos2,
px2.year year2,
px2.age age2,
px2.gp gp2,
px2.goals goals2,
px2.assists assists2,
1000 - ABS(p2.gp - px2.gp)
- ABS(p2.goals - px2.goals)
- ABS(p2.assists - px2.assists) sim2
FROM
( SELECT
p1.player PrimaryPlayer,
p1.pos PrimaryPos,
p1.year PrimaryYear,
p1.age PrimaryAge,
p1.gp PrimaryGP,
p1.goals PrimaryGoals,
p1.assists PrimaryAssists,
px.player,
px.pos,
px.year,
px.age,
px.gp,
px.goals,
px.assists,
1000 - ABS(p1.gp - px.gp)
- ABS(p1.goals - px.goals)
- ABS(p1.assists - px.assists) sim
FROM
hockey p1
JOIN hockey px
ON p1.player <> px.player
WHERE
p1.player = 'John Smith'
AND p1.year = 2010
HAVING
sim >= 900 ) PrimaryQry
JOIN hockey p2
ON PrimaryQry.PrimaryPlayer = p2.player
AND PrimaryQry.PrimaryYear +1 = p2.year
JOIN hockey px2
ON PrimaryQry.Player = px2.Player
AND PrimaryQry.Year +1 = px2.year
If you follow the logic here, you already know the inner query is returning about 10 other players. So, I am keeping the stats of the first person basis IN that query too. THEN, I am joining that result set back to the hockey table TWICE... The join is primary player joined to the first for his/her year +1, the SECOND join works specifically to the one person that qualified against the primary player. The final column results get the entire first year qualifier with the second qualifier, such as
So, it will all be on one row consecutively of
John Smith 2010 Compare Person 1 YearA John Smith 2011 Compare Person 1 YearA+1
John Smith 2010 Compare Person 2 YearB John Smith 2011 Compare Person 2 YearB+1
John Smith 2010 Compare Person 3 YearC John Smith 2011 Compare Person 3 YearC+1

What query are you using to get the averages?
Just applying "AVG" to your expression for 'sim' should work in mysql. e.g.
AVG(1000 - ABS(p1.gp - px.gp) - ABS(p1.goals - px.goals) - ABS(p1.assists - px.assists)) sim
To aggregate over different years, I think there is no alternative to using a subselect or union.
Reference:
http://dev.mysql.com/doc/refman/5.0/en/subqueries.html
http://dev.mysql.com/doc/refman/5.0/en/union.html
Something like:
(ORIGINAL AVG QUERY)
UNION ALL
(ORIGINAL AVG QUERY WITH NEW YEAR)
should do the trick.
(Note that your original query selects data from every year to compare it to the data for John Smith in 2010, which may not be what you want.)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL / BigQuery - Weighted Average & Group By - mysql

Related

mysql query to identify groups of data based on timestmp

MYSQL has 2 tables. Same dates total. Compare and subtract dates

MYSQL query - cross tab? Union? Join? Select? What should I be looking for?

Break up monthly data into daily

Need help with MySQL query getting results to average for year y and y+1

Categories

Resources