MySQL - Calculate SUM(field) per year from UNION

MySQL - Calculate SUM(field) per year from UNION - mysql

I have an SQL UNION command that from a large db (deaths_all), I get the the year (etos variable) and the deaths for that year (sunoloThanatwn) for 3 different scenarios, as stated in the union.
For every year, 3 rows are correctly returned. I want to calculate the sum of sunoloThanatwn, so that I have one SUM(sunoloThanatwn) per year and not 3 rows per year.
SQL UNION Query:
(
SELECT etos, sunoloThanatwn
FROM deaths_all
WHERE field = "A"
GROUP BY etos
)
UNION
(
SELECT etos, sunoloThanatwn
FROM deaths_all
WHERE field = "B"
GROUP BY etos
)
UNION
(
SELECT etos, sunoloThanatwn
FROM deaths_all
WHERE field = "C"
GROUP BY etos
)
ORDER BY etos
The query result is the following (I need a sum per year):

Just do a single aggregation query:
SELECT etos, SUM(sunoloThanatwn) AS total
FROM deaths_all
WHERE field IN ('A', 'B', 'C')
GROUP BY etos;

Related

Add row with zero value when in MySQL when record is not found for a particular month [duplicate]

This question already has answers here:
MySQL how to fill missing dates in range?
(6 answers)
Closed last year.
It was hard to find a good title for my question.
I have 3 tables: materials, orders and order_contents.
There are 5 different types of materials in the materials table
The orders table contains the dates for the orders. Orders currently span over 4 months.
The orders are filled with materials in the table called order_contents.
I am trying to get the overall cost per month for materials and display them in a highchart.
Here's the query I run:
SELECT m.name, CONCAT(MONTH(o.order_date), '/', YEAR(o.order_date)) as `month`, SUM(oc.weight * m.price) AS cost
FROM order_contents oc
INNER JOIN orders o ON oc.order_id = o.id
INNER JOIN materials m ON oc.material_id = m.id
GROUP BY MONTH(o.order_date), m.id
ORDER BY m.name, order_date ASC
Here are the results:
The problem is that if a material isn't used in a particular month, it won't generate a record for it (obviously). So when I loop through the results and try to form the hightable data series, it won't fill a month with zero. For example, the material Big Bag is only consumed in January 2022, but since it's the only entry in the data series, it maps with the first month, which is August. I can add logic to fix this problem but I thought I'd ask here first if there is a way to reformat this query to yield the results I'm looking for.
Here's what I'd like to get:
I'm way out of my league here on SQL capabilities for this sort of problem.

Here is a (probably wired) idea:
For a SQL table:
create table temp
(
month int,
year int,
name varchar(16), -- something like material-type
count int,
)
We could run:
select const_year.year, const_month.month, const_name.name, ifnull(count, 0)
from (select 1 month union
select 2 union
select 3 union
select 4 union
select 5 union
select 6 union
select 7 union
select 8 union
select 9 union
select 10 union
select 11 union
select 12) const_month -- now we have a list contains 12 months
left join (select 2020 year union
select 2021 union
select 2022 ) const_year
on true -- now we have a table contains all months between those years
left join (select distinct temp.name as name
from temp) const_name
on true -- join with all distinct names/types
left join (select temp.name as name, temp.year as year, temp.month as month, sum(count) as count
from temp
group by temp.year, temp.month, temp.name -- here is the real query for statistic
) statistic
on statistic.year = const_year.year
and statistic.month = const_month.month
and statistic.name = const_name.name
order by name, year, month -- order results if we need
I think there definitely have more better solution than this. Though it's working for some case.

Adding blank rows to display of result set returned by MySQL query

I am storing hourly results in a MySQL database table which take the form:
ResultId,CreatedDateTime,Keyword,Frequency,PositiveResult,NegativeResult
349,2015-07-17 00:00:00,Homer Simpson,0.0,0.0,0.0
349,2015-07-17 01:00:00,Homer Simpson,3.0,4.0,-2.0
349,2015-07-17 01:00:00,Homer Simpson,1.0,1.0,-1.0
349,2015-07-17 04:00:00,Homer Simpson,1.0,1.0,0.0
349,2015-07-17 05:00:00,Homer Simpson,8.0,3.0,-2.0
349,2015-07-17 05:00:00,Homer Simpson,1.0,0.0,0.0
Where there might be several results for a given hour, but none for certain hours.
If I want to produce averages of the hourly results, I can do something like this:
SELECT ItemCreatedDateTime AS 'Created on',
KeywordText AS 'Keyword', ROUND(AVG(KeywordFrequency), 2) AS 'Average frequency',
ROUND(AVG(PositiveResult), 2) AS 'Average positive result',
ROUND(AVG(NegativeResult), 2) AS 'Average negative result'
FROM Results
WHERE ResultsNo = 349 AND CreatedDateTime BETWEEN '2015-07-13 00:00:00' AND '2015-07-19 23:59:00'
GROUP BY KeywordText, CreatedDateTime
ORDER BY KeywordText, CreatedDateTime
However, the results only include the hours where data exists, e.g.:
349,2015-07-17 01:00:00,Homer Simpson,2.0,2.5,-1.5
349,2015-07-17 04:00:00,Homer Simpson,1.0,1.0,0.0
349,2015-07-17 05:00:00,Homer Simpson,4.5,1.5,-1.0
But I need to show blanks rows for the missing hours, e.g.
349,2015-07-17 01:00:00,Homer Simpson,2.0,2.5,-1.5
349,2015-07-17 02:00:00,Homer Simpson,0.0,0.0,0.0
349,2015-07-17 03:00:00,Homer Simpson,0.0,0.0,0.0
349,2015-07-17 04:00:00,Homer Simpson,1.0,1.0,0.0
349,2015-07-17 05:00:00,Homer Simpson,4.5,1.5,-1.0
Short of inserting blanks into the results before they are presented, I am uncertain of how to proceed: can I use MySQL to include the blank rows at all?

SQL in general has no knowledge about the data, so you have to add that yourself. In this case you will have to insert the not used hours somehow. This can be done by inserting empty rows, or a bit different by counting the hours and adjusting your average for that.
Counting the hours and adjusting the average:
Count all hours with data (A)
Calculate the number of hours in the period (B)
Calculate the avg as you already did, multiply by A divide by B
Example code to get the hours:
SELECT COUNT(*) AS number_of_records_with_data,
(TO_SECONDS('2015-07-19 23:59:00')-TO_SECONDS('2015-07-13 00:00:00'))/3600
AS number_of_hours_in_interval
FROM Results
WHERE ResultsNo = 349 AND CreatedDateTime
BETWEEN '2015-07-13 00:00:00' AND '2015-07-19 23:59:00'
GROUP BY KeywordText, CreatedDateTime;
And just integrate it with the rest of your query.

You can't use MySQL for that. You'll have to do this with whatever you're using later to process the results. Iterate over the range of hours/dates you're interested in and for those, where MySQL returned some data, us that data. For the rest, just add null/zero values.
Small update after some discussions with my stackoverflow colleagues:
Instead of you can't I should have wrote you shouldn't - as other users have proved there are ways to do this. But I still believe that for different tasks we should use tools that were created having such tasks in mind. And by that I mean that while it's probably possible to tow a car with an F-16, it's still better to just call a tow truck ;) That's what tow trucks are made for.

Although you already have accepted an answer I want to demonstrate how you can generate a datetime series in the query and use that to solve your problem.
This query uses a combination of cross joins together with basic arithmetic and date functions to generate a series of all hours between 2015-07-16 00:00:00 AND 2015-07-18 23:59:00.
Generating this type of data on the fly isn't the best option though; if you already had a table with the numbers 0-31 then all the union queries would be unnecessary.
See this SQL Fiddle to see how it could look using a small number table.
Sample SQL Fiddle with a demo of the query below
select
c.createddate as "Created on",
c.Keyword,
coalesce(ROUND(AVG(KeywordFrequency), 2),0.0) AS 'Average frequency',
coalesce(ROUND(AVG(PositiveResult), 2),0.0) AS 'Average positive result',
coalesce(ROUND(AVG(NegativeResult), 2),0.0) AS 'Average negative result'
from (
select
q.createddate + interval d day + interval t hour as createddate,
d.KeywordText AS 'Keyword'
from (
select distinct h10*10+h1 d from (
select 0 as h10
union all select 1 union all select 2 union all select 3
) d10 cross join (
select 0 as h1
union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9
) d1
) days cross join (
select distinct t10*10 + t1 t from (
select 0 as t10 union all select 1 union all select 2
) h10 cross join (
select 0 as t1
union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9
) h1
) hours
cross join
-- use the following line to set the start date for the series
(select '2015-07-16 00:00:00' createddate) q
-- or use the line below to use the dates in the table
-- (select distinct cast(CreatedDateTime as date) CreatedDate from results) q
cross join (select distinct KeywordText from results) d
) c
left join results r on r.CreatedDateTime = c.createddate AND ResultsNo = 349 and r.KeywordText = c.Keyword
where c.createddate BETWEEN '2015-07-16 00:00:00' AND '2015-07-18 23:59:00'
GROUP BY c.createddate, Keyword
ORDER BY c.createddate, Keyword;

I came up with an idea to do it for add rows with null values in the last of your MySQL query.
Just run this query (in the limit add any number of empty rows you want), and ignore the last column:
SELECT ItemCreatedDateTime AS 'Created on',
KeywordText AS 'Keyword',
ROUND(AVG(KeywordFrequency), 2) AS 'Average frequency',
ROUND(AVG(PositiveResult), 2) AS 'Average positive result',
ROUND(AVG(NegativeResult), 2) AS 'Average negative result',
null
FROM Results
WHERE ResultsNo = 349 AND CreatedDateTime BETWEEN '2015-07-13 00:00:00' AND
'2015-07-19 23:59:00'
GROUP BY KeywordText, CreatedDateTime
UNION
SELECT * FROM (
SELECT null a, null b, null c, null d, null e,
(#cnt := #cnt + 1) f
FROM (SELECT null FROM Results LIMIT 23) empty1
LEFT JOIN (SELECT * FROM Results LIMIT 23) empty2 ON FALSE
JOIN (SELECT #cnt := 0) empty3
) empty
ORDER BY KeywordText, CreatedDateTime

One MySQL query to get AVG by different Groupings?

Wondering is there is a way to write the following in ONE MySQL query.
I have a table:
cust_ID | rpt_name | req_secs
In the query I'd like to get:
the AVG req_secs when grouped by cust_ID
the AVG req_secs when grouped by rpt_name
the total req_secs AVG
I know I can do separate grouping queries on the same table then UNION the results into one. But I was hoping there was some way to do it in one query.
Thanks.

Well, the following would does two out of three:
select n,
(case when n = 1 then cast(cust_id as varchar(255)) else rpt_name end) as grouping,
avg(req_secs)
from t cross join
(select 1 as n union all select 2
) n
group by n, (case when n = 1 then cust_id else rpt_name end);
This essentially "doubles" the data and then does the aggregation for each group. This assumes that cust_id and rpt_name are of compatible types. (The query could be tweaked if this is not the case.)
Actually, you can get the overall average by using rollup:
select n,
(case when n = 1 then cust_id else rpt_name end) as grouping,
avg(req_secs)
from t cross join
(select 1 as n union all select 2
) n
group by n, (case when n = 1 then cast(cust_id as varchar(255)) else rpt_name end) with rollup
This works for average because the average is the same on the "doubled" data as for the original data. It wouldn't work for sum() or count().

No there is not. You can group by a combination of cust_ID and rpt_name at the same time (i.e. two levels of grouping) but you are not going to be able to do separate top-level groupings and then a non-grouped aggregation at the same time.

Because of the way GROUP BY works, the SQL to do this is a little tricky. One way to get the result is to get three copies of the rows, and group each set of rows separately.
SELECT g.gkey
, IF(g.grp='cust_id',t.cust_ID,IF(g.grp='rpt_name',t.rpt_name,'')) AS gval
, AVG(t.req_secs) AS avg_req_secs
FROM (SELECT 'cust_id' AS gkey UNION ALL SELECT 'rpt_name' UNION ALL SELECT 'total') g
CROSS
JOIN mytable t
GROUP
BY g.gkey
, IF(g.grp='cust_id',t.cust_ID,IF(g.grp='rpt_name',t.rpt_name,''))
The inline view aliased as "g" doesn't have to use UNION ALL operators, you just need a rowset that returns exactly 3 rows with distinct values. I just used the UNION ALL as a convenient way to return three literal values as a rowset, so I could join that to the original table.

sql multiple columns plus sum of each column

Using MySQL, I am counting the occurrence of several events (fields) over a time span of years. I then display this in columns by year. My query works perfect when grouped by year. I now want to add a final column which displays the aggregate of the years. How do I include the total of columns query?
Event 2008 2009 2010 2011 total
A 0 2 0 1 3
B 1 2 3 0 6
etc.
Here is the real query:
select
count(*) as total_docs,
YEAR(field_document_date_value) as doc_year,
field_document_facility_id_value as facility,
IF(count(IF(field_document_type_value ='LIC809',1, NULL)) >0,count(IF(field_document_type_value ='LIC809',1, NULL)),'-') as doc_type_LIC809,
IF(count(IF(field_document_type_value ='LIC9099',1, NULL)) >0,count(IF(field_document_type_value ='LIC9099',1, NULL)),'-') as doc_type_LIC9099,
IF(count(field_document_f1_value) >0,count(field_document_f1_value),'-') as substantial_compliance,
IF(count(field_document_f2_value) >0,count(field_document_f2_value),'-') as deficiencies_sited,
IF(count(field_document_f3_value) >0,count(field_document_f3_value),'-') as admin_outcome_809,
IF(count(field_document_f4_value) >0,count(field_document_f4_value),'-') as unfounded,
IF(count(field_document_f5_value) >0,count(field_document_f5_value),'-') as substantiated,
IF(count(field_document_f6_value) >0,count(field_document_f6_value),'-') as inconclusive,
IF(count(field_document_f7_value) >0,count(field_document_f7_value),'-') as further_investigation,
IF(count(field_document_f8_value) >0,count(field_document_f8_value),'-') as admin_outcome_9099,
IF(count(field_document_type_a_value) >0,count(field_document_type_a_value),'-') as penalty_type_a,
IF(count(field_document_type_b_value) >0,count(field_document_type_b_value),'-') as penalty_type_b,
IF(sum(field_document_civil_penalties_value) >0,CONCAT('$',sum(field_document_civil_penalties_value)),'-') as total_penalties,
IF(count(field_document_noncompliance_value) >0,count(field_document_noncompliance_value),'-') as total_noncompliance
from rcfe_content_type_facility_document
where YEAR(field_document_date_value) BETWEEN year(NOW()) -9 AND year(NOW())
and field_document_facility_id_value = :facility
group by doc_year

You can not GROUP row twice in a SELECT, so you can only count row in a year or in total. You can UNION two SELECT (one grouped by year, second not grouped - total) to overcome this limitation, but I think it is better to count total from year result in script if there is any.
Simplified example:
SELECT by_year.amount, years.date_year FROM
-- generating years pseudo table
(
SELECT 2008 AS date_year
UNION ALL SELECT 2009
UNION ALL SELECT 2010
UNION ALL SELECT 2011
) AS years
-- joining with yearly stat data
LEFT JOIN
(
SELECT SUM(value_field) AS amount, YEAR(date_field) AS date_year FROM data
GROUP BY YEAR(date_field)
) AS by_year USING(date_year)
-- appending total
UNION ALL SELECT SUM(value_field) AS amount, 'total' AS date_year FROM data

WITH ROLLUP is your friend:
http://dev.mysql.com/doc/refman/5.7/en/group-by-modifiers.html
Use your original query and simply add this to the last line:
GROUP BY doc_year WITH ROLLUP
That will add a final cumulative row to your query's result set.

Mysql- Aggregate function query grouping problem

Consider following table.
I'm trying to write a query to display - Max values for all the parts per category. Also display the date when the value was max.
So i tried this -
select Part_id, Category, max(Value), Time_Captured
from data_table
where category = 'Temperature'
group by Part_id, Category
First of all, mysql didn't throw an error for not having Time_Captured in group by.
Not sure if its a problem with mysql or my mysql.
So I assume it should return -
1 Temperature 50 11-08-2011 08:00
2 Temperature 70 11-08-2011 09:00
But its returning me the time captured from the first record of the data set i.e. 11-08-2011 07:00
Not sure where I'm going wrong. Any thoughts?
(Note: I'm running this inside a VM. Just in case if it changes anything)

You need to join to the results of a query that finds the max(value), like this:
select dt.Part_id, dt.Category, dt.Value, dt.Time_Captured
from data_table dt
join (select Part_id, Category, max(Value) as Value
from data_table group by 1, 2) x
on x.Part_id = dt.Part_id and x.Category = dt.Category
where dt.category = 'Temperature';
Note that this will return multiple rows if there are multiple rows with the same max value.
If you want to limit this to one row even though there are multiple matches for max(value), select the max(Time_Captured) (or min(Time_Captured) if you prefer), like this:
select dt.Part_id, dt.Category, dt.Value, max(dt.Time_Captured) as Time_Captured
from data_table dt
join (select Part_id, Category, max(Value) as Value
from data_table group by 1, 2) x
on x.Part_id = dt.Part_id and x.Category = dt.Category
where dt.category = 'Temperature'
group by 1, 2, 3;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008