Calculating Average over different intervals - mysql

In mysql, I am calculating averages of the same metric over different intervals (3 Day, 7 Day, 30 Day, 60 Day, etc...), and I need the results to be in a single line per id.
Currently, I am using a Join per each interval. Given that I have to compute this for many different stores, and over several different intervals, is there a cleaner and/or more efficient way of accomplishing this?
Below is the code I am currently using.
Thanks in advance for the help
SELECT T1.id, T1.DailySales_3DayAvg, T2.DailySales_7DayAvg
FROM(
SELECT id, avg(DailySales) as 'DailySales_3DayAvg'
FROM `SalesTable`
WHERE `Store`=2
AND `Date` >= DATE_SUB('2012-07-28', INTERVAL 3 DAY)
AND `Date` < '2012-07-28'
) AS T1
JOIN(
SELECT id, avg(DailySales) as 'DailySales_7DayAvg'
FROM `SalesTable`
WHERE `Store`=2
AND `Date` >= DATE_SUB('2012-07-28', INTERVAL 7 DAY)
AND `Date` < '2012-07-28'
) AS T2
ON T1.ArtistId = T2.ArtistId
Where the results are:
id DailySales_3DayAvg DailySales_7DayAvg
3752 1234.56 1114.78
...

You can use a query like this -
SELECT
id,
SUM(IF(date >= '2012-07-28' - INTERVAL 3 DAY, DailySales, 0)) /
COUNT(IF(date >= '2012-07-28' - INTERVAL 3 DAY, 1, NULL)) 'DailySales_3DayAvg',
SUM(IF(date >= '2012-07-28' - INTERVAL 7 DAY, DailySales, 0)) /
COUNT(IF(date >= '2012-07-28' - INTERVAL 7 DAY, 1, NULL)) 'DailySales_7DayAvg'
FROM
SalesTable
WHERE
Store = 2 AND Date < '2012-07-28'
GROUP BY
id

I don't think you can do this in any other way if you want to pull real-time data. However, if you can afford displaying slightly outdated data, you could pre-calculate these average (like once or twice a day) for each item.
You may want to look into the Event Scheduler, which allows you to keep everything inside MySQL.

Related

Working with a MariaDB view that generates a lot of calculated statistics - How to move to a calculated table?

I currently have a MariaDB database that gets populated every day with different products (around 800) and also gets the price updates for these products.
I've created a view on top of the prices/products table that generates statistics such as the avg, mean and mode for the last 7, 15 and 30 days, and calculates the difference from today's price to the averages of 7, 15 and 30 days.
The problem is that whenever I run this view it takes almost 50 seconds to generate the data. I saw some comments about switching over to a calculated table, in which the calculations would be updated when new data is entered into the table, however I'm quite skeptical in doing that, as I'm inserting around 1000 price points at one specific time of the day that will impact all the calculations on the table. Is a calculated table something that updates only the rows that were updated, or it would recalculate everything? I'm worried about the overhead this might cause (memory is not an issue with the server).
I've pasted the products and prices tables and the view to DBFiddle, here: https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=4cf594a85f950bed34f64d800601baa9
Calculations can be seen for product code 22141
Just to give an idea these are some of the calculations done by the view (available on the fiddle as well):
ROUND((((SELECT preconormal
FROM precos
WHERE codigowine = vinhos.codigowine
AND timestamp >= CURRENT_DATE - INTERVAL 9 HOUR) / (SELECT AVG(preconormal)
FROM precos
WHERE codigowine = vinhos.codigowine
AND timestamp >= CURRENT_DATE - INTERVAL 7 DAY) - 1) * 100), 2) as dif_7_dias,
ROUND((((SELECT preconormal
FROM precos
WHERE codigowine = vinhos.codigowine
AND timestamp >= CURRENT_DATE - INTERVAL 9 HOUR) / (SELECT AVG(preconormal)
FROM precos
WHERE codigowine = vinhos.codigowine
AND timestamp >= CURRENT_DATE - INTERVAL 15 DAY) - 1) * 100), 2) as dif_15_dias,
ROUND((((SELECT preconormal
FROM precos
WHERE codigowine = vinhos.codigowine
AND timestamp >= CURRENT_DATE - INTERVAL 9 HOUR) / (SELECT AVG(preconormal)
FROM precos
WHERE codigowine = vinhos.codigowine
AND timestamp >= CURRENT_DATE - INTERVAL 30 DAY) - 1) * 100), 2) as dif_30_dias
If switching to a calculated table, is there an optimal way to do this?
A "calculated table" isn't a MySQL / MariaDB feature. So I guess you mean another table derived from your raw data, that you use when you need those statistics.
You say the table is "populated every day...". Do you mean it's reloaded from scratch, or do you mean 800 more rows are added? By "every day" do you mean at a particular time of day, or ongoing throughout the day.
Do you always have to select all rows from your view, or can you sometimes do SELECT columns FROM view WHERE something = 'constant';' This matters because optimization techniques differ between the all-rows case and the few-rows case.
How can you handle this problem efficiently?
You could work to optimize the query used to define your view, making it faster. That is very likely a good approach.
MariaDB has a type of column known as a Persistent Computed Column. These are computed when rows are INSERTED or UPDATED. Then they are available for quick reference. But they have limitations; they cannot be defined with subqueries.
You could define an EVENT (a scheduled SQL job) to do the following.
Create a new, empty, "calculated" table with a name like tbl_new.
Use your (slow) view to insert the rows it needs.
Roll over your tables, so the new one replaces the current one and you keep a couple of older ones. This will give you a brief window where tbl doesn't exist.
DROP TABLE IF EXISTS tbl_old_2;
RENAME TABLE tbl_old TO tbl_old_2, tbl TO tbl_old, tbl_new TO tbl;
That's a whole boatload of correlated subqueries, crying out for appropriate indexing.
For a reasonable number of rows being returned by the query, the correlated subqueries can give reasonable performance. But if the outer query is returning thousands of rows, that will be thousands of executions of the subqueries.
I would tend to avoid running multiple SELECT against the same table, to get the last 7 days, the last 15 days, the last 30 days, and then repeating that to get AVG, repeating that to get MAX, and again to get MIN.
Instead, I would tend towards using conditional aggregation, to get all of the stats AVG, MAX, MIN, for all of the time periods 30 days, 15 days, and 7 days, in a single pass through the table.
... pause to note that views can be a problematic for performance; predicates from the outer query may not get pushed into the view query. We're not seeing what the whole view definition is doing, but I suspect we may be materializing a large set.
Consider a query like this:
SELECT ...
, ROUND( ( n.mal / a.avg_07_day - 1)*100 ,2) AS dif_7_dias
, ROUND( ( n.mal / a.avg_15_day - 1)*100 ,2) AS dif_15_dias
, ROUND( ( n.mal / a.avg_30_day - 1)*100 ,2) AS dif_30_dias
, ...
FROM vinhos
LEFT
JOIN ( SELECT h.codigowine
, AVG(IF( h.timestamp >= CURRENT_DATE + INTERVAL -30 DAY, h.preconormal, NULL)) AS avg_30_day
, MAX(IF( h.timestamp >= CURRENT_DATE + INTERVAL -30 DAY, h.preconormal, NULL)) AS max_30_day
, MIN(IF( h.timestamp >= CURRENT_DATE + INTERVAL -30 DAY, h.preconormal, NULL)) AS min_30_day
, AVG(IF( h.timestamp >= CURRENT_DATE + INTERVAL -15 DAY, h.preconormal, NULL)) AS avg_15_day
, MAX(IF( h.timestamp >= CURRENT_DATE + INTERVAL -15 DAY, h.preconormal, NULL)) AS max_15_day
, MIN(IF( h.timestamp >= CURRENT_DATE + INTERVAL -15 DAY, h.preconormal, NULL)) AS min_15_day
, AVG(IF( h.timestamp >= CURRENT_DATE + INTERVAL -7 DAY, h.preconormal, NULL)) AS avg_07_day
, MAX(IF( h.timestamp >= CURRENT_DATE + INTERVAL -7 DAY, h.preconormal, NULL)) AS max_07_day
, MIN(IF( h.timestamp >= CURRENT_DATE + INTERVAL -7 DAY, h.preconormal, NULL)) AS min_07_day
FROM precos h
GROUP
BY h.codigowine
HAVING h.codigowine IS NOT NULL
) a
ON a.codigowine = vinhos.codigowine
LEFT
JOIN ( SELECT s.codigowine
, MAX(s.precnormal) AS mal
, MIN(s.precnormal) AS mil
FROM precos s
WHERE s.timestamp >= CURRENT_DATE - INTERVAL 9 HOUR
GROUP
BY s.codigowine
HAVING s.codigowine IS NOT NULL
) n
ON n.codigowine = vinhos.codigowine
Consider the inline view query a.
Note that we can run that SELECT separately, and get a resultset returned, like we would return a result from a table. We expect this to do a single pass through the referenced table. There may be some predicates (conditions in the WHERE clause) that will filter our row, or enable us to make better use of an index. As currently written, the query could make use of an index with leading column of codigowine to avoid a (potentially expensive) "Using filesort" operation to satisfy the GROUP BY.
I'm a bit confused by the queries the - INTERVAL 9 HOUR. It looks to me like those subqueries could potentially return more than one row. There's no LIMIT clause (and no ORDER BY)... but it looks like we are expecting a single value (scalar), given the division operation.
Without an understanding of what we're trying to achieve there, not knowing the specification, I've wrapped my confusion and put that into another inline view n... not that this is what we want to do, but just to illustrate (again) an inline view returning a resultset. Whatever value(s) we're trying to get from the - INTERVAL 9 HOUR subquery, I think we can return those as a set as well.
With all that said, we can now get around to answering the question that was asked: adding a "calculated table".
If we don't require up to the second results, but can work with cached statistics, I would be looking at materializing the resultset from inline view a into a table, and then re-writing the query above to replace the inline view a with a reference to the cache table.
CREATE TABLE calc_stats_n_days
( codigowine <datatype> PRIMARY KEY
, avg_30_day DOUBLE
, max_30_day DOUBLE
, min_30_day DOUBLE
, avg_15_day DOUBLE
, ...
For the initial population...
INSERT INTO calc_stats_n_days
( codigowine, avg_30_day, maxg_30_day, min_30_day, avg_15_day, ... )
SELECT h.codigowine
, AVG(IF( h.timestamp >= CURRENT_DATE + INTERVAL -30 DAY, h.preconormal, NULL)) AS avg_30_day
, MAX(IF( h.timestamp >= CURRENT_DATE + INTERVAL -30 DAY, h.preconormal, NULL)) AS max_30_day
, MIN(IF( h.timestamp >= CURRENT_DATE + INTERVAL -30 DAY, h.preconormal, NULL)) AS min_30_day
, AVG(IF( h.timestamp >= CURRENT_DATE + INTERVAL -15 DAY, h.preconormal, NULL)) AS avg_15_day
, ...
For ongoing sync, I'd probably create a temporary table, populate it with the same query, and then do a sync between the temporary table and the target table. Maybe an INSERT ... ON DUPLICATE KEY and DELETE anti-join (to remove old rows).
Before considering other options, try and make the query more efficient. This is beneficial on the long term: even if you eventually move to a calculated table, you will still take advantage of a more efficient refresh query.
Your query has 15-20 inline subqueries that all address the same dependant table (as far as I read) and do aggregate computations for the same column precos(preconormal) (min, max, avg, most occuring value). Each metric is computed several times in a date range that varies from 9 hours back to 1 month back. So it goes:
SELECT
codigowine,
nomevinho,
DATE(timestamp) AS data_adc,
-- ...
/* Medidas estatísticas para 7 dias - min, max, media e moda */
ROUND(
(
SELECT MIN(preconormal)
FROM precos
WHERE
codigowine = vinhos.codigowine
AND timestamp >= CURRENT_DATE - INTERVAL 7 DAY
),
2
) AS min_7_dias,
ROUND(
(
SELECT MAX(preconormal)
FROM precos
WHERE
codigowine = vinhos.codigowine
AND timestamp >= CURRENT_DATE - INTERVAL 7 DAY
),
2
) AS max_7_dias,
-- ... and so on ...
FROM vinhos
It seems like it could be more efficient to do all computation at once, using conditional aggregation:
select
codigowine,
min(preconormal) min_30d
max(preconormal) max_30d,
avg(preconormal) avg_30d,
min(case when timestamp >= current_date - interval 15 day) min_15d,
max(case when timestamp >= current_date - interval 15 day) max_15d,
avg(case when timestamp >= current_date - interval 15 day) avg_15d,
min(case when timestamp >= current_date - interval 7 day) min_07d,
max(case when timestamp >= current_date - interval 7 day) max_07d,
avg(case when timestamp >= current_date - interval 7 day) avg_07d
from precos
where timestamp >= current_date - interval 30 day
group by codigowine
For performance, you want an index on (codigowine, timestamp, preconormal).
Then you can join it with the original table:
select
v.nomevinho,
date(v.timestamp) data_adc,
p.*
from vinhos v
inner join (
select
codigowine,
min(preconormal) min_30d
max(preconormal) max_30d,
avg(preconormal) avg_30d,
min(case when timestamp >= current_date - interval 15 day then preconormal end) min_15d,
max(case when timestamp >= current_date - interval 15 day then preconormal end) max_15d,
avg(case when timestamp >= current_date - interval 15 day then preconormal end) avg_15d,
min(case when timestamp >= current_date - interval 7 day then preconormal end) min_07d,
max(case when timestamp >= current_date - interval 7 day then preconormal end) max_07d,
avg(case when timestamp >= current_date - interval 7 day then preconormal end) avg_07d
from precos
where timestamp >= current_date - interval 30 day
group by codigowine
) p on p.codigowine = v.codigowine
This should be a sensible base query to build upon. To get the other computed values (most occuring value per period, latest value), you may add additional joins, or use inline queries.
To finish: here is another version of the base query, that aggregates after the join. Depending on how your data spreads across the two tables, this may, or may not be more efficient (and will not be equivalent if there are duplicates codigowine in table vinhos):
select
v.nomevinho,
date(v.timestamp) data_adc,
p.codigowine,
date(v.timestamp) data_adc,
min(p.preconormal) min_30d
max(p.preconormal) max_30d,
avg(p.preconormal) avg_30d,
min(case when p.timestamp >= current_date - interval 15 day then p.preconormal end) min_15d,
max(case when p.timestamp >= current_date - interval 15 day then p.preconormal end) max_15d,
avg(case when p.timestamp >= current_date - interval 15 day then p.preconormal end) avg_15d,
min(case when p.timestamp >= current_date - interval 7 day then p.preconormal end) min_07d,
max(case when p.timestamp >= current_date - interval 7 day then p.preconormal end) max_07d,
avg(case when p.timestamp >= current_date - interval 7 day then p.preconormal end) avg_07d
from vinhos v
inner join precos p
on p.codigowine = v.codigowine
and p.timestamp >= current_date - interval 30 day
group by v.codigowine, v.nomevinho
Looking at your query: Try refactoring it to eliminate as many dependent subqueries as possible, and instead JOINing to subqueries. Eliminating those dependent subqueries will make a vast performance difference.
Figuring the mode is an application of finding the detail record for an extreme value in a dataset. If you use this as a subquery
WITH freq AS (
SELECT COUNT(*) freq,
ROUND(preconormal, 2) preconormal,
codigowine
FROM precos
WHERE timestamp >= CURRENT_DATE - INTERVAL 7 DAY
GROUP BY ROUND(preconormal, 2), codigowine
),
most AS (
SELECT MAX(freq) freq,
codigowine
FROM freq
GROUP BY codigowine
),
mode AS (
SELECT GROUP_CONCAT(preconormal ORDER BY preconormal DESC) modeps,
freq.codigowine
FROM freq
JOIN most ON freq.freq = most.freq
GROUP BY freq.codigowine
)
SELECT * FROM mode
You can find the most frequent price for each item. The first CTE, freq, gets the prices and their frequencies.
The second CTE, most, finds the frequency of the most frequent price (or prices).
The third CTE, mode, extracts the most frequent prices from freq using a JOIN. It also uses GROUP_CONCAT() because it's possible to have more than one mode--most frequent price.
For your stats you can do this:
WITH s7 AS (
SELECT ROUND(MIN(preconormal), 2) minp,
ROUND(AVG(preconormal), 2) meanp,
ROUND(MAX(preconormal), 2) maxp,
codigowine
FROM precos
WHERE timestamp >= CURRENT_DATE - INTERVAL 7 DAY
GROUP BY codigowine
),
s15 AS (
SELECT ROUND(MIN(preconormal), 2) minp,
ROUND(AVG(preconormal), 2) meanp,
ROUND(MAX(preconormal), 2) maxp,
codigowine
FROM precos
WHERE timestamp >= CURRENT_DATE - INTERVAL 15 DAY
GROUP BY codigowine
),
s30 AS (
SELECT ROUND(MIN(preconormal), 2) minp,
ROUND(AVG(preconormal), 2) meanp,
ROUND(MAX(preconormal), 2) maxp,
codigowine
FROM precos
WHERE timestamp >= CURRENT_DATE - INTERVAL 30 DAY
GROUP BY codigowine
),
m7 AS (
WITH freq AS (
SELECT COUNT(*) freq,
ROUND(preconormal, 2) preconormal,
codigowine
FROM precos
WHERE timestamp >= CURRENT_DATE - INTERVAL 7 DAY
GROUP BY ROUND(preconormal, 2), codigowine
),
most AS (
SELECT MAX(freq) freq,
codigowine
FROM freq
GROUP BY codigowine
),
mode AS (
SELECT GROUP_CONCAT(preconormal ORDER BY preconormal DESC) modeps,
freq.codigowine
FROM freq
JOIN most ON freq.freq = most.freq
GROUP BY freq.codigowine
)
SELECT * FROM mode
)
SELECT v.codigowine, v.nomevinho, DATE(timestamp) AS data_adc,
s7.minp min_7_dias, s7.maxp max_7_dias, s7.meanp media_7_dias, m7.modeps moda_7_dias,
s15.minp min_15_dias, s15.maxp max_15_dias, s15.meanp media_15_dias,
s30.minp min_30_dias, s30.maxp max_30_dias, s30.meanp media_30_dias
FROM vinhos v
LEFT JOIN s7 ON v.codigowine = s7.codigowine
LEFT JOIN m7 ON v.codigowine = m7.codigowine
LEFT JOIN s15 ON v.codigowine = s15.codigowine
LEFT JOIN s30 ON v.codigowine = s30.codigowine
I'll leave it to you to do the modes for 15 and 30 days.
This is quite the query. You better hope the next guy to work on it doesn't curse your name. :-)

Calculating percentage between two subqueries mysql

I'm trying to calculate the percentage between two sub queries, I have a solution which to me doesn't seem very elegant at all:
SET #tot := (select count(*) FROM shipment WHERE created < date_format(date_add(CURRENT_TIMESTAMP(), interval 1 day), '%Y%m%d000000') AND created >= date_format(CURRENT_TIMESTAMP(), '%Y%m%d000000'))
SELECT #tot AS Total, ((select count(*) from shipment where created < date_format(date_add(CURRENT_TIMESTAMP(), interval 1 day), '%Y%m%d000000') AND created >= date_format(CURRENT_TIMESTAMP(), '%Y%m%d000000') AND state = 'despatched') / #tot) * 100 AS Percentage
While this works at command line, it fails miserably in a bespoke platform i'm trying to create a report for, i'm wondering if there is a way to simplify this without the usage of a set variable?
Thanks in advance.
I don't see any reason you couldn't do it simply as a SELECT from a nested subquery. (Untested in MySQL, but works in SQL Server with the proper date functions.)
select total, dispatched / total * 100 as percentage from
(
select count(*) as Total,
sum(case when state = 'despatched' then 1.0 else 0.0 end) as dispatched
from
shipment
where
created >= date_format(CURRENT_TIMESTAMP(), '%Y%m%d000000')
and
created < date_format(date_add(CURRENT_TIMESTAMP(), interval 1 day), '%Y%m%d000000')
) calcs

Incorrect values with multiple left joins (MySQL)

I am trying to make a report. It is supposed to give me a list of the machines at a specific customer and the sum of hours and material that was put in to that machine.
In the following examples, I select the sum of materials and hours in different fields to make the problem clearer. But i really want to sum the material an hours, then group them by the machine field.
I can query the list of machine and cost of hours without problems.
SELECT CONCAT(`customer`.`PREFIX`, `wo`.`machine_id`) AS `machine`,
ROUND(COALESCE(SUM(`wohours`.`length` * `wohours`.`price`), 0), 2) AS `hours`
FROM `wo`
JOIN `customer` ON `customer`.`id`=`wo`.`customer_id`
LEFT JOIN `wohours` ON `wohours`.`wo_id`=`wo`.`id` AND `wohours`.`wo_customer_id`=`wo`.`customer_id`
AND `wohours`.`wo_machine_id`=`wo`.`machine_id` AND `wohours`.`date`>=(CURDATE() - INTERVAL DAY(CURDATE() - INTERVAL 1 DAY) DAY) - INTERVAL 11 MONTH
WHERE `wo`.`customer_id`=1
GROUP BY `wo`.`machine_id`;
This gives me the correct values for hours. But when I add the material like this:
SELECT CONCAT(`customer`.`PREFIX`, `wo`.`machine_id`) AS `machine`,
ROUND(COALESCE(SUM(`wohours`.`length` * `wohours`.`price`), 0), 2) AS `hours`,
ROUND(COALESCE(SUM(`womaterial`.`multiplier` * `womaterial`.`price`), 0), 2) AS `material`
FROM `wo`
JOIN `customer` ON `customer`.`id`=`wo`.`customer_id`
LEFT JOIN `wohours` ON `wohours`.`wo_id`=`wo`.`id` AND `wohours`.`wo_customer_id`=`wo`.`customer_id`
AND `wohours`.`wo_machine_id`=`wo`.`machine_id` AND `wohours`.`date`>=(CURDATE() - INTERVAL DAY(CURDATE() - INTERVAL 1 DAY) DAY) - INTERVAL 11 MONTH
LEFT JOIN `womaterial` ON `womaterial`.`wo_id`=`wo`.`id` AND `womaterial`.`wo_customer_id`=`wo`.`customer_id`
AND `womaterial`.`wo_machine_id`=`wo`.`machine_id` AND `wohours`.`date`>=(CURDATE() - INTERVAL DAY(CURDATE() - INTERVAL 1 DAY) DAY) - INTERVAL 11 MONTH
WHERE `wo`.`customer_id`=1
GROUP BY `wo`.`machine_id`;
then both hour and material values are incorrect.
I have read other threads where people with similar problems could solve this by splitting it in multiple queries or subqueries. But I don't think that is possible in this case.
Any help is appreciated.
//John
Your other reading is correct. You will need to put them into their own "subquery" for the join. The reason you are probably getting invalid values is that the materials table has multiple records per machine, thus causing a Cartesian result from your original based on hours. And you don't know which has many vs just one making it look incorrect.
So, I've written, and each inner-most query for pre-aggregating the woHours and woMaterial will produce a single record per "wo_id and machine_id" to join back to the wo table when finished. Each of these queries has the criteria on the single customer ID you are trying to run it for.
Then, as re-joined to the work order (wo) table, it grabs all records and applies the ROUND() and COALESCE() in case no such hours or materials present. So this is a return of something like
WO Machine ID Machine Hours Material
1 1 CustX 1 2 0
2 4 CustY 4 2.5 6.5
3 4 CustY 4 1.2 .5
4 1 CustX 1 1.5 1.2
Finally, you can now roll up the SUM() of all these entries into a single row per machine ID
Machine Hours Material
CustX 1 3.5 1.2
CustY 4 3.7 7.0
SELECT
AllWO.Machine,
SUM( AllWO.Hours ) Hours,
SUM( AllWO.Material ) Material
from
( SELECT
wo.wo_id,
wo.Machine_ID,
CONCAT(customer.PREFIX, wo.machine_id) AS machine,
ROUND( COALESCE( PreSumHours.MachineHours, 0), 2) AS hours,
ROUND( COALESCE( PreSumMaterial.materialHours, 0), 2) AS material
FROM
wo
JOIN customer
ON wo.customer_id = customer.id
LEFT JOIN ( select wohours.wo_id,
wohours.wo_machine_id,
SUM( wohours.length * wohours.price ) as machinehours
from
wohours
where
wohours.wo_customer_id = 1
AND wohours.date >= ( CURDATE() - INTERVAL DAY( CURDATE() - INTERVAL 1 DAY) DAY) - INTERVAL 11 MONTH
group by
wohours.wo_id,
wohours.wo_machine_id ) as PreSumHours
ON wo.id = PreSumHours.wo_id
AND wo.machine_id = PreSumHours.wo_machine_id
LEFT JOIN ( select womaterial.wo_id,
womaterial.wo_machine_id,
SUM( womaterial.length * womaterial.price ) as materialHours
from
womaterial
where
womaterial.wo_customer_id = 1
AND womaterial.date >= ( CURDATE() - INTERVAL DAY( CURDATE() - INTERVAL 1 DAY) DAY) - INTERVAL 11 MONTH
group by
womaterial.wo_id,
womaterial.wo_machine_id ) as PreSumMaterial
ON wo.id = PreSumMaterial.wo_id
AND wo.machine_id = PreSumMaterial.wo_machine_id
WHERE
wo.customer_id = 1 ) AllWO
group by
AllWO.Machine_ID

Count timestamps with offset between two databases in a view

I am trying to count time stamps between two databases but one has overlapping time stamps, due to not my design flaw.
SELECT date(time + INTERVAL 8 HOUR) as day, COUNT(DISTINCT comment)
FROM news.data
GROUP BY day
UNION ALL
SELECT date(time + INTERVAL 8 HOUR) as day, COUNT(DISTINCT comment)
FROM`news-backup`.`data`
GROUP BY day
ORDER BY year(day) desc, day(day) DESC
LIMIT 20
What seems to happen, there are some timestamps in range of both databases so they produce separate counts for certain dates. So it would give count for TODAY from news and news-backup
EX:
date count
2013-1-15 10
2013-1-15 13
2013-1-14 8
2013-1-13 15
What I want is
EX:
date count
2013-1-15 23
2013-1-14 8
2013-1-13 15
Here is a kicker, I need it in a view, so there are some limitations with that (no subqueries allowed). Thoughts? And no I cannot change the data dump sequence that happens between to DBs
You can't put a subquery in a view, but you can put a view in a view.
So:
create view1 as
SELECT date(time + INTERVAL 8 HOUR) as day, 'current' as which, COUNT(DISTINCT comment) as cnt
FROM news.data
GROUP BY day
UNION ALL
SELECT date(time + INTERVAL 8 HOUR) as day, 'backup' as which, COUNT(DISTINCT comment) as cnt
FROM`news-backup`.`data`
GROUP BY day, which
I'm not sure what you logic for combining them is:
create view2 as
select day, max(cnt) -- sum(cnt)? prefer current or backup?
from view1
group by day
ORDER BY day desc
The documentation that bans subqueries is here. Be sure to search for "The SELECT statement cannot contain".
If you have a table of all the dates, the following "absurd" SQL might work:
select c.date,
coalesce( (select count(distinct comment) from news.data where date(time + INTERVAL 8 HOUR) = c.date),
(select count(distinct comment) from news_backup.data where date(time + INTERVAL 8 HOUR) = c.date)
) as NumComments
from calendar c
This version is assuming you want the "new" first, then the backup. If you want the sum, then you would add them.

MySql Query: include days that have COUNT(id) == 0 but only in the last 30 days

I am doing a query to get the number of builds per day from our database for the last 30 days. But it has become needed to marked days where there were no builds also.
In my WHERE clause I use submittime to determine whether there were builds, how could I modify this to include days that have COUNT(id) == 0 but only in the last 30 days.
Original Query:
SELECT COUNT(id) AS 'Past-Month-Builds',
CONCAT(MONTH(submittime), '-', DAY(submittime)) as 'Month-Day'
FROM builds
WHERE DATE(submittime) >= DATE_SUB(CURDATE(), INTERVAL 30 day)
GROUP BY MONTH(submittime), DAY(submittime);
What I've Tried:
SELECT COUNT(id) AS 'Past-Month-Builds',
CONCAT(MONTH(submittime), '-', DAY(submittime)) as 'Month-Day'
FROM builds
WHERE DATE(submittime) >= DATE_SUB(CURDATE(), INTERVAL 30 day)
OR COUNT(id) = 0
GROUP BY MONTH(submittime), DAY(submittime);
You need a table of dates, then left join to the builds table.
Something like this:
SELECT
COUNT(id) AS 'Past-Month-Builds',
CONCAT(MONTH(DateTable.Date), '-', DAY(DateTable.Date)) as 'Month-Day'
FROM DateTable
LEFT JOIN builds ON DATE(builds.submittime) = DateTable.Date
WHERE DateTable.Date >= DATE_SUB(CURDATE(), INTERVAL 30 day)
GROUP BY MONTH(submittime), DAY(submittime);