Incorrect values with multiple left joins (MySQL) - mysql

I am trying to make a report. It is supposed to give me a list of the machines at a specific customer and the sum of hours and material that was put in to that machine.
In the following examples, I select the sum of materials and hours in different fields to make the problem clearer. But i really want to sum the material an hours, then group them by the machine field.
I can query the list of machine and cost of hours without problems.
SELECT CONCAT(`customer`.`PREFIX`, `wo`.`machine_id`) AS `machine`,
ROUND(COALESCE(SUM(`wohours`.`length` * `wohours`.`price`), 0), 2) AS `hours`
FROM `wo`
JOIN `customer` ON `customer`.`id`=`wo`.`customer_id`
LEFT JOIN `wohours` ON `wohours`.`wo_id`=`wo`.`id` AND `wohours`.`wo_customer_id`=`wo`.`customer_id`
AND `wohours`.`wo_machine_id`=`wo`.`machine_id` AND `wohours`.`date`>=(CURDATE() - INTERVAL DAY(CURDATE() - INTERVAL 1 DAY) DAY) - INTERVAL 11 MONTH
WHERE `wo`.`customer_id`=1
GROUP BY `wo`.`machine_id`;
This gives me the correct values for hours. But when I add the material like this:
SELECT CONCAT(`customer`.`PREFIX`, `wo`.`machine_id`) AS `machine`,
ROUND(COALESCE(SUM(`wohours`.`length` * `wohours`.`price`), 0), 2) AS `hours`,
ROUND(COALESCE(SUM(`womaterial`.`multiplier` * `womaterial`.`price`), 0), 2) AS `material`
FROM `wo`
JOIN `customer` ON `customer`.`id`=`wo`.`customer_id`
LEFT JOIN `wohours` ON `wohours`.`wo_id`=`wo`.`id` AND `wohours`.`wo_customer_id`=`wo`.`customer_id`
AND `wohours`.`wo_machine_id`=`wo`.`machine_id` AND `wohours`.`date`>=(CURDATE() - INTERVAL DAY(CURDATE() - INTERVAL 1 DAY) DAY) - INTERVAL 11 MONTH
LEFT JOIN `womaterial` ON `womaterial`.`wo_id`=`wo`.`id` AND `womaterial`.`wo_customer_id`=`wo`.`customer_id`
AND `womaterial`.`wo_machine_id`=`wo`.`machine_id` AND `wohours`.`date`>=(CURDATE() - INTERVAL DAY(CURDATE() - INTERVAL 1 DAY) DAY) - INTERVAL 11 MONTH
WHERE `wo`.`customer_id`=1
GROUP BY `wo`.`machine_id`;
then both hour and material values are incorrect.
I have read other threads where people with similar problems could solve this by splitting it in multiple queries or subqueries. But I don't think that is possible in this case.
Any help is appreciated.
//John

Your other reading is correct. You will need to put them into their own "subquery" for the join. The reason you are probably getting invalid values is that the materials table has multiple records per machine, thus causing a Cartesian result from your original based on hours. And you don't know which has many vs just one making it look incorrect.
So, I've written, and each inner-most query for pre-aggregating the woHours and woMaterial will produce a single record per "wo_id and machine_id" to join back to the wo table when finished. Each of these queries has the criteria on the single customer ID you are trying to run it for.
Then, as re-joined to the work order (wo) table, it grabs all records and applies the ROUND() and COALESCE() in case no such hours or materials present. So this is a return of something like
WO Machine ID Machine Hours Material
1 1 CustX 1 2 0
2 4 CustY 4 2.5 6.5
3 4 CustY 4 1.2 .5
4 1 CustX 1 1.5 1.2
Finally, you can now roll up the SUM() of all these entries into a single row per machine ID
Machine Hours Material
CustX 1 3.5 1.2
CustY 4 3.7 7.0
SELECT
AllWO.Machine,
SUM( AllWO.Hours ) Hours,
SUM( AllWO.Material ) Material
from
( SELECT
wo.wo_id,
wo.Machine_ID,
CONCAT(customer.PREFIX, wo.machine_id) AS machine,
ROUND( COALESCE( PreSumHours.MachineHours, 0), 2) AS hours,
ROUND( COALESCE( PreSumMaterial.materialHours, 0), 2) AS material
FROM
wo
JOIN customer
ON wo.customer_id = customer.id
LEFT JOIN ( select wohours.wo_id,
wohours.wo_machine_id,
SUM( wohours.length * wohours.price ) as machinehours
from
wohours
where
wohours.wo_customer_id = 1
AND wohours.date >= ( CURDATE() - INTERVAL DAY( CURDATE() - INTERVAL 1 DAY) DAY) - INTERVAL 11 MONTH
group by
wohours.wo_id,
wohours.wo_machine_id ) as PreSumHours
ON wo.id = PreSumHours.wo_id
AND wo.machine_id = PreSumHours.wo_machine_id
LEFT JOIN ( select womaterial.wo_id,
womaterial.wo_machine_id,
SUM( womaterial.length * womaterial.price ) as materialHours
from
womaterial
where
womaterial.wo_customer_id = 1
AND womaterial.date >= ( CURDATE() - INTERVAL DAY( CURDATE() - INTERVAL 1 DAY) DAY) - INTERVAL 11 MONTH
group by
womaterial.wo_id,
womaterial.wo_machine_id ) as PreSumMaterial
ON wo.id = PreSumMaterial.wo_id
AND wo.machine_id = PreSumMaterial.wo_machine_id
WHERE
wo.customer_id = 1 ) AllWO
group by
AllWO.Machine_ID

Related

How to select records but exclude if one type is outside a subquery?

We have multiple invStatus values (1-10) and want to exclude only one status type (1) BUT only those of that type that are a older than X number of days. So all records will show but NOT those who's invStatus = 1 and is older than X days. invStatus = 1 and younger than X days will be included in the recordset.
Do I select all records generically, then in a subquery filter those of status = 1 that are older than X days?
The query below uses NOT IN in an attempt to select those records to exclude but it is not working and also seems to be inefficient as it takes a couple seconds to execute.
SELECT
tblinventory.invId,
tblinventory.invTitle,
tblinventory.invStatus,
tblhouseinfo.Address,
tblhouseinfo.City,
tblhouseinfo.`State`,
tblhouseinfo.Zip,
tblhouseinfo.Update_date,
CURRENT_DATE() - INTERVAL 10 DAY AS dateEx
FROM
tblinventory
LEFT OUTER JOIN tblhouseinfo ON tblinventory.invId = tblhouseinfo.addInfoID
WHERE
invReleased = 0
AND invStatus NOT IN (SELECT invId from tblhouseinfo WHERE invStatus = 1
AND tblhouseinfo.Update_date < CURRENT_DATE() - INTERVAL 10 DAY )
ORDER BY
`tblhouseinfo`.`Update_date` DESC
I could filter the results with PHP on the page level but this also seems less than efficient and would prefer to perform this task using the best practices.
UPDATE:
There are a total of 155 rows.
All tblhouseinfo.Update_date (timestamp) values are "2017-09-06 10:53:17" (Aug 9th) accept three I changed for testing to "2017-07-06 10:53:17
" (July 6th)
Utilizing the suggestion for :
AND NOT (invStatus = 1 AND tblhouseinfo.Update_date > CURRENT_DATE() - INTERVAL 10 DAY )
60 records are excluded not the expected 3.
"2017-08-28" is the current result from CURRENT_DATE() - INTERVAL 10 DAY which should be within the 10 day range to select "2017-09-06 10:53:17" and only exclude the three records that are "2017-07-06 10:53:17"
FINAL WORKING SOLUTION/Query:
SELECT
tblinventory.invId,
tblinventory.invTitle,
tblinventory.invStatus,
tblhouseinfo.Address,
tblhouseinfo.City,
tblhouseinfo.`State`,
tblhouseinfo.Zip,
tblhouseinfo.Update_date,
CURRENT_DATE() - INTERVAL 10 DAY AS dateEx
FROM
tblinventory
LEFT OUTER JOIN tblhouseinfo ON tblinventory.invId = tblhouseinfo.addInfoID
WHERE
invReleased = 0
AND NOT (invStatus = 1 AND tblhouseinfo.Update_date < CURRENT_DATE() - INTERVAL 10 DAY )
ORDER BY
`tblhouseinfo`.`Update_date` DESC
SELECT
tblinventory.invId,
tblinventory.invTitle,
tblinventory.invStatus,
tblhouseinfo.Address,
tblhouseinfo.City,
tblhouseinfo.`State`,
tblhouseinfo.Zip,
tblhouseinfo.Update_date,
CURRENT_DATE() - INTERVAL 10 DAY AS dateEx
FROM
tblinventory
LEFT OUTER JOIN tblhouseinfo ON tblinventory.invId = tblhouseinfo.addInfoID
WHERE
invReleased = 0
AND NOT (invStatus = 1 AND tblhouseinfo.Update_date < CURRENT_DATE() - INTERVAL 10 DAY )
ORDER BY
`tblhouseinfo`.`Update_date` DESC
You don't need to select invID from the other table if you know you never want the ID #1 (invStatus 1). But you can also throw in an AND statement for the # of days.
I always use timestamps (in UNIX) for recording data entry / modification.
AND (timestamp >= beginTimestamp AND timeStamp <= endTimestamp)

Aggregating table data in MySQL, is there an easier way to do this?

I'm trying to write a query that aggregates data from a table.
Essentially I have a long list of devices that have been inventoried and eventually installed over the last couple of years.
I want to find the average amount of time between when the device was received and when it was installed, and then have that data sorted by the month the device was installed. BUT in each month's row, I also want to include the data from the previous months.
So essentially what I want to see is: (sorry for terrible formatting)
MonthInstalled | TimeToInstall | Total#Devices
-----------------+---------------+----------------------------
Jan | 10 Days | 5
Feb(=Jan+Feb) | 15 Days | 18 (5 in Jan + 13 in Feb)
Mar(=Jan+Feb+Mar)| 13 Days | 25 (5 + 13 + 7)
...
The query I currently have written looks like this:
INSERT INTO DevicesInstall
SELECT ROUND(AVG(DATEDIFF(dvc.dt_install , dvc.dt_receive)), 1) AS 'Install',
COUNT(dvc.dvc_model) AS 'Total Devices',
MAX(dvc.dt_install) AS 'Date',
loc.loc_campus AS 'Campus'
FROM dvc_info dvc, location loc
WHERE dvc.dvc_loc_bin = loc.loc_bin
AND dvc.dt_install < '20160201'
;
Although this is functional, I have to iterate this for each month manually, so it is not scale-able. Is there a way to condense this at all?
We can return the dates using an inline view (derived table), and then join to the dvc_info table, so we can get the "cumulative" results.
To get the results for:
Jan
Jan+Feb
Jan+Feb+Mar
We need to return three copies of the rows for Jan, and two copies of the rows for Feb, and then collapse the those rows into an appropriate group.
The loc_campus is being included in the SELECT list... not clear why that is needed. If we want results "by campus", then we need to include that expression in the GROUP BY clause. Otherwise, the value returned for that non-aggregate is indeterminate... we will get a value for some row "in the group", but it could be any row.
Something like this:
SELECT d.dt AS `before_date`
, loc.loc_campus AS `Campus`
, ROUND(AVG(DATEDIFF(dvc.dt_install,dvc.dt_receive)),1) AS `Install`
, COUNT(dvc.dvc_model) AS `Total Devices`
, MAX(dvc.dt_install) AS `latest_dt_install`
FROM ( SELECT '2016-01-01' + INTERVAL 1 MONTH AS dt
UNION ALL SELECT '2016-01-01' + INTERVAL 2 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 3 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 4 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 5 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 6 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 7 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 8 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 9 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 10 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 11 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 12 MONTH
) d
CROSS
JOIN location loc
LEFT
JOIN dvc_info dvc
ON dvc.dvc_loc_bin = loc.loc_bin
AND dvc.dt_install < d.dt
GROUP
BY d.dt
, loc.loc_campus
ORDER
BY d.dt
, loc.loc_campus
Note that the value returned for d.dt will be the "up until" date. We're going to get '2016-02-01' returned for the January results. If we want to return a value of January date, we can use an expression in the SELECT list...
SELECT DATE_FORMAT(d.dt + INTERVAL -1 MONTH,'%Y-%m') AS `month`
Lots of options on query alternatives.
But it looks like the "big hump" is that to get cumulative results, we need to return multiple copies of the dvc_info rows, so the rows can be collapsed into each "grouping".
I recommend working on just the SELECT first. And get that tested working, before monkeying around to turn it into an INSERT ... SELECT.
FOLLOWUP
We can use any query as an inline view (derived table d) that returns a set of dates we want.
e.g.
FROM ( SELECT DATE_FORMAT(m.install_dt,'%Y-%m-01') + INTERVAL 1 MONTH AS dt
FROM dvc_install m
WHERE m.install_dt >= '2016-01-01'
GROUP BY DATE_FORMAT(m.install_dt,'%Y-%m-01') + INTERVAL 1 MONTH
) d
Note that with this approach, if there are no install_dt in February, we won't get back a row for February. Using the static UNION ALL SELECT approach allows us to get back "zero" counts, i.e. to return rows for months where there isn't an install_dt in that month. (But that's the answer to a different question... how do I get back a "zero" count for February when there aren't any rows for Februrary?)
Alternatively, if we have a calendar table e.g. cal that contains a list of the dates we want, we could just reference the table in place of the inline view, or the inline view query could get rows from that.
FROM ( SELECT cal.dt
FROM cal cal
WHERE cal.dt >= '2016-01-01'
AND cal.dt <= NOW()
AND DATE_FORMAT(cal.dt,'%d') = '01'
) d

Return a zero for a day with no results

I have a query which returns the total of users who registered for each day. Problem is if a day had no one register it doesn't return any value, it just skips it. I would rather it returned zero
this is my query so far
SELECT count(*) total FROM users WHERE created_at < NOW() AND created_at >
DATE_SUB(NOW(), INTERVAL 7 DAY) AND owner_id = ? GROUP BY DAY(created_at)
ORDER BY created_at DESC
Edit
i grouped the data so i would get a count for each day- As for the date range, i wanted the total users registered for the previous seven days
A variation on the theme "build your on 7 day calendar inline":
SELECT D, count(created_at) AS total FROM
(SELECT DATE_SUB(NOW(), INTERVAL D DAY) AS D
FROM
(SELECT 0 as D
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
) AS D
) AS D
LEFT JOIN users ON date(created_at) = date(D)
WHERE owner_id = ? or owner_id is null
GROUP BY D
ORDER BY D DESC
I don't have your table structure at hand, so that would need adjustment probably. In the same order of idea, you will see I use NOW() as a reference date. But that's easily adjustable. Anyway that's the spirit...
See for a live demo http://sqlfiddle.com/#!2/ab5cf/11
If you had a table that held all of your days you could do a left join from there to your users table.
SELECT SUM(CASE WHEN U.Id IS NOT NULL THEN 1 ELSE 0 END)
FROM DimDate D
LEFT JOIN Users U ON CONVERT(DATE,U.Created_at) = D.DateValue
WHERE YourCriteria
GROUP BY YourGroupBy
The tricky bit is that you group by the date field in your data, which might have 'holes' in it, and thus miss records for that date.
A way to solve it is by filling a table with all dates for the past 10 and next 100 years or so, and to (outer)join that to your data. Then you will have one record for each day (or week or whatever) for sure.
I had to do this only for MS SqlServer, so how to fill a date table (or perhaps you can do it dynamically) is for someone else to answer.
A bit long winded, but I think this will work...
SELECT count(users.created_at) total FROM
(SELECT DATE_SUB(CURDATE(),INTERVAL 6 DAY) as cdate UNION ALL
SELECT DATE_SUB(CURDATE(),INTERVAL 5 DAY) UNION ALL
SELECT DATE_SUB(CURDATE(),INTERVAL 4 DAY) UNION ALL
SELECT DATE_SUB(CURDATE(),INTERVAL 3 DAY) UNION ALL
SELECT DATE_SUB(CURDATE(),INTERVAL 2 DAY) UNION ALL
SELECT DATE_SUB(CURDATE(),INTERVAL 1 DAY) UNION ALL
SELECT CURDATE()) t1 left join users
ON date(created_at)=t1.cdate
WHERE owner_id = ? or owner_id is null
GROUP BY t1.cdate
ORDER BY t1.cdate DESC
It differs from your query slightly in that it works on dates rather than date times which your query is doing. From your description I have assumed you mean to use whole days and therefore have used dates.

Calculating Average over different intervals

In mysql, I am calculating averages of the same metric over different intervals (3 Day, 7 Day, 30 Day, 60 Day, etc...), and I need the results to be in a single line per id.
Currently, I am using a Join per each interval. Given that I have to compute this for many different stores, and over several different intervals, is there a cleaner and/or more efficient way of accomplishing this?
Below is the code I am currently using.
Thanks in advance for the help
SELECT T1.id, T1.DailySales_3DayAvg, T2.DailySales_7DayAvg
FROM(
SELECT id, avg(DailySales) as 'DailySales_3DayAvg'
FROM `SalesTable`
WHERE `Store`=2
AND `Date` >= DATE_SUB('2012-07-28', INTERVAL 3 DAY)
AND `Date` < '2012-07-28'
) AS T1
JOIN(
SELECT id, avg(DailySales) as 'DailySales_7DayAvg'
FROM `SalesTable`
WHERE `Store`=2
AND `Date` >= DATE_SUB('2012-07-28', INTERVAL 7 DAY)
AND `Date` < '2012-07-28'
) AS T2
ON T1.ArtistId = T2.ArtistId
Where the results are:
id DailySales_3DayAvg DailySales_7DayAvg
3752 1234.56 1114.78
...
You can use a query like this -
SELECT
id,
SUM(IF(date >= '2012-07-28' - INTERVAL 3 DAY, DailySales, 0)) /
COUNT(IF(date >= '2012-07-28' - INTERVAL 3 DAY, 1, NULL)) 'DailySales_3DayAvg',
SUM(IF(date >= '2012-07-28' - INTERVAL 7 DAY, DailySales, 0)) /
COUNT(IF(date >= '2012-07-28' - INTERVAL 7 DAY, 1, NULL)) 'DailySales_7DayAvg'
FROM
SalesTable
WHERE
Store = 2 AND Date < '2012-07-28'
GROUP BY
id
I don't think you can do this in any other way if you want to pull real-time data. However, if you can afford displaying slightly outdated data, you could pre-calculate these average (like once or twice a day) for each item.
You may want to look into the Event Scheduler, which allows you to keep everything inside MySQL.

Count timestamps with offset between two databases in a view

I am trying to count time stamps between two databases but one has overlapping time stamps, due to not my design flaw.
SELECT date(time + INTERVAL 8 HOUR) as day, COUNT(DISTINCT comment)
FROM news.data
GROUP BY day
UNION ALL
SELECT date(time + INTERVAL 8 HOUR) as day, COUNT(DISTINCT comment)
FROM`news-backup`.`data`
GROUP BY day
ORDER BY year(day) desc, day(day) DESC
LIMIT 20
What seems to happen, there are some timestamps in range of both databases so they produce separate counts for certain dates. So it would give count for TODAY from news and news-backup
EX:
date count
2013-1-15 10
2013-1-15 13
2013-1-14 8
2013-1-13 15
What I want is
EX:
date count
2013-1-15 23
2013-1-14 8
2013-1-13 15
Here is a kicker, I need it in a view, so there are some limitations with that (no subqueries allowed). Thoughts? And no I cannot change the data dump sequence that happens between to DBs
You can't put a subquery in a view, but you can put a view in a view.
So:
create view1 as
SELECT date(time + INTERVAL 8 HOUR) as day, 'current' as which, COUNT(DISTINCT comment) as cnt
FROM news.data
GROUP BY day
UNION ALL
SELECT date(time + INTERVAL 8 HOUR) as day, 'backup' as which, COUNT(DISTINCT comment) as cnt
FROM`news-backup`.`data`
GROUP BY day, which
I'm not sure what you logic for combining them is:
create view2 as
select day, max(cnt) -- sum(cnt)? prefer current or backup?
from view1
group by day
ORDER BY day desc
The documentation that bans subqueries is here. Be sure to search for "The SELECT statement cannot contain".
If you have a table of all the dates, the following "absurd" SQL might work:
select c.date,
coalesce( (select count(distinct comment) from news.data where date(time + INTERVAL 8 HOUR) = c.date),
(select count(distinct comment) from news_backup.data where date(time + INTERVAL 8 HOUR) = c.date)
) as NumComments
from calendar c
This version is assuming you want the "new" first, then the backup. If you want the sum, then you would add them.