MySQL Couting Distinct values when grouping by dates - mysql

I do have two tables, one is a calendar table with a DATE column, and the other contains ID's and three DATES for each ID.
Calendar Table:
dt
2016-01-01
2016-01-02
2016-01-03
2016-01-04
...
Data Table:
ID d_created d_forwarded d_solved
1 2016-01-01 2016-01-02 2016-01-03
2 2016-01-01 2016-01-02 2016-01-03
3 2016-01-02 2016-01-02 2016-01-04
4 2016-01-03 2016-01-04 2016-01-05
...
The Data Table does in reality contain a multitude of other fields, but I think that is irrelevant for my question. I have a query which selects the DATE field from the calendar table for a given range, let's say a month, and then I do a LEFT JOIN with the Data Table using all three DATE fields combined with OR, because I need to count multiple things from the Data Table, some depending on the d_created, some depending on the d_forwarded and others on d_solved:
SELECT
tc.dt AS dt,
COUNT(DISTINCT(CASE
WHEN td.id != 0
AND DATE(td.d_solved) >= '2016-01-01'
AND DATE(td.d_solved) <= '2016-01-31'
THEN td.id
ELSE NULL
END)) AS result1
.... more stmts ...
FROM calendar_table tc
LEFT JOIN data_table td ON tc.dt = DATE(td.d_created) OR tc.dt = DATE(td.d_solved) OR tc.dt = DATE(td.d_forwarded)
Now here's my problem: The query delivers the correct output, when I do not group my results by tc.dt, but as soon as I group it by tc.dt, the results are incorrect. I am by no means an SQL expert, but as far as I understand it, td.id will occur more than once due to the JOIN, and as long as I have a single result row, the DISTINCT prevents an ID from being counted twice.
I need to be able to count all ID's which have been created, solved or forwarded within my date range, and I also need the calendar table join because I would like to display each day in the range, even though there might be no matching dates in my data table for a particular day, if that makes sense.
Is there any way I can make sure that no ID is counted more than one time when grouping by days ?
I hope I could make clear what the exact problem is, if not, please let me know and I try to elaborate in more detail.
UPDATE
I tried using SarathChandra example which looks quite promising and it does indeed deliver results, however as soon as I add more criteria to my CASE WHEN statement, it does not work the way it should. I forked and modified SarathChandra's ideaone fiddle HERE
So it should return 1 for the 2016-01-02 date but it shows a 0 ?
UPDATE 2
Unfortunately, none of the provided answers was able to solve the underlying problem. While both suggestions were appreciated a lot, I ended up splitting the query into three queries, each time joining the calendar table with the same range of dates, and then combining the arrays in PHP to a single result set.

I have made some assumptions regarding the data in arriving at the following solution:
d_created shall precede d_forwarded, which in turn shall precede d_solved.
In order to remove duplicate counts, that is, count each record only once, I am joining on the basis of the least of the three dates.
The below query seems to be working fine for me.
SELECT
tc.dt AS dt,
COUNT(
CASE WHEN DATE(td.d_created) BETWEEN '2016-01-01' AND '2016-01-31' THEN td.id
ELSE NULL END) AS `Count`
FROM calendar_table tc
LEFT JOIN data_table td ON
(tc.dt = LEAST(DATE(td.d_created), DATE(td.d_solved), DATE(td.d_forwarded)))
GROUP BY tc.dt;
UPDATE: Working example code here.

The problem here is that you join on multiple columns, so when you group on date you'll for example get the ID 1 for the '2016-01-01' (created), '2016-01-02' (solved) and '2016-01-03' (forwarded).
You could try to join to the same table 3 times and count the results in 3 columns. The sum of each column should then match the number of records.
SQL Fiddle Example
Query:
SELECT tc.dt AS dt,
COUNT(DISTINCT(CASE WHEN td_solved.id != 0
AND DATE(td_solved.d_solved) >= '2016-01-01'
AND DATE(td_solved.d_solved) <= '2016-01-31' THEN td_solved.id ELSE NULL END)) AS solved,
COUNT(DISTINCT(CASE WHEN td_created.id != 0
AND DATE(td_created.d_created) >= '2016-01-01'
AND DATE(td_created.d_created) <= '2016-01-31' THEN td_created.id ELSE NULL END)) AS created,
COUNT(DISTINCT(CASE WHEN td_forwarded.id != 0
AND DATE(td_forwarded.d_forwarded) >= '2016-01-01'
AND DATE(td_forwarded.d_forwarded) <= '2016-01-31' THEN td_forwarded.id ELSE NULL END)) AS forwarded
FROM calendar_table tc
LEFT JOIN data_table td_created ON tc.dt = DATE(td_created.d_created)
LEFT JOIN data_table td_solved ON tc.dt = DATE(td_solved.d_solved)
LEFT JOIN data_table td_forwarded ON tc.dt = DATE(td_forwarded.d_forwarded)
GROUP BY 1 WITH ROLLUP

Related

Select ... CASE WHEN ... ORDER BY alias

Here are my tables:
Table PROGRAMME prg
prg_id
ln1_id
pmt_id
prg_commission
dep_id
Table COMMISSION com
com_id
pmt_id
ln1_id
dep_id
com_commission
Table PROMOTEUR pmt
pmt_id
pmt_txcommission
I need to get the "commission" of the table programme
But when it's null or empty (prg.commission), i need to get the "commission" value from the table "Commission" (com.commission - by the ln1_id, pmt_id and dep_id matching from both table).
If there is no result matching (no result in table COMMISSION where com.ln1_id = prg.ln1_id AND com.pmt_id = prg.ln1_id and com.dep_id = prg.dep_id) i need to get the "Commission" of PROMOTEUR (pmt.commission)
I don't really have idea how to do it in Sql... it would be easier in PHP condition but i have to do it on MySQL because after getting the good value of "Commission" for each of my programmes, i need to ORDER them by ASC...
I'm not sure if i'm easily understandable (english not my native language). Here is an exemple of what i tried (not successfully sadly) because there is a bit too much condition for me !
SELECT prg.commision AS commission, pmt.commission AS commission, com.commission AS commission
FROM (((PROGRAMME prg
LEFT JOIN LOINIVEAU1 ln1 ON ln1.ln1_id = prg.ln1_id)
LEFT JOIN PROMOTEUR pmt ON pmt.pmt_id = prg.pmt_id)
LEFT JOIN COMMISSION com ON com.pmt_id = pmt.pmt_id)
WHERE
CASE prg.comission != null
THEN prg.comission
ELSE CASE com.commission != null
THEN com.commission
ELSE pmt.commission
THEN pmt.commission
ORDER BY commission ASC
Is this what you want:
SELECT COALESCE(prg.commission, com.commission, pmt.commission) resolved_commission
FROM PROGRAMME prg
LEFT JOIN COMMISSION com
ON com.pmt_id = prg.pmt_id
AND com.ln1_id = prg.ln1_id
AND com.dep_id = prg.dep_id
LEFT JOIN PROMOTEUR pmt
ON pmt.pmt_id = prg.pmt_id
ORDER BY resolved_commission
I have taken the following steps:
Removed the LEFT JOIN to LOINIVEAU1 as this does not appear to be necessary
Updated the other LEFT JOIN conditions to reflect those in your description
Used COALESCE to return the first not null value from the comma-separated columns.. this replaces the CASE statement that you have incorrectly placed in the WHERE clause.
Removed the unnecessary parentheses.
As another pointer.. never use != NULL use IS NOT NULL instead.. as it turns out I did not need this in my solution.
UPDATE
Following further information:
SELECT CASE
WHEN prg.commission > 0
AND prg.commissionstart >= CURDATE()
AND prg.commissionend < CURDATE() + INTERVAL 1 DAY
THEN prg.commission
WHEN com.commission > 0
THEN com.commission
WHEN pmt.commission > 0
THEN pmt.commission
ELSE 0 /* Whatever you want, probably 0 */
END resolved_commission
FROM PROGRAMME prg
JOIN PROGRAMME_DEPARTMENT prg_dep
ON prg_dep.prg_id = prg.id /* Guessing the JOIN here */
LEFT JOIN COMMISSION com
ON com.pmt_id = prg.pmt_id
AND com.ln1_id = prg.ln1_id
AND com.dep_id = prg_dep.dep_id
LEFT JOIN PROMOTEUR pmt
ON pmt.pmt_id = prg.pmt_id
AND pmt.commissionstart >= CURDATE()
AND pmt.commissionend < CURDATE() + INTERVAL 1 DAY
ORDER BY resolved_commission
Should get you a little closer..
You could replace x > 0 with x IS NOT NULL and x > 0 in the CASE for added clarity.
I would also seriously consider placing all the commissions in the COMMISSION table with a start and end and replacing prg.commission* and pmt.commission* with a link through an intermediate to this table.. this way you can rid yourself of all the 0 values and use a LEFT JOIN with COALESCE to get the resolved_commission.
PERFORMANCE TWEAKS
Use EXPLAIN [EXTENDED] ... to see how your query is being executed and play around with composite indexing combinations of the following columns on each table:
PROGRAMME: ([id,], pmt_id, ln1_id, commission, commissionstart, commissionend)
PROGRAMME_DEPARTMENT: (pmt_id, ln1_id, dep_id)
COMMISSION: (pmt_id, ln1_id, dep_id, commission)
PROMOTEUR: (pmt_id, commission, commissionstart, commissionend)

MySQL SUM not working correctly after JOIN

I have 2 tables that look like the following:
TABLE 1 TABLE 2
user_id | date accountID | date | hours
And I'm trying to add up the hours by the week. If I use the following statement I get the correct results:
SELECT
SUM(hours) as totalHours
FROM
hours
WHERE
accountID = 244
AND
date >= '2014-02-02' and date < '2014-02-09'
GROUP BY
accountID
But when I join the two tables I get a number like 336640 when it should be 12
SELECT
SUM(hours) as totalHours
FROM
hours
JOIN table1 ON
user_id = accountID
WHERE
accountID = 244
AND
date >= '2014-02-02' and date < '2014-02-09'
GROUP BY
accountID
Does anyone know why this is?
EDIT: Turns out I just needed to add DISTINC, thanks!
JOIN operations usually generate more rows in the result table: join's result is a row for every possible pair of rows in the two joined tables that happens to meet the criterion selected in the ON clause. If there are multiple rows in table1 that match each row in hours, the result of your join will repeat hours.accountID and hours.hours many times. So, adding up the hours yields a high result.
The reason is that the table you are joining to matches multiple rows in the first table. These all get added together.
The solution is to do the aggregation in a subquery before doing the join:
select totalhours
from (SELECT SUM(hours) as totalHours
FROM hours
WHERE accountID = 244 AND
date >= '2014-02-02' and date < '2014-02-09'
GROUP BY accountID
) h join
table1 t1
on t1.user_id = h.accountID;
I suspect your actual query is more complicated. For instance, table1 is not referenced in this query so the join is only doing filtering/duplication of rows. And the aggregation on hours is irrelevant when you are choosing only one account.
You should probably be specifying LEFT JOIN to be sure that it won't eliminate rows that don't match.
Also, date BETWEEN ? AND ? is preferable to date >= ? AND date < ?.

Count by month with only two date fields - IN and OUT

Haven't been able to find an answer to this specific issue. Need a total count of inventory grouped by month on different products. Source data has date fields, one for IN and one for OUT. Total count for a specific month would include an aggregate sum of all rows with an IN date prior to specific month as long as the out date is null or a date after the specific month.
Obviously I can get a count for any given month by writing a query for count(distinct productID) with a WHERE clause stating that the IN Date be before the month I'm interested in (IE September 2012) AND the Out Date is null or after 9/2012:
Where ((in_date <= '2012-09-30') AND (out_date >= '2012-09-01' or out_date is null))
If the product was even part of inventory for one day in September I want it to count which is why out date above 9/1/12. Sample data below. Instead of querying for a specific month, how can I turn this:
Raw Data - Each Row Is Individual Item
InDate OutDate ProductAttr ProductID
2008-04-05 NULL Blue 101
2008-06-04 NULL Red 125
2008-01-01 2012-06-01 Blue 134
2008-12-10 2012-10-09 Red 129
2009-10-15 2012-11-01 Blue 153
2012-10-01 2013-06-01 Red 149
Into this?:
Date ProductAttr Count
2008-04 Blue 503
2008-04 Red 1002
2008-05 Blue 94
2008-05 Red 3004
2008-06 Blue 2000
2008-06 Red 322
Through grouping I can get the raw data into this format grouped by months:
InDate OutDate Value Count
2008-05 2012-05 Blue 119
2008-05 2008-06 Red 333
2008-05 2012-10 Blue 4
2008-05 NULL Red 17488
2008-06 2012-11 Blue 711
2008-06 2013-02 Red 34
If you wanted to know how many products were 'IN' as of Oct. 2012- you would sum the counts of all rows except for 2. Group on Value to keep blue and red separate. Row 2 is ruled out because OutDate is before Oct. 2012.
Thank in advance.
EDIT:
Gordon Linoff's solution works just like I need it to. The only issue I am having now is the size and efficiency of the query, because the part I left out above is that the product attribute is actually located in a different table then the IN/OUT dates and I also need to join a third table to limit to a certain type of product (ForSale for example). I have tried two different approaches and they both work and return the same data, but both take far too long to automate this report:
select months.mon, count(distinct d.productID), d.ProductAttr
from (select '2008-10' as mon union all
select '2008-11' union all
select '2008-12' union all
select '2009-01'
) months left outer join
t
on months.mon >= date_format(t.Indate, '%Y-%m') and
(months.mon <= date_format(t.OutDate, '%Y-%m') or t.OutDate is NULL)
join x on x.product_id = t.product_id and x.type = 'ForSale'
join d on d.product_id = x.product_id and d.type = 'Attribute'
group by months.mon, d.ProductAttr;
Also tried the above without the last two joins by adding subqueries for the product attribute and where/exclusion - this seems to run about the same or a bit slower:
select months.mon, count(distinct t.productID), (select ProductAttr from d where productid = t.productID and type = 'attribute' limit 1)
from (select '2008-10' as mon union all
select '2008-11' union all
select '2008-12' union all
select '2009-01'
) months left outer join
t
on months.mon >= date_format(t.Indate, '%Y-%m') and
(months.mon <= date_format(t.OutDate, '%Y-%m') or t.OutDate is NULL)
WHERE exists (select 1 from x where x.productid = t.productID and x.type = 'ForSale')
group by months.mon, d.ProductAttr;
Any ideas to make this more efficient with the additional data that I need to rely on 3 source tables in total (1 just for exclusion). Thanks in advance.
You can do this by generating a list of the months that you need. The easiest way is to do this manually in MySQL (although generating the code in Excel can make this easier).
Then use a left join and aggregation to get the information you want:
select months.mon, t.ProductAttr, count(distinct t.productID)
from (select '2008-10' as mon union all
select '2008-11' union all
select '2008-12' union all
select '2009-01'
) months left outer join
t
on months.mon >= date_format(t.Indate, '%Y-%m') and
(months.mon <= date_format(t.OutDate, '%Y-%m) or t.OutDate is NULL)
group by t months.mon, t.ProductAttr;
This version does all the comparisons as strings. You are working at the granularity of "month" and the format YYYY-MM does a good job for comparisons.
EDIT:
You do need every month that you want in the output. If you have products coming in every month, then you could do:
select months.mon, t.ProductAttr, count(distinct t.productID)
from (select distinct date_format(t.InDate, '%Y-%m') as mon
from t
) months left outer join
t
on months.mon >= date_format(t.InDate, '%Y-%m') and
(months.mon <= date_format(t.OutDate, '%Y-%m) or t.OutDate is NULL)
group by t months.mon, t.ProductAttr;
This pulls the months from the data.

Mysql query to calculate total cost

HI
I have a table listsing_prices (id,listing_id,day_from,day_to,price)
I need to calculate the total cost of an holiday in mysql becouse I need to sort the results by total cost.
EX:
VALUES IN TABLE
1 6 2011-04-27 2011-04-30 55,00
2 6 2011-05-01 2011-05-02 60,00
3 6 2011-05-03 2011-05-15 65,00
holiday from 2011-04-28 to 2011-05-05 total cost = 480
Without creating an actual table to represent every day from start date to end date, you could use mysql query variables. The first query can join to any table as long as it has as many records as days you are concerned with for the hoiday period... in this case, 8 days from April 28 to May 5. By doing a Cartesian and limiting to 8 will in essence, create a temp result set with one record per each day, starting with 2011/04/28 (your starting date).
Then, this is joined back to your pricing table that matches the date period and sums the matching price for total costs...
select
sum( pt.price ) as TotalCosts
from
( SELECT
#r:= date_add(#r, interval 1 day ) CalendarDate
FROM
(select #r := STR_TO_DATE('2011/04/28', '%Y/%m/%d')) vars,
AnyTableWithAtLeast8ays limit 8 ) JustDates,
PricesTable pt
where
JustDates.CalendarDate between pt.date_from and pt.date_to
select count(price) from listing_prices where day_from >= '2011-04-28' and day_to <= '2011-05-05'
-- This will provide a list of ids along with how many days fall between the two
SELECT a.id, DATEDIFF(DAYS, CASE WHEN day_from < '2011-04-28' THEN '2011-04-28' ELSE day_from END CASE, day_to) AS DayCount
FROM listing_prices a
WHERE '2011-04-28' BETWEEN a.date_from AND a.date_to
AND a.date_to <= '2011-05-05'
-- Based on the previous query, sum the number of days within the range
SELECT SUM( a.price * b.DayCount ) AS Total
FROM listing_prices a
JOIN ( SELECT a.id, DATEDIFF(DAYS, CASE WHEN day_from < '2011-04-28' THEN '2011-04-28' ELSE day_from END CASE, day_to) AS DayCount
FROM listing_prices a
WHERE '2011-04-28' BETWEEN a.date_from AND a.date_to
AND a.date_to <= '2011-05-05'
) b ON a.id = b.id
Please note that this is untested ... the query at the top I believe should work but if it doesn't, it can be modified and so that it does work (get the number of days within each range) and then literally copied and pasted into the subquery of the second query. The second query is the one that you will actually use.

sql query to get duplicate records with different dates

I need to get records with different date field ,
table Sites:
field id
reference
created
Every day we add lot of records, so I need to do a function that extract all records existing with duplicates of rows just was added, to do some notifications.
the conditions that i can't get is the difference between records of the current day and the old data in the table should be (one day to 4 days) .
If is there any simple query to do that without using transaction .
I'm not sure I totally understand what you mean by duplicate records, but here's a basic date query:
SELECT fieldId, reference, created, DATE(created) as the_date
FROM Sites
WHERE the_date
BETWEEN DATE( DATE_SUB( NOW() , INTERVAL 3 DAY ) )
AND DATE ( NOW() )
I'm making several assumptions such as:
You don't want the "first" row returned
Duplicates don't carry the
date forward (The next after initial 4 days is not a duplicate)
The 4 days means +4 days so Day 5 is included
So, my code is :
with originals as (
select s1.*
from sites as s1
where 0 = (
select count(*)
from sites as s2
where s1.field_id = s2.field_id
and s1.reference = s2.reference
and s1.created <> s2.created
and DATEDIFF(DAY,s2.created, s1.created) between 1 and 4
)
)
select s1.*
from sites as s1
inner join originals as o
on s1.field_id = o.field_id
and s1.reference = o.reference
and s1.created <> o.created
where DATEDIFF(DAY,o.created, s1.created) between 1 and 4
order by 1,2,3;
Here it is in a fiddle: http://sqlfiddle.com/#!3/9b407/20
This could be simpler if some conditions are relaxed.
thanks a lot for every one who tried to help me ,
i have found this solution after lot of test
SELECT `id`,`reference`,count(`config_id`) as c,`created` FROM `sites`
where datediff(date(current_date()),date(`created`)) < 4
group by `reference`
having c > 1
thanks a lot for your help