MySQL - Count Yearly Totals when some Years have nulls - mysql

I have 1 table with similar data:
CustomerID | ProjectID | DateListed | DateCompleted
123456 | 045 | 07-29-2010 | 04-03-2011
123456 | 123 | 10-12-2011 | 11-30-2011
123456 | 157 | 12-12-2011 | 02-10-2012
123456 | 258 | 06-07-2011 | NULL
Basically, a customer contacts us, we get a project on our list, and we mark it completed when we're done with it.
What I'm after is a simple (you'd think, at least) count of all projects, with expected output like below:
YEAR | TotalListed | TotalCompleted
2010 | 1 | 0
2011 | 3 | 2
2012 | 0 | 1
However, my query below - because of the join - isn't showing 2012's count, because there's been no listed project for 2012. However, I can't really reverse the query, as then 2010's count wouldn't show up (since nothing was completed in 2010).
I'm open to any suggestions, or tips like how to do this. I've pondered a temp table, is that the best way to go? I'm open to anything that gets me what I need!
(If the code looks familiar, ya'll helped me get the subquery made! MySQL Subquery with main query data variable)
SELECT YEAR(p1.DateListed) AS YearListed, COUNT(p1.ProjectID) As Listed, PreQuery.Completed
FROM(
SELECT YEAR(DateCompleted) AS YearCompleted, COUNT(ProjectID) AS Completed
FROM projects
WHERE CustomerID = 123456 AND DateListed >= DATE_SUB(Now(), INTERVAL 5 YEAR)
GROUP BY YEAR(DateCompleted)
) PreQuery
RIGHT OUTER JOIN projects p1 ON PreQuery.YearCompleted = YEAR(p1.DateListed)
WHERE CustomerID = 123456 AND DateListed >= DATE_SUB(Now(), INTERVAL 5 YEAR)
GROUP BY YearListed
ORDER BY p1.DateListed

After reviewing your table, query, and expected results - I believe I have found a more-revised query to suit your needs. It is a fairly-full rewrite of your existing query though, but I've tested it with your given data and received the same results you want/expect:
SELECT
years.`year`,
SUM(IF(YEAR(DateListed) = years.`year`, 1, 0)) AS TotalListed,
SUM(IF(YEAR(DateCompleted) = years.`year`, 1, 0)) AS TotalCompleted
FROM
projects
LEFT JOIN (
SELECT DISTINCT `year` FROM (
SELECT YEAR(DateListed) AS `year` FROM projects
UNION SELECT YEAR(DateCompleted) AS `year` FROM projects WHERE DateCompleted IS NOT NULL
) as year_inner
) AS years
ON YEAR(DateListed) = `year`
OR YEAR(DateCompleted) = `year`
WHERE
CustomerID = 123456 AND DateListed >= DATE_SUB(Now(), INTERVAL 5 YEAR)
GROUP BY
years.`year`
ORDER BY
years.`year`
To explain, we should start with the inner query (aliased as year_inner). It selects a full list of years in the DateListed and DateCompleted columns and then selects a DISTINCT list of those to create the years alias sub-query. This sub-query is used to get a full list of "years" that we want data for. Doing it this way, opposed to a sub-query with counts and groupings will allow you to only have to define the WHERE clause on the outermost query (though, if efficiency becomes an issue with thousands and thousands of records, you could always add a WHERE clause to the inner query too; or an index to the date columns).
After we've built our inner queries, we join the projects table on the results with a LEFT JOIN for the DateListed or DateCompleted's YEAR() value - which will allow us to bring back null columns too!
For the field selections, we use the year column from our inner query to assure that we get a full list of years to display. Then, we compare the current row's DateListed & DateCompleted YEAR() value to the current year; if they're equal, add 1 - else add 0. When we GROUP BY year, our SUM() will count all of the 1's for that year for each column and give you the output you want (hopefully, of course =P).

Related

MySQL Query - data not showing as expected, problem in code

I am trying to query a database for the number of individuals who did not arrive for their booking on a given date. However, the results given are not as expected.
From manual checking, the results for 3rd May 2021 should be displayed as 3. I have a feeling that the customer id's are being added together with the result being displayed rather than just the count of individual customer id's.
select
count(c.CUSTOMER_ID) AS 'No Shows',
date(checkins.POSTDATE) as date
from
customers c, checkins
where
checkins.postdate >= date_sub(curdate(), interval 7 day)
and
(
c.archived = 0
and (
(
(
(
(
(
c.GUID in (
select
sb1.customer_guid
from
schedule_bookings sb1
join schedule_events se1 on sb1.course_guid = se1.course_guid
and sb1.OFFERING_ID in (
'2915911', '3022748', '3020740', '2915949',
'2914398', '2916147', '3022701',
'3020699', '2916185', '2915168',
'2916711', '3022403', '3020455',
'2916785', '2916478', '2915508',
'3022538', '3020582', '2915994',
'2914547', '2916069', '3022648',
'3020658', '2916107', '2915290',
'2928786', '2914729', '3022854',
'3020812', '2914694', '2914659',
'3041801', '2920756', '2920834',
'2920795', '2916223', '3022788',
'3020783', '2916239', '2915013'
)
and sb1.CANCELLED in ('0')
)
)
or (
c.GUID in (
select
sp.customer_guid
from
schedule_participants sp
join schedule_bookings sb2 on sp.BOOKING_ID = sb2.BOOKING_ID
join schedule_events se2 on sb2.course_guid = se2.course_guid
and sb2.OFFERING_ID in (
'2915911', '3022748', '3020740', '2915949',
'2914398', '2916147', '3022701',
'3020699', '2916185', '2915168',
'2916711', '3022403', '3020455',
'2916785', '2916478', '2915508',
'3022538', '3020582', '2915994',
'2914547', '2916069', '3022648',
'3020658', '2916107', '2915290',
'2928786', '2914729', '3022854',
'3020812', '2914694', '2914659',
'3041801', '2920756', '2920834',
'2920795', '2916223', '3022788',
'3020783', '2916239', '2915013'
)
and sb2.CANCELLED in ('0')
)
)
)
)
)
and (
(
(
not (
(
(
select
count(CHECKIN_ID)
from
checkins
where
checkins.CUSTOMER_ID = c.CUSTOMER_ID
) between 1
and 9999
)
)
)
)
)
)
)
and not c.customer_id in (1008, 283429, 2507795)
)
group by date(checkins.POSTDATE)
Here are the results:
+----------+------------+
| No Shows | date |
+----------+------------+
| 30627 | 2021-04-27 |
| 37638 | 2021-04-28 |
| 34071 | 2021-04-29 |
| 33579 | 2021-04-30 |
| 29274 | 2021-05-01 |
| 30135 | 2021-05-02 |
| 48339 | 2021-05-03 |
| 8979 | 2021-05-04 |
+----------+------------+
8 rows in set (8.71 sec)
As you can see, the count is nowhere near as intended.
The query parameters are:
Customer is a participant/bookee on the listed specific offerings (offering_id)
Customer's 'Check-in' count was not between 1 and 9999.
Display these results by count per date.
Can anyone see why this query would be not displaying the results as intended?
Kind Regards
Tom
Lets try to reverse this out some. You are dealing with a very finite set of Offering IDs. How about something like starting with the finite list of offerings you are concerned with and join on from that. Additionally, there does not appear to be any need for the join to the schedule events table. If something is booked, its booked. You are never getting any additional context from the event itself.
So, lets start with a very simplified union. You are looking at the bookings table for the possible customer IDs. Then from the actual participants for those same bookings. My GUESS is not every person doing the actual booking may be a participant, likewise, all participants may not be the booking party.
None of this has to do with the actual final customer, archive status or even the events for the booking. We are just getting people - period. Once you have the people and dates, then get the counts.
select
date(CI.POSTDATE) as date,
count( JustCustomers.customer_guid ) AS 'No Shows'
from
(
select
sb1.customer_guid
from
schedule_bookings sb1
where
sb1.CANCELLED = 0
-- if "ID" are numeric, dont use quotes to imply character
and sb1.OFFERING_ID in
( 2915911, 3022748, 3020740, 2915949,
2914398, 2916147, 3022701, 3020699,
2916185, 2915168, 2916711, 3022403,
3020455, 2916785, 2916478, 2915508,
3022538, 3020582, 2915994, 2914547,
2916069, 3022648, 3020658, 2916107,
2915290, 2928786, 2914729, 3022854,
3020812, 2914694, 2914659, 3041801,
2920756, 2920834, 2920795, 2916223,
3022788, 3020783, 2916239, 2915013
)
UNION
select
sp.customer_guid
from
schedule_bookings sb2
JOIN schedule_participants sp
on sb2.BOOKING_ID = sp.BOOKING_ID
where
sb2.CANCELLED = 0
and sb2.OFFERING_ID in
( 2915911, 3022748, 3020740, 2915949,
2914398, 2916147, 3022701, 3020699,
2916185, 2915168, 2916711, 3022403,
3020455, 2916785, 2916478, 2915508,
3022538, 3020582, 2915994, 2914547,
2916069, 3022648, 3020658, 2916107,
2915290, 2928786, 2914729, 3022854,
3020812, 2914694, 2914659, 3041801,
2920756, 2920834, 2920795, 2916223,
3022788, 3020783, 2916239, 2915013
)
) JustCustomers
JOIN customers c
on JustCustomers.customer_guid = c.customer_id
AND c.archived = 0
AND NOT c.customer_id IN (1008, 283429, 2507795)
JOIN checkins CI
on c.CUSTOMER_ID = CI.CUSTOMER_ID
AND CI.postdate >= date_sub(curdate(), interval 7 day)
group by
date(ci.POSTDATE)
The strange thing I notice though is that you are looking for "No shows", but explicitly looking for those people who DID check in. Now, if you are looking for all people who WERE SUPPOSED to be at a given event, then you are probably looking for where the customer DID NOT check in. If that is the intended case, there would be no check-in date to be associated. If that is the case, I would expect a date in some table such as the EVENT Date... such as going on a cruise, the event is when the cruise is, regardless of who makes it to the ship.
If I am way off, I would suggest you edit your existing post, provide additional detail / clarification.

group by year on multiple date columns mysql

I have table as following:
hours | ... | task_assigned | task_deadline | task_completion
----------------------------------------------------------------
123 | ... | 2019-08-01 | - | -
234 | ... | - | 2018-08-01 | 2019-08-01
145 | ... | 2017-08-01 | 2017-08-01 | 2018-01-01
I want to calculate total hours for each year, i.e. grouping by year.
Currently I'm only taking into account task_completion field.
If there's no value in task_completion field, the record is not included in SUM calculation.
To elaborate further, say for year 2019, row 1 and 1 both should be considered. Hence the total hours should be 123 + 234 = 357.
And for year 2018, row 2 and 3.
Similarly, for year 2017, row 3.
SELECT YEAR(task_completion) as year, ROUND(SUM(total_hours), 2) as hours
FROM task
GROUP BY year
HAVING year BETWEEN '$year_from' AND '$year_to'
The resultset:
year | hours
--------------------
2017 | <somevalue>
2018 | <somevalue>
2019 | <somevalue>
How can I include other two date fields too?
You want to consider each row once for each of its years. Use UNION to get these years:
select year, round(sum(total_hours), 2) as hours
from
(
select year(task_assigned) as year, total_hours from task
union
select year(task_deadline) as year, total_hours from task
union
select year(task_completion) as year, total_hours from task
) years_and_hours
group by year
having year between $year_from and $year_to
order by year;
If you want to consider a row with one year twice or thrice also as often in the sum, then change UNION to UNION ALL.
Basically, you want to unpivot the data. I will assume that the - represents a NULL value and your dates are real dates.
select year(dte) as year, sum(total_hours) as hours
from ((select task_assigned as dte, total_hours
from task
) union all
(select task_deadline, total_hours
from task
) union all
(select task_completion, total_hours
from task
)
) d
where dte is not null
group by year(dte)
order by year(dte);
Based on your sample data, the round() is not necessary so I removed it.
If you want to filter for particular years, the filtering should be in a where clause -- so it filters the data before aggregation.
Change the where to:
where year(dte) >= ? and year(dte) <= ?
or:
where dte >= ? and dte <= ?
to pass in the dates.
The ? are for parameter placeholders. Learn how to use parameters rather than munging query strings.
This answer is no langer valid with the updated request.
If I understand correctly, you want to use task_assigned if the task_completion is still null. Use COALEASCE for this.
SELECT
YEAR(COALESCE(task_completion, task_assigned)) as year,
ROUND(SUM(total_hours), 2) as hours
FROM task
GROUP BY year
HAVING year BETWEEN $year_from AND $year_to
ORDER BY year;
(I don't think you actually want to use task_deadline, too, for how could a task get completed before getting assigned first? If such can occur, then include it in the COALESCE expression. Probably: COALESCE(task_completion, task_assigned, task_deadline)` then.)

merging two queries

I have a table of recipes, and I want to show a weekly value for each of them. The values are votes cast for them. My problem is that I want to make an excel-like table with all available fridays on my db, add a column for each recipe, and put it's value for the friday on that column, if any value exists.
Now apparently the easiest join doesn't work so I wrote two queries: one to get all ids for my recipes and one for the values to show. The first (MySql) query is just a select id from recipes, the second is like this:
select d.date,perc from
(SELECT date FROM weekly where YEAR(date)=2014 group by date) as d
left join weekly on d.date = weekly.date and weekly.id_rec= :idrec
Any idea how to merge those two queries? Running two queries makes everything slow down, but when I tried to merge them I didn't get the correct results.
Data:
sql fiddle
The result should be something like:
Dates | Recipe A | Recipe B | ...
Date 1 | 0.005 | 0.11 |
Date 2 | 0 | 0 |
Date 3 | 0 | 0.1 |
Note that Date 2 doesn't exist for Recipe A and B, but for some other do.
You should be able to merge the two queries like this:
SELECT recipes.id, votes.date, votes.perc FROM recipes
RIGHT JOIN
(select weekly.id_rec, d.date, perc from
(SELECT weekly.id_rec, date FROM weekly where YEAR(date) = 2014 group by date) as d left join weekly on d.date = weekly.date) as votes
ON votes.id_rec = recipes.id
SQL Fiddle

mysql nested query with 4 queries

I have an issue with mysql queries,
in fact, I have to build a web page with charts, and I need to fetch the data from the database as follow:
1- Get the total of data received per month, per centre ( it mean country ) for the current year,
2- Get the total of data wich has NOT been done per month, per centre for the current year,
3- Get the total of data which has not been done AND the date exceed 20 days , per month, per centre for the current year.
So, all in all, I'm able to fetch the data for all thoses queries, no problem about that.
The issue I am facing is, I need those queries embedded into 1 single query returning me a table like that:
| monthname | total | totalNotDone | totalExceed20Days |
| January | 52 | 3 | 1 |
| February | 48 | 4 | 0 |
| March | 54 | 1 | 3 |
etc.
Here is a sqlfiddle showing the issue :
edited : http://sqlfiddle.com/#!2/8cc9c/1
Any help would be greatly appreciated guys, I'm really stuck.
Your basic queries are fine. What you need to do is treat each of them as a virtual table, and LEFT JOIN them together. Then your toplevel SELECT can choose the appropriate values for your overall table.
SELECT afftotal.date,
afftotal.centre_id,
afftotal.total AS total,
af20.total AS total_20,
afempty.total AS total_empty
FROM (
/* select total of affaires per month and per centre for this year */
select month(aff_date) AS `date`,
centre_id,
count(*) AS `total`
from affaires
where year(aff_date) = 2014
group by month(aff_date), centre_id
) AS afftotal
LEFT JOIN (
/* select total of affaires per month and per centre for this year where the affaire has been done
before 20 days. */
select month(`affaires`.`aff_date`) AS `date`,
centre_id,
count(*) AS `total`
from `affaires`
where year(`affaires`.`aff_date`) = 2014
and DATEDIFF(`affaires`.`aff_date`, `affaires`.`date_creation_pdl`) > 20
group by monthname(`affaires`.`aff_date`), centre_id
) AS af20 ON afftotal.date = af20.date
AND afftotal.centre_id = af20.centre_id
LEFT JOIN (
/* select total of affaires where the date_creation_pdl is empty */
select month(affaires.aff_date) as `date`,
centre_id,
count(*) as total
from affaires
where date_creation_pdl is null
and year(affaires.aff_date) = 2014
group by monthname(affaires.aff_date)
) AS afempty ON afftotal.date = afempty.date
AND afftotal.centre_id = afempty.centre_id
ORDER BY afftotal.centre_id, afftotal.date
http://sqlfiddle.com/#!2/d563e/24/0
Notice that this is summarizing both by centre_id and date, so you can get all the centre_id values in a single query.
Notice also that the ORDER BY clause is placed at the end of the whole query.
What you have are three subqueries, three virtual tables if you will, each with three columns: a date, a centre_id, and a total. You LEFT JOIN them together ON two of those columns.
I had to muck around with your queries a bit to make them have similar column names and column data formats, so the LEFT JOIN operations have a regular structure.

MySQL LEFT JOIN SUM doen not include 0

Guess there are many varianat of this question, however this has a twist.
My primary table contains logged kilometers for certain dates for certain users:
Table km_run:
|entry|mnumber|dato |km | where 'dato' is the specific date. Formats are like:
|1 |3 |2013-01-01|5.7|
For a specific user ('mnumber') I want to calculate the sum in each week of a year. For this purpose I have made a 'dummy-table' just containing the week numbers from 1 to 53:
Table `week_list`:
|week|
|1 |
|2 |
etc..
This query gives the sum, however I cannot find a way to return a zero if there are no entries in 'km_run' for the specific week.
SELECT `week_list`.`week`, WEEKOFYEAR(`km_run`.`dato`), SUM(`km_run`.`km`)
FROM `week_list` LEFT JOIN `km_run` ON WEEKOFYEAR(`dato`) = `week_list`.`week`
WHERE `km_run`.`mnumber` = 3 AND `km_run`.`dato` >= '2013-01-01'
AND `km_run`.`dato` < '2014-01-01'
GROUP BY WEEKOFYEAR(`dato`)
I have tried to do COALESCE( SUM(km),0) and I have also tried to use the IFNULL function around the sum. Despite the left join, not all records from week_list are returned in the sql statement.
Here's the result:
week | WEEKOFYEAR(`km_run`.`dato`) | SUM(`km_run`.`km`)
1 | 1 | 58.4
3 | 3 | 50.7
4 | 4 | 39.2
As you can see, week two is skipped instead of returning a 0
Firstly JOIN works, creating such rows:
week=2 weekofyear=null mnumber=null sum=0 ...
Then, WHERE clause (for example, where mnumber=3) excludes rows with nulls.
You could try something like this:
SELECT week, SUM(km) FROM (
(SELECT km_run.km AS km, WEEKOFYEAR(km_run.dato) AS week
FROM km_run
WHERE mnumber = 3 AND km_run.dato >= '2013-01-01' AND km_run.dato < '2014-01-01')
UNION
(SELECT 0 AS km, week_list.week as week FROM week_list)
) GROUP BY week