Selecting the first date after the MIN date - mysql

This site has answered many a SQL questions for me. Finally signed up to ask one of my own and get active here.
Anyways, I'm working in a table that has Date_Effective and Date_Lapse. A Client can have multiple rows so what I'm trying to get to is the number of days between a Date_Lapse and the next Date_Effective for the same client. The date values in this table are int's that I'll convert to dates later.
The below code doesn't work. It doesn't like the second value I'm joining on. Why can't I get it to give me min date_effective that's greater than each date_effective? If I run the below I just get no results because it's seeing it as there are no effective dates greater than the min effective date.
SELECT ClientID, c1.Date_Lapse, c2.Date_Effective
FROM Fact_Episodes c1
LEFT JOIN (
SELECT ClientID, min(Date_Effective) as Date_Effective
FROM Fact_Episodes
GROUP BY ClientID
) c2
ON c1.ClientID = c2.ClientID
AND c2.Date_Effective > c1.Date_Effective

If you wanted to stick with the left join, this would work.
select c1.ClientID, c1.Date_Lapse, min(c2.Date_Effective) as Date_Effective
from Fact_Episodes c1
left join Fact_Episodes c2
on c1.ClientID = c2.ClientID
and c2.Date_Effective > c1.Date_Effective
group by c1.ClientId, c1.Date_Lapse

Related

Grouping COUNT by date and id from foreign table

I need to get the count of reports made by id_type and by day in the same result set.
My current query displays the total reports for each type, but doesn't separate the reports by day as well.
SELECT DATE(report.date_insert) AS date_insert, type.name, count(report.id_type) as number_of_orders
from type
left join report
on (type.id_type = report.id_type)
group by type.id_type
As you can see, the only difference between them, is that i've changed the value for type.id_type = XX, but this is not the effective way to achieve my requirement.
Another important requirement is that, if there are no reports from an id_type in a day where at least another id_type does have reports, there should be a result with the count of zero.
I've created a fiddle with the structure and some sample data, where id_type=1 should have 0 reports, id_type=2 should have 8 reports, and id_type=3 should have 5 reports.
http://sqlfiddle.com/#!9/6ceb48/2
Thanks!
You need to join with a subquery that gets all the different dates, and then add the date to the grouping.
SELECT alldates.date_insert, type.name, IFNULL(COUNT(report.id_type), 0) AS number_of_orders
FROM (
SELECT DISTINCT DATE(date_insert) AS date_insert
FROM report) AS alldates
CROSS JOIN type
LEFT JOIN report ON type.id_type = report.id_type AND alldates.date_insert = DATE(report.date_insert)
GROUP BY alldates.date_insert, type.id_type
ORDER BY alldates.date_insert, type.name
DEMO

Generating complex sql tables

I currently have an employee logging sql table that has 3 columns
fromState: String,
toState: String,
timestamp: DateTime
fromState is either In or Out. In means employee came in and Out means employee went out. Each row can only transition from In to Out or Out to In.
I'd like to generate a temporary table in sql to keep track during a given hour (hour by hour), how many employees are there in the company. Aka, resulting table has columns HourBucket, NumEmployees.
In non-SQL code I can do this by initializing the numEmployees as 0 and go through the table row by row (sorted by timestamp) and add (employee came in) or subtract (went out) to numEmployees (bucketed by timestamp hour).
I'm clueless as how to do this in SQL. Any clues?
Use a COUNT ... GROUP BY query. Can't see what you're using toState from your description though! Also, assuming you have an employeeID field.
E.g.
SELECT fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable
INNER JOIN (SELECT employeeID AS 'empID', MAX(timestamp) AS 'latest' FROM StaffinBuildingTable GROUP BY employeeID) AS LastEntry ON StaffinBuildingTable.employeeID = LastEntry.empID
GROUP BY fromState
The LastEntry subquery will produce a list of employeeIDs limited to the last timestamp for each employee.
The INNER JOIN will limit the main table to just the employeeIDs that match both sides.
The outer GROUP BY produces the count.
SELECT HOUR(SBT.timestamp) AS 'Hour', SBT.fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable AS SBT
INNER JOIN (
SELECT SBIJ.employeeID AS 'empID', MAX(timestamp) AS 'latest'
FROM StaffinBuildingTable AS SBIJ
WHERE DATE(SBIJ.timestamp) = CURDATE()
GROUP BY SBIJ.employeeID) AS LastEntry ON SBT.employeeID = LastEntry.empID
GROUP BY SBT.fromState, HOUR(SBT.timestamp)
Replace CURDATE() with whatever date you are interested in.
Note this is non-optimal as it calculates the HOUR twice - once for the data and once for the group.
Again you are using the INNER JOIN to limit the number of returned row, this time to the last timestamp on a given day.
To me your description of the FromState and ToState seem the wrong way round, I'd expect to doing this based on the ToState. But assuming I'm wrong on that the following should point you in the right direction:
First, I create a "Numbers" table containing 24 rows one for each hour of the day:
create table tblHours
(Number int);
insert into tblHours values
(0),(1),(2),(3),(4),(5),(6),(7),
(8),(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),(23);
Then for each date in your employee logging table, I create a row in another new table to contain your counts:
create table tblDailyHours
(
HourBucket datetime,
NumEmployees int
);
insert into tblDailyHours (HourBucket, NumEmployees)
select distinct
date_add(date(t.timeStamp), interval h.Number HOUR) as HourBucket,
0 as NumEmployees
from
tblEmployeeLogging t
CROSS JOIN tblHours h;
Then I update this table to contain all the relevant counts:
update tblDailyHours h
join
(select
h2.HourBucket,
sum(case when el.fromState = 'In' then 1 else -1 end) as cnt
from
tblDailyHours h2
join tblEmployeeLogging el on
h2.HourBucket >= el.timeStamp
group by h2.HourBucket
) cnt ON
h.HourBucket = cnt.HourBucket
set NumEmployees = cnt.cnt;
You can now retrieve the counts with
select *
from tblDailyHours
order by HourBucket;
The counts give the number on site at each of the times displayed, if you want during the hour in question, we'd need to tweak this a little.
There is a working version of this code (using not very realistic data in the logging table) here: rextester.com/DYOR23344
Original Answer (Based on a single over all count)
If you're happy to search over all rows, and want the current "head count" you can use this:
select
sum(case when t.FromState = 'In' then 1 else -1) as Heads
from
MyTable t
But if you know that there will always be no-one there at midnight, you can add a where clause to prevent it looking at more rows than it needs to:
where
date(t.timestamp) = curdate()
Again, on the assumption that the head count reaches zero at midnight, you can generalise that method to get a headcount at any time as follows:
where
date(t.timestamp) = "CENSUS DATE" AND
t.timestamp <= "CENSUS DATETIME"
Obviously you'd need to replace my quoted strings with code which returned the date and datetime of interest. If the headcount doesn't return to zero at midnight, you can achieve the same by removing the first line of the where clause.

Finding First Appearing Value in a List of Duplicate Values

I have a table that stores the statuses an applications goes through. Some applications go through the same status multiple times. Each time it goes through a status, the time of the status change is recorded.
How can I pull a list of applications based on the first time applications goes through a particular status within a specified date range. Below is what I have tried thus far:
SELECT d1.STATUS,
d1.APPL_ID
FROM APP_STATUS d1
LEFT JOIN APP_STATUS d2 ON d1.APPL_ID = d2.APPL_ID AND d1.STATUS = 'AT_CUSTOMER' AND d2.STATUS = 'AT_CUSTOMER'
WHERE DATE(d1.STATUS_CREATE_DT) >= '2014-10-26'
AND DATE(d1.STATUS_CREATE_DT) <= '2014-11-25'
AND d2.STATUS IS NULL
GROUP BY d1.APPL_ID;
To get the first time a status goes through, try this query:
select a.appl_id, min(status_create_dt) as first_dt
from ap_status
where d.STATUS_CREATE_DT >= '2014-10-26' and
d.STATUS_CREATE_DT < date('2014-11-25') + interval 1 day and
d2.STATUS IS NULL
group by a.appl_id;
I think this does what you need. If you want more columns, then you can join this back to ap_status.
Note that I changed the date logic a bit. The date functions are only on the constant side of the dates. This allows the query to take advantage of an index on STATUS_CREATE_DT, if appropriate.

MySQL cumulative sum grouped by date

I know there have been a few posts related to this, but my case is a little bit different and I wanted to get some help on this.
I need to pull some data out of the database that is a cumulative count of interactions by day. currently this is what i have
SELECT
e.Date AS e_date,
count(e.ID) AS num_interactions
FROM example AS e
JOIN example e1 ON e1.Date <= e.Date
GROUP BY e.Date;
The output of this is close to what I want but not exactly what I need.
The problem I'm having is the dates are stored with the hour minute and second that the interaction happened, so the group by is not grouping days together.
This is what the output looks like.
On 12-23 theres 5 interactions but its not grouped because the time stamp is different. So I need to find a way to ignore the timestamp and just look at the day.
If I try GROUP BY DAY(e.Date) it groups the data by the day only (i.e everything that happened on the 1st of any month is grouped into one row) and the output is not what I want at all.
GROUP BY DAY(e.Date), MONTH(e.Date) is splitting it up by month and the day of the month, but again the count is off.
I'm not a MySQL expert at all so I'm puzzled on what i'm missing
New Answer
At first, I didn't understand you were trying to do a running total. Here is how that would look:
SET #runningTotal = 0;
SELECT
e_date,
num_interactions,
#runningTotal := #runningTotal + totals.num_interactions AS runningTotal
FROM
(SELECT
DATE(eDate) AS e_date,
COUNT(*) AS num_interactions
FROM example AS e
GROUP BY DATE(e.Date)) totals
ORDER BY e_date;
Original Answer
You could be getting duplicates because of your join. Maybe e1 has more than one match for some rows which is inflating your count. Either that or the comparison in your join is also comparing the seconds, which is not what you expect.
Anyhow, instead of chopping the datetime field into days and months, just strip the time from it. Here is how you do that.
SELECT
DATE(e.Date) AS e_date,
count(e.ID) AS num_interactions
FROM example AS e
JOIN example e1 ON DATE(e1.Date) <= DATE(e.Date)
GROUP BY DATE(e.Date);
I figured out what I needed to do last night... but since I'm new to this I couldn't post it then... what I did that worked was this:
SELECT
DATE(e.Date) AS e_date,
count(e.ID) AS num_daily_interactions,
(
SELECT
COUNT(id)
FROM example
WHERE DATE(Date) <= e_date
) as total_interactions_per_day
FROM example AS e
GROUP BY e_date;
Would that be less efficient than your query? I may just do the calculation in python after pulling out the count per day if its more efficient, because this will be on the scale of thousands to hundred of thousands of rows returned.

Get count on two different date columns and group by date

I have table containing two DATE columns. TS_customer and TS_verified
I am searching for a way to get a result where in the first column I have dates where either someone created a user (TS_customer) or someone got verified (TS_verified).
In the second column I want count(TS_customer) grouped by the first column.
The third column I want count(TS_verified) grouped by the first column.
It might be 0 customers verified on a sign up date, and in another case 0 signups on a date someone got verified.
I guess it should be an easy one, but I've spent so many hours on it now. Would really appreciate some help. I need this for a graph in excel, so i basicly want how many customers signed up and how many got verified one day without having the hassle to have two selects and combinding them manually.
EDIT: link to SQLfiddle http://sqlfiddle.com/#!2/b14fc/1/0
Thanks
First, we need the list of days.
That looks like this http://sqlfiddle.com/#!2/b14fc/14/0:
SELECT DISTINCT days
FROM (
SELECT DISTINCT DATE(TS_customer) days
FROM customer
UNION
SELECT DISTINCT DATE(TS_verified) days
FROM customer
) AS alldays
WHERE days IS NOT NULL
ORDER BY days
Next we need a summary of customer counts by day. That's pretty easy http://sqlfiddle.com/#!2/b14fc/16/0:
SELECT DATE(TS_customer) days, COUNT(TS_customer)
FROM customer
GROUP BY days
The summary of verifications by day is similarly easy.
Next we need to join these three subqueries together http://sqlfiddle.com/#!2/b14fc/29/0.
SELECT alldays.days, custcount, verifycount
FROM (
SELECT DISTINCT DATE(TS_customer) days
FROM customer
UNION
SELECT DISTINCT DATE(TS_verified) days
FROM customer
) AS alldays
LEFT JOIN (
SELECT DATE(TS_customer) days, COUNT(TS_customer) custcount
FROM customer
GROUP BY days
) AS cust ON alldays.days = cust.days
LEFT JOIN (
SELECT DATE(TS_verified) days, COUNT(TS_verified) verifycount
FROM customer
GROUP BY days
) AS verif ON alldays.days = verif.days
WHERE alldays.days IS NOT NULL
ORDER BY alldays.days
Finally, if you want 0 displayed rather than (null) for days when there weren't any customers and/or verifications, change the SELECT line to this http://sqlfiddle.com/#!2/b14fc/30/0.
SELECT alldays.days,
IFNULL(custcount,0) AS custcount,
IFNULL(verifycount,0) AS verifycount
See how that goes? We build up your result set step by step.
I'm a bit confused on why you created a fiddle that can not hold null values on the TS_Customer and then mention that the field can hold null values.
Having said that, I've modified the solution to work with null values and still be pretty efficient and simple:
SELECT days, sum(custCount) custCount, sum(verifCount) verifCount FROM (
SELECT DATE(TS_customer) days, count(*) custCount, 0 verifCount
FROM customer
WHERE TS_customer IS NOT NULL
GROUP BY days
UNION ALL
SELECT DATE(TS_verified) days, 0, count(*)
FROM customer
WHERE TS_verified IS NOT NULL
GROUP BY days
) s
GROUP BY days
I've also created a different fiddle containing some null values here.