I'm in trouble with a mysql statement counting appointments for one day within a given time period. I've got a calendar table including starting and finishing column (type = DateTime). The following statement should count all appointments for November including overall appointments:
SELECT
COUNT('APPOINTMENTS') AS Count,
DATE(c.StartingDate) AS Datum
FROM t_calendar c
WHERE
c.GUID = 'blalblabla' AND
((DATE(c.StartingDate) <= DATE('2012-11-01 00:00:00')) AND (DATE(c.EndingDate) >= DATE('2012-11-30 23:59:59'))) OR
((DATE(c.StartingDate) >= DATE('2012-11-01 00:00:00')) AND (DATE(c.EndingDate) <= DATE('2012-11-30 23:59:59')))
GROUP BY DATE(c.StartingDate)
HAVING Count > 1
But how to include appointments that starts before a StartingDate and ends on the StartingDate?
e.g.
StartingDate = 2012-11-14 17:00:00, EndingDate = 2012-11-15 08:00:00
StartingDate = 2012-11-15 09:00:00, EndingDate = 2012-11-15 10:00:00
StartingDate = 2012-11-15 11:00:00, EndingDate = 2012-11-15 12:00:00
My statement returns a count of 2 for 15th of November. But that's wrong because the first appointment is missing. How to include these appointments? What I am missing, UNION SELECT, JOIN, sub selection?
A possible solution?
SELECT
c1.GUID, COUNT('APPOINTMENTS') + COUNT(DISTINCT c2.ANYFIELD) AS Count,
DATE(c1.StartingDate) AS Datum,
COUNT(DISTINCT c2.ANYFIELD)
FROM
t_calendar c1
LEFT JOIN
t_calendar c2
ON
c2.ResourceGUID = c1.ResourceGUID AND
(DATE(c2.EndingDate) = DATE(c1.StartingDate)) AND
(DATE(c2.StartingDate) < DATE(c1.StartingDate))
WHERE
((DATE(c1.StartingDate) <= DATE('2012-11-01 00:00:00')) AND (DATE(c1.EndingDate) >= DATE('2012-11-30 23:59:59'))) OR
((DATE(c1.StartingDate) >= DATE('2012-11-01 00:00:00')) AND (DATE(c1.EndingDate) <= DATE('2012-11-30 23:59:59')))
GROUP BY
c1.ResourceGUID,
DATE(c1.StartingDate)
First: Consolidate range checking
First of all your two range where conditions can be replaced by a single one. And it also seems that you're only counting appointments that either completely overlap target date range or are completely contained within. Partially overlapping ones aren't included. Hence your question about appointments that end right on the range starting date.
To make where clause easily understandable I'll simplify it by using:
two variables to define target range:
rangeStart (in your case 1st Nov 2012)
rangeEnd (I'll rather assume to 1st Dec 2012 00:00:00.00000)
won't be converting datetime to dates only (using date function) the way that you did, but you can easily do that.
With these in mind your where clause can be greatly simplified and covers all appointments for given range:
...
where (c.StartingDate < rangeEnd) and (c.EndingDate >= rangeStart)
...
This will search for all appointments that fall in target range and will cover all these appointment cases:
start end
target range |==============|
partial front |---------|
partial back |---------|
total overlap |---------------------|
total containment |-----|
Partial front/back may also barely touch your target range (what you've been after).
Second: Resolving the problem
Why you're missing the first record? Simply because of your having clause that only collects those groups that have more than 1 appointment starting on a given day: 15th Nov has two, but 14th has only one and is therefore excluded because Count = 1 and is not > 1.
To answer your second question what am I missing is: you're not missing anything, actually you have too much in your statement and needs to simplified.
Try this statement instead that should return exactly what you're after:
select count(c.GUID) as Count,
date(c.StartingDate) as Datum
from t_calendar c
where (c.GUID = 'blabla') and
(c.StartingDate < str_to_date('2012-12-01', '%Y-%m-%d') and
(c.EndingDate >= str_to_date('2012-11-01', '%Y-%m-%d'))
group by date(c.StartingDate)
I used str_to_date function to make string to date conversion more safe.
I'm not really sure why you included having in your statement, because it's not really needed. Unless your actual statement is more complex and you only included part that's most relevant. In that case you'll likely have to change it to:
having Count > 0
Getting appointment count per day in any given date range
There are likely other ways as well but the most common way would be using a numbers or ?calendar* table that gives you the ability to break a range into individual points - days. They you have to join your appointments to this numbers table and provide results.
I've created a SQLFiddle that does the trick. Here's what it does...
Suppose you have numbers table Num with numbers from 0 to x. And appointments table Cal with your records. Following script created these two tables and populates some data. Numbers are only up to 100 which is enough for 3 months worth of data.
-- appointments
create table Cal (
Id int not null auto_increment primary key,
StartDate datetime not null,
EndDate datetime not null
);
-- create appointments
insert Cal (StartDate, EndDate)
values
('2012-10-15 08:00:00', '2012-10-20 16:00:00'),
('2012-10-25 08:00:00', '2012-11-01 03:00:00'),
('2012-11-01 12:00:00', '2012-11-01 15:00:00'),
('2012-11-15 10:00:00', '2012-11-16 10:00:00'),
('2012-11-20 08:00:00', '2012-11-30 08:00:00'),
('2012-11-30 22:00:00', '2012-12-05 00:00:00'),
('2012-12-01 05:00:00', '2012-12-10 12:00:00');
-- numbers table
create table Nums (
Id int not null primary key
);
-- add 100 numbers
insert into Nums
select a.a + (10 * b.a)
from (select 0 as a union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9) as a,
(select 0 as a union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9) as b
Now what you have to do now is
Select a range of days which you do by selecting numbers from Num table and convert them to dates.
Then join your appointments to those dates so that those appointments that fall on particular day are joined to that particular day
Then just group all these appointments per each day and get results
Here's the code that does this:
-- just in case so comparisons don't trip over
set names 'latin1' collate latin1_general_ci;
-- start and end target date range
set #s := str_to_date('2012-11-01', '%Y-%m-%d');
set #e := str_to_date('2012-12-01', '%Y-%m-%d');
-- get appointment count per day within target range of days
select adddate(#s, n.Id) as Day, count(c.Id) as Appointments
from Nums n
left join Cal c
on ((date(c.StartDate) <= adddate(#s, n.Id)) and (date(c.EndDate) >= adddate(#s, n.Id)))
where adddate(#s, n.Id) < #e
group by Day;
And this is the result of this rather simple select statement:
| DAY | APPOINTMENTS |
-----------------------------
| 2012-11-01 | 2 |
| 2012-11-02 | 0 |
| 2012-11-03 | 0 |
| 2012-11-04 | 0 |
| 2012-11-05 | 0 |
| 2012-11-06 | 0 |
| 2012-11-07 | 0 |
| 2012-11-08 | 0 |
| 2012-11-09 | 0 |
| 2012-11-10 | 0 |
| 2012-11-11 | 0 |
| 2012-11-12 | 0 |
| 2012-11-13 | 0 |
| 2012-11-14 | 0 |
| 2012-11-15 | 1 |
| 2012-11-16 | 1 |
| 2012-11-17 | 0 |
| 2012-11-18 | 0 |
| 2012-11-19 | 0 |
| 2012-11-20 | 1 |
| 2012-11-21 | 1 |
| 2012-11-22 | 1 |
| 2012-11-23 | 1 |
| 2012-11-24 | 1 |
| 2012-11-25 | 1 |
| 2012-11-26 | 1 |
| 2012-11-27 | 1 |
| 2012-11-28 | 1 |
| 2012-11-29 | 1 |
| 2012-11-30 | 2 |
Related
I have a table that includes a field with dates (call it date) and a field with a cumulative running total (call it X) | call it table SAMPLE.
***My data in field DATE does not include weekends and holidays.
I can find the delta in the numbers from day to day by simply subtracting any chosen value in "X" and subtracting that from the row above.
Here's my current query:
select
date,
a.X - b.X as 'Daily Total'
from SAMPLE as a
left join SAMPLE as b
on b.date = if(weekday(a.date) = 0 , a.date - interval 3 day, a.date- interval 1 day);
The problem is that the above values work until I hit dates with holidays. If Monday is a holiday, then the values return null because a.date - interval 1 day will not exist. What's the best way to go about solving the holidays issue?
the below are the current results:
+------------+---------------+
| date | X |
+------------+---------------+
| 2018-03-26 | -40105.00 |
| 2018-03-27 | 28470.00 |
| 2018-03-28 | 5265.00 |
| 2018-03-29 | -23010.00 |
| 2018-04-02 | NULL |
| 2018-04-03 | -24830.00 |
| 2018-04-04 | -21970.00 |
| 2018-04-05 | -9620.00 |
| 2018-04-06 | 36465.00 |
Thanks in advance!!
I will sort the table by date then assign a sequence or series of numbers from 1 to n. I will then subtract the value of current row from the previous row except the first row. For first row, i will copy the value X.
select rnk2.`date`,
case when rnk1.r1=1 and rnk2.r2=1 then rnk1.X else rnk2.X-rnk1.X end as 'Daily Total'
from (
select `date`,X,#r1:=#r1+1 as r1
from samples, (select #r1:=0) a
order by `date` ) rnk1
inner join
(select `date`,X,#r2:=#r2+1 as r2
from samples, (select #r2:=0) b
order by `date`) rnk2
on (rnk1.r1=1 and rnk2.r2=1) or (rnk1.r1+1=rnk2.r2)
order by rnk2.`date`
I have a table to register users logs every one minute and other activities using DateTime for each user_id
This is a sample data of my table
id | user_id | log_datetime
------------------------------------------
1 | 1 | 2016-09-25 13:01:08
2 | 1 | 2016-09-25 13:04:08
3 | 1 | 2016-09-25 13:07:08
4 | 1 | 2016-09-25 13:10:08
5 | 2 | 2016-09-25 13:11:08
6 | 1 | 2016-09-25 13:13:08
7 | 2 | 2016-09-25 13:13:09
8 | 2 | 2016-09-25 13:14:10
I would like to calculate the total active time on the system
UPDATE: Expected Output
For Example user_id 1 his total available time should be 00:12:00
Since his hours and seconds are same so I'll just subtract last log from previous then previous from next previous and so on then I'll sum all subtracted values
this a simple for
Simply I want to loop through the data from last record to first record with in my range
this is a simple formula I hope that make my question clear
SUM((T< n > - T< n-1 >) + (T< n-1 > - T< n-2 >) ... + (T< n-x > - T< n-first >))
Since user_id 1 his hours and seconds are the same then I'll calculate the minutes only.
(13-10)+(10-7)+(7-4)+(4-1) = 12
user_id | total_hours
---------------------------------
1 | 00:12:00
2 | 00:03:02
I did this code
SET #start_date = '2016-09-25';
SET #start_time = '13:00:00';
SET #end_date = '2016-09-25';
SET #end_time = '13:15:00';
SELECT
`ul1`.`user_id`, SEC_TO_TIME(SUM(TIME_TO_SEC(`dl1`.`log_datetime`))) AS total_hours
FROM
`users_logs` AS `ul1`
JOIN `users_logs` AS `ul2`
ON `ul1`.`id` = `ul2`.`id`
WHERE
`ul1`.`log_datetime` >= CONCAT(#start_date, ' ', #start_time)
AND
`ul2`.`log_datetime` <= CONCAT(#end_date, ' ', #end_time)
GROUP BY `ul1`.`user_id`
But this code Sum all Time not getting the difference. This is the output of the code
user_id | total_hours
---------------------------------
1 | 65:35:40
2 | 39:38:25
How can I calculate the Sum of all difference datetime, then I want to display his active hours every 12 hours (00:00:00 - 11:59:59) and (12:00:00 - 23:59:59) with in selected DateTime Period at the beginning of the code
So the output would look like this (just an dummy example not from given data)
user_id | total_hours | 00_12_am | 12_00_pm |
-------------------------------------------------------
1 | 10:10:40 | 02:05:20 | 08:05:20 |
2 | 04:10:20 | 01:05:10 | 03:05:30 |
Thank you
So you log every minute and if a user is available there is a log entry.
Then count the logs per user, so you have the number of total minutes.
select user_id, count(*) as total_minutes
from user_logs
group by user_id;
If you want them displayed as time use sec_to_time:
select user_id, sec_to_time(count(*) * 60) as total_hours
from user_logs
group by user_id;
As to conditional aggregation:
select
user_id,
count(*) as total_minutes,
count(case when hour(log_datetime) < 12 then 1 end) as total_minutes_am,
count(case when hour(log_datetime) >= 12 then 1 end) as total_minutes_pm
from user_logs
group by user_id;
UPDATE: In order to count each minute just once count distinct minutes, i.e. DATE_FORMAT(log_datetime, '%Y-%m-%d %H:%i'). This can be done with COUNT(DISTINCT ...) or with a subquery getting distinct values.
The complete query:
select
user_id,
count(*) as total_minutes,
count(case when log_hour < 12 then 1 end) as total_minutes_am,
count(case when log_hour >= 12 then 1 end) as total_minutes_pm
from
(
select distinct
user_id,
date_format(log_datetime, '%y-%m-%d %h:%i') as log_moment,
hour(log_datetime) as log_hour
from.user_logs
) log
group by user_id;
MySQL
Lets say there is a credit card processing company. Every time a credit card is used a row gets inserted into a table.
create table tran(
id int,
tran_dt datetime,
card_id int,
merchant_id int,
amount int
);
One wants to know what cards have been used 3+ times in any 15 minute window at the same merchant.
My attempt:
select card_id, date(tran_dt), hour(tran_dt), merchant_id, count(*)
from tran
group by card_id, date(tran_dt), hour(tran_dt), merchant_id
having count(*)>=3
The first problem is that would give excessive transactions per hour, not per a 15 minute window. The second problem is that would not catch transactions that cross the hour mark ie at 1:59pm and 2:01pm.
To make this simpler, it would ok to split up the hour into 5 minute increments. So we would not have to check 1:00-1:15pm, 1:01-1:16pm, etc. It would be ok to check 1:00-1:15pm, 1:05-1:20pm, etc., if that is easier.
Any ideas how to fix the sql? I have a feeling maybe I need sql window functions, that are not yet available in MySQL. Or write a stored procedure that can look at each 15 block.
http://sqlfiddle.com/#!9/f2d74/1
You can convert the date/time to seconds and do arithmetic on the seconds to get the value within a 15 minute clock interval:
select card_id, min(date(tran_dt)) as first_charge_time, merchant_id, count(*)
from tran
group by card_id, floor(to_seconds(tran_dt) / (60 * 15)), merchant_id
having count(*) >= 3;
The above uses to_seconds(). In earlier versions of MySQL, you can use unix_timestamp().
Getting any 15 minute interval is more challenging. You can express the query as:
select t1.*, count(*) as numTransactions
from tran t1 join
tran t2
on t1.merchant_id = t2.merchanti_d and
t1.card_id = t2.card_id and
t2.tran_dt >= t1.tran_dt and
t2.tran_dt < t1.tran_dt + interval 15 minute
group by t1.id
having numTransactions >= 3;
Performance of this query might be problematic. An index on trans(card_id, merchant_id, tran_dt) should help a lot.
An option might be adding a trigger to the tran table on insert that checks the card_id inserted against the previous 15 minutes. If the count is greater than 3 then insert it into an "audit" table that you can query at your leisure.
-- create table to store audited cards
create table audit_cards(
card_id int,
tran_dt datetime
);
-- create trigger on tran table to catch the cards used 3 times in 15 min
CREATE TRIGGER audit_card AFTER INSERT ON tran
BEGIN
if (select count(new.card_id)
from tran
where tran_dt >= (new.tran_dt - INTERVAL 15 MINUTE)) >= 3
THEN
INSERT new.card_id, new.tran_dt into audit_cards;
END;
Then you can run a report on these cards...
select * from audit_cards;
http://dev.mysql.com/doc/refman/5.6/en/trigger-syntax.html
SELECT t1.card_id,t1.merchant_id,count(distinct t1.id)+1 as ChargeCount
FROM tran t1
INNER JOIN tran t2
on t2.card_id=t1.card_id
and t2.merchant_id=t1.merchant_id
and t2.tran_dt <= DATE_ADD(t1.tran_dt, INTERVAL 15 MINUTE)
and t2.id>t1.id
GROUP BY t1.card_id,t1.merchant_id
HAVING ChargeCount>2;
I was able to group all rows belonging to the same 15 minute window without duplicate records in the result, using in a single query.
Say your table has:
| id | tran_dt | card_id | merchant_id | amount |
|----|---------------------|---------|-------------|--------|
| 13 | 2015-07-23 16:40:00 | 1 | 1 | 10 |
| 14 | 2015-07-23 16:59:00 | 1 | 1 | 10 | <-- these should
| 15 | 2015-07-23 17:00:00 | 1 | 1 | 10 | <-- be identified
| 16 | 2015-07-23 17:01:00 | 1 | 1 | 10 | <-- in the
| 17 | 2015-07-23 17:02:00 | 1 | 1 | 10 | <-- first group
| 18 | 2015-07-23 17:03:00 | 2 | 2 | 10 |
...
| 50 | 2015-07-24 17:58:00 | 1 | 1 | 10 | <-- and these
| 51 | 2015-07-24 17:59:00 | 1 | 1 | 10 | <-- in the
| 52 | 2015-07-24 18:00:00 | 1 | 1 | 10 | <-- second
The result will be:
| id | card_id | merchant_id | numTrans | amount | dateTimeFirstTrans | dateTimeLastTrans
|----|---------|-------------|----------|--------|---------------------|---------------------
| 14 | 1 | 1 | 4 | 40 | 2015-07-23 16:59:00 | 2015-07-23 17:02:00
| 50 | 1 | 1 | 3 | 30 | 2015-07-24 17:58:00 | 2015-07-24 18:00:00
The query (SQL Fiddle):
select output.* from
(
select
min(subquery.main_id) as id,
subquery.main_card_id as card_id,
subquery.main_merchant_id as merchant_id,
count(subquery.main_id) as numTrans,
sum(subquery.main_amount) as amount,
min(subquery.x_timeFrameStart) as dateTimeFirstTrans,
max(subquery.x_timeFrameStart) as dateTimeLastTrans
from
(
select
main.id as main_id,
main.card_id as main_card_id,
main.merchant_id as main_merchant_id,
main.tran_dt as main_timeFrameStart,
main.amount as main_amount,
main.tran_dt + INTERVAL 15 MINUTE as main_timeFrameEnd,
xList.tran_dt as x_timeFrameStart,
xList.tran_dt + INTERVAL 15 MINUTE as x_timeFrameEnd
from tran as main
inner join tran as xList on /* cross list */
main.card_id = xList.card_id and
main.merchant_id = xList.merchant_id
where
xList.tran_dt between main.tran_dt and main.tran_dt + INTERVAL 15 MINUTE
) as subquery
group by subquery.main_id, subquery.main_card_id, subquery.main_merchant_id, subquery.main_timeFrameStart, subquery.main_timeFrameEnd
having count(subquery.main_id) >= 3
) as output
left join (
select
xList.id as x_id
from tran as main
inner join tran as xList on /* cross list */
main.card_id = xList.card_id and
main.merchant_id = xList.merchant_id and
main.id <> xList.id /* keep only first of the list */
where
xList.tran_dt between main.tran_dt and main.tran_dt + INTERVAL 15 MINUTE
) as exclude on output.id = exclude.x_id
where exclude.x_id is null;
The query is a bit long, and it repeats one subquery just to filter duplicates, so do your testing and tuning to make sure you don't incur in performance problems.
Here is some table for storing advertising campaign budgets history:
campaign_budgets_history
id_campaign budget date
1 10 2013-01-01
1 15 2013-01-03
1 10 2013-01-05
If there are no data for some date, it would be equal to the last set budget.
How can I count the sum of budgets by date range, for example from '2013-01-02' to '2013-01-06'. The result must be $60, because of the budget for '2013-01-02' would be equal to '2013-01-01', and budget for '2013-01-04' would be equal to '2013-01-03'.
Is there are any way to do it via SQL?
Here is a query for you. It uses user variables to denote the ends of the query range, but in the final version you'll likely rather use parameter placeholders instead. Note that #end is the first day after the range you query, i.e. it's the exclusive end of the range.
SET #begin = '2013-01-02';
SET #end = '2013-01-07';
SELECT
SUM(DATEDIFF(IF(CAST(c.end AS date) > CAST(#end AS date),
CAST(#end AS date),
CAST(c.end AS date)
),
IF(c.begin < CAST(#begin AS date),
CAST(#begin AS date),
c.begin
)
) * c.budget
) AS overall_budget
FROM
(SELECT a.id_campaign,
a.date begin,
MIN(IFNULL(b.date, CAST(#end AS date))) end,
a.budget
FROM campaign_budgets_history a
LEFT JOIN campaign_budgets_history b
ON a.id_campaign = b.id_campaign AND a.date < b.date
WHERE a.date < CAST(#end AS date)
GROUP BY a.id_campaign, a.date
HAVING end > CAST(#begin AS date)
) c;
Tested on SQL Fiddle. Not sure why all the casts seem necessary, perhaps there is a way to avoid some of them. But the above appears to work, and some versions with less casts did not.
The idea is that the subquery creates a table of ranges, each denoting the dates where a given budget was in effect. You might have to adjust the beginning of the first range, to match the beginning of your query range. Then you simply subtract the dates to obtain the number of days for each, and multiply that number by the daily budget.
This example uses a calendar (utility) table...
SELECT * FROM calendar WHERE dt BETWEEN '2012-12-27' AND '2013-01-12';
+------------+
| dt |
+------------+
| 2012-12-27 |
| 2012-12-28 |
| 2012-12-29 |
| 2012-12-30 |
| 2012-12-31 |
| 2013-01-01 |
| 2013-01-02 |
| 2013-01-03 |
| 2013-01-04 |
| 2013-01-05 |
| 2013-01-06 |
| 2013-01-07 |
| 2013-01-08 |
| 2013-01-09 |
| 2013-01-10 |
| 2013-01-11 |
| 2013-01-12 |
+------------+
SELECT SUM(budget) total
FROM campaign_budgets_history a
JOIN
( SELECT MAX(y.date) max_date
FROM calendar x
JOIN campaign_budgets_history y
ON y.date <= x.dt
WHERE x.dt BETWEEN '2013-01-02' AND '2013-01-06'
GROUP
BY x.dt
) b
ON b.max_date = a.date;
I have a database table which holds each user's checkins in cities. I need to know how many days a user has been in a city, and then, how many visits a user has made to a city (a visit consists of consecutive days spent in a city).
So, consider I have the following table (simplified, containing only the DATETIMEs - same user and city):
datetime
-------------------
2011-06-30 12:11:46
2011-07-01 13:16:34
2011-07-01 15:22:45
2011-07-01 22:35:00
2011-07-02 13:45:12
2011-08-01 00:11:45
2011-08-05 17:14:34
2011-08-05 18:11:46
2011-08-06 20:22:12
The number of days this user has been to this city would be 6 (30.06, 01.07, 02.07, 01.08, 05.08, 06.08).
I thought of doing this using SELECT COUNT(id) FROM table GROUP BY DATE(datetime)
Then, for the number of visits this user has made to this city, the query should return 3 (30.06-02.07, 01.08, 05.08-06.08).
The problem is that I have no idea how shall I build this query.
Any help would be highly appreciated!
You can find the first day of each visit by finding checkins where there was no checkin the day before.
select count(distinct date(start_of_visit.datetime))
from checkin start_of_visit
left join checkin previous_day
on start_of_visit.user = previous_day.user
and start_of_visit.city = previous_day.city
and date(start_of_visit.datetime) - interval 1 day = date(previous_day.datetime)
where previous_day.id is null
There are several important parts to this query.
First, each checkin is joined to any checkin from the previous day. But since it's an outer join, if there was no checkin the previous day the right side of the join will have NULL results. The WHERE filtering happens after the join, so it keeps only those checkins from the left side where there are none from the right side. LEFT OUTER JOIN/WHERE IS NULL is really handy for finding where things aren't.
Then it counts distinct checkin dates to make sure it doesn't double-count if the user checked in multiple times on the first day of the visit. (I actually added that part on edit, when I spotted the possible error.)
Edit: I just re-read your proposed query for the first question. Your query would get you the number of checkins on a given date, instead of a count of dates. I think you want something like this instead:
select count(distinct date(datetime))
from checkin
where user='some user' and city='some city'
Try to apply this code to your task -
CREATE TABLE visits(
user_id INT(11) NOT NULL,
dt DATETIME DEFAULT NULL
);
INSERT INTO visits VALUES
(1, '2011-06-30 12:11:46'),
(1, '2011-07-01 13:16:34'),
(1, '2011-07-01 15:22:45'),
(1, '2011-07-01 22:35:00'),
(1, '2011-07-02 13:45:12'),
(1, '2011-08-01 00:11:45'),
(1, '2011-08-05 17:14:34'),
(1, '2011-08-05 18:11:46'),
(1, '2011-08-06 20:22:12'),
(2, '2011-08-30 16:13:34'),
(2, '2011-08-31 16:13:41');
SET #i = 0;
SET #last_dt = NULL;
SET #last_user = NULL;
SELECT v.user_id,
COUNT(DISTINCT(DATE(dt))) number_of_days,
MAX(days) number_of_visits
FROM
(SELECT user_id, dt
#i := IF(#last_user IS NULL OR #last_user <> user_id, 1, IF(#last_dt IS NULL OR (DATE(dt) - INTERVAL 1 DAY) > DATE(#last_dt), #i + 1, #i)) AS days,
#last_dt := DATE(dt),
#last_user := user_id
FROM
visits
ORDER BY
user_id, dt
) v
GROUP BY
v.user_id;
----------------
Output:
+---------+----------------+------------------+
| user_id | number_of_days | number_of_visits |
+---------+----------------+------------------+
| 1 | 6 | 3 |
| 2 | 2 | 1 |
+---------+----------------+------------------+
Explanation:
To understand how it works let's check the subquery, here it is.
SET #i = 0;
SET #last_dt = NULL;
SET #last_user = NULL;
SELECT user_id, dt,
#i := IF(#last_user IS NULL OR #last_user <> user_id, 1, IF(#last_dt IS NULL OR (DATE(dt) - INTERVAL 1 DAY) > DATE(#last_dt), #i + 1, #i)) AS
days,
#last_dt := DATE(dt) lt,
#last_user := user_id lu
FROM
visits
ORDER BY
user_id, dt;
As you see the query returns all rows and performs ranking for the number of visits. This is known ranking method based on variables, note that rows are ordered by user and date fields. This query calculates user visits, and outputs next data set where days column provides rank for the number of visits -
+---------+---------------------+------+------------+----+
| user_id | dt | days | lt | lu |
+---------+---------------------+------+------------+----+
| 1 | 2011-06-30 12:11:46 | 1 | 2011-06-30 | 1 |
| 1 | 2011-07-01 13:16:34 | 1 | 2011-07-01 | 1 |
| 1 | 2011-07-01 15:22:45 | 1 | 2011-07-01 | 1 |
| 1 | 2011-07-01 22:35:00 | 1 | 2011-07-01 | 1 |
| 1 | 2011-07-02 13:45:12 | 1 | 2011-07-02 | 1 |
| 1 | 2011-08-01 00:11:45 | 2 | 2011-08-01 | 1 |
| 1 | 2011-08-05 17:14:34 | 3 | 2011-08-05 | 1 |
| 1 | 2011-08-05 18:11:46 | 3 | 2011-08-05 | 1 |
| 1 | 2011-08-06 20:22:12 | 3 | 2011-08-06 | 1 |
| 2 | 2011-08-30 16:13:34 | 1 | 2011-08-30 | 2 |
| 2 | 2011-08-31 16:13:41 | 1 | 2011-08-31 | 2 |
+---------+---------------------+------+------------+----+
Then we group this data set by user and use aggregate functions:
'COUNT(DISTINCT(DATE(dt)))' - counts the number of days
'MAX(days)' - the number of visits, it is a maximum value for the days field from our subquery.
That is all;)
As data sample provided by Devart, the inner "PreQuery" works with sql variables. By defaulting the #LUser to a -1 (probable non-existent user ID), the IF() test checks for any difference between last user and current. As soon as a new user, it gets a value of 1... Additionally, if the last date is more than 1 day from the new date of check-in, it gets a value of 1. Then, the subsequent columns reset the #LUser and #LDate to the value of the incoming record just tested against for the next cycle. Then, the outer query just sums them up and counts them for the final correct results per the Devart data set of
User ID Distinct Visits Total Days
1 3 9
2 1 2
select PreQuery.User_ID,
sum( PreQuery.NextVisit ) as DistinctVisits,
count(*) as TotalDays
from
( select v.user_id,
if( #LUser <> v.User_ID OR #LDate < ( date( v.dt ) - Interval 1 day ), 1, 0 ) as NextVisit,
#LUser := v.user_id,
#LDate := date( v.dt )
from
Visits v,
( select #LUser := -1, #LDate := date(now()) ) AtVars
order by
v.user_id,
v.dt ) PreQuery
group by
PreQuery.User_ID
for a first sub-task:
select count(*)
from (
select TO_DAYS(p.d)
from p
group by TO_DAYS(p.d)
) t
I think you should consider changing database structure. You could add table visits and visit_id into your checkins table. Each time you want to register new checkin you check if there is any checkin a day back. If yes then you add a new checkin with visit_id from yesterday's checkin. If not then you add new visit to visits and new checkin with new visit_id.
Then you could get you data in one query with something like that:
SELECT COUNT(id) AS number_of_days, COUNT(DISTINCT visit_id) number_of_visits FROM checkin GROUP BY user, city
It's not very optimal but still better than doing anything with current structure and it will work. Also if results can be separate queries it will work very fast.
But of course drawbacks are you will need to change database structure, do some more scripting and convert current data to new structure (i.e. you will need to add visit_id to current data).