Find longest consecutive time interval per object - mysql

There are Merchants and they can submit Claims.
I need to find the longest time period during which a Merchant had at least 1 claim. So a time period (in fractions of a day, whatever) per merchant_id.
So, for example:
+-------------+-----------+----------------------+----------------------+
| merchant_id | claim_id | from | to |
+-------------+-----------+----------------------+----------------------+
| 1 | 11 | 2016-08-15 12:00:00 | 2016-08-17 12:00:00 |
| 1 | 22 | 2016-08-16 12:00:00 | 2016-08-18 12:00:00 |
| 1 | 33 | 2016-08-19 12:00:00 | 2016-08-20 12:00:00 |
| 2 | 66 | 2016-08-15 12:00:00 | 2016-08-17 12:00:00 |
| 2 | 67 | 2016-08-18 12:00:00 | 2016-08-19 12:00:00 |
+-------------+-----------+----------------------+----------------------+
For merchant_id = 1 it would be 3 days.
For merchant_id = 2 it would be 2 days.
How do I do that?

Doing this alone in MySQL is really complex. I've tried for a particular merchant_id. I am not still sure if this is 100% right without checking for different set of inputs.
But you can give this a try and later I can explain the logic behind.
SELECT
firstTable.merchant_id,
MAX(TIMESTAMPDIFF(DAY,firstTable.from,secondTable.to)) AS maxConsecutiveDays
FROM
(
SELECT
A.merchant_id,
A.from,
#rn1 := #rn1 + 1 AS row_number
FROM merchants A
CROSS JOIN (SELECT #rn1 := 0) var
WHERE A.merchant_id = 2
AND NOT EXISTS (
SELECT 1 FROM merchants B WHERE B.merchant_id = A.merchant_id AND A.idt <> B.idt AND A.`from` BETWEEN B.from AND B.to
)
ORDER BY A.from
) AS firstTable
INNER JOIN (
SELECT
A.merchant_id,
A.to,
#rn2 := #rn2 + 1 AS row_number
FROM merchants A
CROSS JOIN (SELECT #rn2 := 0) var
WHERE A.merchant_id = 2
AND NOT EXISTS (
SELECT 1 FROM merchants B WHERE B.merchant_id = A.merchant_id AND A.idt <> B.idt AND A.to BETWEEN B.from AND B.to
)
ORDER BY A.to
) AS secondTable
ON firstTable.row_number = secondTable.row_number;
WORKING DEMO
Algorithm:
Let's consider the following steps for a particular merchant_id
First find all the start points which are not inside in any of the
ranges. I call this independent start points. Let's say these start
points are stored in a set S.
Second now find all the end points which are not inside in any of
the ranges. These are independent end points and are stored in a set
E.
Sort the sets in ascending order of time.
Now give a rank to every element of a set starting from 1.
Join these two sets on matching rank number.
Now enumerate the two sets simultaneously and get the difference in
days. And later find the maximum of this difference.
The last step can be illustrated by the following code snippet:
int maxDiff = 0;
for(int i=0; i< E.size(); i++){
if((E.get(i) - S.get(i) > maxDiff){
maxDiff = E.get(i) - S.get(i);
}
}
And maxDiff is your output;
EDIT:
In order to get longest consecutive days for each merchant check this DEMO

Related

MySQL: how to assign same ID for records with close timestamp

I have a MySQL table with timestamp column t. I need to create another integer column (groupId) which will have the same value for records with timestamp with
less then 3 sec difference. My version of MySQL has no window function support. This is the expected output in 2nd column:
+---------------------+--------+
| t | groupId|
+---------------------+--------+
| 2017-06-17 18:15:13 | 1 |
| 2017-06-17 18:15:14 | 1 |
| 2017-06-17 20:30:06 | 2 |
| 2017-06-17 20:30:07 | 2 |
| 2017-06-17 22:44:58 | 3 |
| 2017-06-17 22:44:59 | 3 |
| 2017-06-17 23:59:50 | 4 |
| 2017-06-17 23:59:51 | 4 |
I tried to use self-join and TIMESTAMPDIFF(SECOND,t1,t2) <3
but I do not know how to generate the unique groupId.
P.S.
It is guaranteed by the nature of data what there is no continues range which spans > 3 sec
You can do this using variables.
select tm
,#diff:=timestampdiff(second,#prev,tm)
,#prev:=tm
,#grp:=case when #diff<3 or #diff is null then #grp else #grp+1 end as groupID
from t
cross join (select #prev:='',#diff:=0,#grp:=1) r
order by tm
For this, I believe that you need to create a stored procedure that first sort your table by the column t (timestamp) and then goes through it grouping and assigning the groupId accordingly.... in this case you can use your own counter as groupID.
What it is important here, is how you split the time into frames of 2 seconds, you could end with different results depending of your point of reference...
This query puts every record in the same group when the previous record is just 3 seconds before:
UPDATE t
JOIN (
SELECT
t.*
, #gid := IF(TIMESTAMPDIFF(SECOND, #prev, t) > 3, #gid + 1, #gid) AS gid
, #prev := t
FROM t
, (SELECT #prev := NULL, #gid := 1) v
ORDER BY t
) sq ON t.t = sq.t
SET t.groupId = sq.gid;
see it working live in an sqlfiddle
learn more about user-defined variables here
This query will work in Oracle sql:
select *
from (
select e.*,
rank() over (partition by trunc(hiredate,'mi') order by trunc(hiredate,'mi') desc) MINu
from emp e
)

MySQL SUM previous row by date column using Union

I am hoping I am just stumped because its the end of the work day on a Monday, and someone here can give me a hand.
Basically I have 2 tables that have invoice information and a table that has payment information. Using the following I get the first part of my display.
SELECT d.id, i.id as invid, i.company_id, d.total, created, adjustment FROM tbl_finance_invoices as i
LEFT JOIN tbl_finance_invoice_details as d ON d.invoice_id = i.id
WHERE company_id = '69350'
UNION
SELECT id, 0, comp_id, amount_paid, uploaded_date, 'paid' FROM tbl_finance_invoice_paid_items
WHERE comp_id = '69350'
ORDER BY created
What I want to do is:
Create a new column called "Balance" that adds total to the previous total by the created column regardless of how the rest of the table is sorted.
To give a quick example, my current output is something like:
id | invid | company_id | total | created | adjustment
12 | 16 | 1 | 40 | 01/01/16| 0
100| 0 | 1 | 10 | 01/05/16| 0
50 | 20 | 1 | 50 | 05/01/16| 0
What my goal is would be:
id | invid | company_id | total | created | adjustment | balance |Notes
12 | 16 | 1 | 40 | 01/01/16| 0 | 40 | 0 + 40
100| 0 | 1 | 10 | 01/05/16| 1 | 50 | 40 + 10
50 | 20 | 1 | 50 | 05/01/16| 0 | 100 | 50 + 50
And regardless of sorting by id, invid, total, created, etc, the balance would always be tied to the created date.
So if I added a "Where adjustment = '1'" to my sql, I would get:
100| 0 | 1 | 10 | 01/05/16| 1 | 50 | 40 + 10
Since the OP confirmed my understanding in comments, I'm basing my answer on the following assumption:
The running total would be tied to the order of created_date. The
running total would only be affected by company id as a filtering
criterion, all other filters should be disregarded for that
calculation.
Since the running total may have a different order by and filtering criteria than the rest of the query, therefore the running total calculation has to be placed in a subquery.
The other assumption I have to make is that there cannot be more than one invoice with the same created date for a single customer id, since the original query in the OP does not have any group by or summing either.
I prefer to use the approach suggested by #OMG Ponies in this post on SO, where he initiates the mysql variable holding the running total in a subquery, thus there is no need to initialize the variable in a separate set statement.
SELECT d.id, i.id as invid, i.company_id, rt.total, rt.cumulative_sum, rt.created, adjustment
FROM tbl_finance_invoices as i
LEFT JOIN tbl_finance_invoice_details as d ON d.invoice_id = i.id
LEFT JOIN
(SELECT d.total, created, #running_total := #running_total + t.count AS cumulative_sum
FROM tbl_finance_invoices as i
LEFT JOIN tbl_finance_invoice_details as d ON d.invoice_id = i.id
JOIN (SELECT #running_total := 0) r -- no join condition, so this produces a carthesian join
WHERE company_id = '69350'
ORDER BY created) rt
ON i.created=rt.created --this is also an assumption, I do not know which original table holds the created field
WHERE company_id = '69350' and adjustment=1
ORDER BY d.id
If you need to take the amounts from the tbl_finance_invoice_paid_items into account as well, then you need to add that to the subquery.

Fetch Unit consumption date-wise

I am struggling in to get result from mysql in the following way. I have 10 records in mysql db table having date and unit fields. I need to get used units on every date.
Table structure as follows, adding today unit with past previous unit in every record:
Date Units
---------- ---------
10/10/2012 101
11/10/2012 111
12/10/2012 121
13/10/2012 140
14/10/2012 150
15/10/2012 155
16/10/2012 170
17/10/2012 180
18/10/2012 185
19/10/2012 200
Desired output will be :
Date Units
---------- ---------
10/10/2012 101
11/10/2012 10
12/10/2012 10
13/10/2012 19
14/10/2012 10
15/10/2012 5
16/10/2012 15
17/10/2012 10
18/10/2012 5
19/10/2012 15
Any help will be appreciated. Thanks
There's a couple of ways to get the resultset. If you can live with an extra column in the resultset, and the order of the columns, then something like this is a workable approach.
using user variables
SELECT d.Date
, IF(#prev_units IS NULL
,#diff := 0
,#diff := d.units - #prev_units
) AS `Units_used`
, #prev_units := d.units AS `Units`
FROM ( SELECT #prev_units := NULL ) i
JOIN (
SELECT t.Date, t.Units
FROM mytable t
ORDER BY t.Date, t.Units
) d
This returns the specified resultset, but it includes the Units column as well. It's possible to have that column filtered out, but it's more expensive, because of the way MySQL processes an inline view (MySQL calls it a "derived table")
To remove that extra column, you can wrap that in another query...
SELECT f.Date
, f.Units_used
FROM (
query from above goes here
) f
ORDER BY f.Date
but again, removing that column comes with the extra cost of materializing that result set a second time.
using a semi-join
If you are guaranteed to have a single row for each Date value, either stored as a DATE, or as a DATETIME with the timecomponent set to a constant, such as midnight, and no gaps in the Date value, and Date is defined as DATE or DATETIME datatype, then another query that will return the specifid result set:
SELECT t.Date
, t.Units - s.Units AS Units_Used
FROM mytable t
LEFT
JOIN mytable s
ON s.Date = t.Date + INTERVAL -1 DAY
ORDER BY t.Date
If there's a missing Date value (a gap) such that there is no matching previous row, then Units_used will have a NULL value.
using a correlated subquery
If you don't have a guarantee of no "missing dates", but you have a guarantee that there is no more than one row for a particular Date, then another approach (usually more expensive in terms of performance) is to use a correlated subquery:
SELECT t.Date
, ( t.Units - (SELECT s.Units
FROM mytable s
WHERE s.Date < t.Date
ORDER BY s.Date DESC
LIMIT 1)
) AS Units_used
FROM mytable t
ORDER BY t.Date, t.Units
spencer7593's solution will be faster, but you can also do something like this...
SELECT * FROM rolling;
+----+-------+
| id | units |
+----+-------+
| 1 | 101 |
| 2 | 111 |
| 3 | 121 |
| 4 | 140 |
| 5 | 150 |
| 6 | 155 |
| 7 | 170 |
| 8 | 180 |
| 9 | 185 |
| 10 | 200 |
+----+-------+
SELECT a.id,COALESCE(a.units - b.units,a.units) units
FROM
( SELECT x.*
, COUNT(*) rank
FROM rolling x
JOIN rolling y
ON y.id <= x.id
GROUP
BY x.id
) a
LEFT
JOIN
( SELECT x.*
, COUNT(*) rank
FROM rolling x
JOIN rolling y
ON y.id <= x.id
GROUP
BY x.id
) b
ON b.rank= a.rank -1;
+----+-------+
| id | units |
+----+-------+
| 1 | 101 |
| 2 | 10 |
| 3 | 10 |
| 4 | 19 |
| 5 | 10 |
| 6 | 5 |
| 7 | 15 |
| 8 | 10 |
| 9 | 5 |
| 10 | 15 |
+----+-------+
This should give the desired result. I don't know how your table is called so I named it "tbltest".
Naming a table date is generally a bad idea as it also refers to other things (functions, data types,...) so I renamed it "fdate". Using uppercase characters in field names or tablenames is also a bad idea as it makes your statements less database independent (some databases are case sensitive and some are not).
SELECT
A.fdate,
A.units - coalesce(B.units, 0) AS units
FROM
tbltest A left join tbltest B ON A.fdate = B.fdate + INTERVAL 1 DAY

Counting appointments for each day using MYSQL

I'm in trouble with a mysql statement counting appointments for one day within a given time period. I've got a calendar table including starting and finishing column (type = DateTime). The following statement should count all appointments for November including overall appointments:
SELECT
COUNT('APPOINTMENTS') AS Count,
DATE(c.StartingDate) AS Datum
FROM t_calendar c
WHERE
c.GUID = 'blalblabla' AND
((DATE(c.StartingDate) <= DATE('2012-11-01 00:00:00')) AND (DATE(c.EndingDate) >= DATE('2012-11-30 23:59:59'))) OR
((DATE(c.StartingDate) >= DATE('2012-11-01 00:00:00')) AND (DATE(c.EndingDate) <= DATE('2012-11-30 23:59:59')))
GROUP BY DATE(c.StartingDate)
HAVING Count > 1
But how to include appointments that starts before a StartingDate and ends on the StartingDate?
e.g.
StartingDate = 2012-11-14 17:00:00, EndingDate = 2012-11-15 08:00:00
StartingDate = 2012-11-15 09:00:00, EndingDate = 2012-11-15 10:00:00
StartingDate = 2012-11-15 11:00:00, EndingDate = 2012-11-15 12:00:00
My statement returns a count of 2 for 15th of November. But that's wrong because the first appointment is missing. How to include these appointments? What I am missing, UNION SELECT, JOIN, sub selection?
A possible solution?
SELECT
c1.GUID, COUNT('APPOINTMENTS') + COUNT(DISTINCT c2.ANYFIELD) AS Count,
DATE(c1.StartingDate) AS Datum,
COUNT(DISTINCT c2.ANYFIELD)
FROM
t_calendar c1
LEFT JOIN
t_calendar c2
ON
c2.ResourceGUID = c1.ResourceGUID AND
(DATE(c2.EndingDate) = DATE(c1.StartingDate)) AND
(DATE(c2.StartingDate) < DATE(c1.StartingDate))
WHERE
((DATE(c1.StartingDate) <= DATE('2012-11-01 00:00:00')) AND (DATE(c1.EndingDate) >= DATE('2012-11-30 23:59:59'))) OR
((DATE(c1.StartingDate) >= DATE('2012-11-01 00:00:00')) AND (DATE(c1.EndingDate) <= DATE('2012-11-30 23:59:59')))
GROUP BY
c1.ResourceGUID,
DATE(c1.StartingDate)
First: Consolidate range checking
First of all your two range where conditions can be replaced by a single one. And it also seems that you're only counting appointments that either completely overlap target date range or are completely contained within. Partially overlapping ones aren't included. Hence your question about appointments that end right on the range starting date.
To make where clause easily understandable I'll simplify it by using:
two variables to define target range:
rangeStart (in your case 1st Nov 2012)
rangeEnd (I'll rather assume to 1st Dec 2012 00:00:00.00000)
won't be converting datetime to dates only (using date function) the way that you did, but you can easily do that.
With these in mind your where clause can be greatly simplified and covers all appointments for given range:
...
where (c.StartingDate < rangeEnd) and (c.EndingDate >= rangeStart)
...
This will search for all appointments that fall in target range and will cover all these appointment cases:
start end
target range |==============|
partial front |---------|
partial back |---------|
total overlap |---------------------|
total containment |-----|
Partial front/back may also barely touch your target range (what you've been after).
Second: Resolving the problem
Why you're missing the first record? Simply because of your having clause that only collects those groups that have more than 1 appointment starting on a given day: 15th Nov has two, but 14th has only one and is therefore excluded because Count = 1 and is not > 1.
To answer your second question what am I missing is: you're not missing anything, actually you have too much in your statement and needs to simplified.
Try this statement instead that should return exactly what you're after:
select count(c.GUID) as Count,
date(c.StartingDate) as Datum
from t_calendar c
where (c.GUID = 'blabla') and
(c.StartingDate < str_to_date('2012-12-01', '%Y-%m-%d') and
(c.EndingDate >= str_to_date('2012-11-01', '%Y-%m-%d'))
group by date(c.StartingDate)
I used str_to_date function to make string to date conversion more safe.
I'm not really sure why you included having in your statement, because it's not really needed. Unless your actual statement is more complex and you only included part that's most relevant. In that case you'll likely have to change it to:
having Count > 0
Getting appointment count per day in any given date range
There are likely other ways as well but the most common way would be using a numbers or ?calendar* table that gives you the ability to break a range into individual points - days. They you have to join your appointments to this numbers table and provide results.
I've created a SQLFiddle that does the trick. Here's what it does...
Suppose you have numbers table Num with numbers from 0 to x. And appointments table Cal with your records. Following script created these two tables and populates some data. Numbers are only up to 100 which is enough for 3 months worth of data.
-- appointments
create table Cal (
Id int not null auto_increment primary key,
StartDate datetime not null,
EndDate datetime not null
);
-- create appointments
insert Cal (StartDate, EndDate)
values
('2012-10-15 08:00:00', '2012-10-20 16:00:00'),
('2012-10-25 08:00:00', '2012-11-01 03:00:00'),
('2012-11-01 12:00:00', '2012-11-01 15:00:00'),
('2012-11-15 10:00:00', '2012-11-16 10:00:00'),
('2012-11-20 08:00:00', '2012-11-30 08:00:00'),
('2012-11-30 22:00:00', '2012-12-05 00:00:00'),
('2012-12-01 05:00:00', '2012-12-10 12:00:00');
-- numbers table
create table Nums (
Id int not null primary key
);
-- add 100 numbers
insert into Nums
select a.a + (10 * b.a)
from (select 0 as a union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9) as a,
(select 0 as a union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9) as b
Now what you have to do now is
Select a range of days which you do by selecting numbers from Num table and convert them to dates.
Then join your appointments to those dates so that those appointments that fall on particular day are joined to that particular day
Then just group all these appointments per each day and get results
Here's the code that does this:
-- just in case so comparisons don't trip over
set names 'latin1' collate latin1_general_ci;
-- start and end target date range
set #s := str_to_date('2012-11-01', '%Y-%m-%d');
set #e := str_to_date('2012-12-01', '%Y-%m-%d');
-- get appointment count per day within target range of days
select adddate(#s, n.Id) as Day, count(c.Id) as Appointments
from Nums n
left join Cal c
on ((date(c.StartDate) <= adddate(#s, n.Id)) and (date(c.EndDate) >= adddate(#s, n.Id)))
where adddate(#s, n.Id) < #e
group by Day;
And this is the result of this rather simple select statement:
| DAY | APPOINTMENTS |
-----------------------------
| 2012-11-01 | 2 |
| 2012-11-02 | 0 |
| 2012-11-03 | 0 |
| 2012-11-04 | 0 |
| 2012-11-05 | 0 |
| 2012-11-06 | 0 |
| 2012-11-07 | 0 |
| 2012-11-08 | 0 |
| 2012-11-09 | 0 |
| 2012-11-10 | 0 |
| 2012-11-11 | 0 |
| 2012-11-12 | 0 |
| 2012-11-13 | 0 |
| 2012-11-14 | 0 |
| 2012-11-15 | 1 |
| 2012-11-16 | 1 |
| 2012-11-17 | 0 |
| 2012-11-18 | 0 |
| 2012-11-19 | 0 |
| 2012-11-20 | 1 |
| 2012-11-21 | 1 |
| 2012-11-22 | 1 |
| 2012-11-23 | 1 |
| 2012-11-24 | 1 |
| 2012-11-25 | 1 |
| 2012-11-26 | 1 |
| 2012-11-27 | 1 |
| 2012-11-28 | 1 |
| 2012-11-29 | 1 |
| 2012-11-30 | 2 |

MySQL: group by consecutive days and count groups

I have a database table which holds each user's checkins in cities. I need to know how many days a user has been in a city, and then, how many visits a user has made to a city (a visit consists of consecutive days spent in a city).
So, consider I have the following table (simplified, containing only the DATETIMEs - same user and city):
datetime
-------------------
2011-06-30 12:11:46
2011-07-01 13:16:34
2011-07-01 15:22:45
2011-07-01 22:35:00
2011-07-02 13:45:12
2011-08-01 00:11:45
2011-08-05 17:14:34
2011-08-05 18:11:46
2011-08-06 20:22:12
The number of days this user has been to this city would be 6 (30.06, 01.07, 02.07, 01.08, 05.08, 06.08).
I thought of doing this using SELECT COUNT(id) FROM table GROUP BY DATE(datetime)
Then, for the number of visits this user has made to this city, the query should return 3 (30.06-02.07, 01.08, 05.08-06.08).
The problem is that I have no idea how shall I build this query.
Any help would be highly appreciated!
You can find the first day of each visit by finding checkins where there was no checkin the day before.
select count(distinct date(start_of_visit.datetime))
from checkin start_of_visit
left join checkin previous_day
on start_of_visit.user = previous_day.user
and start_of_visit.city = previous_day.city
and date(start_of_visit.datetime) - interval 1 day = date(previous_day.datetime)
where previous_day.id is null
There are several important parts to this query.
First, each checkin is joined to any checkin from the previous day. But since it's an outer join, if there was no checkin the previous day the right side of the join will have NULL results. The WHERE filtering happens after the join, so it keeps only those checkins from the left side where there are none from the right side. LEFT OUTER JOIN/WHERE IS NULL is really handy for finding where things aren't.
Then it counts distinct checkin dates to make sure it doesn't double-count if the user checked in multiple times on the first day of the visit. (I actually added that part on edit, when I spotted the possible error.)
Edit: I just re-read your proposed query for the first question. Your query would get you the number of checkins on a given date, instead of a count of dates. I think you want something like this instead:
select count(distinct date(datetime))
from checkin
where user='some user' and city='some city'
Try to apply this code to your task -
CREATE TABLE visits(
user_id INT(11) NOT NULL,
dt DATETIME DEFAULT NULL
);
INSERT INTO visits VALUES
(1, '2011-06-30 12:11:46'),
(1, '2011-07-01 13:16:34'),
(1, '2011-07-01 15:22:45'),
(1, '2011-07-01 22:35:00'),
(1, '2011-07-02 13:45:12'),
(1, '2011-08-01 00:11:45'),
(1, '2011-08-05 17:14:34'),
(1, '2011-08-05 18:11:46'),
(1, '2011-08-06 20:22:12'),
(2, '2011-08-30 16:13:34'),
(2, '2011-08-31 16:13:41');
SET #i = 0;
SET #last_dt = NULL;
SET #last_user = NULL;
SELECT v.user_id,
COUNT(DISTINCT(DATE(dt))) number_of_days,
MAX(days) number_of_visits
FROM
(SELECT user_id, dt
#i := IF(#last_user IS NULL OR #last_user <> user_id, 1, IF(#last_dt IS NULL OR (DATE(dt) - INTERVAL 1 DAY) > DATE(#last_dt), #i + 1, #i)) AS days,
#last_dt := DATE(dt),
#last_user := user_id
FROM
visits
ORDER BY
user_id, dt
) v
GROUP BY
v.user_id;
----------------
Output:
+---------+----------------+------------------+
| user_id | number_of_days | number_of_visits |
+---------+----------------+------------------+
| 1 | 6 | 3 |
| 2 | 2 | 1 |
+---------+----------------+------------------+
Explanation:
To understand how it works let's check the subquery, here it is.
SET #i = 0;
SET #last_dt = NULL;
SET #last_user = NULL;
SELECT user_id, dt,
#i := IF(#last_user IS NULL OR #last_user <> user_id, 1, IF(#last_dt IS NULL OR (DATE(dt) - INTERVAL 1 DAY) > DATE(#last_dt), #i + 1, #i)) AS
days,
#last_dt := DATE(dt) lt,
#last_user := user_id lu
FROM
visits
ORDER BY
user_id, dt;
As you see the query returns all rows and performs ranking for the number of visits. This is known ranking method based on variables, note that rows are ordered by user and date fields. This query calculates user visits, and outputs next data set where days column provides rank for the number of visits -
+---------+---------------------+------+------------+----+
| user_id | dt | days | lt | lu |
+---------+---------------------+------+------------+----+
| 1 | 2011-06-30 12:11:46 | 1 | 2011-06-30 | 1 |
| 1 | 2011-07-01 13:16:34 | 1 | 2011-07-01 | 1 |
| 1 | 2011-07-01 15:22:45 | 1 | 2011-07-01 | 1 |
| 1 | 2011-07-01 22:35:00 | 1 | 2011-07-01 | 1 |
| 1 | 2011-07-02 13:45:12 | 1 | 2011-07-02 | 1 |
| 1 | 2011-08-01 00:11:45 | 2 | 2011-08-01 | 1 |
| 1 | 2011-08-05 17:14:34 | 3 | 2011-08-05 | 1 |
| 1 | 2011-08-05 18:11:46 | 3 | 2011-08-05 | 1 |
| 1 | 2011-08-06 20:22:12 | 3 | 2011-08-06 | 1 |
| 2 | 2011-08-30 16:13:34 | 1 | 2011-08-30 | 2 |
| 2 | 2011-08-31 16:13:41 | 1 | 2011-08-31 | 2 |
+---------+---------------------+------+------------+----+
Then we group this data set by user and use aggregate functions:
'COUNT(DISTINCT(DATE(dt)))' - counts the number of days
'MAX(days)' - the number of visits, it is a maximum value for the days field from our subquery.
That is all;)
As data sample provided by Devart, the inner "PreQuery" works with sql variables. By defaulting the #LUser to a -1 (probable non-existent user ID), the IF() test checks for any difference between last user and current. As soon as a new user, it gets a value of 1... Additionally, if the last date is more than 1 day from the new date of check-in, it gets a value of 1. Then, the subsequent columns reset the #LUser and #LDate to the value of the incoming record just tested against for the next cycle. Then, the outer query just sums them up and counts them for the final correct results per the Devart data set of
User ID Distinct Visits Total Days
1 3 9
2 1 2
select PreQuery.User_ID,
sum( PreQuery.NextVisit ) as DistinctVisits,
count(*) as TotalDays
from
( select v.user_id,
if( #LUser <> v.User_ID OR #LDate < ( date( v.dt ) - Interval 1 day ), 1, 0 ) as NextVisit,
#LUser := v.user_id,
#LDate := date( v.dt )
from
Visits v,
( select #LUser := -1, #LDate := date(now()) ) AtVars
order by
v.user_id,
v.dt ) PreQuery
group by
PreQuery.User_ID
for a first sub-task:
select count(*)
from (
select TO_DAYS(p.d)
from p
group by TO_DAYS(p.d)
) t
I think you should consider changing database structure. You could add table visits and visit_id into your checkins table. Each time you want to register new checkin you check if there is any checkin a day back. If yes then you add a new checkin with visit_id from yesterday's checkin. If not then you add new visit to visits and new checkin with new visit_id.
Then you could get you data in one query with something like that:
SELECT COUNT(id) AS number_of_days, COUNT(DISTINCT visit_id) number_of_visits FROM checkin GROUP BY user, city
It's not very optimal but still better than doing anything with current structure and it will work. Also if results can be separate queries it will work very fast.
But of course drawbacks are you will need to change database structure, do some more scripting and convert current data to new structure (i.e. you will need to add visit_id to current data).