I would like to display a players current score as well as how many points they have gained within a selected time frame.
I have 2 tables
skills table
+----+---------+---------------------+
| id | name | created_at |
+----+---------+---------------------+
| 1 | skill 1 | 2020-06-05 00:00:00 |
| 2 | skill 2 | 2020-06-05 00:00:00 |
| 3 | skill 3 | 2020-06-05 00:00:00 |
+----+---------+---------------------+
scores table
+----+-----------+----------+-------+---------------------+
| id | player_id | skill_id | score | created_at |
+----+-----------+----------+-------+---------------------+
| 1 | 1 | 1 | 5 | 2020-06-06 00:00:00 |
| 2 | 1 | 1 | 10 | 2020-07-06 00:00:00 |
| 3 | 1 | 2 | 1 | 2020-07-06 00:00:00 |
| 4 | 2 | 1 | 11 | 2020-07-06 00:00:00 |
| 5 | 1 | 1 | 13 | 2020-07-07 00:00:00 |
| 6 | 1 | 2 | 10 | 2020-07-07 00:00:00 |
| 7 | 2 | 1 | 12 | 2020-07-07 00:00:00 |
| 8 | 1 | 1 | 20 | 2020-07-08 00:00:00 |
| 9 | 1 | 2 | 15 | 2020-07-08 00:00:00 |
| 10 | 2 | 1 | 17 | 2020-07-08 00:00:00 |
+----+-----------+----------+-------+---------------------+
my expected results are:-
24 hour query
+-----------+---------+-------+------+
| player_id | name | score | gain |
+-----------+---------+-------+------+
| 1 | skill 1 | 20 | 7 |
| 1 | skill 2 | 15 | 5 |
+-----------+---------+-------+------+
7 day query
+-----------+---------+-------+------+
| player_id | name | score | gain |
+-----------+---------+-------+------+
| 1 | skill 1 | 20 | 10 |
| 1 | skill 2 | 15 | 14 |
+-----------+---------+-------+------+
31 day query
+-----------+---------+-------+------+
| player_id | name | score | gain |
+-----------+---------+-------+------+
| 1 | skill 1 | 20 | 15 |
| 1 | skill 2 | 15 | 14 |
+-----------+---------+-------+------+
so far I have the following, but all this does is return the last 2 records for each skill, I am struggling to calculate the gains and the different time frames
SELECT player_id, skill_id, name, score
FROM (SELECT player_id, skill_id, name, score,
#skill_count := IF(#current_skill = skill_id, #skill_count + 1, 1) AS skill_count,
#current_skill := skill_id
FROM skill_scores
INNER JOIN skills
ON skill_id = skills.id
WHERE player_id = 1
ORDER BY skill_id, score DESC
) counted
WHERE skill_count <= 2
I would like some help figuring out the query I need to build to get the desired results, or is it best to do this with php instead of in the db?
EDIT:-
MYSQL 8.0.20 dummy data id's are primary_key auto increment but I didnt ad that for simplicity:-
CREATE TABLE skills
(
id bigint,
name VARCHAR(80)
);
CREATE TABLE skill_scores
(
id bigint,
player_id bigint,
skill_id bigint,
score bigint,
created_at timestamp
);
INSERT INTO skills VALUES (1, 'skill 1');
INSERT INTO skills VALUES (2, 'skill 2');
INSERT INTO skills VALUES (3, 'skill 3');
INSERT INTO skill_scores VALUES (1, 1, 1 , 5, '2020-06-06 00:00:00');
INSERT INTO skill_scores VALUES (2, 1, 1 , 10, '2020-07-06 00:00:00');
INSERT INTO skill_scores VALUES (3, 1, 2 , 1, '2020-07-06 00:00:00');
INSERT INTO skill_scores VALUES (4, 2, 1 , 11, '2020-07-06 00:00:00');
INSERT INTO skill_scores VALUES (5, 1, 1 , 13, '2020-07-07 00:00:00');
INSERT INTO skill_scores VALUES (6, 1, 2 , 10, '2020-07-07 00:00:00');
INSERT INTO skill_scores VALUES (7, 2, 1 , 12, '2020-07-07 00:00:00');
INSERT INTO skill_scores VALUES (8, 1, 1 , 20, '2020-07-08 00:00:00');
INSERT INTO skill_scores VALUES (9, 1, 2 , 15, '2020-07-08 00:00:00');
INSERT INTO skill_scores VALUES (10, 2, 1 , 17, '2020-07-08 00:00:00');
WITH cte AS (
SELECT id, player_id, skill_id,
FIRST_VALUE(score) OVER (PARTITION BY player_id, skill_id ORDER BY created_at DESC) score,
FIRST_VALUE(score) OVER (PARTITION BY player_id, skill_id ORDER BY created_at DESC) - FIRST_VALUE(score) OVER (PARTITION BY player_id, skill_id ORDER BY created_at ASC) gain,
ROW_NUMBER() OVER (PARTITION BY player_id, skill_id ORDER BY created_at DESC) rn
FROM skill_scores
WHERE created_at BETWEEN #current_date - INTERVAL #interval DAY AND #current_date
)
SELECT cte.player_id, skills.name, cte.score, cte.gain
FROM cte
JOIN skills ON skills.id = cte.skill_id
WHERE rn = 1
ORDER BY player_id, name;
fiddle
Ps. I don't understand where gain=15 is taken for 31-day period - the difference between '2020-07-08 00:00:00' and '2020-06-06 00:00:00' is 32 days.
Well i think you need a (temporary) table for this. I will call it "player_skill_gains". Its basically the players skills ordered by created_at and with an auto_incremented id:
CREATE TABLE player_skill_gains
(`id` int PRIMARY KEY AUTO_INCREMENT NOT NULL
, `player_id` int
, skill_id int
, score int
, created_at date)
;
INSERT INTO player_skill_gains(player_id, skill_id, score, created_at)
SELECT player_skills.player_id AS player_id
, player_skills.skill_id
, SUM(player_skills.score) AS score
, player_skills.created_at
FROM player_skills
GROUP BY player_skills.id, player_skills.skill_id, player_skills.created_at
ORDER BY player_skills.player_id, player_skills.skill_id, player_skills.created_at ASC;
Using this table we can relatively easily select the last skill for each row (id-1). Using this we can calculate the gains:
SELECT player_skill_gains.player_id, skills.name, player_skill_gains.score
, player_skill_gains.score - IFNULL(bef.score,0) AS gain
, player_skill_gains.created_at
FROM player_skill_gains
INNER JOIN skills ON player_skill_gains.skill_id = skills.id
LEFT JOIN player_skill_gains AS bef ON (player_skill_gains.id - 1) = bef.id
AND player_skill_gains.player_id = bef.player_id
AND player_skill_gains.skill_id = bef.skill_id
For the different queries you want to have (24 hours, 7 days, etc.) you just have to specify the needed where-part for the query.
You can see all this in action here: http://sqlfiddle.com/#!9/1571a8/11/0
Related
i have table like this with mysql version 5.7
CREATE TABLE order_match (
ID INT,
user_id INT,
createdAt DATE,
status_id INT,
quantity INT
);
INSERT INTO order_match VALUES
(1, 12, '2020-01-01', 4, 1),
(2, 12, '2020-01-03', 7, 1),
(3, 12, '2020-01-06', 7, 2),
(4, 13, '2020-01-02', 5, 2),
(5, 13, '2020-01-03', 6, 1),
(6, 14, '2020-03-03', 8, 0.5),
(7, 13, '2020-03-04', 4, 1),
(8, 15, '2020-04-04', 7, 3),
(9, 14, '2020-03-02', 7, 2),
(10, 14, '2020-03-10', 5, 4),
(11, 13, '2020-04-10', 8, 3),
(12, 13, '2020-04-11', 8, 2),
(13, 16, '2020-04-15', 8, 3);
select * from order_match
order by createdAt;
the output just like this
+---------+---------+------------+-----------+----------+
| ID | user_id | createdAt | status_id | quantity |
+---------+---------+------------+-----------+----------+
| 1 | 12 | 2020-01-01 | 4 | 1 |
| 4 | 13 | 2020-01-02 | 5 | 2 |
| 2 | 12 | 2020-01-03 | 7 | 1 |
| 5 | 13 | 2020-01-03 | 6 | 1 |
| 3 | 12 | 2020-01-06 | 7 | 2 |
| 9 | 14 | 2020-03-02 | 7 | 2 |
| 6 | 14 | 2020-03-03 | 8 | 1 |
| 7 | 13 | 2020-03-04 | 4 | 1 |
| 10 | 14 | 2020-03-10 | 5 | 4 |
| 8 | 15 | 2020-04-04 | 7 | 3 |
| 11 | 13 | 2020-04-10 | 8 | 3 |
| 12 | 13 | 2020-04-11 | 8 | 2 |
| 13 | 16 | 2020-04-15 | 8 | 3 |
| 13 rows | | | | |
+---------+---------+------------+-----------+----------+
with ID as the id of transaction, user_id as the buyer who doing transaction, createdAt as the date transaction happen, status_id as the status of transaction (which 4, 5, 6, 8 as the approval transaction) and quantity as the amount of quantity of every transaction
this is the fiddle
so i want to find out the statistic of how many transaction, total amount of quantity, and total frequency of unique user between 2020-03-01 until 2020-04-01, unique user is the user who doing his first approval transaction before 2020-03-01 and at least doing 1 approval transaction in between 2020-03-01 until 2020-04-01, based on the table i made the expected result just like this
+------------+------------------+-----------------+
| count user | total_order (kg) | total_order (x) |
+------------+------------------+-----------------+
| 1 | 1 | 1 |
+------------+------------------+-----------------+
explanation : as we know the user who become unique user in between 2020-03-01 until 2020-04-01 are user_id 13, because he doing his first approval transaction on 2020-01-02 (before 2020-03-01) and then doing his approval transaction at least one time on 2020-03-01 until 2020-04-01, on time range, user_id 13 (count user) doing 1 transaction (total_order (x)) and the amount are 1 kg (total_order (kg )
i've doing this syntax
select
count(distinct om.user_id) as count,
sum(om.quantity) as total_order_kg,
count(om.id) as order_x
from (select count(xx.count_) as count_
from (select count(user_id) as count_ from order_match
where status_Id in (4, 5, 6, 8)
group by user_id
) xx
) x1,
(select user_id
from order_match
group by user_id
) yy,
order_match om
where yy.user_id = om.user_id and
status_id in (4, 5, 6, 8)
and om.createdAt < '2020-03-01'
and EXISTS (select 1 from order_match om2
where om.user_id = om2.user_id
and status_id in (4, 5, 6, 8)
and om2.createdAt >= '2020-03-01'
and om2.createdAt <= '2020-04-01');
but idk why the result like this
+------------+------------------+-----------------+
| count user | total_order (kg) | total_order (x) |
+------------+------------------+-----------------+
| 1 | 3 | 2 |
+------------+------------------+-----------------+
THE FIDDLE
-- separate users statistic
SELECT user_id,
SUM(quantity * (createdAt >= #start)) total_order_kg,
SUM(createdAt >= #start) order_x
FROM order_match
WHERE createdAt <= #finish
GROUP BY user_id
HAVING SUM(createdAt >= #start)
AND SUM(createdAt >= #start) < COUNT(createdAt);
-- overall statistic
SELECT COUNT(*) users_count,
SUM(order_kg) total_order_kg,
SUM(order_count) total_order_count
FROM ( SELECT user_id,
SUM(quantity * (createdAt >= #start)) order_kg,
SUM(createdAt >= #start) order_count
FROM order_match
WHERE createdAt <= #finish
GROUP BY user_id
HAVING SUM(createdAt >= #start)
AND SUM(createdAt >= #start) < COUNT(createdAt) ) totals;
fiddle
'why the result like this' - you are using comma joins so are starting from a cartesian product you can see what is happening if you substitute the aggregations for actual values for example
select
om.user_id,
om.quantity,
om.id,
x1.count_,
yy.user_id
from (select count(xx.count_) as count_
from (select count(user_id) as count_ from t
where status_Id in (4, 5, 6, 8)
group by user_id
) xx
) x1,
(select user_id
from t
group by user_id
) yy,
t om
where yy.user_id = om.user_id and
status_id in (4, 5, 6, 8)
and om.createdAt < '2020-03-01'
and EXISTS (select 1 from t om2
where om.user_id = om2.user_id
and status_id in (4, 5, 6, 8)
and om2.createdAt >= '2020-03-01'
and om2.createdAt <= '2020-04-01');
Where t is my table name and a copy of order_match.
If you run this query without the where clause then you get 65 rows returned, if you run it with the where clause but not the exists check you get 3 rows returned if you run it in it's entirety you get
---------+----------+------+--------+---------+
| user_id | quantity | id | count_ | user_id |
+---------+----------+------+--------+---------+
| 13 | 2 | 4 | 4 | 13 |
| 13 | 1 | 5 | 4 | 13 |
+---------+----------+------+--------+---------+
2 rows in set (0.002 sec)
Which when aggregated produces the result you get from your query.
NB group by without any aggregation functions is just wrong.
I have a table with below-mentioned columns:
ID (int) AUTO_INCREMENT PRIMARY KEY
DOCTOR_ID (int)
PATIENT_IN_TIME (datetime)
AVG_CHECKUP_TIME
I want to subtract row 1 PATIENT_IN_TIME with row 2 PATIENT_IN_TIME and save the result in minutes to AVG_CHECKUP_TIME.
Suppose there are 5 entries in the table.
|1|2|2018-03-22 02:49:51|NULL|
|2|2|2018-03-22 02:56:37|NULL|
So I want to find the difference of both the rows and save the minutes in the last column. So, the output would look like,
|1|2|2018-03-22 02:49:51|7|
|2|2|2018-03-22 02:56:37|NULL|
Please let me know if you need more information.
create table tbl
(
id int auto_increment primary key,
doctor_id int,
patient_in_time datetime,
avg_checkup_time datetime
);
insert into tbl values
(1, 2, '2018-03-22 02:49:51', null),
(2, 2, '2018-03-22 02:56:37', null),
(3, 2, '2018-03-22 03:00:15', null),
(4, 2, '2018-03-22 03:03:37', null);
select t1.id, t1.doctor_id, t1.patient_in_time,
timestampdiff(minute, t1.patient_in_time,
(select patient_in_time
from tbl where id = t1.id +1)) diff
from tbl t1
id | doctor_id | patient_in_time | diff
-: | --------: | :------------------ | ---:
1 | 2 | 2018-03-22 02:49:51 | 6
2 | 2 | 2018-03-22 02:56:37 | 3
3 | 2 | 2018-03-22 03:00:15 | 3
4 | 2 | 2018-03-22 03:03:37 | null
dbfiddle here
As per comments if order is set by patient_in_time then you can use an scalar subquery that returns next row in this way:
select t1.id,
t1.doctor_id,
t1.patient_in_time,
timestampdiff(minute,
t1.patient_in_time,
(select patient_in_time
from tbl
where patient_in_time > t1.patient_in_time
order by patient_in_time asc
limit 1)) diff
from tbl t1
order by patient_in_time
id | doctor_id | patient_in_time | diff
-: | --------: | :------------------ | ---:
1 | 2 | 2018-03-22 02:49:51 | 6
2 | 2 | 2018-03-22 02:56:37 | 3
3 | 2 | 2018-03-22 03:00:15 | 3
4 | 2 | 2018-03-22 03:03:37 | null
dbfiddle here
I'm trying to GROUP BY same records but different timestamp or datetime.
The difference of time is only about 3 minutes from the first entry.
example:
This is what the database table looks like.
*-------------------------------------------*
| id | name | time |
| 1 | Lei | 2018-02-21 12:00:10 |
| 2 | Lei | 2018-02-21 12:01:11 |
| 3 | Lei | 2018-02-21 12:01:15 |
| 4 | Lei | 2018-02-21 12:01:16 |
| 5 | Anna | 2018-02-21 12:03:11 |
| 6 | Anna | 2018-02-21 12:03:13 |
| 7 | Bell | 2018-02-21 12:05:01 |
| 8 | Lei | 2018-02-21 12:10:00 |
*-------------------------------------------*
I want to get Lei's entry from 12:00:10 up to 3 minutes from her first timestamp or datetime record.
so the output would be like this.
*-------------------------------------------*
| id | name | time |
| 1 | Lei | 2018-02-21 12:00:10 |
| 5 | Anna | 2018-02-21 12:03:11 |
| 7 | Bell | 2018-02-21 12:05:01 |
| 8 | Lei | 2018-02-21 12:10:00 |
*-------------------------------------------*
I'll be gladly appreciate your help, mysql or php it is.
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE Table1
(`id` int, `name` varchar(4), `time` datetime)
;
INSERT INTO Table1
(`id`, `name`, `time`)
VALUES
(1, 'Lei', '2018-02-21 12:00:10'),
(2, 'Lei', '2018-02-21 12:01:11'),
(3, 'Lei', '2018-02-21 12:01:15'),
(4, 'Lei', '2018-02-21 12:01:16'),
(5, 'Anna', '2018-02-21 12:03:11'),
(6, 'Anna', '2018-02-21 12:03:13'),
(7, 'Bell', '2018-02-21 12:05:01')
;
Query 1:
select id, name, min(time) as time
from Table1
group by name
order by time
Results:
| id | name | time |
|----|------|----------------------|
| 1 | Lei | 2018-02-21T12:00:10Z |
| 5 | Anna | 2018-02-21T12:03:11Z |
| 7 | Bell | 2018-02-21T12:05:01Z |
OR if you want to group by interval 3 minute you can do it like this
select id, name, min(time) as time
from Table1
group by name, UNIX_TIMESTAMP(time) DIV 180
order by time
;
With your sample data, you don't need to consider the timestamp at all:
select (#rn := #rn + 1) as id, name, min(time) as time
from t cross join
(select #rn := 0) params
group by id, name;
Grouping things by three minute intervals, from the first record in the interval is much harder. This requires either variables or recursive CTEs.
Looks like you need something like this:
select *
from mytable t
where not exists (
select *
from mytable t1
where t1.name = t.name
and t1.id <> t.id
and t1.time >= t.time - interval 3 minute
and t1.time < t.time
);
Demo: http://sqlfiddle.com/#!9/03cf16/1
It will select rows only if no row with the same name exists within a three munutes interval.
I have a big trouble to find a correct way to select a column from another table, and show one results that would contain two tables in the same time.
First table:
id | times | project_id |
12 | 12.24 | 40 |
13 | 13.22 | 40 |
14 | 13.22 | 20 |
15 | 12.22 | 20 |
16 | 13.30 | 40 |
Second table:
id | times | project_id |
32 | 22.24 | 40 |
33 | 23.22 | 40 |
34 | 23.22 | 70 |
35 | 22.22 | 70 |
36 | 23.30 | 40 |
I expect to select all the times from the first table for project_id =40, and join to this times from the second table for the same project_id =40.
The results should be like this below:
id | time | time | project_id |
12 | 12.24 | 22.24 | 40 |
13 | 13.22 | 23.22 | 40 |
16 | 13.30 | 23.30 | 40 |
You need to use UNION ALL between those 2 tables otherwise you will get incorrect results. Once you have all the rows together then you can use variables to carry over "previous values" such as shown below and demonstrated at this SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE Table1
(`id` int, `times` decimal(6,2), `project_id` int)
;
INSERT INTO Table1
(`id`, `times`, `project_id`)
VALUES
(12, 12.24, 40),
(13, 13.22, 40),
(14, 13.22, 20),
(15, 12.22, 20),
(16, 13.30, 40)
;
CREATE TABLE Table2
(`id` int, `times` decimal(6,2), `project_id` int)
;
INSERT INTO Table2
(`id`, `times`, `project_id`)
VALUES
(32, 22.24, 40),
(33, 23.22, 40),
(34, 23.22, 70),
(35, 22.22, 70),
(36, 23.30, 40)
;
Query 1:
select
project_id, id, prev_time, times
from (
select
#row_num :=IF(#prev_value=d.project_id,#row_num+1,1) AS RowNumber
, d.*
, IF(#row_num %2 = 0, #prev_time, '') prev_time
, #prev_value := d.project_id
, #prev_time := times
from (
select `id`, `times`, `project_id` from Table1
union all
select `id`, `times`, `project_id` from Table2
) d
cross join (select #prev_value := 0, #row_num := 0) vars
order by d.project_id, d.times
) d2
where prev_time <> ''
Results:
| project_id | id | prev_time | times |
|------------|----|-----------|-------|
| 20 | 14 | 12.22 | 13.22 |
| 40 | 13 | 12.24 | 13.22 |
| 40 | 32 | 13.30 | 22.24 |
| 40 | 36 | 23.22 | 23.3 |
| 70 | 34 | 22.22 | 23.22 |
Note: MySQL doe snot currently support LEAD() and LAG() functions when this answer was prepared. When MySQL does support these that approach would be simpler and probably more efficient.
select
d.*
from (
select
d1.*
, LEAD(times,1) OVER(partition by project_id order by times ASC) next_time
from (
select id, times, project_id from Table1
union all
select id, times, project_id from Table2
) d1
) d
where next_time is not null
My goal is to return a start and end date having same value in a column. Here is my table. The (*) have been marked to give you the idea of how I want to get "EndDate" for every similar sequence value of A & B columns
ID | DayDate | A | B
-----------------------------------------------
1 | 2010/07/1 | 200 | 300
2 | 2010/07/2 | 200 | 300 *
3 | 2010/07/3 | 150 | 250
4 | 2010/07/4 | 150 | 250 *
8 | 2010/07/5 | 150 | 350 *
9 | 2010/07/6 | 200 | 300
10 | 2010/07/7 | 200 | 300 *
11 | 2010/07/8 | 100 | 200
12 | 2010/07/9 | 100 | 200 *
and I want to get the following result table from the above table
| DayDate |EndDate | A | B
-----------------------------------------------
| 2010/07/1 |2010/07/2 | 200 | 300
| 2010/07/3 |2010/07/4 | 150 | 250
| 2010/07/5 |2010/07/5 | 150 | 350
| 2010/07/6 |2010/07/7 | 200 | 300
| 2010/07/8 |2010/07/9 | 100 | 200
UPDATE:
Thanks Mike, The approach of yours seems to work in your perspective of considering the following row as a mistake.
8 | 2010/07/5 | 150 | 350 *
However it is not a mistake. The challenge I am faced with this type of data is like a scenario of logging a market price change with date. The real problem in mycase is to select all rows with the beginning and ending date if both A & B matches in all these rows. Also to select the rows which are next to previously selected, and so on like that no data is left out in the table.
I can explain a real world scenario. A Hotel with Room A and B has room rates for each day entered in to table as explained in my question. Now the hotel needs to get a report to show the price calendar in a shorter way using start and end date, instead of listing all the dates entered. For example, on 2010/07/01 to 2010/07/02 the price of A is 200 and B is 300. This price is changed from 3rd to 4th and on 5th there is a different price only for that day where the Room B is price is changed to 350. So this is considered as a single day difference, thats why start and end dates are same.
I hope this explained the scenario of the problem. Also note that this hotel may be closed for a specific time period, lets say this is an additional problem to my first question. The problem is what if the rate is not entered on specific dates, for example on Sundays the hotel do not sell these two rooms so they entered no price, meaning the row will not exist in the table.
Creating related tables allows you much greater freedom to query and extract relevant information. Here's a few links that you might find useful:
You could start with these tutorials:
http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html
http://net.tutsplus.com/tutorials/databases/sql-for-beginners/
There are also a couple of questions here on stackoverflow that might be useful:
Normalization in plain English
What exactly does database normalization do?
Anyway, on to a possible solution. The following examples use your hotel rooms analogy.
First, create a table to hold information about the hotel rooms. This table just contains the room ID and its name, but you could store other information in here, such as the room type (single, double, twin), its view (ocean front, ocean view, city view, pool view), and so on:
CREATE TABLE `room` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`name` VARCHAR(45) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `name_UNIQUE` (`name` ASC) )
ENGINE = InnoDB;
Now create a table to hold the changing room rates. This table links to the room table through the room_id column. The foreign key constraint prevents records being inserted into the rate table which refer to rooms that do not exist:
CREATE TABLE `rate` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`room_id` INT UNSIGNED NOT NULL,
`date` DATE NOT NULL,
`rate` DECIMAL(6,2) UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
INDEX `fk_room_rate` (`room_id` ASC),
CONSTRAINT `fk_room_rate`
FOREIGN KEY (`room_id` )
REFERENCES `room` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;
Create two rooms, and add some daily rate information about each room:
INSERT INTO `room` (`id`, `name`) VALUES (1, 'A'), (2, 'B');
INSERT INTO `rate` (`id`, `room_id`, `date`, `rate`) VALUES
( 1, 1, '2010-07-01', 200),
( 2, 1, '2010-07-02', 200),
( 3, 1, '2010-07-03', 150),
( 4, 1, '2010-07-04', 150),
( 5, 1, '2010-07-05', 150),
( 6, 1, '2010-07-06', 200),
( 7, 1, '2010-07-07', 200),
( 8, 1, '2010-07-08', 100),
( 9, 1, '2010-07-09', 100),
(10, 2, '2010-07-01', 300),
(11, 2, '2010-07-02', 300),
(12, 2, '2010-07-03', 250),
(13, 2, '2010-07-04', 250),
(14, 2, '2010-07-05', 350),
(15, 2, '2010-07-06', 300),
(16, 2, '2010-07-07', 300),
(17, 2, '2010-07-08', 200),
(18, 2, '2010-07-09', 200);
With that information stored, a simple SELECT query with a JOIN will show you the all the daily room rates:
SELECT
room.name,
rate.date,
rate.rate
FROM room
JOIN rate
ON rate.room_id = room.id;
+------+------------+--------+
| A | 2010-07-01 | 200.00 |
| A | 2010-07-02 | 200.00 |
| A | 2010-07-03 | 150.00 |
| A | 2010-07-04 | 150.00 |
| A | 2010-07-05 | 150.00 |
| A | 2010-07-06 | 200.00 |
| A | 2010-07-07 | 200.00 |
| A | 2010-07-08 | 100.00 |
| A | 2010-07-09 | 100.00 |
| B | 2010-07-01 | 300.00 |
| B | 2010-07-02 | 300.00 |
| B | 2010-07-03 | 250.00 |
| B | 2010-07-04 | 250.00 |
| B | 2010-07-05 | 350.00 |
| B | 2010-07-06 | 300.00 |
| B | 2010-07-07 | 300.00 |
| B | 2010-07-08 | 200.00 |
| B | 2010-07-09 | 200.00 |
+------+------------+--------+
To find the start and end dates for each room rate, you need a more complex query:
SELECT
id,
room_id,
MIN(date) AS start_date,
MAX(date) AS end_date,
COUNT(*) AS days,
rate
FROM (
SELECT
id,
room_id,
date,
rate,
(
SELECT COUNT(*)
FROM rate AS b
WHERE b.rate <> a.rate
AND b.date <= a.date
AND b.room_id = a.room_id
) AS grouping
FROM rate AS a
ORDER BY a.room_id, a.date
) c
GROUP BY rate, grouping
ORDER BY room_id, MIN(date);
+----+---------+------------+------------+------+--------+
| id | room_id | start_date | end_date | days | rate |
+----+---------+------------+------------+------+--------+
| 1 | 1 | 2010-07-01 | 2010-07-02 | 2 | 200.00 |
| 3 | 1 | 2010-07-03 | 2010-07-05 | 3 | 150.00 |
| 6 | 1 | 2010-07-06 | 2010-07-07 | 2 | 200.00 |
| 8 | 1 | 2010-07-08 | 2010-07-09 | 2 | 100.00 |
| 10 | 2 | 2010-07-01 | 2010-07-02 | 2 | 300.00 |
| 12 | 2 | 2010-07-03 | 2010-07-04 | 2 | 250.00 |
| 14 | 2 | 2010-07-05 | 2010-07-05 | 1 | 350.00 |
| 15 | 2 | 2010-07-06 | 2010-07-07 | 2 | 300.00 |
| 17 | 2 | 2010-07-08 | 2010-07-09 | 2 | 200.00 |
+----+---------+------------+------------+------+--------+
You can find a good explanation of the technique used in the above query here:
http://www.sqlteam.com/article/detecting-runs-or-streaks-in-your-data
My general approach is to join the table onto itself based on DayDate = DayDate+1 and the A or B values not being equal
This will find the end dates for each period (where the value is going to be different on the following day)
The only problem is, that won't find an end date for the final period. To get around this, I selct the max date from the table and union that into my list of end dates
Once you have the list of end dates defined, you can join them to the original table based on the end date being greater than or equal to the original date
From this final list, select the minimum daydate grouped by the other fields
select
min(DayDate) as DayDate,EndDate,A,B from
(SELECT DayDate, A, B, min(ends.EndDate) as EndDate
FROM yourtable
LEFT JOIN
(SELECT max(DayDate) as EndDate FROM yourtable UNION
SELECT t1.DayDate as EndDate
FROM yourtable t1
JOIN yourtable t2
ON date_add(t1.DayDate, INTERVAL 1 DAY) = t2.DayDate
AND (t1.A<>t2.A OR t1.B<>t2.B)) ends
ON ends.EndDate>=DayDate
GROUP BY DayDate, A, B) x
GROUP BY EndDate,A,B
I think I have found a solution which does produce the table desired.
SELECT
a.DayDate AS StartDate,
( SELECT b.DayDate
FROM Dates AS b
WHERE b.DayDate > a.DayDate AND (b.B = a.B OR b.B IS NULL)
ORDER BY b.DayDate ASC LIMIT 1
) AS StopDate,
a.A as A,
a.B AS B
FROM Dates AS a
WHERE Coalesce(
(SELECT c.B
FROM Dates AS c
WHERE c.DayDate <= a.DayDate
ORDER BY c.DayDate DESC LIMIT 1,1
), -99999
) <> a.B
AND a.B IS NOT NULL
ORDER BY a.DayDate ASC;
is able to generate the following table result
StartDate StopDate A B
2010-07-01 2010-07-02 200 300
2010-07-03 2010-07-04 150 250
2010-07-05 NULL 150 350
2010-07-06 2010-07-07 200 300
2010-07-08 2010-07-09 100 200
But I need a way to replace the NULL with the same date of the start date.