I'm trying to GROUP BY same records but different timestamp or datetime.
The difference of time is only about 3 minutes from the first entry.
example:
This is what the database table looks like.
*-------------------------------------------*
| id | name | time |
| 1 | Lei | 2018-02-21 12:00:10 |
| 2 | Lei | 2018-02-21 12:01:11 |
| 3 | Lei | 2018-02-21 12:01:15 |
| 4 | Lei | 2018-02-21 12:01:16 |
| 5 | Anna | 2018-02-21 12:03:11 |
| 6 | Anna | 2018-02-21 12:03:13 |
| 7 | Bell | 2018-02-21 12:05:01 |
| 8 | Lei | 2018-02-21 12:10:00 |
*-------------------------------------------*
I want to get Lei's entry from 12:00:10 up to 3 minutes from her first timestamp or datetime record.
so the output would be like this.
*-------------------------------------------*
| id | name | time |
| 1 | Lei | 2018-02-21 12:00:10 |
| 5 | Anna | 2018-02-21 12:03:11 |
| 7 | Bell | 2018-02-21 12:05:01 |
| 8 | Lei | 2018-02-21 12:10:00 |
*-------------------------------------------*
I'll be gladly appreciate your help, mysql or php it is.
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE Table1
(`id` int, `name` varchar(4), `time` datetime)
;
INSERT INTO Table1
(`id`, `name`, `time`)
VALUES
(1, 'Lei', '2018-02-21 12:00:10'),
(2, 'Lei', '2018-02-21 12:01:11'),
(3, 'Lei', '2018-02-21 12:01:15'),
(4, 'Lei', '2018-02-21 12:01:16'),
(5, 'Anna', '2018-02-21 12:03:11'),
(6, 'Anna', '2018-02-21 12:03:13'),
(7, 'Bell', '2018-02-21 12:05:01')
;
Query 1:
select id, name, min(time) as time
from Table1
group by name
order by time
Results:
| id | name | time |
|----|------|----------------------|
| 1 | Lei | 2018-02-21T12:00:10Z |
| 5 | Anna | 2018-02-21T12:03:11Z |
| 7 | Bell | 2018-02-21T12:05:01Z |
OR if you want to group by interval 3 minute you can do it like this
select id, name, min(time) as time
from Table1
group by name, UNIX_TIMESTAMP(time) DIV 180
order by time
;
With your sample data, you don't need to consider the timestamp at all:
select (#rn := #rn + 1) as id, name, min(time) as time
from t cross join
(select #rn := 0) params
group by id, name;
Grouping things by three minute intervals, from the first record in the interval is much harder. This requires either variables or recursive CTEs.
Looks like you need something like this:
select *
from mytable t
where not exists (
select *
from mytable t1
where t1.name = t.name
and t1.id <> t.id
and t1.time >= t.time - interval 3 minute
and t1.time < t.time
);
Demo: http://sqlfiddle.com/#!9/03cf16/1
It will select rows only if no row with the same name exists within a three munutes interval.
Related
I would like to display a players current score as well as how many points they have gained within a selected time frame.
I have 2 tables
skills table
+----+---------+---------------------+
| id | name | created_at |
+----+---------+---------------------+
| 1 | skill 1 | 2020-06-05 00:00:00 |
| 2 | skill 2 | 2020-06-05 00:00:00 |
| 3 | skill 3 | 2020-06-05 00:00:00 |
+----+---------+---------------------+
scores table
+----+-----------+----------+-------+---------------------+
| id | player_id | skill_id | score | created_at |
+----+-----------+----------+-------+---------------------+
| 1 | 1 | 1 | 5 | 2020-06-06 00:00:00 |
| 2 | 1 | 1 | 10 | 2020-07-06 00:00:00 |
| 3 | 1 | 2 | 1 | 2020-07-06 00:00:00 |
| 4 | 2 | 1 | 11 | 2020-07-06 00:00:00 |
| 5 | 1 | 1 | 13 | 2020-07-07 00:00:00 |
| 6 | 1 | 2 | 10 | 2020-07-07 00:00:00 |
| 7 | 2 | 1 | 12 | 2020-07-07 00:00:00 |
| 8 | 1 | 1 | 20 | 2020-07-08 00:00:00 |
| 9 | 1 | 2 | 15 | 2020-07-08 00:00:00 |
| 10 | 2 | 1 | 17 | 2020-07-08 00:00:00 |
+----+-----------+----------+-------+---------------------+
my expected results are:-
24 hour query
+-----------+---------+-------+------+
| player_id | name | score | gain |
+-----------+---------+-------+------+
| 1 | skill 1 | 20 | 7 |
| 1 | skill 2 | 15 | 5 |
+-----------+---------+-------+------+
7 day query
+-----------+---------+-------+------+
| player_id | name | score | gain |
+-----------+---------+-------+------+
| 1 | skill 1 | 20 | 10 |
| 1 | skill 2 | 15 | 14 |
+-----------+---------+-------+------+
31 day query
+-----------+---------+-------+------+
| player_id | name | score | gain |
+-----------+---------+-------+------+
| 1 | skill 1 | 20 | 15 |
| 1 | skill 2 | 15 | 14 |
+-----------+---------+-------+------+
so far I have the following, but all this does is return the last 2 records for each skill, I am struggling to calculate the gains and the different time frames
SELECT player_id, skill_id, name, score
FROM (SELECT player_id, skill_id, name, score,
#skill_count := IF(#current_skill = skill_id, #skill_count + 1, 1) AS skill_count,
#current_skill := skill_id
FROM skill_scores
INNER JOIN skills
ON skill_id = skills.id
WHERE player_id = 1
ORDER BY skill_id, score DESC
) counted
WHERE skill_count <= 2
I would like some help figuring out the query I need to build to get the desired results, or is it best to do this with php instead of in the db?
EDIT:-
MYSQL 8.0.20 dummy data id's are primary_key auto increment but I didnt ad that for simplicity:-
CREATE TABLE skills
(
id bigint,
name VARCHAR(80)
);
CREATE TABLE skill_scores
(
id bigint,
player_id bigint,
skill_id bigint,
score bigint,
created_at timestamp
);
INSERT INTO skills VALUES (1, 'skill 1');
INSERT INTO skills VALUES (2, 'skill 2');
INSERT INTO skills VALUES (3, 'skill 3');
INSERT INTO skill_scores VALUES (1, 1, 1 , 5, '2020-06-06 00:00:00');
INSERT INTO skill_scores VALUES (2, 1, 1 , 10, '2020-07-06 00:00:00');
INSERT INTO skill_scores VALUES (3, 1, 2 , 1, '2020-07-06 00:00:00');
INSERT INTO skill_scores VALUES (4, 2, 1 , 11, '2020-07-06 00:00:00');
INSERT INTO skill_scores VALUES (5, 1, 1 , 13, '2020-07-07 00:00:00');
INSERT INTO skill_scores VALUES (6, 1, 2 , 10, '2020-07-07 00:00:00');
INSERT INTO skill_scores VALUES (7, 2, 1 , 12, '2020-07-07 00:00:00');
INSERT INTO skill_scores VALUES (8, 1, 1 , 20, '2020-07-08 00:00:00');
INSERT INTO skill_scores VALUES (9, 1, 2 , 15, '2020-07-08 00:00:00');
INSERT INTO skill_scores VALUES (10, 2, 1 , 17, '2020-07-08 00:00:00');
WITH cte AS (
SELECT id, player_id, skill_id,
FIRST_VALUE(score) OVER (PARTITION BY player_id, skill_id ORDER BY created_at DESC) score,
FIRST_VALUE(score) OVER (PARTITION BY player_id, skill_id ORDER BY created_at DESC) - FIRST_VALUE(score) OVER (PARTITION BY player_id, skill_id ORDER BY created_at ASC) gain,
ROW_NUMBER() OVER (PARTITION BY player_id, skill_id ORDER BY created_at DESC) rn
FROM skill_scores
WHERE created_at BETWEEN #current_date - INTERVAL #interval DAY AND #current_date
)
SELECT cte.player_id, skills.name, cte.score, cte.gain
FROM cte
JOIN skills ON skills.id = cte.skill_id
WHERE rn = 1
ORDER BY player_id, name;
fiddle
Ps. I don't understand where gain=15 is taken for 31-day period - the difference between '2020-07-08 00:00:00' and '2020-06-06 00:00:00' is 32 days.
Well i think you need a (temporary) table for this. I will call it "player_skill_gains". Its basically the players skills ordered by created_at and with an auto_incremented id:
CREATE TABLE player_skill_gains
(`id` int PRIMARY KEY AUTO_INCREMENT NOT NULL
, `player_id` int
, skill_id int
, score int
, created_at date)
;
INSERT INTO player_skill_gains(player_id, skill_id, score, created_at)
SELECT player_skills.player_id AS player_id
, player_skills.skill_id
, SUM(player_skills.score) AS score
, player_skills.created_at
FROM player_skills
GROUP BY player_skills.id, player_skills.skill_id, player_skills.created_at
ORDER BY player_skills.player_id, player_skills.skill_id, player_skills.created_at ASC;
Using this table we can relatively easily select the last skill for each row (id-1). Using this we can calculate the gains:
SELECT player_skill_gains.player_id, skills.name, player_skill_gains.score
, player_skill_gains.score - IFNULL(bef.score,0) AS gain
, player_skill_gains.created_at
FROM player_skill_gains
INNER JOIN skills ON player_skill_gains.skill_id = skills.id
LEFT JOIN player_skill_gains AS bef ON (player_skill_gains.id - 1) = bef.id
AND player_skill_gains.player_id = bef.player_id
AND player_skill_gains.skill_id = bef.skill_id
For the different queries you want to have (24 hours, 7 days, etc.) you just have to specify the needed where-part for the query.
You can see all this in action here: http://sqlfiddle.com/#!9/1571a8/11/0
This is part of my table on MySQL database
+----------+---------------------+--------+
| sID | sDatetime | sETX |
+----------+---------------------+--------+
| 16213404 | 2020-04-24 16:00:00 | 497681 |
| 16213398 | 2020-04-20 14:58:56 | 281011 |
+----------+---------------------+--------+
This table count with 14.121.398 records
I realized that in this case more than one hour has passed between the previous and the next row
mysql> SELECT
TIMEDIFF(
'2020-04-20 16:00:00',
'2020-04-20 14:58:56'
);
+------------------------------------------------------------------+
| TIMEDIFF(
'2020-04-20 16:00:00',
'2020-04-20 14:58:56'
) |
+------------------------------------------------------------------+
| 01:01:04 |
+------------------------------------------------------------------+
1 row in set
this is not possible because the data is downloaded maximum from the source every five minutes
in this case is missing the time slot between 3pm and 4pm
I have tried this query without success because the return is all zero
I think because the sID is not consecutive
The code I've tried below
SELECT A.`sID`, A.`sDatetime`, (B.`sDatetime` - A.`sDatetime`) AS timedifference
FROM tbl_2020 A INNER JOIN tbl_2020 B ON B.sID = (A.sID + 1)
ORDER BY A.sID ASC;
how can i find this anomaly in mysql table?
my version of MySQL is 5.5.62-log
the name of column is sDatetime the type is Datetime.
any suggestion, please?
thanks in advance for any help
edit #01
+----------+-----------+---------------------+
| sID | time_diff | sDatetime |
+----------+-----------+---------------------+
| 18389322 | 301 | 2020-05-16 23:53:29 |
| 18390472 | 308 | 2020-05-16 23:48:21 |
| 18389544 | 301 | 2020-05-16 23:43:20 |
| 18388687 | 303 | 2020-05-16 23:38:17 |
| 18388398 | 301 | 2020-05-16 23:33:16 |
| 18390451 | 308 | 2020-05-16 23:28:08 |
| 18388915 | 302 | 2020-05-16 23:23:06 |
| 18388208 | 301 | 2020-05-16 23:18:05 |
| 18390516 | 301 | 2020-05-16 23:13:04 |
| 18389904 | 301 | 2020-05-16 23:08:03 |
+----------+-----------+---------------------+
mysql> SELECT
TIMEDIFF(
'2020-05-16 23:53:29',
'2020-05-16 23:48:21'
) AS td;
+----------+
| td |
+----------+
| 00:05:08 |
+----------+
1 row in set
You should try something like this
SELECT
sID
,TIME_TO_SEC(TIMEDIFF(#date,sDatetime)) time_diff
,#date := sDatetime
,sETX
FROM(
SELECT * FROM table1
ORDER BY sDatetime DESC) s1,(SELECT #date :=(SELECT MAX(sDatetime) FROM table1)) s2
HAVING time_diff > 300
First you order the table by time, then you get the time difference between two consecutive rows and check if they are bigger than 5 minutes
see example here https://www.db-fiddle.com/f/2yKt6d5RWngXVYJKPGZL6m/8
Comparing current row to previous works
drop table if exists t;
create table t
(sID int, sDatetime datetime, sETX int);
insert into t values
( 16213404 , '2020-04-24 16:00:00' , 497681),
( 16213398 , '2020-04-20 14:58:56' , 281011);
select sid,sdatetime,(select sdatetime from t t1 where t1.sid < t.sid order by t1.sid desc limit 1) prevdt,
time_to_sec(sdatetime) - time_to_sec((select sdatetime from t t1 where t1.sid < t.sid order by t1.sid desc limit 1)) diff
from t
where time_to_sec(sdatetime) - time_to_sec((select sdatetime from t t1 where t1.sid < t.sid order by t1.sid desc limit 1)) > 300;
+----------+---------------------+---------------------+------+
| sid | sdatetime | prevdt | diff |
+----------+---------------------+---------------------+------+
| 16213404 | 2020-04-24 16:00:00 | 2020-04-20 14:58:56 | 3664 |
+----------+---------------------+---------------------+------+
1 row in set (0.002 sec)
If this is too slow add your table definition so that we can see the indexes you have.
For my problem the general structure of the tables is:
the Workers are located in different Branches (Branch table).
Prospective customer register (Registration table) as a Customer (Customer table)
and can order the products to buy (Order table).
Branch Table:
+------------+--------------+-----------------+
| 'branchId' | 'street' | 'city' |
+------------+--------------+-----------------+
| 'B002' | 'Clover Dr' | 'London' |
| 'B003' | 'Main St' | 'Glagsow' |
| 'B004' | 'Manse Rd' | 'Bristol' |
| 'B005' | 'Deer Rd' | 'London' |
| 'B007' | 'Argyll St' | 'Los Angeles' |
| 'B008' | 'Mission St' | 'San Francisco' |
| 'B009' | 'SOMA' | 'San Francisco' |
+------------+--------------+-----------------+
Customer Table:
+--------------+----------+-----------+-----------------+
| 'customerId' | 'fName' | 'lName' | 'telNo' |
+--------------+----------+-----------+-----------------+
| 'CR56' | 'Aline' | 'Stewart' | '0141-848-1825' |
| 'CR58' | 'Jacky' | 'Ho' | '0123-1325434' |
| 'CR62' | 'Mary' | 'Tregar' | '01224-196720' |
| 'CR74' | 'Mike' | 'Ritchie' | '01475-392178' |
| 'CR76' | 'John' | 'Kay' | '0207-774-5632' |
+--------------+----------+-----------+-----------------+
Registration Table:
+--------------+------------+------------+-----------------------+
| 'customerId' | 'branchId' | 'workerId' | 'joiningDate' |
+--------------+------------+------------+-----------------------+
| 'CR56' | 'B003' | 'SG37' | '2004-05-02 12:00:00' |
| 'CR58' | 'B003' | 'SA9' | '2004-05-03 12:00:00' |
| 'CR62' | 'B007' | 'SA9' | '2004-05-01 12:00:00' |
| 'CR74' | 'B004' | 'SG37' | '2004-04-04 12:00:00' |
| 'CR76' | 'B005' | 'SL41' | '2004-03-03 12:00:00' |
+--------------+------------+------------+-----------------------+
Order Table:
+--------------+---------------+-----------------------+
| 'customerId' | 'productId' | 'orderDate' |
+--------------+---------------+-----------------------+
| 'CR56' | 'PA14' | '2004-05-04 11:30:00' |
| 'CR62' | 'PA14' | '2004-05-04 14:00:00' |
| 'CR56' | 'PG36' | '2004-06-07 11:00:00' |
| 'CR56' | 'PG4' | '2004-04-14 12:05:00' |
| 'CR76' | 'PG4' | '2004-04-04 10:15:00' |
+--------------+---------------+-----------------------+
I am trying to form a query to find the number of orders per Branch within 1, 2, and 3 months of client Registration.
Let's say for example
+----------+------------+-----------------+
| 'months' | 'branchId' | 'numberOfOrder' |
+----------+------------+-----------------+
| 1 | 'B003' | 2 |
| 2 | 'B004' | 1 |
+----------+------------+-----------------+
I tried to group the table by month and date but I am stuck and not able to proceed forward.
Does anyone has any ideas and help me unblock?
I started doing something like this, but I am completely lost at the moment.
SELECT
COUNT(DISTINCT o.orderDate) AS 'count'
FROM
Order o, Registration r
WHERE
o.orderDate BETWEEN DATE('2001-01-01') AND DATE('2005-01-31')
GROUP BY YEAR(o.orderDate), MONTH(o.orderDate);
But this seems I am pretty far from what I am trying to achieve.
I'm not entirely sure of what your desired result is, but with this query you can get the count of orders, per branch, within 3 months after registration.
SELECT
reg.branchId,
COUNT(reg.branchId) AS 'orderCount'
FROM `order` AS ord INNER JOIN `registration` AS reg
ON ord.customerId = reg.customerId
WHERE reg.joiningDate BETWEEN reg.joiningDate AND DATE_ADD(reg.joiningDate, INTERVAL 3 MONTH)
GROUP BY reg.branchId
Result
Is this what you wanted to do?
Your desired result has nothing in common with your data.
So i assume you want the Order count for every branch.
I added the year also, because it is usually needed and doesn't bother if your data don't go over one year
Update:
Now ot only select orders which was place in the 3 month since the a customer joined
.It is limited by the date_add in the where clause
CREATE TABLE registration
(`customerId` varchar(4), `branchId` varchar(4), `workerId` varchar(4), `joiningDate` datetime)
;
INSERT INTO registration
(`customerId`, `branchId`, `workerId`, `joiningDate`)
VALUES
('CR56', 'B003', 'SG37', '2004-05-02 12:00:00'),
('CR58', 'B003', 'SA9', '2004-05-03 12:00:00'),
('CR62', 'B007', 'SA9', '2004-05-01 12:00:00'),
('CR74', 'B004', 'SG37', '2004-04-04 12:00:00'),
('CR76', 'B005', 'SL41', '2004-03-03 12:00:00')
;
✓
✓
CREATE TABLE `order`
(`customerId` varchar(4), `productId` varchar(4), `orderDate` datetime)
;
INSERT INTO `order`
(`customerId`, `productId`, `orderDate`)
VALUES
('CR56', 'PA14', '2004-05-04 11:30:00'),
('CR62', 'PA14', '2004-05-04 14:00:00'),
('CR56', 'PG36', '2004-06-07 11:00:00'),
('CR56', 'PG4', '2004-04-14 12:05:00'),
('CR76', 'PG4', '2004-04-04 10:15:00')
;
✓
✓
SELECT MONTH(o.`orderDate`),r.branchId, COUNT(*) numberOfOrder
FROM registration r inner join `order` o ON r.`customerId` = o.`customerId`
WHERE o.`orderDate` BETWEEN r.`joiningDate` AND DATE_ADD(r.`joiningDate`, INTERVAL 3 MONTH)
GROUP BY YEAR(o.`orderDate`),MONTH(o.`orderDate`),r.branchId
MONTH(o.`orderDate`) | branchId | numberOfOrder
-------------------: | :------- | ------------:
4 | B005 | 1
5 | B003 | 1
5 | B007 | 1
6 | B003 | 1
db<>fiddle here
You may try below query i guess, having moths calculated on the basis of differences in orderDate and joiningDate -
SELECT abs(ceil(datediff(o.`orderDate`, r.`joiningDate`)/30)) months_join,r.branchId, COUNT(*) numberOfOrder
FROM registration r inner join `order` o ON r.`customerId` = o.`customerId`
GROUP BY YEAR(o.`orderDate`), abs(ceil(datediff(o.`orderDate`, r.`joiningDate`)/30)),r.branchId
Abstract question
I have a sql-table that contains records in the following form:
(list_id, value) where the list_id is an Integer identifiing a specific list and the value is something that has an order.
I now struggle to write a sql query that returns all records of that table at first ordered by the rank the list has compared to the other lists and then ordered by the value.
The abstract problem is, that I want to sort a list of lists using sql.
Algorithm to compare two lists
The algorithm to compare two lists is the following:
data CompareRes = FirstSmaller | FirstGreater | Equal deriving Show
compareLists :: Ord a => [a] -> [a] -> CompareRes
compareLists [] [] = Equal
-- Longer lists are considered to be smaller
compareLists _ [] = FirstSmaller
compareLists [] _ = FirstGreater
compareLists (x:xs) (y:ys)
| x < y = FirstSmaller
| x > y = FirstGreater
| otherwise = compareLists xs ys
Details
In my specific case the values are all Dates.
So my table looks like this:
CREATE TABLE `list_date` (
`list_id` INT NOT NULL,
`date` DATE NOT NULL,
PRIMARY KEY (`list_id`, `date`)
);
I'm using a mysql:8.0 database so a solution using WINDOW-functions is acceptable.
Example
Data
INSERT INTO `list_date` VALUES
(1, '2019-11-02'), (1, '2019-11-03'), (1, '2019-11-04'), (1, '2019-11-05'), (1, '2019-11-07'), (1, '2019-11-08'), (1, '2019-11-09'),
(2, '2019-11-01'), (2, '2019-11-03'), (2, '2019-11-04'),
(3, '2019-11-01'), (3, '2019-11-02'), (3, '2019-11-03'),
(4, '2019-11-02'), (4, '2019-11-04'), (4, '2019-11-13'), (4, '2019-11-14'),
(5, '2019-11-03'), (5, '2019-11-04'), (5, '2019-11-05'), (5, '2019-11-10'),
(6, '2019-11-01'), (6, '2019-11-02'), (6, '2019-11-03'), (6, '2019-11-05');
Query
Where I really struggle is to create an expression that calculates the list_rank:
SELECT
`list_id`,
`date`,
<PLEASE HELP> as `list_rank`
FROM
`list_date`
ORDER BY
`list_rank`, `date`;
Expected result
| list_id | date | list_rank |
|---------|------------|-----------|
| 6 | 2019-11-01 | 1 |
| 6 | 2019-11-02 | 1 |
| 6 | 2019-11-03 | 1 |
| 6 | 2019-11-05 | 1 |
| 3 | 2019-11-01 | 2 |
| 3 | 2019-11-02 | 2 |
| 3 | 2019-11-03 | 2 |
| 2 | 2019-11-01 | 3 |
| 2 | 2019-11-03 | 3 |
| 2 | 2019-11-04 | 3 |
| 1 | 2019-11-02 | 4 |
| 1 | 2019-11-03 | 4 |
| 1 | 2019-11-04 | 4 |
| 1 | 2019-11-05 | 4 |
| 1 | 2019-11-07 | 4 |
| 1 | 2019-11-08 | 4 |
| 1 | 2019-11-09 | 4 |
| 4 | 2019-11-02 | 5 |
| 4 | 2019-11-04 | 5 |
| 4 | 2019-11-13 | 5 |
| 4 | 2019-11-14 | 5 |
| 5 | 2019-11-03 | 6 |
| 5 | 2019-11-04 | 6 |
| 5 | 2019-11-05 | 6 |
| 5 | 2019-11-10 | 6 |
or
That image is the current live result my application produces. Currently the sorting is implemented using Java.
Edit
After not receiving a better answer, I implemented a solution as suggested by #gordon-linoff:
SELECT
`list_id`,
`date`
FROM
`list_date`
INNER JOIN (
SELECT `sub`.`list_id`,
GROUP_CONCAT(`sub`.`date` ORDER BY `sub`.`date` SEPARATOR '') as `concat_dates`
FROM `list_date` as `sub`
GROUP BY `sub`.`list_id`
) `all_dates` ON (`all_dates`.`list_id` = `list_date`.`list_id`)
ORDER BY
`all_dates`.`concat_dates`, `date`;
I've also created an SQL Fiddle - So you can play around with your solution.
But this solution does not sort the lists as expected because longer lists are considered bigger than smaller lists.
So I am still hoping to receive a solution that solves 100% of my requirements :)
If I understand correctly, you can sort the lists by the dates concatenated together:
select ld.*
from list_date ld join
(select list_id, group_concat(date) as dates
from ld
group by list_id
) ldc
on ld.list_id = ldc.list_id
order by ldc.dates, ld.date;
Since it's for MySql 8 the window functions can be used for this (yay).
Here's a query that first calculates some metrics, to use in the calculation of the ranking:
SELECT
list_id,
`date`,
DENSE_RANK() OVER (ORDER BY ListMinDate ASC, ListCount DESC, ListMaxDate, list_id) AS list_rank
FROM
(
SELECT
list_id,
`date`,
COUNT(*) OVER (PARTITION BY list_id) AS ListCount,
MIN(`date`) OVER (PARTITION BY list_id) AS ListMinDate,
MAX(`date`) OVER (PARTITION BY list_id) AS ListMaxDate
FROM list_date
) q
ORDER BY list_rank, `date`
A test on db<>fiddle here
My goal is to return a start and end date having same value in a column. Here is my table. The (*) have been marked to give you the idea of how I want to get "EndDate" for every similar sequence value of A & B columns
ID | DayDate | A | B
-----------------------------------------------
1 | 2010/07/1 | 200 | 300
2 | 2010/07/2 | 200 | 300 *
3 | 2010/07/3 | 150 | 250
4 | 2010/07/4 | 150 | 250 *
8 | 2010/07/5 | 150 | 350 *
9 | 2010/07/6 | 200 | 300
10 | 2010/07/7 | 200 | 300 *
11 | 2010/07/8 | 100 | 200
12 | 2010/07/9 | 100 | 200 *
and I want to get the following result table from the above table
| DayDate |EndDate | A | B
-----------------------------------------------
| 2010/07/1 |2010/07/2 | 200 | 300
| 2010/07/3 |2010/07/4 | 150 | 250
| 2010/07/5 |2010/07/5 | 150 | 350
| 2010/07/6 |2010/07/7 | 200 | 300
| 2010/07/8 |2010/07/9 | 100 | 200
UPDATE:
Thanks Mike, The approach of yours seems to work in your perspective of considering the following row as a mistake.
8 | 2010/07/5 | 150 | 350 *
However it is not a mistake. The challenge I am faced with this type of data is like a scenario of logging a market price change with date. The real problem in mycase is to select all rows with the beginning and ending date if both A & B matches in all these rows. Also to select the rows which are next to previously selected, and so on like that no data is left out in the table.
I can explain a real world scenario. A Hotel with Room A and B has room rates for each day entered in to table as explained in my question. Now the hotel needs to get a report to show the price calendar in a shorter way using start and end date, instead of listing all the dates entered. For example, on 2010/07/01 to 2010/07/02 the price of A is 200 and B is 300. This price is changed from 3rd to 4th and on 5th there is a different price only for that day where the Room B is price is changed to 350. So this is considered as a single day difference, thats why start and end dates are same.
I hope this explained the scenario of the problem. Also note that this hotel may be closed for a specific time period, lets say this is an additional problem to my first question. The problem is what if the rate is not entered on specific dates, for example on Sundays the hotel do not sell these two rooms so they entered no price, meaning the row will not exist in the table.
Creating related tables allows you much greater freedom to query and extract relevant information. Here's a few links that you might find useful:
You could start with these tutorials:
http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html
http://net.tutsplus.com/tutorials/databases/sql-for-beginners/
There are also a couple of questions here on stackoverflow that might be useful:
Normalization in plain English
What exactly does database normalization do?
Anyway, on to a possible solution. The following examples use your hotel rooms analogy.
First, create a table to hold information about the hotel rooms. This table just contains the room ID and its name, but you could store other information in here, such as the room type (single, double, twin), its view (ocean front, ocean view, city view, pool view), and so on:
CREATE TABLE `room` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`name` VARCHAR(45) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `name_UNIQUE` (`name` ASC) )
ENGINE = InnoDB;
Now create a table to hold the changing room rates. This table links to the room table through the room_id column. The foreign key constraint prevents records being inserted into the rate table which refer to rooms that do not exist:
CREATE TABLE `rate` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`room_id` INT UNSIGNED NOT NULL,
`date` DATE NOT NULL,
`rate` DECIMAL(6,2) UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
INDEX `fk_room_rate` (`room_id` ASC),
CONSTRAINT `fk_room_rate`
FOREIGN KEY (`room_id` )
REFERENCES `room` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;
Create two rooms, and add some daily rate information about each room:
INSERT INTO `room` (`id`, `name`) VALUES (1, 'A'), (2, 'B');
INSERT INTO `rate` (`id`, `room_id`, `date`, `rate`) VALUES
( 1, 1, '2010-07-01', 200),
( 2, 1, '2010-07-02', 200),
( 3, 1, '2010-07-03', 150),
( 4, 1, '2010-07-04', 150),
( 5, 1, '2010-07-05', 150),
( 6, 1, '2010-07-06', 200),
( 7, 1, '2010-07-07', 200),
( 8, 1, '2010-07-08', 100),
( 9, 1, '2010-07-09', 100),
(10, 2, '2010-07-01', 300),
(11, 2, '2010-07-02', 300),
(12, 2, '2010-07-03', 250),
(13, 2, '2010-07-04', 250),
(14, 2, '2010-07-05', 350),
(15, 2, '2010-07-06', 300),
(16, 2, '2010-07-07', 300),
(17, 2, '2010-07-08', 200),
(18, 2, '2010-07-09', 200);
With that information stored, a simple SELECT query with a JOIN will show you the all the daily room rates:
SELECT
room.name,
rate.date,
rate.rate
FROM room
JOIN rate
ON rate.room_id = room.id;
+------+------------+--------+
| A | 2010-07-01 | 200.00 |
| A | 2010-07-02 | 200.00 |
| A | 2010-07-03 | 150.00 |
| A | 2010-07-04 | 150.00 |
| A | 2010-07-05 | 150.00 |
| A | 2010-07-06 | 200.00 |
| A | 2010-07-07 | 200.00 |
| A | 2010-07-08 | 100.00 |
| A | 2010-07-09 | 100.00 |
| B | 2010-07-01 | 300.00 |
| B | 2010-07-02 | 300.00 |
| B | 2010-07-03 | 250.00 |
| B | 2010-07-04 | 250.00 |
| B | 2010-07-05 | 350.00 |
| B | 2010-07-06 | 300.00 |
| B | 2010-07-07 | 300.00 |
| B | 2010-07-08 | 200.00 |
| B | 2010-07-09 | 200.00 |
+------+------------+--------+
To find the start and end dates for each room rate, you need a more complex query:
SELECT
id,
room_id,
MIN(date) AS start_date,
MAX(date) AS end_date,
COUNT(*) AS days,
rate
FROM (
SELECT
id,
room_id,
date,
rate,
(
SELECT COUNT(*)
FROM rate AS b
WHERE b.rate <> a.rate
AND b.date <= a.date
AND b.room_id = a.room_id
) AS grouping
FROM rate AS a
ORDER BY a.room_id, a.date
) c
GROUP BY rate, grouping
ORDER BY room_id, MIN(date);
+----+---------+------------+------------+------+--------+
| id | room_id | start_date | end_date | days | rate |
+----+---------+------------+------------+------+--------+
| 1 | 1 | 2010-07-01 | 2010-07-02 | 2 | 200.00 |
| 3 | 1 | 2010-07-03 | 2010-07-05 | 3 | 150.00 |
| 6 | 1 | 2010-07-06 | 2010-07-07 | 2 | 200.00 |
| 8 | 1 | 2010-07-08 | 2010-07-09 | 2 | 100.00 |
| 10 | 2 | 2010-07-01 | 2010-07-02 | 2 | 300.00 |
| 12 | 2 | 2010-07-03 | 2010-07-04 | 2 | 250.00 |
| 14 | 2 | 2010-07-05 | 2010-07-05 | 1 | 350.00 |
| 15 | 2 | 2010-07-06 | 2010-07-07 | 2 | 300.00 |
| 17 | 2 | 2010-07-08 | 2010-07-09 | 2 | 200.00 |
+----+---------+------------+------------+------+--------+
You can find a good explanation of the technique used in the above query here:
http://www.sqlteam.com/article/detecting-runs-or-streaks-in-your-data
My general approach is to join the table onto itself based on DayDate = DayDate+1 and the A or B values not being equal
This will find the end dates for each period (where the value is going to be different on the following day)
The only problem is, that won't find an end date for the final period. To get around this, I selct the max date from the table and union that into my list of end dates
Once you have the list of end dates defined, you can join them to the original table based on the end date being greater than or equal to the original date
From this final list, select the minimum daydate grouped by the other fields
select
min(DayDate) as DayDate,EndDate,A,B from
(SELECT DayDate, A, B, min(ends.EndDate) as EndDate
FROM yourtable
LEFT JOIN
(SELECT max(DayDate) as EndDate FROM yourtable UNION
SELECT t1.DayDate as EndDate
FROM yourtable t1
JOIN yourtable t2
ON date_add(t1.DayDate, INTERVAL 1 DAY) = t2.DayDate
AND (t1.A<>t2.A OR t1.B<>t2.B)) ends
ON ends.EndDate>=DayDate
GROUP BY DayDate, A, B) x
GROUP BY EndDate,A,B
I think I have found a solution which does produce the table desired.
SELECT
a.DayDate AS StartDate,
( SELECT b.DayDate
FROM Dates AS b
WHERE b.DayDate > a.DayDate AND (b.B = a.B OR b.B IS NULL)
ORDER BY b.DayDate ASC LIMIT 1
) AS StopDate,
a.A as A,
a.B AS B
FROM Dates AS a
WHERE Coalesce(
(SELECT c.B
FROM Dates AS c
WHERE c.DayDate <= a.DayDate
ORDER BY c.DayDate DESC LIMIT 1,1
), -99999
) <> a.B
AND a.B IS NOT NULL
ORDER BY a.DayDate ASC;
is able to generate the following table result
StartDate StopDate A B
2010-07-01 2010-07-02 200 300
2010-07-03 2010-07-04 150 250
2010-07-05 NULL 150 350
2010-07-06 2010-07-07 200 300
2010-07-08 2010-07-09 100 200
But I need a way to replace the NULL with the same date of the start date.