Recursive update of start-end date to avoid overlaps - mysql

I am dealing with a huge database in MySQL about Italian working contracts (number of rows about 20 million). Each row of my core table represents a specific signed contract for a worker with a specific employer. In order to reconstruct the work history of each worker, when I indexed the table in the import process, I have ordered workers by their identification code and the starting date of each contract. Then, each row has its own progressive ID but at the same time, I have added two fields to each row one referring to the previous ID, the other to the following one. These two fields are effectively not null only if the previous or the subsequent ID refers to the same worker.
I have made a small example of how my data looks like here (alternatively, in the following script I have created a small reproducible example).
How the history of a worker may look like
How it should change at the end
My current task is to calculate the effective number of days worked by each individual on my table. Nonetheless, data are undoubtedly characterized by huge overlapping. After all, each individual may have several overlapping contracts. For example, a contract started on date 01/01/2010 and ended on date 01/01/2012 may be followed by several other shorter contracts started later on by ending before the date 01/01/2012. Therefore, if I count the number of days effectively worked by this individual, I may have a double counting. For this reason, I want to rearrange contracts by changing their end date in order to obtain subsequent nonoverlapping contracts. The only possible overlap could be of one day.
I have made a graphical example of how the working history of an individual may look like and how I want to re-arrange it in the following two images.
Since I cannot modify the starting date of each contract/row, I wanted to work on the ending date of each contract by modifying it according to the previous contract.
I worked by following these steps:
If the ending date of the previous contract is greater than the end of the current contract (of each row), I modified the ending date placing it equal to the end date of the previous one.
Since I do not know how many contracts are actually overlapping (each contract if mliked to the previous one and the following one but there may be an overlapping contract further in the past), I decided to iterate this process by the maximum number of contract that an individual may have in my table. With this procedure, I substantially extend the overlapping time up to the case where this overlapping ceases to occur. For example, the end date of contract n.3 of the example would extend to contract n.4, n.5, and n.6. At the end of this iterative procedure, they will all have the same ending date equal today 12.
Once finished this procedure I modified the end date of each contract by placing it equal to the starting date of the following one if there is overlapping.
Here below you can find the code I used for this procedure.
-- My example table (data_example.csv on GitHub)
drop table if exists mytable;
create table mytable
(
id INT,
WORKER_ID INT not null,
EMPLOYER_ID INT not null,
dt_start date not null, -- Contract start date
dt_end date, -- Contract end date
id_prev INT, -- ID of previous contract
dt_start_prev date, -- Start date of previous contract
dt_end_prev date, -- End date of previous contract
id_next INT, -- ID of next contract
dt_start_next date, -- Start date of next contract
dt_end_next date, -- End date of next contract
primary key(id)
);
insert into mytable
(id, WORKER_ID, EMPLOYER_ID, dt_start, dt_end,
id_prev, dt_start_prev, dt_end_prev,
id_next, dt_start_next, dt_end_next)
values
(1, 5157, 3384722, '2012-01-01', '2012-01-03', NULL, NULL, NULL, 2, '2012-01-02', '2012-01-04'),
(2, 5157, 3384722, '2012-01-02', '2012-01-04', 1, '2012-01-01', '2012-01-03', 3, '2012-01-04', '2012-01-12'),
(3, 5157, 96120, '2012-01-04', '2012-01-12', 2, '2012-01-02', '2012-01-04', 4, '2012-01-07', '2012-01-08'),
(4, 5157, 3384722, '2012-01-07', '2012-01-08', 3, '2012-01-04', '2012-01-12', 5, '2012-01-08', '2012-01-10'),
(5, 5157, 3384722, '2012-01-08', '2012-01-10', 4, '2012-01-07', '2012-01-08', 6, '2012-01-10', '2012-01-11'),
(6, 5157, 3954093, '2012-01-10', '2012-01-11', 5, '2012-01-08', '2012-01-10', 7, '2012-01-12', '2012-01-15'),
(7, 5157, 3384722, '2012-01-12', '2012-01-15', 6, '2012-01-10', '2012-01-11', 8, '2012-01-14', '2012-01-16'),
(8, 5157, 3954093, '2012-01-14', '2012-01-16', 7, '2012-01-12', '2012-01-15', 9, '2012-01-14', '2012-01-14'),
(9, 5157, 3384722, '2012-01-14', '2012-01-14', 8, '2012-01-14', '2012-01-16', 10, '2012-01-14', '2012-01-20'),
(10, 5157, 96120, '2012-01-14', '2012-01-20', 9, '2012-01-14', '2012-01-14', NULL, NULL, NULL),
(11, 5990, 1940957, '2012-01-01', '2012-01-30', NULL, NULL, NULL, 12, '2012-02-01', '2012-02-15'),
(12, 5990, 4822105, '2012-02-01', '2012-02-15', 11, '2012-01-01', '2012-01-30', 13, '2012-02-10', '2012-02-10'),
(13, 5990, 1940957, '2012-02-10', '2012-02-10', 12, '2012-02-01', '2012-02-15', 14, '2012-02-16', '2012-02-20'),
(14, 5990, 1940957, '2012-02-16', '2012-02-20', 13, '2012-02-10', '2012-02-10', 15, '2012-02-17', '2012-02-28'),
(15, 5990, 4822105, '2012-02-17', '2012-02-28', 14, '2012-02-16', '2012-02-20', NULL, NULL, NULL);
-- The following table counts the number of contracts for each individual
-- I will use it the determine the maximum number of contract per worker
drop table if exists max_act;
create table max_act
as select WORKER_ID, count(*) n
from mytable
group by WORKER_ID;
set SQL_SAFE_UPDATES = 0;
-- Here I create the procedure
drop procedure if exists doiterate;
delimiter //
create procedure doiterate()
begin
declare total INT unsigned DEFAULT 0;
-- The number of iterations is equal to the maximum value in the table 'max_act'
while total <= (select MAX(n) from max_act) do
-- If the end date of the previous contract is greater than the end of the current contract
-- the procedure sets the end date equal to the end date of the previous contract
update mytable a
set a.dt_end =
case
when a.dt_end is NOT null and a.dt_end_prev > a.dt_end then a.dt_end_prev
else a.dt_end end
;
-- Here I update in each row the end date of the previous contract
update mytable a
left outer join mytable p on a.id_prev = p.id
set a.dt_end_prev =
case
when a.dt_end_prev is NOT null and a.dt_end_prev != p.dt_end then p.dt_end
else a.dt_end_prev end
;
set total = total + 1;
end while;
end//
delimiter ;
CALL doiterate();
-- Here I set the end date of each contract equal to the beginning of the next one if there is overlapping
update mytable a
set a.dt_end =
case
when a.dt_end is NOT null and a.dt_start_next < a.dt_end then a.dt_start_next
else a.dt_end end
;
set SQL_SAFE_UPDATES = 1;
However, I think this procedure is all but optimal. I have estimated it would take me days until it ends. I would really appreciate it if someone may give me some hints on how to handle this issue. Thank you in advance.

As already stated in one comment, I tried the use of both LAG() and LEAD() functions to concatenate in chronological order all contracts by individual. However, the procedure - maybe my fault - results to be even slower.
Therefore, I simply decided to run the procedure only on those workers only on those workers who actually had at least two overlapping contracts, maybe not the best solution (for sure not in term of coding) but at least I was able to perform the procedure (it took me more or less 1 day and half).
-- Here I am identifying contracts with an overlapping previous contract
alter table mytable add column flag_overlap INT default 0;
update mytable set flag_overlap = 1 where dt_end is NOT null and dt_end_prev > dt_end;
-- Creating a table with only those workers with at least two overlapping contracts
drop table if exists mytable_id;
create table mytable_id as select WORKER_ID
from mytable where flag_overlap = 1
group by WORKER_ID;
-- This is my table of interests with all the contracts for those workers identified in the previous step
drop table if exists mytable_mod;
create table mytable_mod
as select *
from mytable a
inner join mytable_id b on a.WORKER_ID = b.WORKER_ID
order by WORKER_ID , dt_start;
alter table mytable_mod add unique index idx_ord_id(id);
-- The rest of the code is the same as the one posted in this question,
-- simply I referred to the table 'mytable_mod' and no longer to 'mytable'.
-- [...]
-- At the end I updated the 'revised' end date of my original table 'mytable'
UPDATE mytable a
left outer join mytable_mod b on a.ord_all = b.ord_all
set
a.dt_end = b.dt_end ,
a.dt_end_next = b.dt_end_next ,
a.dt_end_prev = b.dt_end_prev
;

Related

How do you aggregate a column on the day before, day of, and day after an event indicated by a flag column?

I have a table which has a date column, some self-reports of happiness in another column, and a flag column which indicates a gym day.
I want to get the average happiness scores on the day before, the day of, and the day after a gym session.
If you imagine this table, the averages should return day_before = 1, day_of = 2, and day_after = 3.
So the set up is like in this fiddle, although in my actual database the gym flag column is joined in from a separate table.
CREATE TABLE test
(`date` datetime, `gym` int, `happiness` int)
;
INSERT INTO test
(`date`, `gym`, `happiness`)
VALUES
('2019-01-06 00:00:00', NULL, 1),
('2019-02-06 00:00:00', 1, 2),
('2019-03-06 00:00:00', NULL, 3),
('2019-04-06 01:00:00', NULL, 1),
('2019-05-06 01:00:00', 1, 2),
('2019-06-06 01:00:00', NULL, 3),
('2019-07-06 01:00:00', NULL, 1),
('2019-08-06 01:00:00', 1, 2),
('2019-09-06 01:00:00', NULL, 3)
;
I tried using a subquery to return when the "gym" column in date - 1 = 1, and also use the results in a case which would have "day of", "day before", and "day after" strings. Then I could simply group by that column. I couldn't get this to work and I'm not even sure if that's something you can do.
Use two self-joins.
SELECT AVG(before.happiness) AS day_before, AVG(current.happiness) AS day_of, AVG(after.happiness) AS day_after
FROM test AS current
JOIN test AS before ON before.date = DATE_SUB(current.date, INTERVAL 1 DAY)
JOIN test AS after ON after.date = DATE_ADD(current.date, INTERVAL 1 DAY)
WHERE current.gym = 1

MySQL date comparison issue on timestamp column

I have a database table with an auto-update column which is required to be in the TIMESTAMP format, this saves dates in the form YYYY-MM-DD HH:mm:ss for each time a row is updated.
On reading statements that date comparisons are (possibly very) processor heavy, the preferred method seems to be to use MySQL BETWEEN statement to check and return updates that have occurred in the last 24 hours.
A reference: https://stackoverflow.com/a/14104364/3536236
My SQL
I have removed some details that take up space that are outside the scope of this question, such as some columns
-- Generation Time: Oct 14, 2015 at 04:54 PM
-- Server version: 5.5.45-cll
-- PHP Version: 5.4.31
--
-- Table structure for table `check_log`
--
CREATE TABLE IF NOT EXISTS `check_log` (
`table_id` int(8) NOT NULL AUTO_INCREMENT,
`last_action` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`ip_addr` varchar(60) NOT NULL,
`submit_fail` varchar(1) NOT NULL,
PRIMARY KEY (`fail_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 AUTO_INCREMENT=14 ;
--
-- Dumping data for table `check_log`
--
INSERT INTO `check_log` (`table_id`, `last_action`, `ip_addr`, `submit_fail`) VALUES
(2, '2015-10-14 14:08:30', '92.99.252.185', 'N'),
(3, '2015-10-14 14:09:23', '92.99.252.185', 'N'),
(4, '2015-10-14 14:09:25', '92.99.252.185', 'N'),
(5, '2015-10-14 14:09:38', '92.99.252.185', 'N'),
(6, '2015-10-14 14:14:22', '92.99.252.185', 'N'),
(7, '2015-10-14 14:17:13', '92.99.252.185', 'N'),
(8, '2015-10-14 14:20:51', '92.99.252.185', 'N'),
(9, '2015-10-14 14:20:52', '92.99.252.185', 'N'),
(10, '2015-10-14 14:50:34', '92.99.252.185', 'N'),
(11, '2015-10-14 15:29:07', '92.99.252.185', 'N'),
(12, '2015-10-14 15:31:04', '92.99.252.185', 'N'),
(13, '2015-10-14 15:32:00', '92.99.252.185', 'N');
My Query
Now, my query wants to return all the rows that fit the criteria that have been updated in the last 24hours. So:
SELECT * FROM `check_log` WHERE `ip_addr` = '92.99.252.185' AND
(`last_action` BETWEEN date_sub(CURDATE() , INTERVAL -1 DAY ) AND CURDATE())
AND `submit_fail` = 'N'
I wrote the query in this shape because I wanted to explore how BETWEEN ... AND ... handled other ANDS in the same query, and hence for my own clarity I encased the BETWEEN statement in brackets ().
I have tried a range of minorly different syntaxes for this query including:
SELECT * FROM `check_login` WHERE `ip_addr` = '92.99.252.185' AND
(DATE_FORMAT(`last_action`, '%Y-%m-%d') BETWEEN date_sub(CURDATE() , INTERVAL -1 DAY ) AND CURDATE())
and pure date check:
SELECT * FROM `check_login` WHERE
`last_action` BETWEEN date_sub(CURDATE() , INTERVAL -1 DAY ) AND CURDATE()
Each time the MySQL returns Zero Rows (not an error) but zero rows found.
I have viewed and compared at least a dozen similar answers on SO about the comparison of dates and am at a bit of a loss how I'm not getting the rows returned that I'm expecting with my query.
(I am ideally wanting to use the BETWEEN form as this table will, when in use be reaching several thousands of rows. )
What can I do to make the comparison work?
How does the BETWEEN clause handle other ANDs, is it suitable to encase in brackets (for clarity)
Is there a more efficient / suitable method to compare timestamp column dates?
It appears that DATE_SUB() is subtraction so I did not need to do -1 on the INTERVAL <value> DAY section of the SQL, however, the INTERVAL does accept negative values but that would overall be a subtraction of a negative and so a +1 Day interval.
I had originally thought for some reason DATE_SUB had stood for substitution as the allowance of negative values in the value part - to me - meant that there was no need for a data addition function as well.
I wasted half a day reading up and trying to work out how this logic worked.

Mysql Select Only Staff with Specified Number of Consecutive Free Time Slots

Each staff already has a table of avail time slots in AvailSlots like this:
Staff_ID Avail_Slots_Datetime
1 2015-1-1 09:00:00
1 2015-1-1 10:00:00
1 2015-1-1 11:00:00
2 2015-1-1 09:00:00
2 2015-1-1 10:00:00
2 2015-1-1 11:00:00
3 2015-1-1 09:00:00
3 2015-1-1 12:00:00
3 2015-1-1 15:00:00
I need to find out which staff has, for example, 2 (or 3, 4, etc) CONSECUTIVE avail time slots at each time slot. As a novice, the INNER JOIN codes below is all I know to write if the query is for 2 consecutive time slots.
SELECT a.start_time, a.person
FROM a_free a, a_free b
WHERE (b.start_time = addtime( a.start_time, '01:00:00' )) and (a.person = b.person)
But, obviously, doing it that way, I would have to add more INNER JOIN codes - for each case - depending on whether the query is for 3, or 4, or 5 , etc consecutive available time slots at a given date/hour. Therefore, I want to learn a more efficient and flexible way to do the same. Specifically, the query code I need (in natural language) would be this:
For each time slot in AvailSlots, list one staff that has X (where X can
be any number I specify per query, from 1 to 24) consecutive datetime
slot starting from that datetime. In case more than one staff can meet
that criteria, the tie break is their "rank" which is kept in a
separate table below:
Ranking Table (lower number = higher rank)
Staff_ID Rank
1 3
2 1
3 2
If the answer is to use things like "mysql variables", "views", etc, please kindly explain how those things work. Again, as a total mysql novice, "select", "join", "where", "group by" are all I know so far. I am eager to learn more but have trouble understanding more advanced mysql concepts so far. Many thanks in advance.
Using a bit more data than you posted, I found a query that might do what you need. It does use the variables as you predicted :) but I hope it's pretty self-explanatory. Let's start with the table:
CREATE TABLE a_free
(`Staff_ID` int, `Avail_Slots_Datetime` datetime)
;
INSERT INTO a_free
(`Staff_ID`, `Avail_Slots_Datetime`)
VALUES
(1, '2015-01-01 09:00:00'),
(1, '2015-01-01 10:00:00'),
(1, '2015-01-01 11:00:00'),
(1, '2015-01-01 13:00:00'),
(2, '2015-01-01 09:00:00'),
(2, '2015-01-01 10:00:00'),
(2, '2015-01-01 11:00:00'),
(3, '2015-01-01 09:00:00'),
(3, '2015-01-01 12:00:00'),
(3, '2015-01-01 15:00:00'),
(3, '2015-01-01 16:00:00'),
(3, '2015-01-01 17:00:00'),
(3, '2015-01-01 18:00:00')
;
Then there's a query to find the consecutive slots. It lists start times of each pair, and marks each group of consecutive slots with a unique number. The case expression is where the magic happens, see the comments:
select
Staff_ID,
Avail_Slots_Datetime as slot_start,
case
when #slot_group is null then #slot_group:=0 -- initalize the variable
when #prev_end <> Avail_Slots_Datetime then #slot_group:=#slot_group+1 -- iterate if previous slot end does not match current one's start
else #slot_group -- otherwise just just keep the value
end as slot_group,
#prev_end:= Avail_Slots_Datetime + interval 1 hour as slot_end -- store the current slot end to compare with next row
from a_free
order by Staff_ID, Avail_Slots_Datetime asc;
Having the list with slot groups identified, we can wrap the query above in another one to get the lengths of each slot group. The results of the first query are treated as any other table:
select
Staff_ID,
slot_group,
min(slot_start) as group_start,
max(slot_end) as group_end,
count(*) as group_length
from (
select
Staff_ID,
Avail_Slots_Datetime as slot_start,
case
when #slot_group is null then #slot_group:=0
when #prev_end <> Avail_Slots_Datetime then #slot_group:=#slot_group+1
else #slot_group
end as slot_group,
#prev_end:= Avail_Slots_Datetime + interval 1 hour as slot_end
from a_free
order by Staff_ID, Avail_Slots_Datetime asc
) groups
group by Staff_ID, slot_group;
Note: if you use the same DB connection to execute the query again, the variables would not be reset, so the slot_groups numbering will continue to grow. This normally should not be a problem, but to be on the safe side, you need to execute something like this before or after:
select #prev_end:=null;
Play with the fiddle if you like: http://sqlfiddle.com/#!2/0446c8/15

Order by day_of_week in MySQL

How can I order the mysql result by varchar column that contains day of week name?
Note that MONDAY should goes first, not SUNDAY.
Either redesign the column as suggested by Williham Totland, or do some string parsing to get a date representation.
If the column only contains the day of week, then you could do this:
ORDER BY FIELD(<fieldname>, 'MONDAY', 'TUESDAY', 'WEDNESDAY', 'THURSDAY', 'FRIDAY', 'SATURDAY', 'SUNDAY');
Why not this?
ORDER BY (
CASE DAYOFWEEK(dateField)
WHEN 1 THEN 7 ELSE DAYOFWEEK(dateField)
END
)
I believe this orders Monday to Sunday...
I'm thinking that short of redesigning the column to use an enum instead, there's not a lot to be done for it, apart from sorting the results after you've gotten them out.
Edit: A dirty hack is of course to add another table with id:weekday pairs and using joins or select in selects to fake an enum.
... ORDER BY date_format(order_date, '%w') = 0, date_format(order_date, '%w') ;
This looks messy but still works and seems more generic:
select day,
case day
when 'monday' then 1
when 'tuesday' then 2
when 'wednesday' then 3
when 'thursday' then 4
when 'friday' then 5
when 'saturday' then 6
when 'sunday' then 7
end as day_nr from test order by day_nr;
Using if is even more generic and messier:
select id, day,
if(day = 'monday',1,
if(day = 'tuesday',2,
if(day = 'wednesday',3,
if(day = 'thursday',4,
if(day = 'friday',5,
if(day = 'saturday',6,7)
)
)
)
)
) as day_nr from test order by day_nr;
You can also hide the details of conversion from name to int in stored procedure.
I realise that this is an old thread, but as it comes to the top of google for certain search times I will use it to share my approach.
I wanted the same result as the original question, but in addition I wanted the ordering of the results starting from the current day of the week and then progressing through the rest of the days.
I created a separate table, in which the days were listed over a fortnight, so that no matter which day you started from you could run through a sequence of 7 days.
CREATE TABLE IF NOT EXISTS `Weekdays` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(50) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=15 ;
INSERT INTO `Weekdays` (`id`, `name`) VALUES
(1, 'Monday'),
(2, 'Tuesday'),
(3, 'Wednesday'),
(4, 'Thursday'),
(5, 'Friday'),
(6, 'Saturday'),
(7, 'Sunday'),
(8, 'Monday'),
(9, 'Tuesday'),
(10, 'Wednesday'),
(11, 'Thursday'),
(12, 'Friday'),
(13, 'Saturday'),
(14, 'Sunday');
I then ran the query with a variable that determined the start point in sequence and used a join to get the order number for the days. For example to start the listing at Wednesday, I do the following:
SELECT #startnum := MIN(id) FROM Weekdays WHERE name='Wednesday';
SELECT * FROM Events INNER JOIN ( SELECT id as weekdaynum, name as dayname FROM Weekdays WHERE id>(#startnum-1) AND id<(#startnum+7) ) AS s2 ON s2.dayname=Events.day ORDER BY weekdaynum;
I hope this helps someone who stumbles onto this post.
Found another way, your can reverse order bye week
ORDER BY date_format(date_name, '%w') DESC;
Another way would be to create another table with those days and an int to order them by, join that table when searching, and order by it. Of course, joining on a varchar is not recommended.
Table DaysOfWeek
id | day
--------------------
1 | Monday
2 | Tuesday
3 | Wednesday
4 | Thursday
5 | Friday
6 | Saturday
SELECT * FROM WhateverTable
LEFT JOIN DaysOFWeek on DaysOFWeek.day = WhateverTable.dayColumn
ORDER BY DaysOfWeek.id
(Apologies if that's not correct; I've been stuck with SQL server recently)
Again, this is NOT recommended, but if you cannot alter the data you've already got... This will also work if there are non-standard values in the dayColumn field.
Found another way that works for me:
SELECT LAST_NAME, HIRE_DATE, TO_CHAR(HIRE_DATE, 'fmDAY') as 'Day' FROM EMPLOYEES
ORDER BY TO_CHAR(HIRE_DATE, 'd');
Hope it helps
In my case, since the days can be registered in several languages, to get the correct order I do like this according to Glen Solsberry:
....
....
ORDER BY
FIELD(<fieldname>, 'MONDAY', 'TUESDAY', 'WEDNESDAY', 'THURSDAY', 'FRIDAY', 'SATURDAY', 'SUNDAY'),
FIELD(<fieldname>, 'LUNDI', 'MARDI', 'MERCREDI', 'JEUDI', 'VENDREDI', 'SAMEDI', 'DIMANCHE'),
FIELD(<fieldname>, 'LUNES', 'MARTES', 'MIERCOLES', 'JUEVES', 'VIERNES', 'SABADO', 'DOMINGO'),
FIELD(<fieldname>, 'MONTAGE', 'DIENSTAG', 'MITTWOCH', 'DENNERSTAG', 'FREITAG', 'SAMSTAG', 'SONNTAG')
;
Do not forget that, <fieldname> is the name of the date column in question in your case.
I saw that ...WHEN 1 THEN 7... was posted but it should be WHEN 1 THEN 8.
So...
ORDER BY (
CASE DATEPART(DW, yourdatefield)
WHEN 1 THEN 8 ELSE DATEPART(DW, yourdatefield)
END
)
Otherwise Sunday may come before Saturday because both Sunday and Saturday would equal 7. By setting Sunday to 8, it ensures it comes after Saturday.
If you try this, it should work:
SELECT ename, TO_CHAR(hiredate, 'fmDay') as "Day"
FROM my_table
ORDER BY MOD(TO_CHAR(hiredate, 'D') + 5, 7)

How to avoid duplicate registrations in MySQL

I wonder if it is possible to restrain users to insert duplicate registration records.
For example some team is registered from 5.1.2009 - 31.12.2009. Then someone registers the same team for 5.2.2009 - 31.12.2009.
Usually the end_date is not an issue, but start_date should not be between existing records start and end date
CREATE TABLE IF NOT EXISTS `ejl_team_registration` (
`id` int(11) NOT NULL auto_increment,
`team_id` int(11) NOT NULL,
`league_id` smallint(6) NOT NULL,
`start_date` date NOT NULL,
`end_date` date NOT NULL,
PRIMARY KEY (`team_id`,`league_id`,`start_date`),
UNIQUE KEY `id` (`id`)
);
I would check it in the code df the program, not the database.
If you want to do this in database, you can probably use pre-insert trigger that will fail if there are any conflicting records.
This is a classic problem of time overlapping. Say you want to register a certain team for the period of A (start_date) until B (end_date).
This should NOT be allowed in next cases:
the same team is already registered, so that the registered period is completely inside the A-B period (start_date >= A and end_date <= B)
the same team is already registered at point A (start_date <= A and end_date >= A)
the same team is already registered at point B (start_date <= B and end_date >= B)
In those cases, registering would cause time overlap. In any other it would not, so you're free to register.
In sql, the check would be:
select count(*) from ejl_team_registration
where (team_id=123 and league_id=45)
and ((start_date>=A and end_date<=B)
or (start_date<=A and end_date>=A)
or (start_date<=B and end_date>=B)
);
... with of course real values for the team_id, league_id, A and B.
If the query returns anything else than 0, the team is already registered and registering again would cause time overlap.
To demonstrate this, let's populate the table:
insert into ejl_team_registration (id, team_id, league_id, start_date, end_date)
values (1, 123, 45, '2007-01-01', '2007-12-31')
, (2, 123, 45, '2008-01-01', '2008-12-31')
, (3, 123, 45, '20010-01-01', '2010-12-31');
Let's check if we could register team 123 in leage 45 between '2009-02-03' and '2009-12-31':
select count(*) from ejl_team_registration
where (team_id=123 and league_id=45)
and ((start_date<='2009-02-03' and end_date>='2009-12-31')
or (start_date<='2009-03-31' and end_date>='2009-03-02')
or (start_date<='2009-12-31' and end_date>='2009-12-31')
);
The result is 0, so we can register freely.
Registering between e.g. '2009-02-03' and '2011-12-31' would not be possible.
I'll leave checking other values for you as a practice.
PS: You mentioned the end date is usually not an issue. As a matter of fact it is, since inserting an entry with invalid end date would cause overlapping as well.
Before doing your INSERT, do a SELECT to check.
SELECT COUNT(*) FROM `ejl_team_registration`
WHERE `team_id` = [[myTeamId]] AND `league_id` = [[myLeagueId]]
AND `start_date` <= NOW()
AND `end_date` >= NOW()
If that returns more than 0, then don't insert.