How to avoid duplicate registrations in MySQL - mysql

I wonder if it is possible to restrain users to insert duplicate registration records.
For example some team is registered from 5.1.2009 - 31.12.2009. Then someone registers the same team for 5.2.2009 - 31.12.2009.
Usually the end_date is not an issue, but start_date should not be between existing records start and end date
CREATE TABLE IF NOT EXISTS `ejl_team_registration` (
`id` int(11) NOT NULL auto_increment,
`team_id` int(11) NOT NULL,
`league_id` smallint(6) NOT NULL,
`start_date` date NOT NULL,
`end_date` date NOT NULL,
PRIMARY KEY (`team_id`,`league_id`,`start_date`),
UNIQUE KEY `id` (`id`)
);

I would check it in the code df the program, not the database.

If you want to do this in database, you can probably use pre-insert trigger that will fail if there are any conflicting records.

This is a classic problem of time overlapping. Say you want to register a certain team for the period of A (start_date) until B (end_date).
This should NOT be allowed in next cases:
the same team is already registered, so that the registered period is completely inside the A-B period (start_date >= A and end_date <= B)
the same team is already registered at point A (start_date <= A and end_date >= A)
the same team is already registered at point B (start_date <= B and end_date >= B)
In those cases, registering would cause time overlap. In any other it would not, so you're free to register.
In sql, the check would be:
select count(*) from ejl_team_registration
where (team_id=123 and league_id=45)
and ((start_date>=A and end_date<=B)
or (start_date<=A and end_date>=A)
or (start_date<=B and end_date>=B)
);
... with of course real values for the team_id, league_id, A and B.
If the query returns anything else than 0, the team is already registered and registering again would cause time overlap.
To demonstrate this, let's populate the table:
insert into ejl_team_registration (id, team_id, league_id, start_date, end_date)
values (1, 123, 45, '2007-01-01', '2007-12-31')
, (2, 123, 45, '2008-01-01', '2008-12-31')
, (3, 123, 45, '20010-01-01', '2010-12-31');
Let's check if we could register team 123 in leage 45 between '2009-02-03' and '2009-12-31':
select count(*) from ejl_team_registration
where (team_id=123 and league_id=45)
and ((start_date<='2009-02-03' and end_date>='2009-12-31')
or (start_date<='2009-03-31' and end_date>='2009-03-02')
or (start_date<='2009-12-31' and end_date>='2009-12-31')
);
The result is 0, so we can register freely.
Registering between e.g. '2009-02-03' and '2011-12-31' would not be possible.
I'll leave checking other values for you as a practice.
PS: You mentioned the end date is usually not an issue. As a matter of fact it is, since inserting an entry with invalid end date would cause overlapping as well.

Before doing your INSERT, do a SELECT to check.
SELECT COUNT(*) FROM `ejl_team_registration`
WHERE `team_id` = [[myTeamId]] AND `league_id` = [[myLeagueId]]
AND `start_date` <= NOW()
AND `end_date` >= NOW()
If that returns more than 0, then don't insert.

Related

MySQL add balance from previous rows

I’ve tried a few things I’ve seen on here but it doesn’t work in my case, the balance on each row seems to duplicate.
Anyway I have a table that holds mortgage transactions, that table has a Column that stores an interest added value or a payment value.
So I might have:
Balance: 100,000
Interest added 100 - balance 100,100
Payment made -500 - balance 99,600
Interest added 100 - balance 99,700
Payment made -500 - balance 99,200
What I’m looking for is a query to pull all of these in date order newest first and summing the balance in a column depending on whether it has interest or payment (the one that doesn’t will be null) so at the end of the rows it will have the current liability
I can’t remember what the query I tried was but it ended up duplicating rows and the balance was weird
Sample structure and data:
CREATE TABLE account(
id int not null primary key auto_increment,
account_name varchar(50),
starting_balance float(10,6)
);
CREATE TABLE account_transaction(
id int not null primary key auto_increment,
account_id int NOT NULL,
date datetime,
interest_amount int DEFAULT null,
payment_amount float(10,6) DEFAULT NULL
);
INSERT INTO account (account_name,starting_balance) VALUES('Test Account','100000');
INSERT INTO account_transaction (account_id,date,interest_amount,payment_amount) VALUES(1,'2020-10-01 00:00:00',300,null);
INSERT INTO account_transaction (account_id,date,interest_amount,payment_amount) VALUES(1,'2020-10-01 00:00:00',null,-500);
INSERT INTO account_transaction (account_id,date,interest_amount,payment_amount) VALUES(1,'2020-11-01 00:00:00',300,null);
INSERT INTO account_transaction (account_id,date,interest_amount,payment_amount) VALUES(1,'2020-11-05 00:00:00',-500,null);
So interest will be added on to the rolling balance, and the starting balance is stored against the account - if we have to have a transaction added for this then ok. Then when a payment is added it can be either negative or positive to decrease the balance moving to each row.
So above example i'd expect to see something along the lines of:
I hope this makes it clearer
WITH
starting_dates AS ( SELECT id account_id, MIN(`date`) startdate
FROM account_transaction
GROUP BY id ),
combine AS ( SELECT 0 id,
starting_dates.account_id,
starting_dates.startdate `date`,
0 interest_amount,
account.starting_balance payment_amount
FROM account
JOIN starting_dates ON account.id = starting_dates.account_id
UNION ALL
SELECT id,
account_id,
`date`,
interest_amount,
payment_amount
FROM account_transaction )
SELECT DATE(`date`) `Date`,
CASE WHEN interest_amount = 0 THEN 'Balance Brought Forward'
WHEN payment_amount IS NULL THEN 'Interest Added'
WHEN interest_amount IS NULL THEN 'Payment Added'
ELSE 'Unknown transaction type'
END `Desc`,
CASE WHEN interest_amount = 0 THEN ''
ELSE COALESCE(interest_amount, 0)
END Interest,
COALESCE(payment_amount, 0) Payment,
SUM(COALESCE(payment_amount, 0) + COALESCE(interest_amount, 0))
OVER (PARTITION BY account_id ORDER BY id) Balance
FROM combine
ORDER BY id;
fiddle
PS. Source data provided (row with id=4) was altered according to desired output provided. Source structure was altered, FLOAT(10,6) which is not compatible with provided values was replaced with DECIMAL.
PPS. The presence of more than one account is allowed.

Select one piece of data from every day at a specific hour MySQL

My database has data imputed every 1 minute and is stored in the format 2020-04-05 16:20:04 under a column called timestamp.
I need a MySQL query to select data from every day at a specific hour (the second does not matter), for for example I want to get the data from 16:00 of every day from the past 30 days.
It currently, just grabs the data from the past 30 days and then the PHP application sorts it, however, this is causing very slow loading time, hence wanting to only select the wanted data from the database.
Example of data
Please try the following sql:
select
d.timestamp, hour(d.timestamp)
from
demo1 d
where
DATEDIFF(NOW(), d.timestamp) < 30 and hour(d.timestamp) = 16;
The create sql is as following:
CREATE TABLE `demo1` (
`id` int(11) not null auto_increment primary key,
`serverid` int(11) not null,
`timestamp` datetime not null,
KEY `idx_timestamp` (`timestamp`)
) engine = InnoDB;
insert into `demo1` (serverid, timestamp)
VALUES (1, "2020-07-05 16:20:04"),
(2, "2020-07-06 17:20:04"),
(3, "2020-07-07 16:40:04"),
(4, "2020-07-08 08:20:04"),
(5, "2020-07-05 15:20:04"),
(5, "2020-07-05 16:59:04"),
(5, "2020-06-04 16:59:04");
Zhiyong's response will work, but wont perform well. You need to figure out a way to get the query to use indexes.
You can add a simple index on timestamp and run the query this way:
SELECT
d.timestamp, d.*
FROM demo1 d
WHERE 1
AND d.timestamp > CURDATE() - INTERVAL 30 DAY
AND hour(d.timestamp) = 16;
In MySQL 5.7 and up, you can created a generated column (also called calculated column) top store the hour of the timestamp in a separate column. You can then index this column, perhaps as a composite index of hour + timestamp, so that the query above will perform really quickly.
ALTER TABLE demo1
ADD COLUMN hour1 tinyint GENERATED ALWAYS AS (HOUR(timestamp)) STORED,
ADD KEY (hour1, timestamp);
The result query would be:
SELECT
d.timestamp, d.*
FROM demo1 d
WHERE 1
AND d.timestamp > CURDATE() - INTERVAL 30 DAY
AND hour1 = 16;
More info on that here:
https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html
https://dev.mysql.com/doc/refman/5.7/en/generated-column-index-optimizations.html

Why Index (on primary key column) is not used?

I have a date table, which has a column date (PK). The CREATE script is here:
CREATE TABLE date_table (
date DATE
,year INT(4)
,month INT(2)
,day INT(2)
,month_pad VARCHAR(2)
,day_pad VARCHAR(2)
,month_name VARCHAR(10)
,year_month_index INT(6)
,year_month_hypname VARCHAR(7)
,year_month_name VARCHAR(15)
,week_day_index INT(1)
,day_name VARCHAR(9)
,week INT(2)
,week_interval VARCHAR(13)
,weekend_fl INT(1)
,quarter_num INT(1)
,quarter_num_pad VARCHAR(2)
,quarter_name VARCHAR(2)
,year_quarter_index INT(6)
,year_quarter_name VARCHAR(7)
,PRIMARY KEY (date)
);
Now I would like select rows from this table with dynamic values, using such as LAST_DAY() or DATE_SUB(DATE_FORMAT(SYSDATE(),'%Y-01-01'), INTERVAL X YEAR), etc.
When one of my queries failed and didn't execute in 30 secs, I knew something was fishy, and it looks like the reason is that the index on the primary key column is not used. Here are my results (sorry for using an image instead of copying the queries, but I thought it's concise enough for this purpose, and the queries are short/simple enough):
First of all, it's strange that the BETWEEN works differently than using >= and <=. Secondly, it looks like the index is only used for constant values. If you look closely, you can see that on the right side (where >= and <= is used), it shows ~9K rows, which is half of the rows in the table (the table has about ~18k rows, dates from 2000-01-01 to `2050-12-31).
SYSDATE() returns the time at which it executes. This differs from the behavior for NOW(), which returns a constant time that indicates the time at which the statement began to execute. (Within a stored function or trigger, NOW() returns the time at which the function or triggering statement began to execute.)
-- https://dev.mysql.com/doc/refman/5.7/en/date-and-time-functions.html#function_sysdate
That is, the Optimizer does not see this as a "constant". Otherwise, the Optimizer eagerly evaluates any "constant expressions", then tries to take advantage of knowing the value.
See also the sysdate_is_now option.
Bottom line: Don't use SYSDATE() for normal datetime usage; use NOW() or CURDATE().
Looks like if I use CURRENT_DATE() (or NOW()) instead of SYSDATE(), it's working. Both of these queries:
SELECT *
FROM date_table t
WHERE 1 = 1
AND t.ddate >= LAST_DAY(CURRENT_DATE()) AND t.ddate <= LAST_DAY(CURRENT_DATE());
SELECT *
FROM date_table t
WHERE 1 = 1
AND t.ddate >= LAST_DAY(NOW()) AND t.ddate <= LAST_DAY(NOW());
Give the same result, which is this:
I will accept my answer as a solution, but I'm still looking for an explanation. I thought it might has to do something with SYSDATE() not being a DATE, but NOW() is also not a DATE...
EDIT: Forgot to add, BETWEEN is also working as I see.

Counting rows in event table, grouped by time range, a lot

Imagine I have a table like this:
CREATE TABLE `Alarms` (
`AlarmId` INT UNSIGNED NOT NULL AUTO_INCREMENT
COMMENT "32-bit ID",
`Ended` BOOLEAN NOT NULL DEFAULT FALSE
COMMENT "Whether the alarm has ended",
`StartedAt` TIMESTAMP NOT NULL DEFAULT 0
COMMENT "Time at which the alarm was raised",
`EndedAt` TIMESTAMP NULL
COMMENT "Time at which the alarm ended (NULL iff Ended=false)",
PRIMARY KEY (`AlarmId`),
KEY `Key4` (`StartedAt`),
KEY `Key5` (`Ended`, `EndedAt`)
) ENGINE=InnoDB;
Now, for a GUI, I want to produce:
a list of days during which at least one alarm were "active"
for each day, how many alarms started
for each day, how many alarms ended
The intent is to present users with a dropdown box from which they can choose a date to see any alarms active (started before or during, and ended during or after) on that day. So something like this:
+-----------------------------------+
| Choose day ▼ |
+-----------------------------------+
| 2017-12-03 (3 started) |
| 2017-12-04 (1 started, 2 ended) |
| 2017-12-05 (2 ended) |
| 2017-12-16 (1 started, 1 ended) |
| 2017-12-17 (1 started) |
| 2017-12-18 |
| 2017-12-19 |
| 2017-12-20 |
| 2017-12-21 (1 ended) |
+-----------------------------------+
I will probably force an age limit on alarms so that they are archived/removed after, say, a year. So that's the scale we're working with.
I expect anywhere from zero to tens of thousands of alarms per day.
My first thought was a reasonably simple:
(
SELECT
COUNT(`AlarmId`) AS `NumStarted`,
NULL AS `NumEnded`,
DATE(`StartedAt`) AS `Date`
FROM `Alarms`
GROUP BY `Date`
)
UNION
(
SELECT
NULL AS `NumStarted`,
COUNT(`AlarmId`) AS `NumEnded`,
DATE(`EndedAt`) AS `Date`
FROM `Alarms`
WHERE `Ended` = TRUE
GROUP BY `Date`
);
This uses both of my indexes, with join type ref and ref type const, which I'm happy with. I can iterate over the resultset, dumping the non-NULL values found into a C++ std::map<boost::gregorian::date, std::pair<size_t, size_t>> (then "filling the gaps" for days on which no alarms started or ended, but were active from previous days).
The spanner I'm throwing in the works is that the list should take into account location-based timezones, but only my application knows about timezones. For logistical reasons, the MySQL session is deliberately SET time_zone = '+00:00' so that timestamps are all kicked out in UTC. (Various other tools are then used to perform any necessary location-specific corrections for historical timezones, taking into account DST and whatnot.) For the rest of the application this is great, but for this particular query it breaks the date GROUPing.
Maybe I could pre-calculate (in my application) a list of time ranges, and generate a huge query of 2n UNIONed queries (where n = number of "days" to check) and get the NumStarted and NumEnded counts that way:
-- Example assuming desired timezone is -05:00
--
-- 3rd December
(
SELECT
COUNT(`AlarmId`) AS `NumStarted`,
NULL AS `NumEnded`,
'2017-12-03' AS `Date`
FROM `Alarms`
-- Alarm started during 3rd December UTC-5
WHERE `StartedAt` >= '2017-12-02 19:00:00'
AND `StartedAt` < '2017-12-03 19:00:00'
GROUP BY `Date`
)
UNION
(
SELECT
NULL AS `NumStarted`,
COUNT(`AlarmId`) AS `NumEnded`,
'2017-12-03' AS `Date`
FROM `Alarms`
-- Alarm ended during 3rd December UTC-5
WHERE `EndedAt` >= '2017-12-02 19:00:00'
AND `EndedAt` < '2017-12-03 19:00:00'
GROUP BY `Date`
)
UNION
-- 4th December
(
SELECT
COUNT(`AlarmId`) AS `NumStarted`,
NULL AS `NumEnded`,
'2017-12-04' AS `Date`
FROM `Alarms`
-- Alarm started during 4th December UTC-5
WHERE `StartedAt` >= '2017-12-03 19:00:00'
AND `StartedAt` < '2017-12-04 19:00:00'
GROUP BY `Date`
)
UNION
(
SELECT
NULL AS `NumStarted`,
COUNT(`AlarmId`) AS `NumEnded`,
'2017-12-04' AS `Date`
FROM `Alarms`
-- Alarm ended during 4th December UTC-5
WHERE `EndedAt` >= '2017-12-03 19:00:00'
AND `EndedAt` < '2017-12-04 19:00:00'
GROUP BY `Date`
)
UNION
-- 5th December
-- [..]
But, of course, even if I'm restricting the database to a year's worth of historical alarms, that's up to like 730 UNIONd SELECTs. My spidey senses tell me that this is a very bad idea.
How else can I generate these sort of time-grouped statistics? Or is this really silly and I should look at resolving the problems preventing me from using tzinfo with MySQL?
Must work on MySQL 5.1.73 (CentOS 6) and MariaDB 5.5.50 (CentOS 7).
The UNION approach is actually not far off a viable solution; you can achieve the same thing, without a catastrophically large query, by recruiting a temporary table:
CREATE TEMPORARY TABLE `_ranges` (
`Start` TIMESTAMP NOT NULL DEFAULT 0,
`End` TIMESTAMP NOT NULL DEFAULT 0,
PRIMARY KEY (`Start`, `End`)
);
INSERT INTO `_ranges` VALUES
-- 3rd December UTC-5
('2017-12-02 19:00:00', '2017-12-03 19:00:00'),
-- 4th December UTC-5
('2017-12-03 19:00:00', '2017-12-04 19:00:00'),
-- 5th December UTC-5
('2017-12-04 19:00:00', '2017-12-05 19:00:00'),
-- etc.
;
-- Now the queries needed are simple and also quick:
SELECT
`_ranges`.`Start`,
COUNT(`AlarmId`) AS `NumStarted`
FROM `_ranges` LEFT JOIN `Alarms`
ON `Alarms`.`StartedAt` >= `_ranges`.`Start`
ON `Alarms`.`StartedAt` < `_ranges`.`End`
GROUP BY `_ranges`.`Start`;
SELECT
`_ranges`.`Start`,
COUNT(`AlarmId`) AS `NumEnded`
FROM `_ranges` LEFT JOIN `Alarms`
ON `Alarms`.`EndedAt` >= `_ranges`.`Start`
ON `Alarms`.`EndedAt` < `_ranges`.`End`
GROUP BY `_ranges`.`Start`;
DROP TABLE `_ranges`;
(This approach was inspired by a DBA.SE post.)
Notice that there are two SELECTs — the original UNION is no longer possible, because temporary tables cannot be accessed twice in the same query. However, since we've already introduced additional statements anyway (the CREATE, INSERT and DROP), this seems to be a moot problem in the circumstances.
In both cases, each row represents one of our requested periods, and the first column equals the "start" part of the period (so that we can identify it in the resultset).
Be sure to use exception handling in your code as needed to ensure that _ranges is DROPped before your routine returns; although the temporary table is local to the MySQL session, if you're continuing to use that session afterwards then you probably want a clean state, particularly if this function is going to be used again.
If this is still too heavy, for example because you have many time periods and the CREATE TEMPORARY TABLE itself will therefore become too large, or because multiple statements doesn't fit in your calling code, or because your user doesn't have permission to create and drop temporary tables, you'll have to fall back on a simple GROUP BY over DAY(Date), and ensure that your users run mysql_tzinfo_to_sql whenever the system's tzdata is updated.

How to sort a list of people by birth and death dates when data are incomplete

I have a list of people who may or may not have a birth date and/or a death date. I want to be able to sort them meaningfully - a subjective term - by birth date.
BUT - if they don't have a birth date but they to have a death date, I want to have them collated into the list proximal to other people who died then.
I recognize that this is not a discrete operation - there is ambiguity about where someone should go when their birth date is missing. But I'm looking for something that is a good approximation, most of the time.
Here's an example list of what I'd like:
Alice 1800 1830
Bob 1805 1845
Carol 1847
Don 1820 1846
Esther 1825 1860
In this example, I'd be happy with Carol appearing either before or after Don - that's the ambiguity I'm prepared to accept. The important outcome is that Carol is sorted in the list relative to her death date as a death date, not sorting the death dates in with the birth dates.
What doesn't work is if I coalesce or otherwise map birth and death dates together. For example, ORDER BY birth_date, death_date would put Carol after Esther, which is way out of place by my thinking.
I think you're going to have to calculate an average age people end up living (for those having both birth and death dates). And either subtract them from death date or add them to birth date for people who don't have the other one.
Doing this in one query may not be efficient, and perhaps ugly because mysql doesn't have windowing functions. You may be better of precalculating the average living age beforehand. But let's try to do it in one query anyway:
SELECT name, birth_date, death_date
FROM people
ORDER BY COALESCE(
birth_date,
DATE_SUB(death_date, INTERVAL (
SELECT AVG(DATEDIFF(death_date, birth_date))
FROM people
WHERE birth_date IS NOT NULL AND death_date IS NOT NULL
) DAY)
)
N.B.: I've tried with a larger dataset, and it is not working completely as I'd expect.
Try with this query (it needs an id primary key column):
SELECT * FROM people p
ORDER BY (
CASE WHEN birth IS NOT NULL THEN (
SELECT ord FROM (
SELECT id, #rnum := #rnum + 1 AS ord
FROM people, (SELECT #rnum := 0) r1
ORDER BY (CASE WHEN birth IS NOT NULL THEN 0 ELSE 1 END), birth, death
) o1
WHERE id = p.id
) ELSE (
SELECT ord FROM (
SELECT id, #rnum := #rnum + 1 AS ord
FROM people, (SELECT #rnum := 0) r2
ORDER BY (CASE WHEN death IS NOT NULL THEN 0 ELSE 1 END), death, birth
) o2
WHERE id = p.id
)
END)
;
What I've done is, basically, to sort the dataset two times, once by birth date and then by death date. Then I've used these two sorted lists to assign the final order to the original dataset, picking the place from the birth-sorted list at first, and using the place from the death-sorted list when a row has no birth date.
Here's a few problems with that query:
I didn't run it against lots of datasets, so I can't really guarantee it will work with any dataset;
I didn't check its performance, so it could be quite slow on large datasets.
This is the table I've used to write it, tested with MySQL 5.6.21 (I can't understand why, but SQL Fiddle is rejecting my scripts with a Create script error, so I can't provide you with a live example).
Table creation:
CREATE TABLE `people` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(50) NOT NULL,
`birth` INT(11) NULL DEFAULT NULL,
`death` INT(11) NULL DEFAULT NULL,
PRIMARY KEY (`id`)
);
Data (I actually slightly changed yours):
INSERT INTO `people` (`name`, `birth`, `death`) VALUES ('Alice', 1800, NULL);
INSERT INTO `people` (`name`, `birth`, `death`) VALUES ('Bob', 1805, 1845);
INSERT INTO `people` (`name`, `birth`, `death`) VALUES ('Carol', NULL, 1847);
INSERT INTO `people` (`name`, `birth`, `death`) VALUES ('Don', 1820, 1846);
INSERT INTO `people` (`name`, `birth`, `death`) VALUES ('Esther', 1815, 1860);
you can use a subquery to pick a suitable birthdate for sorting purposes
and then a union to join with the records with a birthdate
for example:
select d1.name, null as birthdate, d1.deathdate, max(d2.birthdate) sort from
d as d1, d as d2
where d1.birthdate is null and d2.deathdate <=d1.deathdate
group by d1.name, d1.deathdate
union all
select name, birthdate, deathdate, birthdate from d
where birthdate is not null
order by 4
http://sqlfiddle.com/#!9/2d91c/1
Not sure if this will work, but worth a try (I can't test this on MySQL) so trying to guess:
order by case birth_date when null then death_date else birth_date end case