MySQL: How to implement consistency constraints? - mysql

Example:
Here is the employee table:
CREATE TABLE `employees` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(50) NOT NULL,
`code` varchar(4) NOT NULL,
`deleted_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
);
The code is a simple login code of 4 characters. Soft delete is implemented using deleted_at field. Current employees are those with deleted_at=NULL.
We need to keep the code unique between the current employees.
Using a UNIQUE Constraints on the code field will prevent current employees from using codes that have been used by a soft-deleted employee.
How to enforce this constraint?
This is an example of the general problem of how to enforce consistency constraints in MySQL.
Edit:
The schema could be changed to make use of unique constraints as #bill-karwin suggests.
What about applying complex consistency constraints that may span multiple tables?
One way (if possible) is to change the schema in order to apply the constraints using foreign key constraint or unique constraint.
Is there another way to apply complex consistency constraints?

One relatively simple solution to your problem would be to change the deleted_at column to default to something other than NULL (e.g. '1900-01-01', or even the "zero" date '0000-00-00' if you have them enabled). You can then create a UNIQUE index on (code, deleted_at) which would prevent any employee from using a code which a current employee had (since you would get a match on (code,default)), but not exclude them using a code which a previous employee had used, since the default value would not match the deleted_at timestamp.

One solution is to create a nullable column is_active that is restricted to either NULL or a single non-NULL value. The columns code and is_active together must be unique.
CREATE TABLE `employees` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(50) NOT NULL,
`code` varchar(4) NOT NULL,
`is_active` enum('yes'),
`deleted_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY (`code`, `is_active`)
);
If is_active is NULL, then it allows any number of duplicates in the code column. If is_active is not NULL, then it allows only one value 'yes' and therefore each value in the code column must be unique.
deleted_at no longer indicates if the employee is currently active or not, only when then were inactivated.
Re your comment:
Constraints that span multiple tables are called ASSERTIONS in the SQL standard, but there is literally no RDBMS product that implements that feature from the standard.
Some implement constraints with triggers, but it's not always obvious how to design triggers to do what you want efficiently.
Honestly, most people resort to application logic for these sorts of constraints. This comes with some risk of race conditions. As soon as you do a SELECT statement to verify the data satisfies the constraints, some other concurrent session may commit data that spoils the constraint before your session can commit its changes.
The only solution is to use pessimistic locking to ensure no other session can jump ahead of you.

Related

How to implement conditional unique constraint

I have a table that needs a unique constraint on 3 columns, but, if the "date" column in for that insert transaction is a newer date than the current record's date, then I want to update that record (so the unique constraint is still true for the table).
Postgres has the concept of deferrable constraints, MySQL does not.
I do want to implement it with the SQL object tools available, though.
Here is my table DDL with column names obfuscated:
CREATE TABLE `apixio_results_test_sefath` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`number` varchar(20) DEFAULT NULL,
`insert_date` datetime DEFAULT NULL,
`item_id` int(5) DEFAULT NULL,
`rule` tinyint(4) DEFAULT NULL,
`another_column` varchar(20) DEFAULT NULL,
`another_column1` varchar(20) DEFAULT NULL,
PRIMARY KEY (`ID`),
KEY `insert_date_index` (`insert_date`),
KEY `number` (`number`),
) ENGINE=InnoDB AUTO_INCREMENT=627393 DEFAULT CHARSET=latin1
and here is the unique constraint statement
Alter Table dbname.table add unique constraint my_unique_constraint (number, item_id, rule);
but I can not add a condition here in this constraint (unless there is a way I'm not aware of?)
The logic I need to run before inserts are blocked by the constraint is to check if the three values: number, item_id, and rule are unique in the table, and if they aren't, then I want to compare the existing record's insert_date with the insert_date from the transaction, and only keep the record with the newest insert_date.
This could be achieved with a trigger I suppose, although I've heard triggers are only to be used if really needed. And on every insert, this trigger would be quite computationally taxing on the DB. Any advice? Any other sql tricks I can use? Or anything to help point me to how to make this trigger?
I tried the unique constraint statement
Alter Table dbname.table add unique constraint my_unique_constraint (number, item_id, rule);
But it will never update with the newer insert_date.
You can do this with an insert statement like:
insert into apixio_results_test_sefath (number, item_id, rule, insert_date, another_column, another_column1)
values (?,?,?,?,?,?)
on duplicate key update
another_column=if(insert_date>values(insert_date),another_column,values(another_column),
another_column1=if(insert_date>values(insert_date),another_column1,values(another_column1),
insert_date=greatest(insert_date,values(insert_date)
for each column besides the unique ones and insert_date, testing to see if the existing insert_date is greater than the value supplied with the insert and conditionally using the existing value or new value for the other column based on that, and ending with updating insert_date only if it is now greater.
mysql 8 has an alternate syntax it prefers to using the values function, but the values function still works.
If you want this to happen automatically for all inserts, you would need to use a trigger.

How do I partition a MySQL table that contains several unique keys?

I have an extremely large MySQL table that I would like to partition. A simplified create of this table is as given below -
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`columnA` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`columnB` varchar(50) NOT NULL ,
`columnC` int(11) DEFAULT NULL,
`columnD` varchar(255) DEFAULT NULL,
`columnE` int(11) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_B` (`columnB`),
UNIQUE KEY `UNIQ_B_C` (`columnB`,`columnC`),
UNIQUE KEY `UNIQ_C_D` (`columnC`,`columnD`),
UNIQUE KEY `UNIQ_E_F_G` (`columnE`,`columnF`,`columnG`)
)
I want to partition my table either by columnA or id, but the problem is that the MySQL Manual states -
In other words, every unique key on the table must use every column in the table's partitioning expression.
Which means that I cannot partition the table on either of those columns without changing my schema. For example, I have considered adding id to all my unique keys like so -
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`columnA` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`columnB` varchar(50) NOT NULL ,
`columnC` int(11) DEFAULT NULL,
`columnD` varchar(255) DEFAULT NULL,
`columnE` int(11) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_B` (`columnB`,`id`),
UNIQUE KEY `UNIQ_B_C` (`columnB`,`columnC`,`id`),
UNIQUE KEY `UNIQ_C_D` (`columnC`,`columnD`,`id`),
UNIQUE KEY `UNIQ_E_F_G` (`columnE`,`columnF`,`columnG`,`id`)
)
Which I do not mind doing except for the fact that it allows for the creation of rows that should not be created. For example, by my original schema, the following row insertion wouldn't have worked twice -
INSERT into myTable (columnC, columnD) VALUES (1.0,2.0)
But it works with the second schema as columnC and columnD by themselves no longer form a unique key. I have considered getting around this by using triggers to prevent the creation of such rows but then the trigger cost would reduce(or outweigh) the partitioning performance gain
Edited:
Some additional information about this table:
Table has more than 1.2Billion records.
Using Mysql 5.6.34 version with InnoDB Engine and running on AWS RDS.
Few other indexes are also there on this table.
Because of huge data and multiple indexes it is an expensive process to insert and retrieve the data.
There are no unique indexes on timestamp and float data types. It was just a sample table schema for illustration. Our actual table has similar schema as above table.
Other than Partitioning what options do we have to improve the
performance of the table without losing any data and maintaining the
integrity constraints.
How do I partition a MySQL table that contains several unique keys?
Sorry to say, you don't.
Also, you shouldn't. Remember that UPDATE and INSERT operations to a table with unique keys necessarily must query the table to ensure the keys stay unique. If it were possible to partition a table so unique keys weren't built in to the partititon expression, then every insert or update would require querying every partition. This would be likely to make the partitioning worse than useless.

MySQL index order

I'm just getting into indexes on MySQL using InnoDB.
Firstly, and hopefully I am right, because I am using InnoDB and creating foreign keys, they will automatically be used as an index when querying the table? Is that correct?
Also, I'm reading that the order of the index will effect the speed of a query and even if it is used.
SO...how exactly do I specify the order of the index if that will indeed impact queries.
If you take my below table for example. It would be very beneficial for a query on this table to first use the index FK on org_id, since that is going greatly reduce the amount of rows read, and it is the org_id that most data is going to be separated by in the application.
CREATE TABLE IF NOT EXISTS `completed_checks` (
`complete_check_id` int(15) NOT NULL AUTO_INCREMENT,
`check_type` varchar(40) NOT NULL,
`check_desc` varchar(200) DEFAULT NULL,
`assigned_user` int(12) DEFAULT NULL,
`assigned_area` int(12) DEFAULT NULL,
`org_id` varchar(8) NOT NULL,
`check_notes` varchar(300) DEFAULT NULL,
`due` date NOT NULL,
`completed_by` int(12) DEFAULT NULL,
`completed_on` datetime DEFAULT NULL,
`status` int(1) DEFAULT NULL,
`passed` int(1) DEFAULT '0',
PRIMARY KEY (`complete_check_id`),
KEY `fk_org_id_CCheck` (`org_id`),
KEY `fk_user_id_CCheck` (`assigned_user`),
KEY `fk_AreaID_CCheck` (`assigned_area`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
So would MySQL use the FK index on org_id first when querying this table with org_id in the where clause?
And on a separate note, how would I specify the order in which the indexes are used in MySQL? If this is something that I need to be concerned about?
Thanks
Yes, this is correct, see MySQL documentation on creating foreign keys:
index_name represents a foreign key ID. The index_name value is ignored if there is already an explicitly defined index on the child table that can support the foreign key. Otherwise, MySQL implicitly creates a foreign key index
The order of the indexes in a table does not affect what index a query will use. You cannot even say that in general all queries should use such index first since different queries may need different indexes. Moreover, MySQL cannot use more than 1 index per table in a query.
In general MySQL decides which index to use (if any). If you believe that MySQL erred in its decision, then you can use index hint to influence MySQL's decision:
Index hints give the optimizer information about how to choose indexes during query processing.
In the newer versions of MySQL you can also use optimizer hints to influence the query plan.
The last way to influence index use is to force the update of the index statistics collected on a table using the analyse table command:
ANALYZE TABLE analyzes and stores the key distribution for a table.

Duplicate row in database with Unique key constraint

I have the following table:
CREATE TABLE `some_table` (
`ReferenceId` int(11) DEFAULT NULL,
`ten` int(10) DEFAULT NULL,
`so` bigint(18) DEFAULT NULL,
`mc` bigint(18) DEFAULT NULL,
`ev` bigint(18) DEFAULT NULL,
`sclso` bigint(18) DEFAULT NULL,
`sowbse` bigint(18) DEFAULT NULL,
`AsOfDate` date DEFAULT NULL,
`dud` date NOT NULL,
UNIQUE KEY `ReferenceId` (`ReferenceId`,`AsOfDate`),
KEY `fk_main_table` (`ReferenceId`),
CONSTRAINT `fk_main_table` FOREIGN KEY (`ReferenceId`) REFERENCES `some_other_table` (`Id`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
In this table I have added a multiple column UNIQUE index on columns ReferenceId and AsOfDate. But I just noticed that there is a duplicate entry in the table even when we have this constraint.
Check the 2 highlighted records. The constraint is applied on first and second-last columns, which are identical but should not have existed.
What could be the possible issues? The data in this table is not inserted/updated from any web/desktop application but only from 1 script that runs in background.
Edit: I have only identified this 1 index being duplicate and the script have been running for past 3 months.
Either one of two things is true:
You're mistaken
Your database is corrupt
To verify your assertion, write a query to show only the invalid condition:
select count(*) as N, ReferenceId, AsOfDate
from some_table
group by ReferenceId, AsOfDate
having count(*) > 1
(You can dispense with the unnecessary, nonstandard backtick-quotes, by the way. You'll find it makes SQL more pleasant to deal with.)
If that query produces any rows, your database is corrupt, by definition: the table cannot be declared unique on two columns and admit two rows of the same values. Find out what's wrong, and fix it.
If it doesn't produce any rows, it might still be corrupt, but that's evidence you're mistaken. You'll want to re-check your facts, and see if there's another explanation for what you're seeing. Get your hands on the verbatim SQL that produced that output (or is supposed to have done). Execute it, redirecting the output to a temporary table or file, and verify the duplication. If you don't find it, it's not there. If you do, see "corruption" in your friendly manual.
One last thing, just as an aside. This line:
KEY `fk_main_table` (`ReferenceId`),
is likely not doing you much good. You already have
UNIQUE KEY `ReferenceId` (`ReferenceId`,`AsOfDate`),
and your DBMS probably creates an index to enforce that, and probably uses that index to locate rows by ReferenceId.

Which way to define foreign keys in MySQL

I see two ways it is done:
Method 1:
CREATE TABLE IF NOT EXISTS `sample` (
`sample_id` tinyint(2) NOT NULL AUTO_INCREMENT,
`description` varchar(32) NOT NULL,
`parent_id` int(10) NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`sample_id`)
) ENGINE=InnoDB;
ALTER TABLE sample ADD CONSTRAINT parent_id FOREIGN KEY (parent_id) REFERENCES parent_tbl(parent_id);
Method 2:
CREATE TABLE IF NOT EXISTS `sample` (
`sample_id` tinyint(2) NOT NULL AUTO_INCREMENT,
`description` varchar(32) NOT NULL,
`parent_id` int(10) NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`sample_id`),
Foreign Key (parent_id) references parent_tbl(parent_id)
) ENGINE=InnoDB;
Which way is better or when to use one over the other?
If you need to add a foreign key to an existing table, use method 1, if you are creating the schema from scratch use method 2.
There isn't a best way, they do the same thing.
The first gives you more flexibility.
1) You are required to use the first method if you create the tables in an order such that a referenced table is created after its referencing table. If you have loops in your references then there may not be a way to avoid this. If there are no loops then there exists an order where all referenced tables are created before their referenced tables, but you may not want to spend time figuring out what that order is and rearranging your scripts.
2) It's not always the case that you know exactly what indexes you will need when you create the table. When you create indexes it is usually a good idea to measure the performance gain on some real data, and perhaps try multiple different indexes to see which works better. For this strategy to work you need to first create the table, insert some data and then you need to be able to modify the indexes for testing. Dropping and recreating the table is not as practical as ALTER TABLE in this situation.
Other than that there isn't really any difference and if you are starting from nothing there is no particular reason to favour one over the other. The resulting index is the same either way.
The end products are indistinguishable.
For clarity (it's nice to see the constraint explictly stand on it's own), I might advocate for the first.
For succinctness (saying the same thing in 1 statement vs 2), I'd might advocate for the second.