MySQL using partitioning and keeping primary keys unchanged - mysql

I'm using MySQL 5.5, and I have an existing table in production that stores customer transactions. A simplified version of the table is:
CREATE TABLE transactions (
id INT NOT NULL AUTO_INCREMENT,
description CHAR(100),
posted DATE,
PRIMARY KEY (id)
) ENGINE=MyISAM
We are exploring the idea of using partitioning on the transaction date to make reports that use date filtering execute faster. The following attempt fails because of restrictions on primary keys and partitions explained in MySQL Partitioning Keys documentation.
mysql> CREATE TABLE transactions (
-> id INT NOT NULL AUTO_INCREMENT,
-> description CHAR(100),
-> posted DATE,
-> PRIMARY KEY (id)
-> ) ENGINE=MyISAM
-> PARTITION BY HASH(MONTH(posted)) PARTITIONS 12;
ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function
A possible workaround is as follows:
CREATE TABLE transactions (
id INT NOT NULL AUTO_INCREMENT,
description CHAR(100),
posted DATE,
PRIMARY KEY (id, posted)
) ENGINE=MyISAM
PARTITION BY HASH(MONTH(posted)) PARTITIONS 12;
Another workaround would be:
CREATE TABLE transactions (
id INT NOT NULL AUTO_INCREMENT,
description CHAR(100),
posted DATE,
KEY (id)
) ENGINE=MyISAM
PARTITION BY HASH(MONTH(posted)) PARTITIONS 12;
In both workarounds the database would not stop the situation of multiple records with the same id, but different posted dates. Is there any way to use partitioning on the posted field and maintain the original unique constraints?

I've been facing this same "problem", and one workaround that I found for that was splitting my table in two, resulting in something like this in your case:
CREATE TABLE transactions (
id INT NOT NULL AUTO_INCREMENT,
description CHAR(100),
PRIMARY KEY (id)
) ENGINE=MyISAM;
And the other:
CREATE TABLE transactions_date (
id INT NOT NULL,
posted DATE
) ENGINE=MyISAM
PARTITION BY HASH(MONTH(posted)) PARTITIONS 12;
The obvious problem is that you have to add some extra logic to the application, like the need to fetch both tables to retrieve all the data when using SELECT statements. You could probably use triggers to help you with the tasks related to INSERT, UPDATE and DELETE.
Just a note: the only functions that can benefit from the use of partition pruning in DATE or DATETIME columns are: YEAR(), TO_DAYS() and TO_SECONDS() (this last one is only available since MySQL 5.5).

Related

Why are there are two rows in MariaDB database violating unique constraint?

I have written an application in Javascript which inserts data into two tables via a connection to a MariaDB server.
There should be a 1:1 correspondance between the rows in these tables when first running the application.
One table stores (simulated) data about properties, the other table stores data about prices. There should be 1 price for each property. At a later date, the price might change, so there could be more than one entry for the price, but this cannot happen when the application is first run. These entries also cannot be in violation of a unique index - but they are.
Perhaps I have misconfigured something in MariaDB? Here is the code which generates the tables.
drop table if exists property_price;
drop table if exists property;
create table property
(
unique_id bigint unsigned not null auto_increment primary key,
web_id bigint unsigned not null,
url varchar(256),
street_address varchar(256),
address_country varchar(64),
property_type varchar(64),
num_bedrooms int,
num_bathrooms int,
created_datetime datetime not null,
modified_datetime datetime not null
);
create table property_price
(
property_unique_id bigint unsigned not null,
price_value decimal(19,2) not null,
price_currency varchar(64) not null,
price_qualifier varchar(64),
added_reduced_ind varchar(64),
added_reduced_date date,
created_datetime datetime not null
);
alter table property_price
add constraint fk_property_unique_id foreign key(property_unique_id)
references property(unique_id);
alter table property
add constraint ui_property_web_id
unique (web_id);
alter table property
add constraint ui_url
unique (url);
alter table property_price
add constraint ui_property_price
unique (property_unique_id, price_value, price_currency, price_qualifier, added_reduced_ind, added_reduced_date);
Below is a screenshot from DBeaver showing that a select statement returns two identical rows.
I don't understand why the unique constraint appears to be violated. The constraint does sometimes work, because if I run my application again, it fails because it attempts to insert a duplicate row which already exists in the DB. (Not the same as the one shown below.)
Can anyone point me in the right direction as to how I might debug this?
MariaDB permits multiple values on columns which form part of a unique constraint.
My solution would be to put the logic for checking for duplicate rows into the application, rather than this being on the database side. Essentially this means the unique constraint is not being used.

Differences between defining primary key - along with column name, at the end of create table stmt, adding primary key index after create table stmt

There are three ways I have seen to define primary keys.
Define along with its column name definition:
CREATE TABLE test (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
-- other fields
);
Define the key at the end of the table definition:
CREATE TABLE test (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
-- other fields
PRIMARY KEY (id)
);
Adding primary key index after table creation. Generally I have seen this in phpMyAdmin's exported .sql files. (Does it depends on the storage engine used?)
CREATE TABLE test (
id INT UNSIGNED NOT NULL,
-- other fields
);
ALTER TABLE test
ADD PRIMARY KEY (id),
MODIFY id INT UNSIGNED NOT NULL AUTO_INCREMENT;
What are the internal differences between all these methods?
Mostly I have seen that importing an SQL file having the 3rd method takes longer time than having other methods.
Edit (After Bill Karwin told that "(the) example(s) shows no import of data"):
The examples above don't contain INSERT queries, but what differences there will be if there are INSERT statements after each of these CREATE TABLE queries for inserting data in them?
There is no difference between the first two forms. It's only a syntax convenience if your primary key is a single column. But if you have a multi-column primary key, you must define the PK as a table constraint:
CREATE TABLE test (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
other INT NOT NULL,
-- other fields
PRIMARY KEY (id, other)
);
The third form is almost the same, because you define the primary key before inserting any data into the table. The only effect is that metadata is altered by the second DDL statement.
Some people claim that adding the primary key after importing data is faster, but this is not true for MySQL's default storage engine InnoDB. The table data is stored as a clustered index. If you don't declare your own primary key, another row id is created implicitly, and this becomes the key for the clustered index. So you're inserting into an index one way or the other.
It's possible that in the old MyISAM storage engine, inserting data to a table with no primary key is a little faster. But you have to count the extra time it takes to add the primary key after you're done inserting data.
In any case, your example shows no import of data, so it's moot.

MySQL auto assign foreign key ID

I have a main table called results. E.g.
CREATE TABLE results (
r_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
r_date DATE NOT NULL,
system_id INT NOT NULL,
FOREIGN KEY (system_id) REFERENCES systems(s_id) ON UPDATE CASCADE ON DELETE CASCADE
);
The systems table as:
CREATE TABLE systems (
s_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
system_name VARCHAR(50) NOT NULL UNIQUE
);
I'm writing a program in Python with MySQL connector. Is there a way to add data to the systems table and then auto assign the generated s_id to the results table?
I know I could INSERT into systems, then do another call to that table to see what the ID is for the s_name, to add to the results table but I thought there might be quirk in SQL that I'm not aware of to make life easier with less calls to the DB?
You could do what you describe in a trigger like this:
CREATE TRIGGER t AFTER INSERT ON systems
FOR EACH ROW
INSERT INTO results SET r_date = NOW(), system_id = NEW.s_id;
This is possible only because the columns of your results table are easy to fill in from the data the trigger has access to. The auto-increment fills itself in, and no additional columns need to be filled in. If you had more columns in the results table, this would be harder.
You should read more about triggers:
https://dev.mysql.com/doc/refman/8.0/en/create-trigger.html
https://dev.mysql.com/doc/refman/8.0/en/triggers.html

What's the difference between using INDEX vs KEY in MySQL?

I know how to use INDEX as in the following code. And I know how to use foreign key and primary key.
CREATE TABLE tasks (
task_id int unsigned NOT NULL AUTO_INCREMENT,
parent_id int unsigned NOT NULL DEFAULT 0,
task varchar(100) NOT NULL,
date_added timestamp NOT NULL,
date_completed timestamp NULL,
PRIMARY KEY ( task_id ),
INDEX parent ( parent_id )
)
However I found a code using KEY instead of INDEX as following.
CREATE TABLE orders (
order_id int unsigned NOT NULL AUTO_INCREMENT,
-- etc
KEY order_date ( order_date )
)
I could not find any explanation on the official MySQL page. Could anyone tell me what is the differences between KEY and INDEX?
The only difference I see is that when I use KEY ..., I need to repeat the word, e.g. KEY order_date ( order_date ).
There's no difference. They are synonyms, though INDEX should be preferred (as INDEX is ISO SQL compliant, while KEY is a MySQL-specific, non-portable, extension).
From the CREATE TABLE manual entry:
KEY is normally a synonym for INDEX. The key attribute PRIMARY KEY can also be specified as just KEY when given in a column definition. This was implemented for compatibility with other database systems.
By "The key attribute PRIMARY KEY can also be specified as just KEY when given in a column definition.", it means that these three CREATE TABLE statements below are equivalent and generate identical TABLE objects in the database:
CREATE TABLE orders1 (
order_id int PRIMARY KEY
);
CREATE TABLE orders2 (
order_id int KEY
);
CREATE TABLE orders3 (
order_id int NOT NULL,
PRIMARY KEY ( order_id )
);
...while these 2 statements below (for orders4, orders5) are equivalent with each other, but not with the 3 statements above, as here KEY and INDEX are synonyms for INDEX, not a PRIMARY KEY:
CREATE TABLE orders4 (
order_id int NOT NULL,
KEY ( order_id )
);
CREATE TABLE orders5 (
order_id int NOT NULL,
INDEX ( order_id )
);
...as the KEY ( order_id ) and INDEX ( order_id ) members do not define a PRIMARY KEY, they only define a generic INDEX object, which is nothing like a KEY at all (as it does not uniquely identify a row).
As can be seen by running SHOW CREATE TABLE orders1...5:
Table
SHOW CREATE TABLE...
orders1
CREATE TABLE orders1 ( order_id int NOT NULL, PRIMARY KEY ( order_id ))
orders2
CREATE TABLE orders2 ( order_id int NOT NULL, PRIMARY KEY ( order_id ))
orders3
CREATE TABLE orders3 ( order_id int NOT NULL, PRIMARY KEY ( order_id ))
orders4
CREATE TABLE orders4 ( order_id int NOT NULL, KEY ( order_id ))
orders5
CREATE TABLE orders5 ( order_id int NOT NULL, KEY ( order_id ))
Here is a nice description about the "difference":
"MySQL requires every Key also be indexed, that's an implementation
detail specific to MySQL to improve performance."
Keys are special fields that play very specific roles within a table, and the type of key determines its purpose within the table.
An index is a structure that RDBMS(database management system) provides to improve data processing. An index has nothing to do with a logical database structure.
SO...
Keys are logical structures you use to identify records within a table and indexes are physical structures you use to optimize data processing.
Source: Database Design for Mere Mortals
Author: Michael Hernandez
It is mentioned as a synonym for INDEX in the 'create table' docs:
MySQL 5.5 Reference Manual :: 13 SQL Statement Syntax :: 13.1 Data Definition Statements :: 13.1.17 CREATE TABLE Syntax
#Nos already cited the section and linked the help for 5.1.
Like PRIMARY KEY creates a primary key and an index for you,
KEY creates an index only.
A key is a set of columns or expressions on which we build an index.
While an index is a structure that is stored in database, keys are strictly a logical concept.
Index help us in fast accessing a record, whereas keys just identify the records uniquely.
Every table will necessarily have a key, but having an index is not mandatory.
Check on https://docs.oracle.com/cd/E11882_01/server.112/e40540/indexiot.htm#CNCPT721

mySQL KEY Partitioning using three table fields (columns)

I am writing a data warehouse, using MySQL as the back-end. I need to partition a table based on two integer IDs and a name string. I have read (parts of) the mySQL documentation regarding partitioning, and it seems the most appropriate partitioning scheme in this scenario would be either a HASH or KEY partitioning.
I have elected for a KEY partitioning because I (chicked out and) dont want to be responsible for providing a 'collision free' hashing algorithm for my fields - instead, I am relying on MySQL hashing to generate the keys required for hashing.
I have included below, a snippet of the schema of the table that I would like to partition based on the COMPOSITE of the following fields:
school id, course_id, ssname (student surname).
BTW, before anyone points out that this is not the best way to store school related information, I'll have to point out that I am only using the case below as an analogy to what I am trying to model.
My Current CREATE TABLE statement looks like this:
CREATE TABLE foobar (
id int UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
school_id int UNSIGNED NOT NULL,
course_id int UNSIGNED NOT NULL,
ssname varchar(64) NOT NULL,
/* some other fields */
FOREIGN KEY (school_id) REFERENCES school(id) ON DELETE RESTRICT ON UPDATE CASCADE,
FOREIGN KEY (course_id) REFERENCES course(id) ON DELETE RESTRICT ON UPDATE CASCADE,
INDEX idx_fb_si (school_id),
INDEX idx_fb_ci (course_id),
CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname(16))
) ENGINE=innodb;
I would like to know how to modify the statement above so that the table is partitioned using the three fields I mentioned at the begining of this question (namely - school_id, course_id and the starting letter of the students surname).
Another question I would like to ask is this:
What happens in 'edge' situations for example if I attempt to insert a record that contains a valid* school_id, course_id or surname - for which no underlying partitioned table file exists - will mySQL automatically create the underlying file.?
Case in point. I have the following schools: New York Kindergaten, Belfast Elementary and the following courses: Lie Algebra in Infitesmal Dimensions, Entangled Entities
Also assume I have the following students (surnames): Bush, Blair, Hussein
When I add a new school (or course, or student), can I insert them into the foobar table (actually, I cant think why not). The reason I ask is that I forsee adding more schools and courses etc, which means that mySQL will have to create additional tables behind the scenes (as the hash will generate new keys).
I will be grateful if someone with experience in this area can confirm (preferably with links backing their assertion), that my understanding (i.e. no manual administration is required if I add new schools, courses or students to the database), is correct.
I dont know if my second question was well formed (clear) or not. If not, I will be glad to clarify further.
*VALID - by valid, I mean that it is valid in terms of not breaking referential integrity.
I doubt partitioning is as useful as you think. That said, there are a couple of other problems with what you're asking for (note: the entirety of this answer applies to MySQL 5; version 6 might be different):
columns used in KEY partitioning must be a part of the primary key. school_id, course_id and ssname are not part of the primary key.
more generally, every UNIQUE key (including the primary key) must include all columns in the partition1. This means you can only partition on the intersection of the columns in the UNIQUE keys. In your example, the intersection is empty.
most partitioning schemes (other than KEY) require integer or null values. If not NULL, ssname will not be an integer value.
foreign keys and partitioning aren't supported simultaneously2. This is a strong argument not to use partitioning.
Fortunately, collision free hashing is one thing you don't need to worry about, because partitioning is going to result in collisions (otherwise, you'd only have a single row in each partition). If you could ignore the above problems as well as the limitations on functions used in partitioning expressions, you could create a HASH partition with:
CREATE TABLE foobar (
...
) ENGINE=innodb
PARTITION BY HASH (school_id + course_id + ORD(ssname))
PARTITIONS 2
;
What should work is:
CREATE TABLE foobar (
id int UNSIGNED NOT NULL AUTO_INCREMENT,
school_id int UNSIGNED NOT NULL,
course_id int UNSIGNED NOT NULL,
ssname varchar(64) NOT NULL,
/* some other fields */
PRIMARY KEY (id, school_id, course_id),
INDEX idx_fb_si (school_id),
INDEX idx_fb_ci (course_id),
CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname)
) ENGINE=innodb
PARTITION BY HASH (school_id + course_id)
PARTITIONS 2
;
or:
CREATE TABLE foobar (
id int UNSIGNED NOT NULL AUTO_INCREMENT,
school_id int UNSIGNED NOT NULL,
course_id int UNSIGNED NOT NULL,
ssname varchar(64) NOT NULL,
/* some other fields */
PRIMARY KEY (id, school_id, course_id, ssname),
INDEX idx_fb_si (school_id),
INDEX idx_fb_ci (course_id),
CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname)
) ENGINE=innodb
PARTITION BY KEY (school_id, course_id, ssname)
PARTITIONS 2
;
As for the files that store tables, MySOL will create them, though it may do it when you define the table rather than when rows are inserted into it. You don't need to worry about how MySQL manages files. Remember, there are a limited number of partitions, defined when you create the table by the PARTITIONS *n* clause.