Does SELECT keep consistency during concurrent UPDATE or INSERT? - mysql

I'm using MySQL 5.6 with innodb. Let's say we have the following two tables:
create table order_head (
id int not null,
version int not null,
order_detail_count int not null,
primary key (id)
);
create table order_detail (
id int not null,
product_id int not null,
qty int not null,
order_head_id int not null,
primary key (id),
foreign key (order_head_id)
references order_head (id)
);
Consider a situation that many concurrent INSERT and UPDATE for both two tables are executing all the time. The application which runs those transactions is well-designed with optimistic locking so concurrent executions don't make any inconsistent data.
Under that situation, I have a concern about issuing the following query:
SELECT
*
FROM
order_head h
JOIN
order_detail d ON (h.id = d.order_head_id);
Does this query always ensure that it will return consistent results? in other words, does this query never mix data of multiple distinct transactions? For example, I don't expect inconsistent results such as the count of records is 4 while order_head.order_detail_count is 3.
I think I don't have good understanding of transactions, so any pointers to good references (e.g. books about transactions) would be also greatly appreciated.

That is a basic principle on any RDBMS.
The ACID Rules (https://en.wikipedia.org/wiki/ACID) that any RDBMS must acomplish, in this case the ISOLATION where each Query to the database should not be interfered by another query that is taking place at the same time.

Related

Adding constraints in the MySQL database to guarantee uniqueness

My SQL table has the following DDL
CREATE TABLE `new_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`family_id` int(11) NOT NULL,
`name` varchar(45) NOT NULL,
PRIMARY KEY (`id`)
)
I want to hold the family names in this simple table. In order to do this I have a microservice where the caller sends via JSON the family details:
{
"family_id" : 1,
"names": ["name1", "name2"]
}
The id is generated via auto increment from MySQL.
So the above JSON will finally trigger two insert statements:
insert (family_id, name) values ( 1, name1)
insert (family_id, name) values ( 1, name2)
The problem arises when a new request comes with a family_id that exists in the table. This should not be allowed, and I am doing a query in order to search if the family_id exists or not. If it exists an exception is raised. How can I avoid this query? The table schema can be altered if needed. Would it be OK if a could add something like a "request id", or a guid to establish uniqueness per request?
All data should be on the same table.
Below an example of the table with some data
(from Comment) I cannot create a second table. Eveything should be kept in one table.
You should normalize your schema and use two tables.
Family and (I assume) Person. Then you can use a UNIQUE constraint for the family_id and add the family_id as foreign key into the Person table.
You need two tables.
CREATE TABLE Families (
family_id MEDIUMINT UNSIGNED AUTO_INCREMENT,
...
PRIMARY KEY(family_id)
);
CREATE TABLE FamilyNames (
family_id MEDIUMINT UNSIGNED, -- not auto-inc here
name VARCHAR(66) NOT NULL,
...
PRIMARY KEY(family_id, name) -- note "composite"
);
A PRIMARY KEY is a UNIQUE KEY is a KEY.
You say you cannot add a second table. But why? You mention needing to generate a particular JSON? Can't that simply be done via a JOIN of the two tables if necessary?
If you can't create a second table to model the constraint properly, then you will have to resort to serializing inserts:
LOCK TABLES new_table WRITE;
Use a SELECT to check if the family id exists in the table.
If the family id is not present, INSERT your new data.
UNLOCK TABLES;
It's necessary to lock the table because otherwise you will have a race condition. Two sessions could check if the family id exists, both find that it does not exist, and then both proceed with their INSERT. If you lock the table, then one session will acquire the lock and do its work, while the other session must wait for the lock, and by the time it acquires the lock, its check will find that the family id has been inserted by the first session.
This method is usually considered bad for concurrency, which can limit your throughput if you have many requests. But if you have infrequent requests, the impact to throughput will be minimal.
This is a workaround for a design problem, but I figured I might as well post it. You can generate a query such as the following:
INSERT INTO `new_table` (`family_id`, `name`)
SELECT * FROM (
SELECT 1, 'name1'
UNION ALL
SELECT 1, 'name2'
) x
LEFT JOIN `new_table` n ON n.family_id = 1
WHERE n.family_id IS NULL
Then check the number of rows affected to determine if it was successful or not.

sample sequence table in mysql

I have decided to use mysql sequence table, since I am using spring jdbc batch insert (can't get primary key with this feature), where I will be pass generated key while inserting each row, I have googled long time now, didnt get proper way of creating sequence table.
I have created a sequence table
create table table_sequence (value int not null) ENGINE = MYISAM;
but I feel it seems to be very basic, since I need to have max value, and cache limit for each instance.
I have many tables, do I need to have one sequence table for each table?
I have very less idea about db sequence, so suggestion are helpful to me. thanks
this may help you:
http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html
CREATE TABLE animals (
id MEDIUMINT NOT NULL AUTO_INCREMENT,
name CHAR(30) NOT NULL,
PRIMARY KEY (id)
) ENGINE=MyISAM;
INSERT INTO animals (name) VALUES
('dog'),('cat'),('penguin'),
('lax'),('whale'),('ostrich');
SELECT * FROM animals;

MySQL using partitioning and keeping primary keys unchanged

I'm using MySQL 5.5, and I have an existing table in production that stores customer transactions. A simplified version of the table is:
CREATE TABLE transactions (
id INT NOT NULL AUTO_INCREMENT,
description CHAR(100),
posted DATE,
PRIMARY KEY (id)
) ENGINE=MyISAM
We are exploring the idea of using partitioning on the transaction date to make reports that use date filtering execute faster. The following attempt fails because of restrictions on primary keys and partitions explained in MySQL Partitioning Keys documentation.
mysql> CREATE TABLE transactions (
-> id INT NOT NULL AUTO_INCREMENT,
-> description CHAR(100),
-> posted DATE,
-> PRIMARY KEY (id)
-> ) ENGINE=MyISAM
-> PARTITION BY HASH(MONTH(posted)) PARTITIONS 12;
ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function
A possible workaround is as follows:
CREATE TABLE transactions (
id INT NOT NULL AUTO_INCREMENT,
description CHAR(100),
posted DATE,
PRIMARY KEY (id, posted)
) ENGINE=MyISAM
PARTITION BY HASH(MONTH(posted)) PARTITIONS 12;
Another workaround would be:
CREATE TABLE transactions (
id INT NOT NULL AUTO_INCREMENT,
description CHAR(100),
posted DATE,
KEY (id)
) ENGINE=MyISAM
PARTITION BY HASH(MONTH(posted)) PARTITIONS 12;
In both workarounds the database would not stop the situation of multiple records with the same id, but different posted dates. Is there any way to use partitioning on the posted field and maintain the original unique constraints?
I've been facing this same "problem", and one workaround that I found for that was splitting my table in two, resulting in something like this in your case:
CREATE TABLE transactions (
id INT NOT NULL AUTO_INCREMENT,
description CHAR(100),
PRIMARY KEY (id)
) ENGINE=MyISAM;
And the other:
CREATE TABLE transactions_date (
id INT NOT NULL,
posted DATE
) ENGINE=MyISAM
PARTITION BY HASH(MONTH(posted)) PARTITIONS 12;
The obvious problem is that you have to add some extra logic to the application, like the need to fetch both tables to retrieve all the data when using SELECT statements. You could probably use triggers to help you with the tasks related to INSERT, UPDATE and DELETE.
Just a note: the only functions that can benefit from the use of partition pruning in DATE or DATETIME columns are: YEAR(), TO_DAYS() and TO_SECONDS() (this last one is only available since MySQL 5.5).

Table schema design while using innodb

I have encountered a problem when designing the table schema for our system.
Here is the situation:
our system has a lot of items ( more than 20 millions ), each item has an unique id, but for each item there can be lots of records. For example for the item with id 1 there are about 5000 records and each record has more than 20 attributes. The needs to be identified by its id and status of one or more of its attributes for use in select, update or delete.
I want to use innodb
But the problem is when using innodb, there must be an cluster index.
Due to the situation described above it seems had to find a cluster index so I can only use an auto_increment int as the key
The current design is as follows:
create table record (
item_key int(10) unsigned NOT NULL AUTO_INCREMENT,
item_id int(10) unsigned NOT NULL,
attribute_1 char(32) NOT NULL,
attribute_2 int(10) unsigned NOT NULL,
.
.
.
.
.
attribute_20 int(10) unsigned NOT NULL,
PRIMARY KEY (`item_key`),
KEY `iattribute_1` (`item_id`,`attribute_1`),
KEY `iattribute_2` (`item_id`,`attribute_2`)
) ENGINE=InnoDB AUTO_INCREMENT=22 DEFAULT CHARSET=latin1
the sql statement:
select * from records
where item_id=1 and attribute_1='a1' and attribute_2 between 10 and 1000;
the update and delete statement are similar.
I don't think this a good design, but I can't think of anything else; all suggestions welcome.
Sorry if I didn't make the question clear.
What I want to access ( select, update, delete, insert) is the records, not the items.
The items have there own attributes, but in the descriptions above, the attributes that I mentioned are belongs to the records.
Every item can have many records, like item 1 have about 5000 records.
Every record have 42 attributes, some of them can be NULL, every record has an unique id, this id is unique among different items, but this id is an string not an number
I want to access the records in this way:
A. I will only get(or update or delete) the records that belongs to one specific item at on time or in one query
B. I will get or update the values of all attributes or some specific attributes in the query
C. The attributes that in the condition of the query may not the same as the attributes that I want.
So there could be some SQL statements like:
Select attribute_1, attribute_N from record_table_1 where item_id=1 and attribute_K='some value' and attribute_M between 10 and 100
And the reasons that why I think the original design is not good are:
I can't choose an attribute or the record id as the primary key, because it is no use, in every query, I have to assign the item id and some attributes as the query condition ( like "where item_id=1 and attribute_1='value1' and attribte_2 between 2 and 3), so I can only use an auto_increment int number as the primary key. The result of this is that each query have to scan two b-trees, and it look like that scan of the secondary index is not effective.
Also compound keys seems useless, because the condition of the query could vary among many attributes.
With the original design, it seems that I have add a lot of indexes to satisfy different queries, otherwise I have to deal with the full table scan problem, but it is obviously that too many indexes is not good for update, delete, insert operations.
If you want a cluster index and don't want to use the myisam engine, it sounds like you should use two tables: one for the unique properties of the items and the other for each instance of the item (with the specified attributes).
You're right the schema is wrong. Having the attribute 1..20 as fields within the table is not the way to do this, you need a separate table to store this information. This table would have the item_key from this record together with its own key and a value and therefore this second table would have indexes that allow much better searching.
Something like the following:
Looking at the diagram it is obvious that something is wrong because the record table is too empty, it doesn't look right to me so maybe I'm missing something in the original question....
Compound Keys
I think maybe you are looking to have compound key rather than a clustered index which is a different thing. You can achieve this by:
create table record (
item_id int(10) unsigned NOT NULL,
attribute_1 char(32) NOT NULL,
attribute_2 int(10) unsigned NOT NULL,
.
.
.
.
.
attribute_20 int(10) unsigned NOT NULL,
PRIMARY KEY (`item_id`,`attribute_1`,`attribute_2`),
KEY `iattribute_1` (`item_id`,`attribute_1`),
KEY `iattribute_2` (`item_id`,`attribute_2`)
) ENGINE=InnoDB AUTO_INCREMENT=22 DEFAULT CHARSET=latin1

mySQL KEY Partitioning using three table fields (columns)

I am writing a data warehouse, using MySQL as the back-end. I need to partition a table based on two integer IDs and a name string. I have read (parts of) the mySQL documentation regarding partitioning, and it seems the most appropriate partitioning scheme in this scenario would be either a HASH or KEY partitioning.
I have elected for a KEY partitioning because I (chicked out and) dont want to be responsible for providing a 'collision free' hashing algorithm for my fields - instead, I am relying on MySQL hashing to generate the keys required for hashing.
I have included below, a snippet of the schema of the table that I would like to partition based on the COMPOSITE of the following fields:
school id, course_id, ssname (student surname).
BTW, before anyone points out that this is not the best way to store school related information, I'll have to point out that I am only using the case below as an analogy to what I am trying to model.
My Current CREATE TABLE statement looks like this:
CREATE TABLE foobar (
id int UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
school_id int UNSIGNED NOT NULL,
course_id int UNSIGNED NOT NULL,
ssname varchar(64) NOT NULL,
/* some other fields */
FOREIGN KEY (school_id) REFERENCES school(id) ON DELETE RESTRICT ON UPDATE CASCADE,
FOREIGN KEY (course_id) REFERENCES course(id) ON DELETE RESTRICT ON UPDATE CASCADE,
INDEX idx_fb_si (school_id),
INDEX idx_fb_ci (course_id),
CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname(16))
) ENGINE=innodb;
I would like to know how to modify the statement above so that the table is partitioned using the three fields I mentioned at the begining of this question (namely - school_id, course_id and the starting letter of the students surname).
Another question I would like to ask is this:
What happens in 'edge' situations for example if I attempt to insert a record that contains a valid* school_id, course_id or surname - for which no underlying partitioned table file exists - will mySQL automatically create the underlying file.?
Case in point. I have the following schools: New York Kindergaten, Belfast Elementary and the following courses: Lie Algebra in Infitesmal Dimensions, Entangled Entities
Also assume I have the following students (surnames): Bush, Blair, Hussein
When I add a new school (or course, or student), can I insert them into the foobar table (actually, I cant think why not). The reason I ask is that I forsee adding more schools and courses etc, which means that mySQL will have to create additional tables behind the scenes (as the hash will generate new keys).
I will be grateful if someone with experience in this area can confirm (preferably with links backing their assertion), that my understanding (i.e. no manual administration is required if I add new schools, courses or students to the database), is correct.
I dont know if my second question was well formed (clear) or not. If not, I will be glad to clarify further.
*VALID - by valid, I mean that it is valid in terms of not breaking referential integrity.
I doubt partitioning is as useful as you think. That said, there are a couple of other problems with what you're asking for (note: the entirety of this answer applies to MySQL 5; version 6 might be different):
columns used in KEY partitioning must be a part of the primary key. school_id, course_id and ssname are not part of the primary key.
more generally, every UNIQUE key (including the primary key) must include all columns in the partition1. This means you can only partition on the intersection of the columns in the UNIQUE keys. In your example, the intersection is empty.
most partitioning schemes (other than KEY) require integer or null values. If not NULL, ssname will not be an integer value.
foreign keys and partitioning aren't supported simultaneously2. This is a strong argument not to use partitioning.
Fortunately, collision free hashing is one thing you don't need to worry about, because partitioning is going to result in collisions (otherwise, you'd only have a single row in each partition). If you could ignore the above problems as well as the limitations on functions used in partitioning expressions, you could create a HASH partition with:
CREATE TABLE foobar (
...
) ENGINE=innodb
PARTITION BY HASH (school_id + course_id + ORD(ssname))
PARTITIONS 2
;
What should work is:
CREATE TABLE foobar (
id int UNSIGNED NOT NULL AUTO_INCREMENT,
school_id int UNSIGNED NOT NULL,
course_id int UNSIGNED NOT NULL,
ssname varchar(64) NOT NULL,
/* some other fields */
PRIMARY KEY (id, school_id, course_id),
INDEX idx_fb_si (school_id),
INDEX idx_fb_ci (course_id),
CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname)
) ENGINE=innodb
PARTITION BY HASH (school_id + course_id)
PARTITIONS 2
;
or:
CREATE TABLE foobar (
id int UNSIGNED NOT NULL AUTO_INCREMENT,
school_id int UNSIGNED NOT NULL,
course_id int UNSIGNED NOT NULL,
ssname varchar(64) NOT NULL,
/* some other fields */
PRIMARY KEY (id, school_id, course_id, ssname),
INDEX idx_fb_si (school_id),
INDEX idx_fb_ci (course_id),
CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname)
) ENGINE=innodb
PARTITION BY KEY (school_id, course_id, ssname)
PARTITIONS 2
;
As for the files that store tables, MySOL will create them, though it may do it when you define the table rather than when rows are inserted into it. You don't need to worry about how MySQL manages files. Remember, there are a limited number of partitions, defined when you create the table by the PARTITIONS *n* clause.