MYSQL: Partitioning Table keeping id unique - mysql

We are using a table which has schema like following:-
CREATE TABLE `user_subscription` (
`ID` varchar(40) NOT NULL,
`COL1` varchar(40) NOT NULL,
`COL2` varchar(30) NOT NULL,
`COL3` datetime NOT NULL,
`COL4` datetime NOT NULL,
`ARCHIVE` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`)
)
Now we wanted to do partition on column ARCHIVE. ARCHIVE can have only 2 values 0 or 1 and so 2 partitions.
Actually in our case, we are using partitioning as a Archival process. To do partition, we need to make ARCHIVE column as a part of primary key. But the problem here is that 2 rows can have same ID with different ARCHIVE column value. Actually thats not the main problem for us as 2 rows will be in different partitions. Problem is when we will update the archive column value of one of them to other to move one of the row to archive partition, then it will not allow us to update the entry giving "Duplicate Error".
Can somebody help in this regard?

Unfortunately,
A UNIQUE INDEX (or a PRIMARY KEY) must include all columns in the table's partitioning function
and since MySQL does not support check constraints either, the only ugly workaround I can think of is enforcing the uniqueness manually though triggers:
CREATE TABLE t (
id INT NOT NULL,
archived TINYINT(1) NOT NULL DEFAULT 0,
PRIMARY KEY (id, archived), -- required by MySQL limitation on partitioning
)
PARTITION BY LIST(archived) (
PARTITION pActive VALUES IN (0),
PARTITION pArchived VALUES IN (1)
);
CREATE TRIGGER tInsert
BEFORE INSERT ON t FOR EACH ROW
CALL checkUnique(NEW.id);
CREATE TRIGGER tUpdate
BEFORE UPDATE ON t FOR EACH ROW
CALL checkUnique(NEW.id);
DELIMITER //
CREATE PROCEDURE checkUnique(pId INT)
BEGIN
DECLARE flag INT;
DECLARE message VARCHAR(50);
SELECT id INTO flag FROM t WHERE id = pId;
IF flag IS NOT NULL THEN
-- the below tries to mimic the error raised
-- by a regular UNIQUE constraint violation
SET message = CONCAT("Duplicate entry '", pId, "'");
SIGNAL SQLSTATE "23000" SET
MYSQL_ERRNO = 1062,
MESSAGE_TEXT = message,
COLUMN_NAME = "id";
END IF;
END //
(fiddle)
MySQL's limitations on partitioning being such a downer (in particular its lack of support for foreign keys), I would advise against using it altogether until the table grows so large that it becomes an actual concern.

Related

Improve performance of query with large table?

I have a large table named 'roomlogs' which has nearly 1 million entries.
The structure of the table:
id --> PK
roomId --> varchar FK to rooms table
userId --> varchar FK to users table
enterTime --> Date and Time
exitTime --> Date and Time
status --> bool
I have the previous indexing on roomID, I recently added an index on the userId column.
So, When I run a stored procedure with following code it is taking more time like on average 50 seconds. WHich it should not take.
DELIMITER ;;
CREATE DEFINER=`root`#`%` PROCEDURE `enter_room`(IN pRoomId varchar(200), IN puserId varchar(50), IN ptime datetime, IN phidden int, pcheckid int, pexit datetime)
begin
update roomlogs set
roomlogs.exitTime = ptime,
roomlogs.`status` = 1
where
roomlogs.userId = puserId
and roomlogs.`status` = 0
and DATEDIFF(ptime,roomlogs.enterTime) = 0;
INSERT into roomlogs
( roomlogs.roomId,
roomlogs.userId,
roomlogs.enterTime,
roomlogs.exitTime,
roomlogs.hidden,
roomlogs.checkinId )
value
( pRoomId,
userId,
ptime,
pexit,
phidden,
pcheckid);
select *
from
roomlogs
where
roomlogs.id= LAST_INSERT_ID();
end ;;
DELIMITER ;
What Can be the reason for it to take this much time:
I added an index recently so previous rows are not indexed.
There is no selection on storage type for any indexes right now. Should I change it to B-tree?
On my website, I get 20-30 simultaneous call on other procedures also while this procedure has 10-20 simultaneous calls, does the update query in the procedure make a lock? But in MySQL.slow_logs table for each query the lock _time shows 0.
Is there any other reason for this behaviour?
Edit: Here is the SHOW TABLE:
CREATE TABLE `roomlogs` (
`roomId` varchar(200) CHARACTER SET latin1 DEFAULT NULL,
`userID` varchar(50) CHARACTER SET latin1 DEFAULT NULL,
`enterTime` datetime DEFAULT NULL,
`exitTime` datetime DEFAULT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
`status` int(11) DEFAULT '0',
`hidden` int(11) DEFAULT '0',
`checkinId` int(11) DEFAULT '-1',
PRIMARY KEY (`id`),
KEY `RoomLogIndex` (`roomId`),
KEY `RoomLogIDIndex` (`id`),
KEY `USERID` (`userID`)
) ENGINE=InnoDB AUTO_INCREMENT=1064216 DEFAULT CHARSET=utf8
I can also see that this query is running more number of times like 100000 times per day (nearly continuously).
SELECT count(*) from roomlogs where roomId=proomId and status='0';
Because of this query reads from the same table, does InnoDB block or create a lock on update query because I can see that when the above-stored procedure is running more number of times then this query is taking more time.
Here is the link for MySQL variables: https://docs.google.com/document/d/17_MVaU4yvpQfVDT83yhSjkLHsgYd-z2mg6X7GwvYZGE/edit?usp=sharing
roomlogs needs this 'composite' index:
INDEX(userId, `status`, enterTime)
I added an index recently so previous rows are not indexed.
Not true. Adding an INDEX indexes the entire table.
The default index type is BTree; no need to explicitly specify it.
does the update query in the procedure make a lock?
It does some form of locking. What is the value of autocommit? Do you explicitly use BEGIN and COMMIT? Is the table ENGINE=InnoDB? Please provide SHOW CREATE TABLE.
MySQL.slow_logs table for each query the lock _time shows 0.
The INSERT you show seems to be inserting the same row as the UPDATE. Maybe you need INSERT ... ON DUPLICATE KEY UPDATE ...?
Don't "hide an index column in a function"; instead of DATEDIFF(roomlogs.enterTime,NOW()) = 0, do
AND enterTime >= CURDATE()
AND enterTime < CURDATE() + INTERVAL 1 DAY
This allows the index to be used more fully.
KEY `RoomLogIndex` (`roomId`), Change to (roomId, status)
KEY `RoomLogIDIndex` (`id`), Remove, redundant with the PK
Buffer pool in only 97,517,568 -- make it more like 9G.

MySQL: Enforce an unique column without using an unique key

I have a column with data that exceeds MySQL's index length limit. Therefore, I can't use an unique key.
There's a solution here to the problem without using an unique key: MySQL: Insert record if not exists in table
However, in the comments, people are having issues with inserting the same value into multiple columns. In my case, a lot of my values are 0, so I'll get duplicate values very often.
I'm using Node and node-mysql to access the database. I'm thinking I can have a variable that keeps track of all values that are currently being inserted. Before inserting, I check if the value is currently being inserting. If so, I'll wait until it finishes inserting, then continue execution as if the value was originally inserted. However, I feel like this will be very error prone.
Here's part of my table schema:
CREATE TABLE `links` (
`id` int(10) UNSIGNED NOT NULL,
`url` varchar(2083) CHARACTER SET latin1 COLLATE latin1_general_cs NOT NULL,
`likes` int(10) UNSIGNED NOT NULL,
`tweets` int(10) UNSIGNED NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
ALTER TABLE `links`
ADD PRIMARY KEY (`id`),
ADD KEY `url` (`url`(50));
I cannot put an unique key on url because it can be 2083 bytes, which is over MySQL's key size limit. likes and tweets will often be 0, so the linked solution will not work.
Is there another possible solution?
If you phrase your INSERT in a certain way, you can make use of WHERE NOT EXISTS to check first if the URL does not exist before completing the insert:
INSERT INTO links (`url`, `likes`, `tweets`)
SELECT 'http://www.google.com', 10, 15 FROM DUAL
WHERE NOT EXISTS
(SELECT 1 FROM links WHERE url='http://www.google.com');
This assumes that the id column is a primary key/auto increment, and MySQL will automatically assign a value to it.

Duplicate values in reference tables

Our application calls a stored procedure to normalize it's data to reference tables, after which it inserts a record into the main table partially containing values and partially containing ids that map to the reference tables. This is one of the stored procedures:
CREATE PROCEDURE `sp_name`(IN valueIn varchar(100), OUT valueOut int)
BEGIN
declare maxid int;
declare countid int;
select max(id) into valueOut from tableName where fieldName=valueIn;
IF valueOut is NULL
THEN
start transaction with consistent snapshot;
select count(*) into countid from tableName where fieldName=valueIn;
IF countid=0
THEN
insert into tableName (fieldName) values (valueIn);
select last_insert_id() into valueOut;
ELSE
select max(id) into valueOut from tableName where fieldName=ValueIn;
end IF;
commit;
end IF;
END
When called manually this works fine but, when being called in production we end up with multiple duplicate values in the reference tables.
Transaction isolation level is REPEATABLE_READ.
Ref table:
CREATE TABLE `tableName` (
`id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`fieldName` varchar(45) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=100 DEFAULT CHARSET=utf8
Using a unique key constraint on the field fieldName isn't a good option. We have tried this but then instead of getting duplicate values, we see that the auto increment skips ID's. And we are trying to preserve ID's so that we do not need to over allocate when it comes to data types. Our main table is huge (multi billion) so we have to make efficient use of data types.
Anybody out there that understands this phenomenon?
There are a lot of hurdles you have to clear if you want to build your own replacement for auto_increment. You'll find problems of serializability, concurrency, performance (usually related to locking), etc.
I think the simplest solution might be to use auto_increment on a column of type bigint unsigned. The maximum value of an unsigned integer is 4,294,967,295: roughly 4x10^9. The maximum value of an unsigned bigint is 18,446,744,073,709,551,615: roughly 1.8x10^19.
The auto_increment will still skip id numbers, but that's by design, and it shouldn't cause trouble with a range of 1.8x10^19.
Before you commit to this path, test big numbers with your client software. Some still don't deal gracefully with bigint, signed or not.

Defining Composite Key with Auto Increment in MySQL

Scenario:
I have a table which references two foreign keys, and for each unique combination of these foreign keys, has its own auto_increment column. I need to implement a Composite Key that will help identify the row as unique using combination of these three (one foreign keys and one auto_increment column, and one other column with non-unique values)
Table:
CREATE TABLE `issue_log` (
`sr_no` INT NOT NULL AUTO_INCREMENT ,
`app_id` INT NOT NULL ,
`test_id` INT NOT NULL ,
`issue_name` VARCHAR(255) NOT NULL ,
primary key (app_id, test_id,sr_no)
);
Of course, there has to be something wrong with my query, because of which the error thrown is:
ERROR 1075: Incorrect table definition; there can be only one auto
column and it must be defined as a key
What I am trying to achieve:
I have an Application Table (with app_id as its primary key), each Application has a set of Issues to be resolved, and each Application has multiple number of tests (so the test_id col)
The sr_no col should increment for unique app_id and test_id.
i.e. The data in table should look like:
The database engine is InnoDB.
I want to achieve this with as much simplicity as possible (i.e. avoid triggers/procedures if possible - which was suggested for similar cases on other Questions).
You can't have MySQL do this for you automatically for InnoDB tables - you would need to use a trigger or procedure, or user another DB engine such as MyISAM. Auto incrementing can only be done for a single primary key.
Something like the following should work
DELIMITER $$
CREATE TRIGGER xxx BEFORE INSERT ON issue_log
FOR EACH ROW BEGIN
SET NEW.sr_no = (
SELECT IFNULL(MAX(sr_no), 0) + 1
FROM issue_log
WHERE app_id = NEW.app_id
AND test_id = NEW.test_id
);
END $$
DELIMITER ;
You can do this with myISAM and BDB engines. InnoDB does not support this. Quote from MySQL 5.0 Reference Manual.
For MyISAM and BDB tables you can specify AUTO_INCREMENT on a secondary column in a multiple-column index. In this case, the generated value for the AUTO_INCREMENT column is calculated as MAX(auto_increment_column) + 1 WHERE prefix=given-prefix.
http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html
I don't fully understand your increment requirement on the test_id column, but if you want an ~autoincrement sequence that restarts on every unique combination of (app_id, test_id), you can do an INSERT ... SELECT FROM the same table, like so:
mysql> INSERT INTO `issue_log` (`sr_no`, `app_id`, `test_id`, `issue_name`) SELECT
IFNULL(MAX(`sr_no`), 0) + 1 /* next sequence number */,
3 /* desired app_id */,
1 /* desired test_id */,
'Name of new row'
FROM `issue_log` /* specify the table name as well */
WHERE `app_id` = 3 AND `test_id` = 1 /* same values as in inserted columns */
This assumes a table definition with no declared AUTO_INCREMENT column. You're essentially emulating autoincrement behavior with the IFNULL(MAX()) + 1 clause, but the manual emulation works on arbitrary columns, unlike the built-in autoincrement.
Note that the INSERT ... SELECT being a single query ensures atomicity of the operation. InnoDB will gap-lock the appropriate index, and many concurrent processes can execute this kind of query while still producing non-conflicting sequences.
You can use a unique composite key for sr_no,app_id & test_id. You cannot use incremental in sr_no as this is not unique.
CREATE TABLE IF NOT EXISTS `issue_log` (
`sr_no` int(11) NOT NULL,
`app_id` int(11) NOT NULL,
`test_id` int(11) NOT NULL,
`issue_name` varchar(255) NOT NULL,
UNIQUE KEY `app_id` (`app_id`,`test_id`,`sr_no`)
) ENGINE=InnoDB ;
I have commented out unique constraint violation in sql fiddle to demonstrate (remove # in line 22 of schema and rebuild schema )
This is what I wanted
id tenant
1 1
2 1
3 1
1 2
2 2
3 2
1 3
2 3
3 3
My current table definition is
CREATE TABLE `test_trigger` (
`id` BIGINT NOT NULL,
`tenant` varchar(255) NOT NULL,
PRIMARY KEY (`id`,`tenant`)
);
I created one table for storing the current id for each tenant.
CREATE TABLE `get_val` (
`tenant` varchar(255) NOT NULL,
`next_val` int NOT NULL,
PRIMARY KEY (`tenant`,`next_val`)
) ENGINE=InnoDB ;
Then I created this trigger which solve my problem
DELIMITER $$
CREATE TRIGGER trigger_name
BEFORE INSERT
ON test_trigger
FOR EACH ROW
BEGIN
UPDATE get_val SET next_val = next_val + 1 WHERE tenant = new.tenant;
set new.id = (select next_val from get_val where tenant=new.tenant);
END$$
DELIMITER ;
This approach will be thread safe also because any insertion for the same tenant will happen sequentially because of the update query in the trigger and for different tenants insertions will happen parallelly.
Just add key(sr_no) on auto-increment column:
CREATE TABLE `issue_log` (
`sr_no` INT NOT NULL AUTO_INCREMENT ,
`app_id` INT NOT NULL ,
`test_id` INT NOT NULL ,
`issue_name` VARCHAR(255) NOT NULL ,
primary key (app_id, test_id,sr_no),
key (`sr_no`)
);
Why don't you try to change the position of declare fields as primary key, since when you use "auto_increment" it has to be referenced as the first. Like in the following example
CREATE TABLE `issue_log` (
`sr_no` INT NOT NULL AUTO_INCREMENT ,
`app_id` INT NOT NULL ,
`test_id` INT NOT NULL ,
`issue_name` VARCHAR(255) NOT NULL ,
primary key (sr_no,app_id, test_id)
);

Need help with mysql schema design - current schema requires dynamic sql within a trigger

I imagine that I have designed my database badly, but I'm currently stumped by the fact that I need to use dynamic sql in a trigger and that's making mysql unhappy.
The context is that I have created a membership database with several dozen tables, the main one of which is the 'member' table with a unique primary key 'id'. There are a number of other tables which have foreign keys referring to the member.id field.
Because the data has been gathered over many years and with little dupe-control, there is another field in the 'member' table called 'superseded_by', which contains the id of the member who supersedes this one. By default, superseded_by is set to be the member_id. Any one whose superseded_by <> id is deemed to be a dupe.
Now the tricky part... when we identify a dupe, we want to set the superseded_by field to point to the new primary member and update all the tables with foreign keys pointing to the now redundant member id. I have tried to do this using an after update trigger... and then I've tried to be clever by querying the foreign keys from the information_schema and using dynamic sql to update them.
This clearly doesn't work (Error Code: 1336 Dynamic SQL is not allowed in stored function or trigger).
I'm assuming there is a better way to design the schema / handle dupes which I haven't thought of.
Help please...
CODE SNIPPET:
-- ---
-- Table 'member'
-- ---
DROP TABLE IF EXISTS member;
CREATE TABLE member (
id INTEGER AUTO_INCREMENT,
superseded_by INTEGER DEFAULT NULL,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
date_of_birth DATE DEFAULT NULL,
gender ENUM('M', 'F') DEFAULT NULL,
mailing_address_id INTEGER DEFAULT NULL,
last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (id),
FOREIGN KEY (mailing_address_id) REFERENCES mailing_address (id),
FOREIGN KEY (superseded_by) REFERENCES member (id)
);
DELIMITER $$
CREATE TRIGGER set_superseded_by_on_insert BEFORE INSERT ON member FOR EACH ROW
BEGIN
SET NEW.superseded_by = NEW.id;
END$$
-- Trigger to update other tables (volunteers, donations, presenters, etc.) when member's superseded_by record is updated
-- Assumes the new superseding person exists (they should also not be superseded by anyone themselves)
CREATE TRIGGER adjust_foreign_member_keys_on_superseded_by_update AFTER UPDATE ON member FOR EACH ROW
BEGIN
DECLARE db, tbl, col VARCHAR(64);
DECLARE fk_update_statement VARCHAR(200);
DECLARE no_more_rows BOOLEAN;
DECLARE fks CURSOR FOR SELECT kcu.TABLE_SCHEMA, kcu.TABLE_NAME, kcu.COLUMN_NAME
FROM information_schema.TABLE_CONSTRAINTS tc
JOIN information_schema.KEY_COLUMN_USAGE kcu ON
tc.table_schema = kcu.table_schema AND tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type='FOREIGN KEY' AND
kcu.REFERENCED_TABLE_NAME = 'member' AND
kcu.REFERENCED_COLUMN_NAME = 'id';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET no_more_rows = TRUE;
IF NEW.superseded_by <> OLD.superseded_by THEN
OPEN fks;
SET no_more_rows = FALSE;
update_loop: LOOP
FETCH fks INTO db, tbl, col;
IF no_more_rows THEN
LEAVE update_loop;
END IF;
SET #fk_update_statement = CONCAT("UPDATE ", db, ".", tbl, " SET ", col, " = NEW.superseded_by WHERE ", col, " = NEW.id;");
PREPARE stmt FROM #fk_update_statement;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END LOOP;
CLOSE fks;
END IF;
END$$
DELIMITER ;
Why are you trying to maintain duplicates in your main table? Seems like you'd be better off with a member table and a member_history table to track previous changes. You could do it by having a table that stored the field changed, date changed and the old and new values. Or you could just store the previous snapshot of the member table before updating it. For instance:
INSERT INTO member_history SELECT NULL, * FROM member WHERE id = ?
UPDATE member SET [...] WHERE id = ?
The schema for member_history would be nearly identical except that you would store member.id as member_id and have a separate primary key for each history entry. (Note: I'm glossing over the syntax a little, the NULL, * part might not work in which case you may need to explicitly name all the fields. Haven't taken the time to check it).
CREATE TABLE member (
id INTEGER AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
date_of_birth DATE DEFAULT NULL,
gender ENUM('M', 'F') DEFAULT NULL,
mailing_address_id INTEGER DEFAULT NULL,
last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (id),
FOREIGN KEY (mailing_address_id) REFERENCES mailing_address (id),
);
CREATE TABLE member_history (
id INTEGER AUTO_INCREMENT,
member_id INTEGER NOT NULL,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
date_of_birth DATE DEFAULT NULL,
gender ENUM('M', 'F') DEFAULT NULL,
mailing_address_id INTEGER DEFAULT NULL,
last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (id),
FOREIGN KEY (member_id) REFERENCES member (id),
);
Notice that I removed the superseded_by field in the member table and the foreign key to mailing_address in the member_history table. You shouldn't need the superseded_by any more and keeping the foreign key in the member_history table isn't really necessary unless you're worried about dangling references in your history.
Ok, just a couple of thoughts on this:
superseded_by is referencing id on the same table and is in general equal to the latter - not in those cases where you were able to identify a dupe, though, in which case it would point to another already existing member's id.
Given that we can safely assume that no superseded_by field will ever hurt the foreign key constraint.
I further assume that id and superseded_by fields of dupes that have not been identified yet are equal.
So, if all of the above is true, you may bend the foreign key of the other related tables to reference superseded_by instead of id. This way you could cascade the changes made to the dupe down to the other tables and still have the exact same constraint as before.
What you think? Am I missing something?
Please note that this is an option only if you are using InnoDB rather than MyISAM.
Regards,
aefxx
Trigger and stored function in mysql have limitations that we can not use dynamic sql in both of these. I hope this helps.