Duplicate values in reference tables - mysql

Our application calls a stored procedure to normalize it's data to reference tables, after which it inserts a record into the main table partially containing values and partially containing ids that map to the reference tables. This is one of the stored procedures:
CREATE PROCEDURE `sp_name`(IN valueIn varchar(100), OUT valueOut int)
BEGIN
declare maxid int;
declare countid int;
select max(id) into valueOut from tableName where fieldName=valueIn;
IF valueOut is NULL
THEN
start transaction with consistent snapshot;
select count(*) into countid from tableName where fieldName=valueIn;
IF countid=0
THEN
insert into tableName (fieldName) values (valueIn);
select last_insert_id() into valueOut;
ELSE
select max(id) into valueOut from tableName where fieldName=ValueIn;
end IF;
commit;
end IF;
END
When called manually this works fine but, when being called in production we end up with multiple duplicate values in the reference tables.
Transaction isolation level is REPEATABLE_READ.
Ref table:
CREATE TABLE `tableName` (
`id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`fieldName` varchar(45) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=100 DEFAULT CHARSET=utf8
Using a unique key constraint on the field fieldName isn't a good option. We have tried this but then instead of getting duplicate values, we see that the auto increment skips ID's. And we are trying to preserve ID's so that we do not need to over allocate when it comes to data types. Our main table is huge (multi billion) so we have to make efficient use of data types.
Anybody out there that understands this phenomenon?

There are a lot of hurdles you have to clear if you want to build your own replacement for auto_increment. You'll find problems of serializability, concurrency, performance (usually related to locking), etc.
I think the simplest solution might be to use auto_increment on a column of type bigint unsigned. The maximum value of an unsigned integer is 4,294,967,295: roughly 4x10^9. The maximum value of an unsigned bigint is 18,446,744,073,709,551,615: roughly 1.8x10^19.
The auto_increment will still skip id numbers, but that's by design, and it shouldn't cause trouble with a range of 1.8x10^19.
Before you commit to this path, test big numbers with your client software. Some still don't deal gracefully with bigint, signed or not.

Related

How can auto-Incrementing be maintained when concurrent transactions occur on a compound key In MYSQL?

I recently encountered an error in my application with concurrent transactions. Previously, auto-incrementing for compound key was implemented using the application itself using PHP. However, as I mentioned, the id got duplicated, and all sorts of issues happened which I painstakingly fixed manually afterward.
Now I have read about related issues and found suggestions to use trigger.
So I am planning on implementing a trigger somewhat like this.
DELIMITER $$
CREATE TRIGGER auto_increment_my_table
BEFORE INSERT ON my_table FOR EACH ROW
BEGIN
SET NEW.id = SELECT MAX(id) + 1 FROM my_table WHERE type = NEW.type;
END $$
DELIMITER ;
But my doubt regarding concurrency still remains. Like what if this trigger was executed concurrently and both got the same MAX(id) when querying?
Is this the correct way to handle my issue or is there any better way?
An example - how to solve autoincrementing in compound index.
CREATE TABLE test ( id INT,
type VARCHAR(192),
value INT,
PRIMARY KEY (id, type) );
-- create additional service table which will help
CREATE TABLE test_sevice ( type VARCHAR(192),
id INT AUTO_INCREMENT,
PRIMARY KEY (type, id) ) ENGINE = MyISAM;
-- create trigger which wil generate id value for new row
CREATE TRIGGER tr_bi_test_autoincrement
BEFORE INSERT
ON test
FOR EACH ROW
BEGIN
INSERT INTO test_sevice (type) VALUES (NEW.type);
SET NEW.id = LAST_INSERT_ID();
END
db<>fiddle here
creating a service table just to auto increment a value seems less than ideal for me. – Mohamed Mufeed
This table is extremely tiny - you may delete all records except one per group with largest autoincremented value in this group anytime. – Akina
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=61f0dc36db25dd5f0cf4647d8970cdee
You may schedule excess rows removing (for example, daily) in service event procedure.
I have managed to solve this issue.
The answer was somewhat in the direction of Akina's Answer. But not quite exactly.
The way I solved it did indeed involved an additional table but not like the way He suggested.
I created an additional table to store meta data about transactions.
Eg: I had table_key like this
CREATE TABLE `journals` (
`id` bigint NOT NULL AUTO_INCREMENT,
`type` smallint NOT NULL DEFAULT '0',
`trans_no` bigint NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `transaction` (`type`,`trans_no`)
)
So I created a meta_journals table like this
CREATE TABLE `meta_journals` (
`type` smallint NOT NULL,
`next_trans_no` bigint NOT NULL,
PRIMARY KEY (`type`),
)
and seeded it with all the different types of journals and the next sequence number.
And whenever I insert a new transaction to the journals I made sure to increment the next_trans_no of the corresponding type in the meta_transactions table. This increment operation is issued inside the same database TRANSACTION, i.e. inside the BEGIN AND COMMIT
This allowed me to use the exclusive lock acquired by the UPDATE statement on the row of meta_journals table. So when two insert statement is issued for the journal concurrently, One had to wait until the lock acquired by the other transaction is released by COMMITing.

MYSQL: Partitioning Table keeping id unique

We are using a table which has schema like following:-
CREATE TABLE `user_subscription` (
`ID` varchar(40) NOT NULL,
`COL1` varchar(40) NOT NULL,
`COL2` varchar(30) NOT NULL,
`COL3` datetime NOT NULL,
`COL4` datetime NOT NULL,
`ARCHIVE` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`)
)
Now we wanted to do partition on column ARCHIVE. ARCHIVE can have only 2 values 0 or 1 and so 2 partitions.
Actually in our case, we are using partitioning as a Archival process. To do partition, we need to make ARCHIVE column as a part of primary key. But the problem here is that 2 rows can have same ID with different ARCHIVE column value. Actually thats not the main problem for us as 2 rows will be in different partitions. Problem is when we will update the archive column value of one of them to other to move one of the row to archive partition, then it will not allow us to update the entry giving "Duplicate Error".
Can somebody help in this regard?
Unfortunately,
A UNIQUE INDEX (or a PRIMARY KEY) must include all columns in the table's partitioning function
and since MySQL does not support check constraints either, the only ugly workaround I can think of is enforcing the uniqueness manually though triggers:
CREATE TABLE t (
id INT NOT NULL,
archived TINYINT(1) NOT NULL DEFAULT 0,
PRIMARY KEY (id, archived), -- required by MySQL limitation on partitioning
)
PARTITION BY LIST(archived) (
PARTITION pActive VALUES IN (0),
PARTITION pArchived VALUES IN (1)
);
CREATE TRIGGER tInsert
BEFORE INSERT ON t FOR EACH ROW
CALL checkUnique(NEW.id);
CREATE TRIGGER tUpdate
BEFORE UPDATE ON t FOR EACH ROW
CALL checkUnique(NEW.id);
DELIMITER //
CREATE PROCEDURE checkUnique(pId INT)
BEGIN
DECLARE flag INT;
DECLARE message VARCHAR(50);
SELECT id INTO flag FROM t WHERE id = pId;
IF flag IS NOT NULL THEN
-- the below tries to mimic the error raised
-- by a regular UNIQUE constraint violation
SET message = CONCAT("Duplicate entry '", pId, "'");
SIGNAL SQLSTATE "23000" SET
MYSQL_ERRNO = 1062,
MESSAGE_TEXT = message,
COLUMN_NAME = "id";
END IF;
END //
(fiddle)
MySQL's limitations on partitioning being such a downer (in particular its lack of support for foreign keys), I would advise against using it altogether until the table grows so large that it becomes an actual concern.

how to create unique sequence in mysql

How to create unique sequence number in MySQL?
The scenario goes like, that in table1 the data say "A" in row1 can appear more than once.
So when it is first occurring a sequence no will be assigned to it, and the same will be assigned to it each time it appears again.
But the data "B" (say the next data entered) will have the next sequence no.
So i cant use auto_increment in this scenario. Say, i have to check the conditions c1 and c2 for this unique sequence no.
Looking for a stored procedure to implement this. Hope i am clear with my problem.
CREATE TABLE `seq` (
`n` BIGINT NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`n`)
);
DELIMITER $$
DROP PROCEDURE IF EXISTS getseq$$
CREATE FUNCTION getseq() RETURN BIGINT
BEGIN
DECLARE r BIGINT;
INSERT INTO `seq` (`n`) VALUES (NULL);
SELECT MAX(`n`) INTO r FROM `seq`;
COMMIT;
RETURN r;
END$$
DELIMITER ;
Concurrent transactions should be revised, but I think it would work, because indeed the mark of auto-increment is shared across transactions, but not the id resulting from the insert you made into the table.

MySQL Auto-Inc Bug?

In my MySQL table I've created an ID column which I'm hoping to auto-increment in order for it to be the primary key.
I've created my table:
CREATE TABLE `test` (
`id` INT( 11 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`name` VARCHAR( 50 ) NOT NULL ,
`date_modified` DATETIME NOT NULL ,
UNIQUE (
`name`
)
) TYPE = INNODB;
then Inserted my records:
INSERT INTO `test` ( `id` , `name` , `date_modified` )
VALUES (
NULL , 'TIM', '2011-11-16 12:36:30'
), (
NULL , 'FRED', '2011-11-16 12:36:30'
);
I'm expecting that my ID's for the above are 1 and 2 (respectively). And so far this is true.
However when I do something like this:
insert into test (name) values ('FRED')
on duplicate key update date_modified=now();
then insert a new record, I'm expecting it to be 3, however now I'm shown an ID of 4; skipping the place spot for 3.
Normally this wouldn't be an issue but I'm using millions of records which have thousands of updates every day.. and I don't really want to even have to think about running out of ID's simply because I'm skipping a ton of numbers..
Anyclue to why this is happening?
MySQL version: 5.1.44
Thank you
My guess is that the INSERT itself kicks off the code that generates the next ID number. When the duplicate key is detected, and ON DUPLICATE KEY UPDATE is executed, the ID number is abandoned. (No SQL dbms guarantees that automatic sequences will be without gaps, AFAIK.)
MySQL docs say
In general, you should try to avoid using an ON DUPLICATE KEY UPDATE
clause on tables with multiple unique indexes.
That page also says
If a table contains an AUTO_INCREMENT column and INSERT ... ON
DUPLICATE KEY UPDATE inserts or updates a row, the LAST_INSERT_ID()
function returns the AUTO_INCREMENT value.
which stops far short of describing the internal behavior I guessed at above.
Can't test here; will try later.
Is it possible to change your key to unsigned bigint - 18,446,744,073,709,551,615 is a lot of records - thus delaying the running out of ID's
Found this in mysql manual http://dev.mysql.com/doc/refman/5.1/en/example-auto-increment.html
Use a large enough integer data type for the AUTO_INCREMENT column to hold the
maximum sequence value you will need. When the column reaches the upper limit of
the data type, the next attempt to generate a sequence number fails. For example,
if you use TINYINT, the maximum permissible sequence number is 127.
For TINYINT UNSIGNED, the maximum is 255.
More reading here http://dev.mysql.com/doc/refman/5.6/en/information-functions.html#function_last-insert-id it could be inferred that the insert to a transactional table is a rollback so the manual says "LAST_INSERT_ID() is not restored to that before the transaction"
What about for a possible solution to use a table to generate the ID's and then insert into your main table as the PK using LAST_INSERT_ID();
From the manual:
Create a table to hold the sequence counter and initialize it:
mysql> CREATE TABLE sequence (id INT NOT NULL);
mysql> INSERT INTO sequence VALUES (0);
Use the table to generate sequence numbers like this:
mysql> UPDATE sequence SET id=LAST_INSERT_ID(id+1);
mysql> SELECT LAST_INSERT_ID();
The UPDATE statement increments the sequence counter and causes the next call to
LAST_INSERT_ID() to return the updated value. The SELECT statement retrieves that
value. The mysql_insert_id() C API function can also be used to get the value.
See Section 20.9.3.37, “mysql_insert_id()”.
It's really a bug how you can see here: http://bugs.mysql.com/bug.php?id=26316
But, apparently, they fixed it on 5.1.47 and it was declared as INNODB plugin problem.
A duplicate, but same problem, you can see here too: http://bugs.mysql.com/bug.php?id=53791 referenced to the first page mentioned here in this answer.

MySql autoincrement column increases by 10 problem

I am a user of a some host company which serves my MySql database. Due to their replication problem, the autoincrement values increses by 10, which seems to be a common problem.
My question is how can I simulate (safely) autoincrement feature so that the column have an consecutive ID?
My idea was to implement some sequence mechanism to solve my problem, but I do not know if it is a best option. I had found such a code snipset over the web:
DELIMITER ;;
DROP TABLE IF EXISTS `sequence`;;
CREATE TABLE `sequence` (
`name` CHAR(16) NOT NULL,
`value` BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;;
DROP FUNCTION IF EXISTS `nextval`;
CREATE FUNCTION `nextval`(thename CHAR(16) CHARSET latin1)
RETURNS BIGINT UNSIGNED
MODIFIES SQL DATA
SQL SECURITY DEFINER
BEGIN
INSERT INTO `sequence`
SET `name`=thename,
`value`=(#val:=##auto_increment_offset)+##auto_increment_increment
ON DUPLICATE KEY
UPDATE `value`=(#val:=`value`)+##auto_increment_increment;
RETURN #val;
END ;;
DELIMITER ;
which seems quite all correct. My second question is if this solution is concurrent-safe? Of course INSERT statement is, but what about ON DUPLICATE KEY update?
Thanks!
Why do you need to have it in the first place?
Even with auto_increment_increment == 1 you are not guaranteed, that the autoincrement field in the table will have consecutive values (what if the rows are deleted, hmm?).
With autoincrement you are simply guaranteed by the db engine, that the field will be unique, nothing else, really.
EDIT: I want to reiterate: In my opinion, it is not a good idea to assume things like concurrent values of an autoincrement column, because it is going to bite you later.
EDIT2: Anyway, this can be "solved" by an "on insert" trigger
create trigger "sequence_b_ins" before insert on `sequence`
for each row
begin
NEW.id = select max(id)+1 from `sequence`;
end
Or something along these lines (sorry, not tested)
Another option would be to use a stored proc to do the insert and have it either select max id from your table or keep another table with the current id being used and update as id's are used.