Alternative to Identity column - sql-server-2008

we have Orders table with an identity column (OrderID) but our order number is composed by OrderType (2 chars), OrderYear (2 chars) and OrderID (6 chars), totally 10 chars (i.e. XX12123456).
This counter has limitations: we can arrive to identity 999999 as OrderID . Next order will have ID composed by 7 chars. Obviously we cannot ave duplicates order ids.
So we have created a table prefilled with progressive OrderID and OrderYear (from 100000 to 999999, order year from 12 to 16, for instance): this stored procedure begins a transacation with SERIALIZABLE isolation level, take first order id not used, update it as used and commit the transaction.
Being our Orders table, i'm worried about deadlocks on executing order id calculation stored procedure or duplicated orderids.
I'll test this with a console application that create multiple concurrency threads and try to extract orderids simulating a production load.
Doubts are:
Exists another method to simulate an identity column safely?
May consider usage of triggers?
May consider differente isolation level?
Other ideas? :D
Thanks!
[EDIT]
After googling and reading a bunch of MSDN documentation, i've found many examples showing how managing errors and dealocks and approaching a type of automatic reply directly from SP, as follow:
CREATE PROCEDURE [dbo].[sp_Ordine_GetOrderID]
#AnnoOrdine AS NVARCHAR(2) = NULL OUTPUT,
#IdOrdine AS INT = NULL OUTPUT
AS
SET NOCOUNT ON
DECLARE #retry AS INT
SET #retry = 2
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
WHILE (#retry > 0)
BEGIN
BEGIN TRY
BEGIN TRANSACTION OrderID
SELECT TOP 1 #AnnoOrdine = AnnoOrdine, #IdOrdine = IdOrdine
FROM ORDINI_PROGRESSIVI --WITH (ROWLOCK)
WHERE Attivo = 1
--ORDER BY AnnoOrdine ASC, IDOrdine ASC
UPDATE ORDINI_PROGRESSIVI WITH (UPDLOCK)
SET Attivo = 0
WHERE AnnoOrdine = #AnnoOrdine AND IdOrdine = #IdOrdine
IF ISNULL(#IdOrdine, '') = '' OR ISNULL(#AnnoOrdine,'') = ''
BEGIN
RAISERROR('Deadlock', 1, 1205)
END
SET #retry = 0
COMMIT TRANSACTION OrderID
SELECT #AnnoOrdine AS AnnoOrdine, #IdOrdine AS IdOrdine
END TRY
BEGIN CATCH
IF (ERROR_NUMBER() = 1205)
SET #retry = #retry - 1;
ELSE
SET #retry = -1;
IF XACT_STATE() <> 0
ROLLBACK TRANSACTION;
END CATCH
END
This approach reduce deadlocks (absent at all) but sometimes i got EMPTY output parameter.
Tested with 30 contemporary threads (so, 30 customers processes that insert orders at the same moment)
Here a debug log with query duration, in milliseconds: http://nopaste.info/285f558758.html
Enough robust for production?

If you do discover that the current solution is creating problems, and it's possible that it won't, then an alternative would be to have a table for each id type you want to create with an identity column and a dummy field
ie:
ABtypeID (ABID int identity(1,1), dummy varchar(1))
You can then insert a record into this table and use the built in functions to retrieve an identity.
ie
insert ABTypeID (dummy) values (null)
select Scope_Identity()
You can delete from these tables as and when you like, and truncacte them at year end to reset the id counters.
You can even wrap the insert in a transaction that gets rolled back - the identity value is not affected by the rollback

Related

Creating unique sequential bank account numbers without dead lock and race condition

I need to create a unique account number for users signing up on my web application. The account number will be created by incrementing the numbers
user 1 - 9898000000001
user 2 - 9898000000002
...
I have the following stored procedure in MySQL database
consider bank_id as '9898' below.
BEGIN
Set #initialComId = '0000001';
Set #table_value = null;
Set #t = null;
SELECT max(company_va) into #table_value FROM virtual_account_numbers
IF ((#table_value) is null and bank_id =1) then
Set #newComId = CONCAT(bank_id,'000001');
INSERT INTO virtual_account_numbers (company_va,partner_banks_id)VALUES (#newComId,bank_id);
SELECT company_va from virtual_account_numbers ORDER BY virtual_account_numbers_id DESC LIMIT 1;
END
With the above-stored procedure, I am running into a dead lock if 10 users register at the same time.
Is there a better solution to this? The account number cannot be random generated number and should be incremented ones.
All together:
Let the database manage the sequence via AUTO_INCREMENT
Use LPAD to fill up with Zeros. Example with 1 : LPAD('1', 10, '0') = 0000000001
Write the new account number into a VARCHAR field
To aid a search, put an INDEX on the account number field.

MySQL fast check if hash exists

I'm trying to create a MySQL function which takes n and m as input and generates random n unique combinations of m ids from result of query.
The function will return one combination per call, and that combination must be distinct from all previous combinations.
During generation it must check another table: if combination already exists, to continue loop until every combination stays unique. Return combination as dash separated ids or if there is no room for unique combination to return false.
So I'm getting 100 random items like this:
SELECT
`Item`.`id`
FROM
`Item`
LEFT JOIN `ItemKeyword` ON `Item`.`id` = `ItemKeyword`.`ItemID`
WHERE
(`Item`.`user_id` = '2')
AND(`ItemKeyword`.`keywordID` = 7130)
AND(`Item`.`type` = 1)
ORDER BY RAND()
LIMIT 100
Past combinations are stored as md5 of concatenation of itemIDs by -.
So I need to concatenate result of this query by - and create md5 of it. Then to send another query into second table named Combination and check with hash column if it exists or not. And continue this loop until I get n results.
I can't figure out how to achieve this correctly and fast. Any suggestion?
Update:
Whole SQL Dump is here: https://gist.github.com/anonymous/e5eb3bf1a10f9d762cc20a8146acf866
If you are testing for uniqueness via the md5, you need to sort the list before taking the md5. This can be demonstrated with SELECT MD5('1-2'), MD5('2-1');
Get rid of LEFT, it seems useless. After that, the Optimizer can choose between starting with ItemKeyword instead of Item. (Without knowing the distribution of the data, I cannot say whether this might help.)
(It would be helpful if you provided SHOW CREATE TABLE for each table. In their absence, I will assume you are using InnoDB and have PRIMARY KEY(id) and PRIMARY KEY(keywordID).)
'Composite' indexes needed:
Item: INDEX(user_id, type, id)
ItemKeyword: INDEX(ItemID, keywordID)
ItemKeyword smells like a many:many mapping table. Most such tables can be improved, starting with tossing the id. See 7 tips on many:many .
I am somewhat lost in your secondary processing.
My tips on RAND may or may not be helpful.
Schema Critique
A PRIMARY KEY is a UNIQUE KEY is an INDEX; eliminate redundant indexes.
INT(4) -- the (4) means nothing; INT is always 32-bits (4 bytes) with a large range. See SMALLINT UNSIGNED (2 bytes, 0..64K range).
An MD5 should be declared CHAR(32) CHARACTER SET ascii, not 255, not utf8. (latin1 is OK.)
The table Combination (id + hash) seems to be useless. Instead, simply change KEY md5 (md5) USING BTREE, to UNIQUE(md5) in the table Item.
You have started toward utf8mb4 with SET NAMES utf8mb4;, yet the tables (and their columns) are still utf8. Emoji and Chinese need utf8mb4; most other text does not.
After addressing these issues, the original Question may be solved (as well as doing some cleanup). If now, please add some further clarification.
Minified
1. Get a sorted list of m unique ids. (I need "sorted" for the next step, and since you are looking for "combinations", it seems that "permutations" are not needed.)
SELECT GROUP_CONCAT(id) AS list
FROM (
SELECT id FROM tbl
ORDER BY RAND()
LIMIT $m
) AS x;
2. Check for uniqueness. Do this by taking MD5(list) (from above) and checking in a table of 'used' md5's. Note: Unless you are asking for a lot of combinations among a small list of ids, dups are unlikely (though not impossible).
3. Deliver the list. However, it is a string of ids separated by commas. Splitting this is best done in application code, not MySQL functions.
4. What will you do with the list? This could be important because it may be convenient to fold step 4 in with step 3.
Bottom line: I would do only step 1 and part of step 2 in SQL; I would build a 'function' in the application code to do the rest.
Permutations
DROP FUNCTION IF EXISTS unique_perm;
DELIMITER //
CREATE FUNCTION unique_perm()
RETURNS VARCHAR(255) CHARACTER SET ascii
NOT DETERMINISTIC
SQL SECURITY INVOKER
BEGIN
SET #n := 0;
iterat: LOOP
SELECT SUBSTRING_INDEX(
GROUP_CONCAT(province ORDER BY RAND() SEPARATOR '-'),
'-', 3) INTO #list -- Assuming you want M=3 items
FROM world.provinces;
SET #md5 := MD5(#list);
INSERT IGNORE INTO md5s (md5) VALUES (#md5); -- To prevent dups
IF ROW_COUNT() > 0 THEN -- Check for dup
RETURN #list; -- Got a unique permutation
END IF;
SET #n := #n + 1;
IF #n > 20 THEN
RETURN NULL; -- Probably ran out of combinations
END IF;
END LOOP iterat;
END;
//
DELIMITER ;
Output:
mysql> SELECT unique_perm(), unique_perm(), unique_perm()\G
*************************** 1. row ***************************
unique_perm(): New Brunswick-Nova Scotia-Quebec
unique_perm(): Alberta-Northwest Territories-New Brunswick
unique_perm(): Manitoba-Quebec-Prince Edward Island
1 row in set (0.01 sec)
Notes:
I hard-coded M=3; adjust as needed. (It could be passed in as an arg.)
Change column and table names for your needs.
With out the test on #n, you could get in a loop if you run out of combinations. (However, if N is even modestly large, that is 'impossible', so you could remove the test.)
If the M is large enough, you will need to increase ##group_concat_max_len. Also, the RETURNS.
CREATE TABLE md5s ( md5 CHAR(32) CHARACTER SET ascii PRIMARY KEY ) ENGINE=InnoDB is needed. And, you will need to TRUNCATE md5s between batches of calls to this function.
That is a working example.
Flaw: It gives unique permutations, not unique combinations. If that is not adequate, read on...
Combinations
DROP FUNCTION IF EXISTS unique_comb;
DELIMITER //
CREATE FUNCTION unique_comb()
RETURNS VARCHAR(255) CHARACTER SET ascii
NOT DETERMINISTIC
SQL SECURITY INVOKER
BEGIN
SET #n := 0;
iterat: LOOP
SELECT GROUP_CONCAT(province ORDER BY province SEPARATOR '-') INTO #list
FROM ( SELECT province FROM world.provinces
ORDER BY RAND() LIMIT 2 ) AS x; -- Assuming you want M=2 items
SET #md5 := MD5(#list);
INSERT IGNORE INTO md5s (md5) VALUES (#md5); -- To prevent dups
IF ROW_COUNT() > 0 THEN -- Check for dup
RETURN #list; -- Got a unique permutation
END IF;
SET #n := #n + 1;
IF #n > 20 THEN
RETURN NULL; -- Probably ran out of combinations
END IF;
END LOOP iterat;
END;
//
DELIMITER ;
Output:
mysql> SELECT unique_comb(), unique_comb(), unique_comb()\G
*************************** 1. row ***************************
unique_comb(): Quebec-Yukon
unique_comb(): Ontario-Yukon
unique_comb(): New Brunswick-Nova Scotia
1 row in set (0.01 sec)
Notes:
The subquery adds some to the cost.
Note that the items in each output string are now (necessarily) ordered.

How to append an auto-incrementing value to a duplicate value?

I have access to a reporting dataset (that I don't control) that we retrieve daily from a cloud service and store in a mysql db to run advanced reporting and report combining locally with 3rd party data visualization software.
The data often has duplicate values on an id field that create problems when joining with other tables for data analysis.
For example:
+-------------+----------+------------+----------+
| workfile_id | zip_code | date | total |
+-------------+----------+------------+----------+
| 78002 | 90210 | 2016-11-11 | 2010.023 |
| 78002 | 90210 | 2016-12-22 | 427.132 |
+-------------+----------+------------+----------+
Workfile_id is duplicated because this is the same job, but additional work on the job was performed in a different month than the original work. Instead of the software creating another workfile id for the job, the same is used.
Doing joins with other tables on workfile_id is problematic when more than one of the same id is present, so I was wondering if it is possible to do one of two things:
Make duplicate workfile_id's unique. Have sql append a number to the workfile id when a duplicate is found. The first duplicate (or second occurrence of the same workfile id) would need to get a .01 appended to the end of the workfile id. Then later, if another duplicate is inserted, it would need to auto increment the appended number, say .02, and so on with any subsequent duplicate workfile_id. This method would work best with our data but I'm curious how difficult this would be for the server from a performance perspective. If I could schedule the alteration to take place after the data is inserted to speed up the initial data insert, that would be ideal.
Sum total columns and remove duplicate workfile_id row. Have a task that identifies duplicate workfile_ids and sums the financial columns of the duplicates, replacing the original total with new sum and deleting the 'new row' after the columns have been added together.
This is more messy from a data preservation perspective, but is acceptable if the first solution isn't possible.
My assumption is that there will be significant overhead to have the server compare new workfile_id values to all existing worlfile_id values each time data is inserted, but our dataset is small and new data is only inserted once daily, at 1:30am, and it also should be feasible to keep the duplicate workfile_id searching to rows inserted within the last 6 mo.
Is finding duplicates in a column (workfile_id) and appending an auto-incrementing value onto the workfile_id possible?
EDIT:
I'm having trouble getting my trigger to work based on sdsc81's answer below.
Any ideas?
DELIMITER //
CREATE TRIGGER append_subID_to_workfile_ID_salesjournal
AFTER INSERT
ON salesjournal FOR EACH ROW
BEGIN
SET #COUNTER = ( SELECT (COUNT(*)-1) FROM salesjournal WHERE workfile_id = NEW.workfile_id );
IF #COUNTER > 1 THEN
UPDATE salesjournal SET workfile_id = CONCAT(workfile_id, #COUNTER) WHERE id = NEW.id;
END IF;
END;//
DELIMITER ;
It's hard to know if the trigger isn't working at all, or if just the code in the trigger isn't working. I get no errors on insert. Is there any way to debug trigger errors?
Well, everything is posible ;)
You dont control the dataset but you can modifify the database, right?
Then you could use a trigger after every insert of a new value, and update it, if its duplicate. Something like:
SET #COUNTER = ( SELECT (COUNT(*)-1) FROM *your_table* WHERE workfile_id = NEW.workfile_id );
IF #COUNTER > 1 THEN
UPDATE *your_table* SET workfile_id = CONCAT(workfile_id, #COUNTER) WHERE some_unique_id = NEW.some_unique_id;
END IF;
If there are only one insert a day, and there is defined an index over the workfile_id value, then it shouldn't be any problem for your server at all.
Also, you could implement the second solution, doing:
DELIMITER //
CREATE TRIGGER append_subID_to_workfile_ID_salesjournal
AFTER INSERT ON salesjournal FOR EACH ROW
BEGIN
SET #COUNTER = ( SELECT (COUNT(*)-1) FROM salesjournal WHERE workfile_id = NEW.workfile_id );
IF #COUNTER > 1 THEN
UPDATE salesjournal SET total = total + NEW.total WHERE workfile_id = NEW.workfile_id AND id <> NEW.id;
DELETE FROM salesjournal WHERE id = NEW.id;
END IF;
END;//
DELIMITER ;
Hope this helps.

Using and UPDATE Trigger to Modify the Row Being Updated

I have an employee table (em) that contains, among other things, a floor Id (fl_id) and room Id (rm_id). In certain circumstances (when em_loc_cnt = 0) I want to set the em record's fl_id and rm_id to null. Below is my code so far.
I am not sure how to refer to the fl_id & rm_id in the commented line. Will I run into issues because this trigger is being called as a result of the em record being updated and I am updating that same record in the trigger?
Suggestions?
IF EXISTS (SELECT * FROM [sysobjects] WHERE [name] = 'em_upd_self_serv_t' AND [type] = 'TR')
BEGIN
DROP TRIGGER [dbo].[em_upd_self_serv_t]
END
GO
CREATE TRIGGER [dbo].[em_upd_self_serv_t]
ON [dbo].[em]
AFTER UPDATE
AS
BEGIN
DECLARE #em_id VARCHAR(35),
#em_loc_cnt INT;
SET #em_id = (SELECT em_id FROM inserted);
SET #em_loc_cnt = (SELECT COUNT(*) FROM emlocs WHERE em_id = #em_id);
IF (#em_loc_cnt = 0)
BEGIN
-- I want to set the fl_id and the rm_id to NULL
END
END;
Your fundamental flaw is that you seem to expect the trigger to be fired once per row - this is NOT the case in SQL Server. Instead, the trigger fires once per statement, and the pseudo table Inserted might contain multiple rows.
Given that that table might contain multiple rows - which one do you expect will be selected here??
SET #em_id = (SELECT em_id FROM inserted);
It's undefined - you might get the values from arbitrary rows in Inserted.
You need to rewrite your entire trigger with the knowledge the Inserted WILL contain multiple rows! You need to work with set-based operations - don't expect just a single row in Inserted !
You need to change your code to something like this:
IF EXISTS (SELECT * FROM sys.triggers WHERE [name] = 'em_upd_self_serv_t')
DROP TRIGGER [dbo].[em_upd_self_serv_t]
GO
CREATE TRIGGER [dbo].[em_upd_self_serv_t]
ON [dbo].[em]
AFTER UPDATE
AS
UPDATE dbo.em
SET fl_id = NULL, rm_id = NULL
FROM Inserted i
WHERE em_id = i.em_id
AND NOT EXISTS (SELECT * FROM dbo.emlocs WHERE em_id = i.em_id)

MySQL - If exists, get primary key. Else, add entry

My table has two columns: "id" (Auto Increment, Primary) and "number" (Unique). Now I want to the following:
if the number already exists, return the id;
else, add entry to the table and return its id.
What's the most efficient method to do this job?
Note:
There is a greater probability that the number is new;
The table will contain hundreds of thousands of records.
Thank you!
INSERT IGNORE INTO table (number) VALUES (42);
SELECT id FROM table WHERE number = 42;
That's probably the most efficient in MySQL. You could use a Stored Procedure to lump them up, which may or may not be slightly more efficient.
EDIT:
If you think it's going to be rare that new numbers come up, this will be even faster:
SELECT id FROM table WHERE number = 42;
if (!id) {
INSERT INTO table WHERE number = 42;
id = SELECT #LAST_INSERT_ID;
}
There is a possible race condition here if concurrent threads simultaneously select then insert the same number at the same time. In this case, the later insert will fail. You could recover from this by re-selecting on this error condition.
Here is one such stored function that does what you describe:
CREATE FUNCTION `spNextNumber`(pNumber int) RETURNS int(11)
BEGIN
DECLARE returnValue int;
SET returnValue := (SELECT Number FROM Tbl WHERE Number = pNumber LIMIT 1);
IF returnValue IS NULL THEN
INSERT IGNORE INTO Tbl (Number) VALUES (pNumber);
SET returnValue := pNumber; -- LAST_INSERT_ID() can give you the real, surrogate key
END IF;
RETURN returnValue;
END
I know this is old, but it is a common problem. So, for the sake of anyone searching for a solution here are 4 different ways to accomplish this task with performance benchmarks. http://mikefenwick.com/blog/insert-into-database-or-return-id-of-duplicate-row-in-mysql/.