MySQL fast check if hash exists - mysql

I'm trying to create a MySQL function which takes n and m as input and generates random n unique combinations of m ids from result of query.
The function will return one combination per call, and that combination must be distinct from all previous combinations.
During generation it must check another table: if combination already exists, to continue loop until every combination stays unique. Return combination as dash separated ids or if there is no room for unique combination to return false.
So I'm getting 100 random items like this:
SELECT
`Item`.`id`
FROM
`Item`
LEFT JOIN `ItemKeyword` ON `Item`.`id` = `ItemKeyword`.`ItemID`
WHERE
(`Item`.`user_id` = '2')
AND(`ItemKeyword`.`keywordID` = 7130)
AND(`Item`.`type` = 1)
ORDER BY RAND()
LIMIT 100
Past combinations are stored as md5 of concatenation of itemIDs by -.
So I need to concatenate result of this query by - and create md5 of it. Then to send another query into second table named Combination and check with hash column if it exists or not. And continue this loop until I get n results.
I can't figure out how to achieve this correctly and fast. Any suggestion?
Update:
Whole SQL Dump is here: https://gist.github.com/anonymous/e5eb3bf1a10f9d762cc20a8146acf866

If you are testing for uniqueness via the md5, you need to sort the list before taking the md5. This can be demonstrated with SELECT MD5('1-2'), MD5('2-1');
Get rid of LEFT, it seems useless. After that, the Optimizer can choose between starting with ItemKeyword instead of Item. (Without knowing the distribution of the data, I cannot say whether this might help.)
(It would be helpful if you provided SHOW CREATE TABLE for each table. In their absence, I will assume you are using InnoDB and have PRIMARY KEY(id) and PRIMARY KEY(keywordID).)
'Composite' indexes needed:
Item: INDEX(user_id, type, id)
ItemKeyword: INDEX(ItemID, keywordID)
ItemKeyword smells like a many:many mapping table. Most such tables can be improved, starting with tossing the id. See 7 tips on many:many .
I am somewhat lost in your secondary processing.
My tips on RAND may or may not be helpful.
Schema Critique
A PRIMARY KEY is a UNIQUE KEY is an INDEX; eliminate redundant indexes.
INT(4) -- the (4) means nothing; INT is always 32-bits (4 bytes) with a large range. See SMALLINT UNSIGNED (2 bytes, 0..64K range).
An MD5 should be declared CHAR(32) CHARACTER SET ascii, not 255, not utf8. (latin1 is OK.)
The table Combination (id + hash) seems to be useless. Instead, simply change KEY md5 (md5) USING BTREE, to UNIQUE(md5) in the table Item.
You have started toward utf8mb4 with SET NAMES utf8mb4;, yet the tables (and their columns) are still utf8. Emoji and Chinese need utf8mb4; most other text does not.
After addressing these issues, the original Question may be solved (as well as doing some cleanup). If now, please add some further clarification.
Minified
1. Get a sorted list of m unique ids. (I need "sorted" for the next step, and since you are looking for "combinations", it seems that "permutations" are not needed.)
SELECT GROUP_CONCAT(id) AS list
FROM (
SELECT id FROM tbl
ORDER BY RAND()
LIMIT $m
) AS x;
2. Check for uniqueness. Do this by taking MD5(list) (from above) and checking in a table of 'used' md5's. Note: Unless you are asking for a lot of combinations among a small list of ids, dups are unlikely (though not impossible).
3. Deliver the list. However, it is a string of ids separated by commas. Splitting this is best done in application code, not MySQL functions.
4. What will you do with the list? This could be important because it may be convenient to fold step 4 in with step 3.
Bottom line: I would do only step 1 and part of step 2 in SQL; I would build a 'function' in the application code to do the rest.

Permutations
DROP FUNCTION IF EXISTS unique_perm;
DELIMITER //
CREATE FUNCTION unique_perm()
RETURNS VARCHAR(255) CHARACTER SET ascii
NOT DETERMINISTIC
SQL SECURITY INVOKER
BEGIN
SET #n := 0;
iterat: LOOP
SELECT SUBSTRING_INDEX(
GROUP_CONCAT(province ORDER BY RAND() SEPARATOR '-'),
'-', 3) INTO #list -- Assuming you want M=3 items
FROM world.provinces;
SET #md5 := MD5(#list);
INSERT IGNORE INTO md5s (md5) VALUES (#md5); -- To prevent dups
IF ROW_COUNT() > 0 THEN -- Check for dup
RETURN #list; -- Got a unique permutation
END IF;
SET #n := #n + 1;
IF #n > 20 THEN
RETURN NULL; -- Probably ran out of combinations
END IF;
END LOOP iterat;
END;
//
DELIMITER ;
Output:
mysql> SELECT unique_perm(), unique_perm(), unique_perm()\G
*************************** 1. row ***************************
unique_perm(): New Brunswick-Nova Scotia-Quebec
unique_perm(): Alberta-Northwest Territories-New Brunswick
unique_perm(): Manitoba-Quebec-Prince Edward Island
1 row in set (0.01 sec)
Notes:
I hard-coded M=3; adjust as needed. (It could be passed in as an arg.)
Change column and table names for your needs.
With out the test on #n, you could get in a loop if you run out of combinations. (However, if N is even modestly large, that is 'impossible', so you could remove the test.)
If the M is large enough, you will need to increase ##group_concat_max_len. Also, the RETURNS.
CREATE TABLE md5s ( md5 CHAR(32) CHARACTER SET ascii PRIMARY KEY ) ENGINE=InnoDB is needed. And, you will need to TRUNCATE md5s between batches of calls to this function.
That is a working example.
Flaw: It gives unique permutations, not unique combinations. If that is not adequate, read on...
Combinations
DROP FUNCTION IF EXISTS unique_comb;
DELIMITER //
CREATE FUNCTION unique_comb()
RETURNS VARCHAR(255) CHARACTER SET ascii
NOT DETERMINISTIC
SQL SECURITY INVOKER
BEGIN
SET #n := 0;
iterat: LOOP
SELECT GROUP_CONCAT(province ORDER BY province SEPARATOR '-') INTO #list
FROM ( SELECT province FROM world.provinces
ORDER BY RAND() LIMIT 2 ) AS x; -- Assuming you want M=2 items
SET #md5 := MD5(#list);
INSERT IGNORE INTO md5s (md5) VALUES (#md5); -- To prevent dups
IF ROW_COUNT() > 0 THEN -- Check for dup
RETURN #list; -- Got a unique permutation
END IF;
SET #n := #n + 1;
IF #n > 20 THEN
RETURN NULL; -- Probably ran out of combinations
END IF;
END LOOP iterat;
END;
//
DELIMITER ;
Output:
mysql> SELECT unique_comb(), unique_comb(), unique_comb()\G
*************************** 1. row ***************************
unique_comb(): Quebec-Yukon
unique_comb(): Ontario-Yukon
unique_comb(): New Brunswick-Nova Scotia
1 row in set (0.01 sec)
Notes:
The subquery adds some to the cost.
Note that the items in each output string are now (necessarily) ordered.

Related

Update all rows in a table with incrementing text value

How to update a column for every row in a table with an incrementing text value using SQL.
I have a table with a column called called ej_number which is a unique identifier. The field format is EJnnnn, ie EJ followed by four digits. I have imported data that doesn't include a value for ej_number, but some new rows do have it set. I want to update every row without ej_number set, starting from EJ0001. I'll resolve duplication later.
I fist did it in a loop in PHP, but realised that the server would time out because of the number of rows, so I decided to do it in SQL.
My first idea was to use a loop, but my research found that row by row updates are not recommended, especially as the only way I could see to do it would use a cursor, which is also not recommended.
I was able to do it in a single statement - the code below works, but it generates a warning (using MySQL Workbench).
SET #next_number = 0;
UPDATE ej_details
SET ej_number = CASE
WHEN ej_number IS NULL THEN (
CONCAT('EJ', LPAD((#next_number:=#next_number+1), 4, '0')))
ELSE ej_number
END;
The statement does what I want, but generates this warning:
692 row(s) affected, 1 warning(s): 1287 Setting user variables within expressions is deprecated and will be removed in a future release. Please set variables in separate statements instead. Rows matched: 692 Changed: 692 Warnings: 1
I would like to know how best to do this without using a deprecated feature. I looked and found plenty of row by row solutions, but couldn't see an alternative that wasn't row by row, probably because I don't know enough to ask the right question.
I think you just want:
SET #next_number = 0;
UPDATE ej_details
SET ej_number = CONCAT('EJ',
LPAD( (#next_number:=#next_number+1), 4, '0'
)
)
WHERE ej_number IS NULL;
This is simpler, but won't change the error.
If you want to do this in a single call, then:
UPDATE ej_details CROSS JOIN
(SELECT #next_number := 0) params
SET ej_number = CONCAT('EJ',
LPAD( (#next_number:=#next_number+1), 4, '0'
)
)
WHERE ej_number IS NULL;
Unfortunately, if the column is declared as unique, then you cannot resolve duplicate values "later".
If you wanted to solve this without the error, you'll need a primary key/unique column:
UPDATE ej_details ed JOIN
(SELECT ed.*,
ROW_NUMBER() OVER (ORDER BY ej_number) as seqnum
FROM ej_details ed2
WHERE ej_number IS NULL
) ed2
ON ed2.? = ed.? -- the primary key goes here
SET ed.ej_number = CONCAT('EJ', LPAD(ed2.seqnum, 4, '0')
);
However, this version is not backwards compatible.
I would be surprised if they really removed variables from MySQL 9. It would break lots and lots and lots of code.

A column that auto increments in MYSQL

I need to create a column that auto increments from 1- (however number of rows there are). However, I need the column to reorder itself depending on the Order of my probability column. Is is possible?
I'd generally recommend against implementing that kind of ordering calculation as an explicit table field. Keeping such information up to date would create more and more overhead as the table grows. Instead, you could just ORDER BY your probability column; or if you really need the "rank" in the query result, there are a number of ways to do that, something like this should work:
SELECT #seq := seq + 1, d.*
FROM theRealData AS d, (SELECT #seq := 0) AS init
ORDER BY theRealData.probability
;
Pseudo code (i'm not looking up exact syntax as I write this, so it there might be some things I overlook) for the stored procedure I mention in the comments below (may need adjustments if I have the ordering reversed.)
CREATE PROCEDURE theProc (newID INT)
BEGIN
DECLARE newProb INT; //Not sure if it is int, but for the sake of example
DECLARE seqAt INT;
SET newProb = SELECT probability FROM theTable WHERE ID = newID;
SET seqAt = SELECT IFNULL(min(seq), 1) FROM theTable WHERE probability > newProb;
UPDATE theTable SET seq = seq + 1 WHERE seq >= seqAt;
UPDATE theTable SET seq = seqAt WHERE ID = newID;
END
If you pass all the fields inserted, instead of just the new row's id after it is inserted, then the procedure can do the insert itself and use last_insert_id() to do the rest of the work.
Modifying the primary key values can become very expensive, specially if you have related tables that point to it.
If you need to keep an order by probability, I would suggest adding an extra column with the probability_order. You can update this column after every insert or every minute, hour or day.
Alternatively, as #Uueerdo says you can just use ORDER BY when querying the table rows.

Make unique string of characters/numbers in SQL

I have a table someTable with a column bin of type VARCHAR(4). Whenever I insert to this table, bin should be a unique combination of characters and numbers. Unique in this sense meaning has not appeared before in the table in another row.
bin is in the form of AA00, where A is a character A-F and 0 is a number 0-9.
Say I insert to this table once: it should come up with a bin value which doesn't appear before. Assuming the table was empty, the first bin could be AA11. On second insertion, it should be AA12, and then AA13, etc.
AA00, AA01, ... AA09, AA10, AA11, ... AA99, AB00, AB01, ... AF99, BA00, BA01, ... FF99
It doesn't matter this table can contain only 3,600 possible rows. How do I create this code, specifically finding a bin that doesn't already exist in someTable? It can be in order as I've described or a random bin, as long as it doesn't appear twice.
CREATE TABLE someTable (
bin VARCHAR(4),
someText VARCHAR(32),
PRIMARY KEY(bin)
);
INSERT INTO someTable
VALUES('?', 'a');
INSERT INTO someTable
VALUES('?', 'b');
INSERT INTO someTable
VALUES('?', 'c');
INSERT INTO someTable
VALUES('?', 'd');
Alternatively, I can use the below procedure to insert instead:
CREATE PROCEDURE insert_someTable(tsomeText VARCHAR(32))
BEGIN
DECLARE var (VARCHAR(4) DEFAULT (
-- some code to find unique bin
);
INSERT INTO someTable
VALUES(var, tsomeText);
END
A possible outcome is:
+------+----------+
| bin | someText |
+------+----------+
| AB31 | a |
| FC10 | b |
| BB22 | c |
| AF92 | d |
+------+----------+
As Gordon said, you will have to use a trigger because it is too complex to do as a simple formula in a default. Should be fairly simple, you just get the last value (order by descending, limit 1) and increment it. Writing the incrementor will be somewhat complicated because of the alpha characters. It would be much easier in an application language, but then you run into issues of table locking and the possibility of two users creating the same value.
A better method would be to use a normal auto-increment primary key and translate it to your binary value. Consider your bin value as two base 6 characters followed by two base 10 values. You then take the id generated by MySQL which is guaranteed to be unique and convert to your special number system. Calculate the bin and store it in the bin column.
To calculate the bin:
Step one would be to get the lower 100 value of the decimal number (mod 100) - that gives you the last two digits. Convert to varchar with a leading zero.
Subtract that from the id, and divide by 100 to get the value for the first two digits.
Get the mod 6 value to determine the 3rd (from the right) digit. Convert to A-F by index.
Subtract this from what's left of the ID, and divide by 6 to get the 4th (from the right) digit. Convert to A-F by index.
Concat the three results together to form the value for the bin.
You may need to edit the following to match your table name and column names, but it should so what you are asking. One possible improvement would be to have it cancel any inserts past the 3600 limit. If you insert the 3600th record, it will duplicate previous bin values. Also, it won't insert AA00 (id=1 = 'AA01'), so it's not perfect. Lastly, you could put a unique index on bin, and that would prevent duplicates.
DELIMITER $$
CREATE TRIGGER `fix_bin`
BEFORE INSERT ON `so_temp`
FOR EACH ROW
BEGIN
DECLARE next_id INT;
SET next_id = (SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_SCHEMA=DATABASE() AND TABLE_NAME='so_temp');
SET #id = next_id;
SET #Part1 = MOD(#id,100);
SET #Temp1 = FLOOR((#id - #Part1) / 100);
SET #Part2 = MOD(#Temp1,6);
SET #Temp2 = FLOOR((#Temp1 - #Part2) / 6);
SET #Part3 = MOD(#Temp2,6);
SET #DIGIT12 = RIGHT(CONCAT("00",#Part1),2);
SET #DIGIT3 = SUBSTR("ABCDEF",#Part2 + 1,1);
SET #DIGIT4 = SUBSTR("ABCDEF",#Part3 + 1,1);
SET NEW.`bin` = CONCAT(#DIGIT4,#DIGIT3,#DIGIT12);
END;
$$
DELIMITER ;

Avoid row was cut by GROUP_CONCAT error on insert without changing group_concat_max_len

I have an insert that uses a GROUP_CONCAT. In certain scenarios, the insert fails with Row XX was cut by GROUP_CONCAT. I understand why it fails but I'm looking for a way to have it not error out since the insert column is already smaller than the group_concat_max_len. I don't want to increase group_concat_max_len.
drop table if exists a;
create table a (x varchar(10), c int);
drop table if exists b;
create table b (x varchar(10));
insert into b values ('abcdefgh');
insert into b values ('ijklmnop');
-- contrived example to show that insert column size varchar(10) < 15
set session group_concat_max_len = 15;
insert into a select group_concat(x separator ', '), count(*) from b;
This insert produces the error Row 2 was cut by GROUP_CONCAT().
I'll try to provide a few clarifications -
The data in table b is unknown. There is no way to say set group_concat_max_len to a value greater than 18.
I do know the insert column size.
Why group_concat 4 GB of data when you want the first x characters?
When the concatenated string is longer than 10 chars, it should insert the first 10 characters.
Thanks.
Your example GROUP_CONCAT is probably cooking up this value:
abcdefgh, ijklmnop
That is 18 characters long, including the separator.
Can you try something like this?
set session group_concat_max_len = 4096;
insert into a
select left(group_concat(x separator ', '),10),
count(*)
from b;
This will trim the GROUP_CONCAT result for you.
You temporarily can set the group_concat_max_len if you need to, then set it back.
I don't know MySQL very well, nor if there is a good reason to do this in the first place, but you could create a running total length, and limit the GROUP_CONCAT() to where that length is under a certain max, you'll still need to set your group_concat_max_len high enough to handle the longest single value (or utilize CASE logic to substring them to be under the max length you desire.
Something like this:
SELECT SUBSTRING(GROUP_CONCAT(col1 separator ', '),1,10)
FROM (SELECT *
FROM (SELECT col1
,#lentot := COALESCE(#lentot,0) + CHAR_LENGTH(col1) AS lentot
FROM Table1
)sub
WHERE lentot < 25
)sub2
Demo: SQL Fiddle
I don't know if it's SQL Fiddle being quirky or if there's a problem with the logic, but sometimes when running I get no output. Not big on MySQL so could definitely be me missing something. It doesn't seem like it should require 2 subqueries but filtering didn't work as expected unless it was nested like that.
Actually, a better way is to use DISTINCT.
I had a situation to add new two fields into existing stored procedure, in a way that a value for that new fields had been obtained by a LEFT JOIN, and because it may have contained a NULL value, a single "concat" value was multiplicated for some cases more than a 100 times.
Because, a group with that new field value contained many NULL values, GROUP_CONCAT exceeded maximum value (in my case 16384).

Best way for Unique Random String for MySQL Long table

I know how to create random chars both at PHP and MySQL but the question is that I have to create a 4 char random string for a table of 10 thousand or so rows. What way is the best to make sure it will remain unique?
I can use a longer string if I need to but not longer then 12.
Just to make it simple, table exists I need to add an extra column and fill it with a 4 char random string and keys must remain unique.
An option:
Put all you possible characters in a table with only one column.
val
------
0
1
...
9
a
b
...
z
Use this query
SELECT CONCAT(a.val,b.val,c.val,d.val)
FROM chars AS a
JOIN chars AS b
JOIN chars AS c
JOIN chars AS d
ORDER BY RAND()
LIMIT 10000
On the other hand if you need to get one ID at a time I see two approaches.
A. If you have a lot of unassigned IDs available.
In this case you just generate an ID and see if it's free. If not try another one.
B. If you want to keep you assigned IDs and the available IDs in the same magnitude level.
In this case it would be best to pre-generate all your IDs, shuffle them, and when you need one just pick the next available one. Say put them all in a table, and when you assign one from that table, you remove it so it can't be picked again.
If your allowed characters are 0-9a-z this means the table will occupy 364. That's just a couple of MB.
As those strings need to be unique, why not use a numeric auto-increment value and then convert that to a character based value similar to the conversion of decimal to hex.
If you choose the e.g. all characters and digits you simply need to create a routine that will convert an integer to a "base 62" number.
You can make use of the DISTINCT keyword.
For example, the following query will only return unique rows by which you can validate that your 4 char random string remains unique:
mysql> SELECT DISTINCT random_strings FROM chars;
This may be lengthy, but would allow you to create what you need:
CREATE FUNCTION gen_alphanum () RETURNS CHAR(4)
RETURN
ELT(FLOOR(1 + (RAND() * (50-1))), 'a','b','c','d','e','f','g','h','i','j','k','l','m ','n','o','p','q','r','s','t','u','v','w','x','y', 'z',
'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y', 'Z',);
)
It sounds like you've got the code in MySQL for creating these random valued strings.
Consider this option:
create a User Defined Function in MySQL. Have this function run the SQL statements to generate and return this new random string. Ensure that you use NOT EXISTS(SELECT MyRandomString FROM MyTable) within that creation statement to check that the random string doesn't already exist in the table.
When inserting new rows, use this function's return value to assign to the MyRandomString column.
to update the data existing, simply:
UPDATE MyTable
SET MyRandomString = fn_CreateSomeRandomString()
when inserting:
INSERT INTO MyTable (foo, bar, MyRandomString)
VALUES ('','', fn_CreateSomeRandomString());
Here's a sample of that UDF on PasteBin.
If you have MySQL 5.6, you can use TO_BASE64 as follows:
select LEFT( TO_BASE64( SHA(rand()) ), 6 ) ;
Alternatively if you don't have 5.6,
DELIMITER //
drop function if exists randChr //
create function randChr()
returns char
BEGIN
IF RAND() <= 0.5 THEN -- Lowercase
return CHAR( 97 + 25*rand() ) ;
ELSE -- uc
return CHAR( 65 + 25*rand() ) ;
END IF;
END //
drop function if exists randString //
create function randString( len int )
returns varchar(255)
BEGIN
SET #n = 0;
SET #res = '' ;
REPEAT
SET #res = concat( #res, randChr() ) ;
set #n = #n + 1 ;
UNTIL #n >= len END REPEAT;
return #res ;
END //
DELIMITER ;
-- USE:
select randString( 5 );
select randString( 60 );