How to Generate unique random number? - mysql

Can someone tell me a good method for automatically placing a unique random number in a mysql database table when a new record is created.

I would create a table with a pool of numbers:
Create Table pool (number int PRIMARY KEY AUTO_INCREMENT);
Insert Into pool (),(),(),(),(),(),(),(),…;
And then define a trigger which picks one random number from that pool:
CREATE TRIGGER pickrand BEFORE INSERT ON mytable
FOR EACH ROW BEGIN
DECLARE nr int;
SET nr = (SELECT number FROM pool order by rand() limit 1);
DELETE FROM pool WHERE number = nr;
SET NEW.nr = nr;
END
In order to avoid concurrency issues you have to run queries in transactions. If performance becomes an issue (because of the slow order by rand()) you can change the way to select a random record.

Your criteria of unique and random are generally conflicting. You can easily accomplish one or the other, but both is difficult, and would require evaluating every row when testing a new potential number to insert.
The best method that meets your criteria is to generate a UUID with the UUID function.
The better choice would be to re-evaluate your design and remove one of the (unique, random) criteria.

Your best option is Autoincrement Column
see here for syntax
perhaps for Random number
try this!
select FLOOR(RAND() * 401) + 100
edit
SELECT FLOOR(RAND() * 99999) AS sup_rand
FROM table
WHERE "sup_rand" NOT IN (SELECT sup_rand FROM table)
LIMIT 1
Steps:
1.Generate a random number
2.Check if its already present
3.if not continue use the number
4.Repeat

Related

MySQL auto increment column on update to unique value

I have a table that looks something like:
name: posts
columns:
- id
- sequence_id
- text
- like_count
The ID is a standard auto-incremented unique integer index.
The sequence ID should be similar — it is also a unique integer index.
The difference is that I want to increment it to the new maximum value in the table on update or insert not just insert.
Currently I accomplish this with a Redis counter that I increment before inserting into the database.
I’d like to drop the Redis dependency, though, and do this with purely MySQL if possible.
One option I thought of was creating a post_updates table which just has an auto-incrementing ID which I use the same way but this feels worse.
Another option is doing a full column scan to do max(sequence_id) + 1 but that isn’t really scalable and it would have race conditions.
Are there some better options I’m not aware of?
There's a solution in the manual to simulate a sequence object in MySQL:
CREATE TABLE sequence (id INT NOT NULL);
INSERT INTO sequence VALUES (0);
The sequence table doesn't need an auto-increment itself, and it stores only one row.
When you are ready to increment your sequence_id in the table you describe, you first update the sequence value in the following manner:
UPDATE sequence SET id = LAST_INSERT_ID(id+1);
Now you can use that value when inserting/updating your table:
INSERT INTO posts SET sequence_id = LAST_INSERT_ID(), text = '...';
or
UPDATE posts SET sequence_id = LAST_INSERT_ID(), like_count = like_count+1;
Note that this is a little bit more heavy than an auto-increment, because updating the sequence table creates a row lock, not just an auto-increment lock. This could put an upper limit on the rate of traffic, because many concurrent clients trying to access this table would queue up on each other.
If you want a very high-throughput solution, I'd recommend to continue using Redis.

Fastest way to remove a HUGE set of row keys from a table via primary key? [duplicate]

I have two tables. Let's call them KEY and VALUE.
KEY is small, somewhere around 1.000.000 records.
VALUE is huge, say 1.000.000.000 records.
Between them there is a connection such that each KEY might have many VALUES. It's not a foreign key but basically the same meaning.
The DDL looks like this
create table KEY (
key_id int,
primary key (key_id)
);
create table VALUE (
key_id int,
value_id int,
primary key (key_id, value_id)
);
Now, my problem. About half of all key_ids in VALUE have been deleted from KEY and I need to delete them in a orderly fashion while both tables are still under high load.
It would be easy to do
delete v
from VALUE v
left join KEY k using (key_id)
where k.key_id is null;
However, as it's not allowed to have a limit on multi table delete I don't like this approach. Such a delete would take hours to run and that makes it impossible to throttle the deletes.
Another approach is to create cursor to find all missing key_ids and delete them one by one with a limit. That seems very slow and kind of backwards.
Are there any other options? Some nice tricks that could help?
Any solution that tries to delete so much data in one transaction is going to overwhelm the rollback segment and cause a lot of performance problems.
A good tool to help is pt-archiver. It performs incremental operations on moderate-sized batches of rows, as efficiently as possible. pt-archiver can copy, move, or delete rows depending on options.
The documentation includes an example of deleting orphaned rows, which is exactly your scenario:
pt-archiver --source h=host,D=db,t=VALUE --purge \
--where 'NOT EXISTS(SELECT * FROM `KEY` WHERE key_id=`VALUE`.key_id)' \
--limit 1000 --commit-each
Executing this will take significantly longer to delete the data, but it won't use too many resources, and without interrupting service on your existing database. I have used it successfully to purge hundreds of millions of rows of outdated data.
pt-archiver is part of the Percona Toolkit for MySQL, a free (GPL) set of scripts that help common tasks with MySQL and compatible databases.
Directly from MySQL documentation
If you are deleting many rows from a large table, you may exceed the
lock table size for an InnoDB table. To avoid this problem, or simply
to minimize the time that the table remains locked, the following
strategy (which does not use DELETE at all) might be helpful:
Select the rows not to be deleted into an empty table that has the same structure as the original table:
INSERT INTO t_copy SELECT * FROM t WHERE ... ;
Use RENAME TABLE to atomically move the original table out of the way and rename the copy to the original name:
RENAME TABLE t TO t_old, t_copy TO t;
Drop the original table:
DROP TABLE t_old;
No other sessions can access the tables involved while RENAME TABLE
executes, so the rename operation is not subject to concurrency
problems. See Section 12.1.9, “RENAME TABLE Syntax”.
So in Your case You may do
INSERT INTO value_copy SELECT * FROM VALUE WHERE key_id IN
(SELECT key_id FROM `KEY`);
RENAME TABLE value TO value_old, value_copy TO value;
DROP TABLE value_old;
And according to what they wrote here RENAME operation is quick and number of records doesn't affect it.
What about this for having a limit?
delete x
from `VALUE` x
join (select key_id, value_id
from `VALUE` v
left join `KEY` k using (key_id)
where k.key_id is null
limit 1000) y
on x.key_id = y.key_id AND x.value_id = y.value_id;
First, examine your data. Find the keys which have too many values to be deleted "fast". Then find out which times during the day you have the smallest load on the system. Perform the deletion of the "bad" keys during that time. For the rest, start deleting them one by one with some downtime between deletes so that you don't put to much pressure on the database while you do it.
May be instead of limit divide whole set of rows into small parts by key_id:
delete v
from VALUE v
left join KEY k using (key_id)
where k.key_id is null and v.key_id > 0 and v.key_id < 100000;
then delete rows with key_id in 100000..200000 and so on.
You can try to delete in separated transaction batches.
This is for MSSQL, but should be similar.
declare #i INT
declare #step INT
set #i = 0
set #step = 100000
while (#i< (select max(VALUE.key_id) from VALUE))
BEGIN
BEGIN TRANSACTION
delete from VALUE where
VALUE.key_id between #i and #i+#step and
not exists(select 1 from KEY where KEY.key_id = VALUE.key_id and KEY.key_id between #i and #i+#step)
set #i = (#i+#step)
COMMIT TRANSACTION
END
Create a temporary table!
drop table if exists batch_to_delete;
create temporary table batch_to_delete as
select v.* from `VALUE` v
left join `KEY` k on k.key_id = v.key_id
where k.key_id is null
limit 10000; -- tailor batch size to your taste
-- optional but may help for large batch size
create index batch_to_delete_ix_key on batch_to_delete(key_id);
create index batch_to_delete_ix_value on batch_to_delete(value_id);
-- do the actual delete
delete v from `VALUE` v
join batch_to_delete d on d.key_id = v.key_id and d.value_id = v.value_id;
To me this is a kind of task the progress of which I would want to see in a log file. And I would avoid solving this in pure SQL, I would use some scripting in Python or other similar language. Another thing that would bother me is that lots of LEFT JOINs with WHERE IS NOT NULL between the tables might cause unwanted locks, so I would avoid JOINs either.
Here is some pseudo code:
max_key = select_db('SELECT MAX(key) FROM VALUE')
while max_key > 0:
cur_range = range(max_key, max_key-100, -1)
good_keys = select_db('SELECT key FROM KEY WHERE key IN (%s)' % cur_range)
keys_to_del = set(cur_range) - set(good_keys)
while 1:
deleted_count = update_db('DELETE FROM VALUE WHERE key IN (%s) LIMIT 1000' % keys_to_del)
db_commit
log_something
if not deleted_count:
break
max_key -= 100
This should not bother the rest of the system very much, but may take long. Another issue is to optimize the table after you deleted all those rows, but this is another story.
If the target columns are properly indexed this should go fast,
DELETE FROM `VALUE`
WHERE NOT EXISTS(SELECT 1 FROM `key` k WHERE k.key_id = `VALUE`.key_id)
-- ORDER BY key_id, value_id -- order by PK is good idea, but check the performance first.
LIMIT 1000
Alter the limit from 10 to 10000 to get acceptable performance, and rerun it several times.
Also take in mind that this mass deletes will perform locks and backups for each row ..
multiple the execution time for each row several times ...
There are some advanced methods to prevent this, but the easiest workaround
is just to put a transaction around this query.
Do you have SLAVE or Dev/Test environment with same data?
The first step is to find out your data distribution if you are worried about a particular key having 1 million value_ids
SELECT v.key_id, COUNT(IFNULL(k.key_id,1)) AS cnt
FROM `value` v LEFT JOIN `key` k USING (key_id)
WHERE k.key_id IS NULL
GROUP BY v.key_id ;
EXPLAIN PLAN for above query is much better than adding
ORDER BY COUNT(IFNULL(k.key_id,1)) DESC ;
Since you don't have partitioning on key_id (too many partitions in your case) and want to keep database running during your delete process, the option is to delete in chucks with SLEEP() between different key_id deletes to avoid overwhelming server. Don't forget to keep an eye on your binary logs to avoid disk filling.
The quickest way is :
Stop application so data is not changed.
Dump key_id and value_id from VALUE table with only matching key_id in KEY table by using
mysqldump YOUR_DATABASE_NAME value --where="key_id in (select key_id from YOUR_DATABASE_NAME.key)" --lock-all --opt --quick --quote-names --skip-extended-insert > VALUE_DATA.txt
Truncate VALUE table
Load data exported in step 2
Start Application
As always, try this in Dev/Test environment with Prod data and same infrastructure so you can calculate downtime.
Hope this helps.
I am just curious what the effect would be of adding a non-unique index on key_id in table VALUE. Selectivity is not high at all (~0.001) but I am curious how that would affect the join performance.
Why don't you split your VALUE table into several ones according to some rule like key_id module some power of 2 (like 256 for example)?

mysql 2.5M rows slow count

Having a simple mysql table with id (primary key) and hash (index). Some other columns (varchar / int) but no queries on them needed.
My total table size is around 350MB with 2.5M rows.
SELECT COUNT(*) FROM table LIMIT 1;
Is taking about 0.5 - 1s. My innodb buffer is set at 1GB. I've also tried variations (without improvements) like:
SELECT COUNT(id) FROM table LIMIT 1;
SELECT COUNT(*) FROM table WHERE id > 0 LIMIT 1;
A single
SELECT * FROM table WHERE id = 'x' LIMIT 1;
would return within 1 ms (localhost mysql). Any tips on improving the slow count (0.5 - 1s) would be greatly appreciated.
You can find a bried explanation here. In short, innodb has to make a full table scan in order to count all rows (without a where clause, which would utilize an index).
See also this answer.
BTW, I can't see any point in using LIMIT 1 in your query. Since there is no group by clause, it will always return one record.
Some time ago I have found for me, that MyISAM tables make these operations faster. But not all tables and architectures can be MyISAM. Check your schema, maybe you can switch this table to MyISAM.
Also use COUNT(1) instead of COUNT(*)
And another technique for you. Create trigger and save count in separated place. Create counter_table and folowing trigger:
DELIMITER //
CREATE TRIGGER update_counter AFTER INSERT ON table_name
FOR EACH ROW
BEGIN
UPDATE counter_table
SET counter = counter + 1
END;

How do I reset sequence numbers to become consecutive?

I've got a mysql table where each row has its own sequence number in a "sequence" column. However, when a row gets deleted, it leaves a gap. So...
1
2
3
4
...becomes...
1
2
4
Is there a neat way to "reset" the sequencing, so it becomes consecutive again in one SQL query?
Incidentally, I'm sure there is a technical term for this process. Anyone?
UPDATED: The "sequence" column is not a primary key. It is only used for determining the order that records are displayed within the app.
If the field is your primary key...
...then, as stated elsewhere on this question, you shouldn't be changing IDs. The IDs are already unique and you neither need nor want to re-use them.
Now, that said...
Otherwise...
It's quite possible that you have a different field (that is, as well as the PK) for some application-defined ordering. As long as this ordering isn't inherent in some other field (e.g. if it's user-defined), then there is nothing wrong with this.
You could recreate the table using a (temporary) auto_increment field and then remove the auto_increment afterwards.
I'd be tempted to UPDATE in ascending order and apply an incrementing variable.
SET #i = 0;
UPDATE `table`
SET `myOrderCol` = #i:=#i+1
ORDER BY `myOrderCol` ASC;
(Query not tested.)
It does seem quite wasteful to do this every time you delete items, but unfortunately with this manual ordering approach there's not a whole lot you can do about that if you want to maintain the integrity of the column.
You could possibly reduce the load, such that after deleting the entry with myOrderCol equal to, say, 5:
SET #i = 5;
UPDATE `table`
SET `myOrderCol` = #i:=#i+1
WHERE `myOrderCol` > 5
ORDER BY `myOrderCol` ASC;
(Query not tested.)
This will "shuffle" all the following values down by one.
I'd say don't bother. Reassigning sequential values is a relatively expensive operation and if the column value is for ordering purpose only there is no good reason to do that. The only concern you might have is if for example your column is UNSIGNED INT and you suspect that in the lifetime of your application you might have more than 4,294,967,296 rows (including deleted rows) and go out of range, even if that is your concern you can do the reassigning as a one time task 10 years later when that happens.
This is a question that often I read here and in other forums. As already written by zerkms this is a false problem. Moreover if your table is related with other ones you'll lose relations.
Just for learning purpose a simple way is to store your data in a temporary table, truncate the original one (this reset auto_increment) and than repopulate it.
Silly example:
create table seq (
id int not null auto_increment primary key,
col char(1)
) engine = myisam;
insert into seq (col) values ('a'),('b'),('c'),('d');
delete from seq where id = 3;
create temporary table tmp select col from seq order by id;
truncate seq;
insert into seq (col) select * from tmp;
but it's totally useless. ;)
If this is your PK then you shouldn't change it. PKs should be (mostly) unchanging columns. If you were to change them then not only would you need to change it in that table but also in any foreign keys where is exists.
If you do need a sequential sequence then ask yourself why. In a table there is no inherent or guaranteed order (even in the PK, although it may turn out that way because of how most RDBMSs store and retrieve the data). That's why we have the ORDER BY clause in SQL. If you want to be able to generate sequential numbers based on something else (time added into the database, etc.) then consider generating that either in your query or with your front end.
Assuming that this is an ID field, you can do this when you insert:
INSERT INTO yourTable (ID)
SELECT MIN(ID)
FROM yourTable
WHERE ID > 1
As others have mentioned I don't recommend doing this. It will hold a table lock while the next ID is evaluated.

Optimisation of volatile data querying

I'm trying to solve a problem with latency on a to a mysql-5.0 db.
The query itself is extremely simple: SELECT SUM(items) FROM tbl WHERE col = 'val'
There's an index on col and there are not more than 10000 values to sum in the worst case (mean of count(items) for all values of col would be around 10).
The table has up to 2M rows.
The query is run frequently enough that sometimes the execution time goes up to 10s, although 99% of them take << 1s
The query is not really cachable - in almost every case, each query like this one will be followed by an insert to that table in the next minute and showing old values is out of question (billing information).
keys are good enough - ~100% hits
The result I'm looking for is every single query < 1s. Are there any ways to improve the select time without changes to the table? Alternatively, are there any interesting changes that would help to resolve the problem? I thought about simply having a table where the current sum is updated for every col right after every insert - but maybe there are better ways to do it?
Another approach is to add a summary table:
create table summary ( col varchar(10) primary key, items int not null );
and add some triggers to tbl so that:
on insert:
insert into summary values( new.col, new.items )
on duplicate key update set summary.items = summary.items + new.items;
on delete:
update summary set summary.items = summary.items - old.items where summary.col = old.col
on update:
update summary set summary.items = summary.items - old.items where summary.col = old.col;
update summary set summary.items = summary.items + new.items where summary.col = new.col;
This will slow down your inserts, but allow you to hit a single row in the summary table for
select items from summary where col = 'val';
The biggest problem with this is bootstrapping the values of the summary table. If you can take the application offline, you can easily initialise summary with values from tbl.
insert into summary select col, sum(items) from tbl group by col;
However, if you need to keep the service running, it is a lot more difficult. If you have a replica, you can stop replication, build the summary table, install the triggers, restart replication, then failover the service to using the replica, and then repeat the process on the retired primary.
If you cannot do that, then you could update the summary table one value of col at a time to reduce the impact:
lock table write tbl, summary;
delete from summary where col = 'val';
insert into summary select col, sum(items) from tbl where col = 'val';
unlock tables;
Or if you can tolerate a prolonged outage:
lock table write tbl, summary;
delete from summary;
insert into summary select col, sum(items) from tbl group by col;
unlock tables;
A covering index should help:
create index cix on tbl (col, items);
This will enable the sum to be performed without reading from the data file - which should be faster.
You should also track how effective your key-buffer is, and whether you need to allocate more memory for it. This can be done by polling the server status and watching the 'key%' values:
SHOW STATUS LIKE 'Key%';
MySQL Manual - show status
The ratio between key_read_requests (ie. the number of index lookups) versus key_reads (ie. number of requests that required index blocks to be read from disk) is important. The higher the number of disk reads, the slower the query will run. You can improvethis by increasing the keybuffer size in the config file.