Selecting in order after many insertions - mysql

I have two scripts; one of them inserts rows into the database, and the other processes newly entered, so-far-unprocessed rows.
CREATE TABLE table (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, col1 VARCHAR(32), col2 VARCHAR(32));
So the first script does several separate insert queries:
INSERT INTO table (id, col1 ,col2) VALUES (0, 'val1_1', 'val1_2');
INSERT INTO table (id, col1 ,col2) VALUES (0, 'val2_1', 'val2_2');
INSERT INTO table (id, col1 ,col2) VALUES (0, 'val3_1', 'val3_2');
...
Then the second script uses something like this to select the unprocessed rows:
SELECT * FROM table WHERE id > (SELECT MAX(id FROM table_processed)) ORDER BY id LIMIT 1000;
(do some processing)
(for each id processed from table: INSERT INTO table_processed (id) VALUES ({table.id});)
Sometimes, the first script will need to insert something like 5000 rows. I noticed that there was at least one instance when the processing script seemed to skip over many of the rows (basically skipped 3000 of them), and was wondering what could cause this and how to prevent it (if it skips over them once, then the next time it'll continue to skip over them since it uses > MAX(id)).
Or is this not supposed to happen? (in which case I guess it'd have to be error with the second script query)

If 2 insert transactions are running, and a later transaction (=gets a higher auto_incremented id) is done earlier, those higher auto increment ids are visible earlier to other transactions (i.e: your processing one) then the lower ones (in a not yet committed transaction, or possibly even an rolled back one). Every INSERT gets an id of the global sequence, so those 2 transactions could not even have a single range of id's, but create a sort of striped use of said range. A good way to work is to never rely on either order or value of auto_incremented ids, do not use them for anything but an identifier.
The most obvious solutions are:
Do not use that MAX(id), but do a LEFT JOIN of table to table_processed, and use those not yet existing in table_processed, but this may be heavy on the selecting side.
Let the INSERTs do an exclusive LOCK on the table (undesirable in busy scenarios, you already seem to have multiple concurrent INSERTs).
Let the INSERTs be done with a processed=0 indexed column (possibly this is just the default value, and you can omit it in the insert), and just SELECT .. FROM table WHERE processed=0, set to 1 when done.
A simple mistake to make is to say: OK, I'll just COMMIT after every single insert so that transaction is done as soon as possible, which is still vulnerable to race conditions, so don't use that.

Related

Count(*) Vs. Max(Id)

If have a table where I do bulk imports from CSV files.
First column is the Id field with autoincrement.
What bothers me is:
When I do a
Select count(*)
And a
Select max(Id)
I get different values. I would have expected those to be identical ?
What am I missing ?
If you insert 10 rows, delete 5, then insert 10 more then your COUNT(*) will not match MAX(id).
You can also insert an id way ahead of where it should be, like in an empty table INSERT ... (id) VALUES (9000000) will kick up your MAX(id) significantly despite having only 1 row.
Rolled-back transactions can also interfere with this.
If you want to know the next increment, check the AUTO_INCREMENT value, but be aware that this is only a guess, the actual value used may differ by the time you actually get around to inserting.
If you want them to match then you need to:
Start with a table where AUTO_INCREMENT=1, as in it's either brand new or has been cleared with TRUNCATE.
Insert using auto-generated id values as one transaction, or as a series of transactions where all of them have been fully committed.

Prevent duplicate increment values during parallel transactions

I use transactions in MySQL to store orders. Each order has OrderID (BIGINT), which looks like this: XXXXXX0001, with last four digits incrementing (1620200001, 1620200002, 1620200003, ...).
The transaction works as follows:
start transaction
get new OrderID (increment by 1)
do some stuff
commit/rollback
Saving the transaction can take up to several seconds and if multiple orders are created in very short timespan, duplicate OrderID's can be inserted into database. Before first order is commited, second is assigned the same OrderID, which, at the moment is next in line.
What is best way to prevent this? Having UNIQUE OrderID does not solve it (there would be rollback in second order). I could get rid of transaction and save OrderID quicker, but this leads to other potential problems and does not entirely solve this (just reduces chances of problem happening).
Any help would be appreciated.
Read about AUTO_INCREMENT. Search for it in the manual on CREATE TABLE. It's a long page, but AUTO_INCREMENT is documented about 1/4 of the way down the page.
Briefly, you just declare the primary key with a column option:
CREATE TABLE mytable (
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
...other columns...
);
The initial value is 1, or you can make it start at a higher value:
ALTER TABLE mytable AUTO_INCREMENT=1620200001;
A table with an auto-increment column ensures that each concurrent transaction gets a unique, increasing value. There is no race condition, because the INSERT acquires a brief table-lock during which it increments the value. Unlike transaction-based locks, the auto-increment table lock is released immediately. So concurrent sessions don't have to wait for your transaction to finish.
Auto-increment is guaranteed to be unique. That is, the same value will not be allocated to multiple sessions. However, it's not guaranteed to allocate consecutive values. Also, it may allocate a value to one session, but that session decides to rollback its transaction. The value it had allocated is NOT returned to any kind of queue of values, because there has probably other sessions that have allocated the next few values in the meantime. So it's possible to "lose" values and then your table has "gaps" or non-consecutive values.
Do not worry about gaps. These could also happen even if the values were consecutive, because you might delete a row later.

SQL Table - How to add a row at the start of an old autoincrement column

I have an existing sql table with 3 columns and 100+ entries/rows. There is an id column with autoincrement.
Now, I want to add 10 new rows at the beginning of the table with id from 1 to 10. But I cannot lose any existing row. So, how do I do it?
One idea that just came to my mind is perhaps I can increase the existing id by adding 10, like 1+10 becomes 11, 25+10 becomes 35, and then I can add rows at the beginning. What will be the script for this IF this is possible?
All you need to do for this is to set the auto_increment for that table to whatever number you need to create space for the new records you want to insert.
For example, if you inserted rows with id's 1-100, you might:
Check the next auto_increment value by running:
select auto_increment as val from information_schema.tables where table_schema='myschema' and table_name='mytable';
Let's assume that value would be 101 (the value that would be used if you inserted a new row). You can "advance" the auto_increment value by running:
alter table myschema.mytable auto_increment = 111;
If you insert a new row like this:
insert into mytable (not_the_id_column) values ('test');
It will get the "next" id of 111. But if you specify id values manually, you are ok in this case as long as you use any value less than 111, so you could insert your desired records like this:
insert into mytable (id, not_the_id_column) values (101, 'test101');
insert into mytable (id, not_the_id_column) values (102, 'test102');
... -- more inserts as needed
Now, you still must take proper precautions when updating PK values, or any value that has dependencies on it (Foreign Key or otherwise), but it is completely legitimate to forcibly advance and/or backfill the id values, as long as the resulting auto_increment value doesn't duplicate one that's already in the table.
I agree with juergen d's comment that you should not do this, but I realize there are situations where this kind of thing must be done.
SELECT MAX(id)-MIN(id)+1 INTO #x FROM theTable;
UPDATE theTable SET id = id + #x;
SELECT MIN(id) INTO #x FROM theTable;
UPDATE theTable SET id = 10 + id - #x;
If the id is the primary key, value collisions within an update can cause MySQL to reject the update. (Hence the pair of updates to avoid such a possibility.)
Edit: Factoring N.B.'s strong objection into this, it would also probably be good to verify the table's next auto-increment value is not going to collide with the updated records after the update is completed. I don't have an appropriate database on hand to verify whether UPDATE statements affect it; and even if they do affect it, you may end up wanting to reduce it so as to not create an unnecessary gap (gaps should ideally not be a problem, but if they are or you are just mildly OCD, it is worth looking into).

SELECT ... FOR UPDATE from one table in multiple threads

I need a little help with SELECT FOR UPDATE (resp. LOCK IN SHARE MODE).
I have a table with around 400 000 records and I need to run two different processing functions on each row.
The table structure is appropriately this:
data (
`id`,
`mtime`, -- When was data1 set last
`data1`,
`data2` DEFAULT NULL,
`priority1`,
`priority2`,
PRIMARY KEY `id`,
INDEX (`mtime`),
FOREIGN KEY ON `data2`
)
Functions are a little different:
first function - has to run in loop on all records (is pretty fast), should select records based on priority1; sets data1 and mtime
second function - has to run only once on each records (is pretty slow), should select records based on priority2; sets data1 and mtime
They shouldn't modify the same row at the same time, but the select may return one row in both of them (priority1 and priority2 have different values) and it's okay for transaction to wait if that's the case (and I'd expect that this would be the only case when it'll block).
I'm selecting data based on following queries:
-- For the first function - not processed first, then the oldest,
-- the same age goes based on priority
SELECT id FROM data ORDER BY mtime IS NULL DESC, mtime, priority1 LIMIT 250 FOR UPDATE;
-- For the second function - only processed not processed order by priority
SELECT if FROM data ORDER BY priority2 WHERE data2 IS NULL LIMIT 50 FOR UPDATE;
But what I am experiencing is that every time only one query returns at the time.
So my questions are:
Is it possible to acquire two separate locks in two separate transactions on separate bunch of rows (in the same table)?
Do I have that many collisions between first and second query (I have troubles debugging that, any hint on how to debug SELECT ... FROM (SELECT ...) WHERE ... IN (SELECT) would be appreciated )?
Can ORDER BY ... LIMIT ... cause any issues?
Can indexes and keys cause any issues?
Key things to check for before getting much further:
Ensure the table engine is InnoDB, otherwise "for update" isn't going to lock the row, as there will be no transactions.
Make sure you're using the "for update" feature correctly. If you select something for update, it's locked to that transaction. While other transactions may be able to read the row, it can't be selected for update, updated or deleted by any other transaction until the lock is released by the original locking transaction.
To keep things clean, try explicitly starting a transaction using "START TRANSACTION", run your select "for update", do whatever you're going to do to the records that are returned, and finish up by explicitly executing a "COMMIT" to close out the transaction.
Order and limit will have no impact on the issue you're experiencing as far as I can tell, whatever was going to be returned by the Select will be the rows that get locked.
To answer your questions:
Is it possible to acquire two separate locks in two separate transactions on separate bunch of rows (in the same table)?
Yes, but not on the same rows. Locks can only exist at the row level in one transaction at a time.
Do I have that many collisions between first and second query (I have troubles debugging that, any hint on how to debug SELECT ... FROM (SELECT ...) WHERE ... IN (SELECT) would be appreciated )?
There could be a short period where the row lock is being calculated, which will delay the second query, however unless you're running many hundreds of these select for updates at once, it shouldn't cause you any significant or noticable delays.
Can ORDER BY ... LIMIT ... cause any issues?
Not in my experience. They should work just as they always would on a normal select statement.
Can indexes and keys cause any issues?
Indexes should exist as always to ensure sufficient performance, but they shouldn't cause any issues with obtaining a lock.
All points in accepted answer seem fine except below 2 points:
"whatever was going to be returned by the Select will be the rows that get locked." &
"Can indexes and keys cause any issues?
but they shouldn't cause any issues with obtaining a lock."
Instead all the rows which are internally read by DB during deciding which rows to select and return will be locked. For example below query will lock all rows of the table but might select and return only few rows:
select * from table where non_primary_non_indexed_column = ? for update
Since there is no index, DB will have to read the entire table to search for your desired row and hence lock entire table.
If you want to lock only one row either you need to specify its primary key or an indexed column in the where clause. Thus indexing becomes very important in case of locking only the appropriate rows.
This is a good reference - https://dev.mysql.com/doc/refman/5.7/en/innodb-locking-reads.html

MySQL AUTO_INCREMENT does not ROLLBACK

I'm using MySQL's AUTO_INCREMENT field and InnoDB to support transactions. I noticed when I rollback the transaction, the AUTO_INCREMENT field is not rollbacked? I found out that it was designed this way but are there any workarounds to this?
It can't work that way. Consider:
program one, you open a transaction and insert into a table FOO which has an autoinc primary key (arbitrarily, we say it gets 557 for its key value).
Program two starts, it opens a transaction and inserts into table FOO getting 558.
Program two inserts into table BAR which has a column which is a foreign key to FOO. So now the 558 is located in both FOO and BAR.
Program two now commits.
Program three starts and generates a report from table FOO. The 558 record is printed.
After that, program one rolls back.
How does the database reclaim the 557 value? Does it go into FOO and decrement all the other primary keys greater than 557? How does it fix BAR? How does it erase the 558 printed on the report program three output?
Oracle's sequence numbers are also independent of transactions for the same reason.
If you can solve this problem in constant time, I'm sure you can make a lot of money in the database field.
Now, if you have a requirement that your auto increment field never have gaps (for auditing purposes, say). Then you cannot rollback your transactions. Instead you need to have a status flag on your records. On first insert, the record's status is "Incomplete" then you start the transaction, do your work and update the status to "compete" (or whatever you need). Then when you commit, the record is live. If the transaction rollsback, the incomplete record is still there for auditing. This will cause you many other headaches but is one way to deal with audit trails.
Let me point out something very important:
You should never depend on the numeric features of autogenerated keys.
That is, other than comparing them for equality (=) or unequality (<>), you should not do anything else. No relational operators (<, >), no sorting by indexes, etc. If you need to sort by "date added", have a "date added" column.
Treat them as apples and oranges: Does it make sense to ask if an apple is the same as an orange? Yes. Does it make sense to ask if an apple is larger than an orange? No. (Actually, it does, but you get my point.)
If you stick to this rule, gaps in the continuity of autogenerated indexes will not cause problems.
I had a client needed the ID to rollback on a table of invoices, where the order must be consecutive
My solution in MySQL was to remove the AUTO-INCREMENT and pull the latest Id from the table, add one (+1) and then insert it manually.
If the table is named "TableA" and the Auto-increment column is "Id"
INSERT INTO TableA (Id, Col2, Col3, Col4, ...)
VALUES (
(SELECT Id FROM TableA t ORDER BY t.Id DESC LIMIT 1)+1,
Col2_Val, Col3_Val, Col4_Val, ...)
Why do you care if it is rolled back? AUTO_INCREMENT key fields are not supposed to have any meaning so you really shouldn't care what value is used.
If you have information you're trying to preserve, perhaps another non-key column is needed.
I do not know of any way to do that. According to the MySQL Documentation, this is expected behavior and will happen with all innodb_autoinc_lock_mode lock modes. The specific text is:
In all lock modes (0, 1, and 2), if a
transaction that generated
auto-increment values rolls back,
those auto-increment values are
“lost.” Once a value is generated for
an auto-increment column, it cannot be
rolled back, whether or not the
“INSERT-like” statement is completed,
and whether or not the containing
transaction is rolled back. Such lost
values are not reused. Thus, there may
be gaps in the values stored in an
AUTO_INCREMENT column of a table.
If you set auto_increment to 1 after a rollback or deletion, on the next insert, MySQL will see that 1 is already used and will instead get the MAX() value and add 1 to it.
This will ensure that if the row with the last value is deleted (or the insert is rolled back), it will be reused.
To set the auto_increment to 1, do something like this:
ALTER TABLE tbl auto_increment = 1
This is not as efficient as simply continuing on with the next number because MAX() can be expensive, but if you delete/rollback infrequently and are obsessed with reusing the highest value, then this is a realistic approach.
Be aware that this does not prevent gaps from records deleted in the middle or if another insert should occur prior to you setting auto_increment back to 1.
INSERT INTO prueba(id)
VALUES (
(SELECT IFNULL( MAX( id ) , 0 )+1 FROM prueba target))
If the table doesn't contain values or zero rows
add target for error mysql type update FROM on SELECT
If you need to have the ids assigned in numerical order with no gaps, then you can't use an autoincrement column. You'll need to define a standard integer column and use a stored procedure that calculates the next number in the insert sequence and inserts the record within a transaction. If the insert fails, then the next time the procedure is called it will recalculate the next id.
Having said that, it is a bad idea to rely on ids being in some particular order with no gaps. If you need to preserve ordering, you should probably timestamp the row on insert (and potentially on update).
Concrete answer to this specific dilemma (which I also had) is the following:
1) Create a table that holds different counters for different documents (invoices, receipts, RMA's, etc..); Insert a record for each of your documents and add the initial counter to 0.
2) Before creating a new document, do the following (for invoices, for example):
UPDATE document_counters SET counter = LAST_INSERT_ID(counter + 1) where type = 'invoice'
3) Get the last value that you just updated to, like so:
SELECT LAST_INSERT_ID()
or just use your PHP (or whatever) mysql_insert_id() function to get the same thing
4) Insert your new record along with the primary ID that you just got back from the DB. This will override the current auto increment index, and make sure you have no ID gaps between you records.
This whole thing needs to be wrapped inside a transaction, of course. The beauty of this method is that, when you rollback a transaction, your UPDATE statement from Step 2 will be rolled back, and the counter will not change anymore. Other concurrent transactions will block until the first transaction is either committed or rolled back so they will not have access to either the old counter OR a new one, until all other transactions are finished first.
SOLUTION:
Let's use 'tbl_test' as an example table, and suppose the field 'Id' has AUTO_INCREMENT attribute
CREATE TABLE tbl_test (
Id int NOT NULL AUTO_INCREMENT ,
Name varchar(255) NULL ,
PRIMARY KEY (`Id`)
)
;
Let's suppose that table has houndred or thousand rows already inserted and you don't want to use AUTO_INCREMENT anymore; because when you rollback a transaction the field 'Id' is always adding +1 to AUTO_INCREMENT value.
So to avoid that you might make this:
Let's remove AUTO_INCREMENT value from column 'Id' (this won't delete your inserted rows):
ALTER TABLE tbl_test MODIFY COLUMN Id int(11) NOT NULL FIRST;
Finally, we create a BEFORE INSERT Trigger to generate an 'Id' value automatically. But using this way won't affect your Id value even if you rollback any transaction.
CREATE TRIGGER trg_tbl_test_1
BEFORE INSERT ON tbl_test
FOR EACH ROW
BEGIN
SET NEW.Id= COALESCE((SELECT MAX(Id) FROM tbl_test),0) + 1;
END;
That's it! You're done!
You're welcome.
$masterConn = mysql_connect("localhost", "root", '');
mysql_select_db("sample", $masterConn);
for($i=1; $i<=10; $i++) {
mysql_query("START TRANSACTION",$masterConn);
$qry_insert = "INSERT INTO `customer` (id, `a`, `b`) VALUES (NULL, '$i', 'a')";
mysql_query($qry_insert,$masterConn);
if($i%2==1) mysql_query("COMMIT",$masterConn);
else mysql_query("ROLLBACK",$masterConn);
mysql_query("ALTER TABLE customer auto_increment = 1",$masterConn);
}
echo "Done";