sqlalchemy, atomicity, and getting inserted id - sqlalchemy

I needed the answer in this article about how to get the id of a newly inserted database entry :
sqlalchemy flush() and get inserted id?
I am wondering about atomicity of commits. For instance, suppose that I have commited a new item to the db and then gotten the id back. I now want to do some further processing and maybe add the item id as a foreign key to another table. This breaks atomicity, as I would like to commit to the db only after I have done this extra processing. Doesn't this sound like a problem? I am facing this problem in my project.

.flush() does not commit anything. The only effect of .flush() (and a later .rollback()) is that the AUTO_INCREMENT of the table is increased (e.g., if the last committed record had ID 13 and then you .add(), .flush() and .rollback() your new record, the next inserted record will receive ID 15 instead of ID 14.

Related

Reserve a block of auto-increment ids in MySQL

I receive batches of, say, 100 items that I need to insert into three related MySQL tables: say current, recent, and historical. I want to insert each batch in each table as a group in a single insert statement for speed. The current table has an auto-increment primary key id that I need to obtain for each inserted row and use as the primary key to insert the same row in the recent and historical tables. My idea is to get the current auto_increment value for current, increment it by 100 using alter table current AUTO_INCREMENT=, then insert the 100 rows into current with programmatically set ids from the block that I just "reserved". Then I can use the same 100 reserved id values for the inserts into the recent and historical tables without having to query them again from the current table.
My question: Is there some reason that this is a bad idea? I have seen nothing about it on the web. The closest I have seen on stack overflow is Insert into an auto increment field but that is not quite the same thing. I can see possible threading issues if I tried to do this from more than one thread at a time.
I'm also open to other suggestions on how to accomplish this.
There might be concurrency issues: If another connection inserts values between the time you get the current value and you set the new value, you would get duplicate keys.
I am not aware if that can happen in your situation, however, or if the inserts happen only from your batch described above, and there is never another instance of it running in parallel.
Methinks you shoud decouple the IDs from the 3 tables and using ALTER TABLE sounds very fishy too.
The most proper way I can think of:
in recent and historical, add a colum that references to current ID; don't try to force the primary IDs to be the same.
Acquire a WRITE table lock on current.
Get the auto_increment value X for current.
Insert your 100 records; their IDs should now run from X+1 to X+100.
Release the table lock.
Insert records in recent and historical with the know IDs in the extra column.
Note: I'm not sure if the auto_increment value points to the next ID, or the current highest value. If you use MAX(id) then you should use the code above.
This is [a bit] late, but in case someone else has this same question (as I did):
As Ethan pointed out in his comment, auto_increment is an internal MySQL utility to produce unique keys. Since you have the ability to generate your own id values external to MySQL, I suggest removing the auto_increment overhead from the table (but keep id as PK, for transport to the other tables). You can then insert your own id values along with the data.
Obviously once you do this you'll have to program your own incrementing id values. To retrieve a "starting point" for each batch and maintain the speed of a single batch INSERT call, create another table (I'll call in management) with just a single record of last_id, which is equivalent to, but independent of, max(id) of your three primary tables. Then, each time a new batch is ready to be processed, start a transaction on management with a write lock, read management.last_id, UPDATE management.last_id to (last_id+1)+number in batch, then close the transaction. You now have sequential id values to insert that are reserved for that batch because any future calls to management.last_id will the next-larger set of id values.
The write-locked transaction removes any concurrency issues (as stated in FrankPI's answer) because any other processes attempting to read management must wait for the lock to be removed and will return the value after the UPDATE. This also removes the id ambiguity in JvO's answer: "...IDs should now run from X+1 to X+100", which can be a dangerous assumption.

mysql delete,autoincrement

I have a table in MySQL using InnoDB and a column is there with the name "id".
So my problem is that whenever I delete the last row from the table and then insert a new value, the new value gets inserted after the deleted id.
I mean suppose my id is 32, and I want to delete it and then if I insert a new row after delete, then the column id auto-increments to 33. So the serial format is broken ie,id =30,31,33 and no 32.
So please help me out to assign the id 32 instead of 33 when ever I insert after deleting the last column.
Short answer: No.
Why?
It's unnecessary work. It doesn't matter, if there are gaps in the serial number.
If you don't want that, don't use auto_increment.
Don't worry, you won't run out of numbers if your column is of type int or even bigint, I promise.
There are reasons why MySQL doesn't automatically decrease the autoincrement value when you delete a row. Those reasons are
danger of broken data integrity (imagine multiple users perform deletes or inserts...doubled entries may occur or worse)
errors may occur when you use master slave replication or transactions
and so on ...
I highly recommend you don't waste time on this! It's really, really error prone.
You have two major misunderstandings about how a relational database works:
there is no such thing as the "last row" in a relational database.
The ID (assuming that is your primary key) has no meaning whatsoever. It doesn't matter if the new row is assigned the 33, 35354 or 236532652632. It's just a value to uniquely identify that row.
Do not rely on consecutive values in your primary key column.
And do not try the max(id)+1 approach. It will simply not work in a system with more than one transaction.
You should stop fighting this, even using SELECT max(id) will not fix this properly when using transactional database engine like Innodb.
Why you might ask? Imagine that you have 2 transactions, A and B, that started almost at the same time, both doing INSERT. First transaction A needs new row id, and it will use it from invisible sequence associated with this table (known as AUTOINCREMENT value), say 21. Another transaction B will use another successive value (say 22) - so far so good.
But, what if transaction A rolls back? Value 21 cannot be reused, and 22 is already committed. And what if there were 10 such transactions?
And max(id) can assign the same value to both A and B, so this is not valid as well.
I suppose you mean "Whenever I delete the last row from the table", isn't it?
Anyway this is how autoincrement works. It's made to keep correct data relations. If in another table you use an id of a record that has been deleted it's more correct to get an error instead of get another record when querying that id.
Anyway here you can see how to get the first free id in a field.

Merging two table entries with unique columns (MySQL)

I know full well this should never happen. Ever. However, I started working at a company recently that hasn't had the greatest database design or input validation and this situation has come up.
There is a table which we'll call 'jobs'*. Jobs has a primary key, 'ID'. The job with the ID of 1 has loads of data associated with it; However, stupidly someone has duplicated that job as id 2 (this has happened around ~500 times so far). All of the information for both needs to be merged as id 1 (or 2, it doesn't matter).
The columns ARE linked by Foreign Key with UPDATE: CASCADE and DELETE: RESTRICT. They are not all called jobs_id.
Is my only (seemingly sensible) option here to:
Change id 1 to something I can guarantee is not used (2,147,483,647)
Temporarily remove the Foreign Key DELETE: RESTRICT
Delete the entry with id 1
Update id 2 to 2,147,483,647 (to link it with all the other entries)
Change id 2,147,483,647 to id 2
Reinstate DELETE: RESTRICT
As none of the code actually performs a delete (the restriction is there just as a fail-safe (someone editing direct in DB)), and the update: cascade is left in, data shouldn't get out of sync. This does seem messy though.
This will be wrapped in a transaction.
I could write something to iterate through each table (~180) and each column to find certain names / conditions, then update from 1 to 2, but that would need maintenance when a new table / column came along.
As this has happened a lot, and I don't see a re-write to prevent it happening any time soon, the 'solution' (sticking plaster) needs to be semi-automatic.
not the table's real name. His (or her) identity has been disguised so he (or she) doesn't get bullied.
Appreciate any input.
Assuming that you know how to identify the duplicated records why not create a new table with the same structure (maybe without the FKs), then loop through the original while copying values to the new table. When you hit a duplication, fix the value when writing to the new table. Then drop the original and rename the temp to the original.
This will clean up the table but if processes are still making the duplicated entries you could use a unique key to limit the damage going forward.

A Never Delete Relational DB Schema Design

I am considering designing a relational DB schema for a DB that never actually deletes anything (sets a deleted flag or something).
1) What metadata columns are typically used to accomodate such an architecture? Obviously a boolean flag for IsDeleted can be set. Or maybe just a timestamp in a Deleted column works better, or possibly both. I'm not sure which method will cause me more problems in the long run.
2) How are updates typically handled in such architectures? If you mark the old value as deleted and insert a new one, you will run into PK unique constraint issues (e.g. if you have PK column id, then the new row must have the same id as the one you just marked as invalid, or else all of your foreign keys in other tables for that id will be rendered useless).
If your goal is auditing, I'd create a shadow table for each table you have. Add some triggers that get fired on update and delete and insert a copy of the row into the shadow table.
Here are some additional questions that you'll also want to consider
How often do deletes occur. What's your performance budget like? This can affect your choices. The answer to your design will be different depending of if a user deleting a single row (like lets say an answer on a Q&A site vs deleting records on an hourly basis from a feed)
How are you going to expose the deleted records in your system. Is it only through administrative purposes or can any user see deleted records. This makes a difference because you'll probably need to come up with a filtering mechanism depending on the user.
How will foreign key constraints work. Can one table reference another table where there's a deleted record?
When you add or alter existing tables what happens to the deleted records?
Typically the systems that care a lot about audit use tables as Steve Prentice mentioned. It often has every field from the original table with all the constraints turned off. It often will have a action field to track updates vs deletes, and include a date/timestamp of the change along with the user.
For an example see the PostHistory Table at https://data.stackexchange.com/stackoverflow/query/new
I think what you're looking for here is typically referred to as "knowledge dating".
In this case, your primary key would be your regular key plus the knowledge start date.
Your end date might either be null for a current record or an "end of time" sentinel.
On an update, you'd typically set the end date of the current record to "now" and insert a new record the starts at the same "now" with the new values.
On a "delete", you'd just set the end date to "now".
i've done that.
2.a) version number solves the unique constraint issue somewhat although that's really just relaxing the uniqueness isn't it.
2.b) you can also archive the old versions into another table.

Do I need to lock a MySQL table when doing a SELECT followed by an INSERT?

I'm no database guru, so I'm curious if a table lock is necessary in the following circumstance:
We have a web app that lets users add entries to the database via an HTML form
Each entry a user adds must have a unique URL
The URL should be generated on the fly, by pulling the most recent ID from the database, adding one, and appending it to the newly created entry
The app is running on ExpressionEngine (I only mention this in case it makes my situation easier to understand for those familiar with the EE platform)
Relevant DB Columns
(exp_channel_titles)
entry_id (primary key, auto_increment)
url_title (must be unique)
My Hypothetical Solution -- is table locking required here?
Let's say there are 100 entries in the table, and each entry in the table has a url_title like entry_1, entry_2, entry_3, etc., all the way to entry_100. Each time a user adds an entry, my script would do something like this:
Query (SELECT) the table to determine the last entry_id and assign it to the variable $last_id
Add 1 to the returned value, and assign the sum to the variable $new_id
INSERT the new entry, setting the url_title field of the latest entry to entry_$new_id (the 101st entry in the table would thus have a url_title of entry_101)
Since my database knowledge is limited, I don't know if I need to worry about locking here. What if a thousand people try to add entries to the database within a 10 second period? Does MySQL automatically handle this, or do I need to lock the table while each new entry is added, to ensure each entry has the correct id?
Running on the MyISAM engine, if that makes a difference.
I think you should look at one of two approaches:
Use and AUTO_INCREMENT column to assign the id
Switching from MyISAM to the InnoDb storage engine which is fully transactional and wrapping your queries in a transaction