SQL history for each row - mysql

I know there are ways to make an audit table in order to get the change history for an entire table in SQL, for ex:
Is there a MySQL option/feature to track history of changes to records?
However, I want to know if there is a way to get the change history for a specific row - ie. a record of edits for row 1 in a table. If there is a way to do this, I would greatly appreciate it. Thanks!

What we have done in the past is have change history tables. The first one would be:
Change History
ch_ID Primary Key
Table_Name Name of the table for the change
Table_PK The PK from that table
Type insert, update, delete
Change_Date Date of the change
Change_By Who made the change
Second would be:
Change History Details
chd_id PK
ch_ID
Column Name
Old Value
New Value
We then used triggers on the table. You can't use stored procedure to capture the info because DBAs usually don't use stored procedures when making data changes. You can then query by table name and the primary key of the record you are interested in. You can also add screen name to the first table to enable you to get all of the changes made for a record on that screen.

I usually suggest having a second history_x table for table x. history_x in this scenario is nearly identical to x; it differs in that it's copy of x's primary key is not primary (and not auto-incrementing even if 'x''s is), and it has it's own primary key and sometimes some sort of addition changed_when datetime field.
Then two triggers are made:
AFTER INSERT ON x basically just clones a new row in x to history_x
AFTER UPDATE ON x just clones the new state of row x to history_x
How to handle DELETE varies. Often, if you're going as far as to actually delete the x record, the corresponding history records can be deleted with it. If you're just flagging the x as "retired", that is covered by the UPDATE handling. If you need to preserve the history after a delete, you can just add a x_deleted "flag" field and a DELETE trigger that clones the last state of the row, but sets the x_deleted flag in history to "true".
Also, this obviously doesn't track PK changes to x, but could if history_x has two copies of x's PK; one would be the historical PK value captured by the triggers with the rest of the fields, and the second would be bound to a foreign key that would cascade all the old history to reference the new key.
Edit: If you can take advantage of the semi-global nature of session/# variables, you can even add information such as who made the change; but often connection pooling can interfere with that (each connection is it's own session).
Edit#2/Warning: If you're storing large data such as BLOBs or large TEXT fields, they should probably NOT be cloned every update.
Oh yeah, the "changed_when" data can also be more useful if expressed as a valid_from and valid_until pair of fields. valid_until should be null for the newest history record, and when a new history record in added the previous newest should have it's valid_until field set. changed_when is enough for a log, but if you need to actually use the old values WHERE ? >= valid_from and ? < valid_until is a lot easier than WHERE valid_until < ? ORDER BY valid_until DESC LIMIT 1

Based on how it sounds to me, what you want to do is to use RowNumber() in a query.
See here for more details.

Related

How to implement temporal data in MySQL

I currently have a non-temporal MySQL DB and need to change it to a temporal MySQL DB. In other words, I need to be able to retain a history of changes that have been made to a record over time for reporting purposes.
My first thought for implementing this was to simply do inserts into the tables instead of updates, and when I need to select the data, simply doing a GROUP BY on some column and ordering by the timestamp DESC.
However, after thinking about things a bit, I realized that that will really mess things up because the primary key for each insert (which would really just be simulating a number of updates on a single record) will be different and thus mess up any linkage that uses the primary key to link to other records in the DB.
As such, my next thought was to continue updating the main tables in the DB, but also create a new insert into an "audit table" that is simply a copy of the full record after the update, and then when I needed to report on temporal data, I could use the audit table for querying purposes.
Can someone please give me some guidance or links on how to properly do this?
Thank you.
Make the given table R temporal(ie, to maintain the history).
One design is to leave the table R as it is and create a new table R_Hist with valid_start_time and valid_end_time.
Valid time is the time when the fact is true.
The CRUD operations can be given as:
INSERT
Insert into both R
Insert into R_Hist with valid_end_time as infinity
UPDATE
Update in R
Insert into R_Hist with valid_end_time as infinity
Update valid_end_time with the current time for the “latest” tuple
DELETE
Delete from R
Update valid_end_time with the current time for the “latest” tuple
SELECT
Select from R for ‘snapshot’ queries (implicitly ‘latest’ timestamp)
Select from R_Hist for temporal operations
Instead, you can choose to design new table for every attribute of table R. By this particular design you can capture attribute level temporal data as opposed to entity level in the previous design. The CRUD operations are almost similar.
I did a column Deleted and a column DeletedDate. Deleted defaults to false and deleted date null.
Complex primary key on IDColumn, Deleted, and DeletedDate.
Can index by deleted so you have real fast queries.
No duplicate primary key on your IDColumn because your primary key includes deleted and deleted date.
Assumption: you won't write to the same record more than once a millisecond. Could cause duplicate primary key issue if deleted date is not unique.
So then I do a transaction type deal for updates: select row, take results, update specific values, then insert. Really its an update to deleted true deleted date to now() then you have it spit out the row after update and use that to get primary key and/or any values not available to whatever API you built.
Not as good as a temporal table and takes some discipline but it builds history into 1 table that is easy to report on.
I may start updating the deleted date column and change it to added/Deleted in addition to the added date so I can sort records by 1 column, the added/deleted column while always updated the addedBy column and just set the same value as the added/Deleted column for logging sake.
Either way could just do a complex case when not null as addedDate else addedDate as addedDate order by AddedDate desc. so, yeah, whatever, this works.

mysql delete,autoincrement

I have a table in MySQL using InnoDB and a column is there with the name "id".
So my problem is that whenever I delete the last row from the table and then insert a new value, the new value gets inserted after the deleted id.
I mean suppose my id is 32, and I want to delete it and then if I insert a new row after delete, then the column id auto-increments to 33. So the serial format is broken ie,id =30,31,33 and no 32.
So please help me out to assign the id 32 instead of 33 when ever I insert after deleting the last column.
Short answer: No.
Why?
It's unnecessary work. It doesn't matter, if there are gaps in the serial number.
If you don't want that, don't use auto_increment.
Don't worry, you won't run out of numbers if your column is of type int or even bigint, I promise.
There are reasons why MySQL doesn't automatically decrease the autoincrement value when you delete a row. Those reasons are
danger of broken data integrity (imagine multiple users perform deletes or inserts...doubled entries may occur or worse)
errors may occur when you use master slave replication or transactions
and so on ...
I highly recommend you don't waste time on this! It's really, really error prone.
You have two major misunderstandings about how a relational database works:
there is no such thing as the "last row" in a relational database.
The ID (assuming that is your primary key) has no meaning whatsoever. It doesn't matter if the new row is assigned the 33, 35354 or 236532652632. It's just a value to uniquely identify that row.
Do not rely on consecutive values in your primary key column.
And do not try the max(id)+1 approach. It will simply not work in a system with more than one transaction.
You should stop fighting this, even using SELECT max(id) will not fix this properly when using transactional database engine like Innodb.
Why you might ask? Imagine that you have 2 transactions, A and B, that started almost at the same time, both doing INSERT. First transaction A needs new row id, and it will use it from invisible sequence associated with this table (known as AUTOINCREMENT value), say 21. Another transaction B will use another successive value (say 22) - so far so good.
But, what if transaction A rolls back? Value 21 cannot be reused, and 22 is already committed. And what if there were 10 such transactions?
And max(id) can assign the same value to both A and B, so this is not valid as well.
I suppose you mean "Whenever I delete the last row from the table", isn't it?
Anyway this is how autoincrement works. It's made to keep correct data relations. If in another table you use an id of a record that has been deleted it's more correct to get an error instead of get another record when querying that id.
Anyway here you can see how to get the first free id in a field.

Merging two table entries with unique columns (MySQL)

I know full well this should never happen. Ever. However, I started working at a company recently that hasn't had the greatest database design or input validation and this situation has come up.
There is a table which we'll call 'jobs'*. Jobs has a primary key, 'ID'. The job with the ID of 1 has loads of data associated with it; However, stupidly someone has duplicated that job as id 2 (this has happened around ~500 times so far). All of the information for both needs to be merged as id 1 (or 2, it doesn't matter).
The columns ARE linked by Foreign Key with UPDATE: CASCADE and DELETE: RESTRICT. They are not all called jobs_id.
Is my only (seemingly sensible) option here to:
Change id 1 to something I can guarantee is not used (2,147,483,647)
Temporarily remove the Foreign Key DELETE: RESTRICT
Delete the entry with id 1
Update id 2 to 2,147,483,647 (to link it with all the other entries)
Change id 2,147,483,647 to id 2
Reinstate DELETE: RESTRICT
As none of the code actually performs a delete (the restriction is there just as a fail-safe (someone editing direct in DB)), and the update: cascade is left in, data shouldn't get out of sync. This does seem messy though.
This will be wrapped in a transaction.
I could write something to iterate through each table (~180) and each column to find certain names / conditions, then update from 1 to 2, but that would need maintenance when a new table / column came along.
As this has happened a lot, and I don't see a re-write to prevent it happening any time soon, the 'solution' (sticking plaster) needs to be semi-automatic.
not the table's real name. His (or her) identity has been disguised so he (or she) doesn't get bullied.
Appreciate any input.
Assuming that you know how to identify the duplicated records why not create a new table with the same structure (maybe without the FKs), then loop through the original while copying values to the new table. When you hit a duplication, fix the value when writing to the new table. Then drop the original and rename the temp to the original.
This will clean up the table but if processes are still making the duplicated entries you could use a unique key to limit the damage going forward.

How to properly clean up a table

In order to determine how often some object has been used, I use a table with the following fields;
id - objectID - timestamp
Every time an object is used, it's ID and time() are added in. This allows me to determine how often an object has been used in the last hour/minute/second etc.
After one hour, the row is useless (I'm not checking above one hour). However, it is my understanding that it is unwise to simply delete the row, because it may mess up the primary key (auto_increment ID).
So I added a field called "active". Prior to checking how often an object has been used I loop over all WHERE active=1 and set it to 0 if more than 1 hour has passed. I don't think this would give any concurrency problems between multiple users, but this leaves me with alot of unused data.
Now I'm thinking that maybe it's best to, prior to inserting new usage data, check if there is a field with active=0 and then rather than inserting a new row, update that one with the new data, and set active to 1 again. However, this would require table locking to prevent multiple clients from updating the same row.
Can anyone shed some more light on this, please?
I've never heard anywhere that deleting rows messes up primary keys.
Are you perhaps attempting to ensure that the id values automatically assigned by auto_increment match those of another table? This is not necessary - you can simply use an INTEGER PRIMARY KEY as the id column and assign the values explicitly.
You could execute an update query that match all rows older than 1 hour.
UPDATE table SET active=0 WHERE timestamp < now() - interval 1 hour

A Never Delete Relational DB Schema Design

I am considering designing a relational DB schema for a DB that never actually deletes anything (sets a deleted flag or something).
1) What metadata columns are typically used to accomodate such an architecture? Obviously a boolean flag for IsDeleted can be set. Or maybe just a timestamp in a Deleted column works better, or possibly both. I'm not sure which method will cause me more problems in the long run.
2) How are updates typically handled in such architectures? If you mark the old value as deleted and insert a new one, you will run into PK unique constraint issues (e.g. if you have PK column id, then the new row must have the same id as the one you just marked as invalid, or else all of your foreign keys in other tables for that id will be rendered useless).
If your goal is auditing, I'd create a shadow table for each table you have. Add some triggers that get fired on update and delete and insert a copy of the row into the shadow table.
Here are some additional questions that you'll also want to consider
How often do deletes occur. What's your performance budget like? This can affect your choices. The answer to your design will be different depending of if a user deleting a single row (like lets say an answer on a Q&A site vs deleting records on an hourly basis from a feed)
How are you going to expose the deleted records in your system. Is it only through administrative purposes or can any user see deleted records. This makes a difference because you'll probably need to come up with a filtering mechanism depending on the user.
How will foreign key constraints work. Can one table reference another table where there's a deleted record?
When you add or alter existing tables what happens to the deleted records?
Typically the systems that care a lot about audit use tables as Steve Prentice mentioned. It often has every field from the original table with all the constraints turned off. It often will have a action field to track updates vs deletes, and include a date/timestamp of the change along with the user.
For an example see the PostHistory Table at https://data.stackexchange.com/stackoverflow/query/new
I think what you're looking for here is typically referred to as "knowledge dating".
In this case, your primary key would be your regular key plus the knowledge start date.
Your end date might either be null for a current record or an "end of time" sentinel.
On an update, you'd typically set the end date of the current record to "now" and insert a new record the starts at the same "now" with the new values.
On a "delete", you'd just set the end date to "now".
i've done that.
2.a) version number solves the unique constraint issue somewhat although that's really just relaxing the uniqueness isn't it.
2.b) you can also archive the old versions into another table.