A Never Delete Relational DB Schema Design - mysql

I am considering designing a relational DB schema for a DB that never actually deletes anything (sets a deleted flag or something).
1) What metadata columns are typically used to accomodate such an architecture? Obviously a boolean flag for IsDeleted can be set. Or maybe just a timestamp in a Deleted column works better, or possibly both. I'm not sure which method will cause me more problems in the long run.
2) How are updates typically handled in such architectures? If you mark the old value as deleted and insert a new one, you will run into PK unique constraint issues (e.g. if you have PK column id, then the new row must have the same id as the one you just marked as invalid, or else all of your foreign keys in other tables for that id will be rendered useless).

If your goal is auditing, I'd create a shadow table for each table you have. Add some triggers that get fired on update and delete and insert a copy of the row into the shadow table.

Here are some additional questions that you'll also want to consider
How often do deletes occur. What's your performance budget like? This can affect your choices. The answer to your design will be different depending of if a user deleting a single row (like lets say an answer on a Q&A site vs deleting records on an hourly basis from a feed)
How are you going to expose the deleted records in your system. Is it only through administrative purposes or can any user see deleted records. This makes a difference because you'll probably need to come up with a filtering mechanism depending on the user.
How will foreign key constraints work. Can one table reference another table where there's a deleted record?
When you add or alter existing tables what happens to the deleted records?
Typically the systems that care a lot about audit use tables as Steve Prentice mentioned. It often has every field from the original table with all the constraints turned off. It often will have a action field to track updates vs deletes, and include a date/timestamp of the change along with the user.
For an example see the PostHistory Table at https://data.stackexchange.com/stackoverflow/query/new

I think what you're looking for here is typically referred to as "knowledge dating".
In this case, your primary key would be your regular key plus the knowledge start date.
Your end date might either be null for a current record or an "end of time" sentinel.
On an update, you'd typically set the end date of the current record to "now" and insert a new record the starts at the same "now" with the new values.
On a "delete", you'd just set the end date to "now".

i've done that.
2.a) version number solves the unique constraint issue somewhat although that's really just relaxing the uniqueness isn't it.
2.b) you can also archive the old versions into another table.

Related

Foreign key to dynamically changing record

Let's say i have a table with a lot of records "employee_hour", and another with alot less records: "turnover_export". employee_hour has a foreign key to turnover_export. However, sometimes the linked turnover_export gets outdated and needs to be updated for a lot of employee_hour records.
How it works currently (this way has become too slow):
For each employee_hour record where the linked turnover_export is outdated, a new and identical employee_hour record is inserted with a link to the updated turnover_export
How could i design a db/process to handle this as efficiently as possible?
Notable points:
Don't want to update the employee_hour records with the new turnover_export (because this means inserting new employee_hour records for each employee_hour record where the linked turnover_export is outdated, -> point 4 of this list)
Don't want to use triggers
The design should be tied as little as possible to a certain platform/db-engine
History should be kept, nothing gets deleted, updates mean new inserts
What I have thought of:
Link to a foreign key which is not an insert id but a UUID, then keep track with another col in turnover_export whether it is the one that should be used. Doesn't work because a foreign key from a single column cannot reference multiple columns
I know I haven't thought of or tried much but honestly i can't think of anything else.
Some context on what these tables are used for:
We want to report on the turnover generated for certain employee hours. We do this by getting a employee_hour record and then multiplying a certain column in this record with a certain column in the linked turnover_export.

Mysql: possible to add constraint that prevents a one to many relation from having less than certain number of relations?

I have a user table that has many, say user_property table, where the foreign user_id is stored in the user_property table.
Now is it possible to add constraint so that a user should have at least one user property? So when a user have five properties, he can delete it one by one, but when there is only one property left, he can not delete it? I tried Googling but I am not even sure what is the search keyword for this.
The reason is, I want to avoid checking if a user have one property remaining only from the application layer, because it reads from replica, the read and write might not be synchronized, and on certain condition the user might accidentally delete all properties.
Any suggestion or different approaches is appreciated.
I don't think you can do this with a constraint. The problem is handling new users. You cannot insert a new user, because it has no properties. You cannot insert a new property, because the user reference is not valid. Ouch!
One solution involves triggers. The idea is the following:
Add to the the users table a column for the number of current properties.
Add to the users table a column for the maximum number of properties ever.
Default the two values to 0 for new users.
Add a check constraint (or trigger) that when the maximum is > 0 then the current number has to be > 0.
In any database, you need to implement the first two counts using triggers (on user_property). MySQL does not support check constraints, so the last condition also requires a trigger.
There is no constraint in SQL that does what you describe.
A foreign key constraint would ensure that every row in user_property must reference a row that exists in the user table.
But there is no constraint in SQL that does the reverse: ensure every user is referenced by at least one row in user_property.
A CHECK constraint has been mentioned by some other comments and answers. But a CHECK constraint can reference only columns of the same row. It can't reference other rows of the same table or different tables.
The most straightforward solution is to handle this in application code. That is:
Implement a function that INSERTs to user, while making sure there's also an INSERT of the first row to user_property.
Implement a function that DELETEs from user_property, but first check if it's would leave zero properties for the given user_id. If so, return an error instead of deleting the user property.
Implementing such data integrity rules in application code comes with a risk, of course. What if you have multiple apps that access the same database? You need to implement the same rules in different apps. Perhaps even in different programming languages. Sounds like a PITA.
Nevertheless, not all business rules can be implemented with simple SQL declarative constraints.

Merging two table entries with unique columns (MySQL)

I know full well this should never happen. Ever. However, I started working at a company recently that hasn't had the greatest database design or input validation and this situation has come up.
There is a table which we'll call 'jobs'*. Jobs has a primary key, 'ID'. The job with the ID of 1 has loads of data associated with it; However, stupidly someone has duplicated that job as id 2 (this has happened around ~500 times so far). All of the information for both needs to be merged as id 1 (or 2, it doesn't matter).
The columns ARE linked by Foreign Key with UPDATE: CASCADE and DELETE: RESTRICT. They are not all called jobs_id.
Is my only (seemingly sensible) option here to:
Change id 1 to something I can guarantee is not used (2,147,483,647)
Temporarily remove the Foreign Key DELETE: RESTRICT
Delete the entry with id 1
Update id 2 to 2,147,483,647 (to link it with all the other entries)
Change id 2,147,483,647 to id 2
Reinstate DELETE: RESTRICT
As none of the code actually performs a delete (the restriction is there just as a fail-safe (someone editing direct in DB)), and the update: cascade is left in, data shouldn't get out of sync. This does seem messy though.
This will be wrapped in a transaction.
I could write something to iterate through each table (~180) and each column to find certain names / conditions, then update from 1 to 2, but that would need maintenance when a new table / column came along.
As this has happened a lot, and I don't see a re-write to prevent it happening any time soon, the 'solution' (sticking plaster) needs to be semi-automatic.
not the table's real name. His (or her) identity has been disguised so he (or she) doesn't get bullied.
Appreciate any input.
Assuming that you know how to identify the duplicated records why not create a new table with the same structure (maybe without the FKs), then loop through the original while copying values to the new table. When you hit a duplication, fix the value when writing to the new table. Then drop the original and rename the temp to the original.
This will clean up the table but if processes are still making the duplicated entries you could use a unique key to limit the damage going forward.

MySql Soft delete

I have an existing application (with MySQL DB).
I just got a new requirement where I need to delete some records from one of main entity. I dont want to apply hard delete here as its risky for whole application. If I use soft delete I have to add another field is_deleted and because of that i have to update all my queries (like where is_deleted = '0').
Please let me know if there is any smarter way to handle this situation. I have to make changes in half of the queries if I introduce a new flag to handle deletes.
Your application can run without any changes. MySQL is ANSI-SPARC Architecture compliant . With external schema you achieve codd's rule 9 "Logical data independence":
Changes to the logical level (tables, columns, rows, and so on) must
not require a change to an application based on the structure. Logical
data independence is more difficult to achieve than physical data
independence.
You can rename your tables and create views with original table names. A sample:
Let's supose a table named my_data:
REMAME TABLE my_data TO my_data_flagged
ALTER TABLE my_data_flagged
ADD COLUMN is_deleted boolean NOT NULL default 0;
CREATE VIEW my_data AS
SELECT *
FROM my_data_flagged
WHERE is_deleted = '0'
Another way is create a trigger and make a copy of erased rows in independent table.
Four suggestions:
Instead of using a bit called is_deleted, use a dateTime called something like deleted_Date... have this value be NULL if it is still active, and be a timestamp for the deletion date otherwise. This way you also know when a particular record was deleted.
Instead of updating half of your queries to exclude deleted records, create a view that does this filtering, and then update your queries to use this view instead of applying the filtering everywhere.
If the soft deleted records are involved in any type of relationships, you may have to create triggers to ensure that active records can't have a parent that is flagged as deleted.
Think ahead to how you want to eventually hard-delete these soft-deleted records, and make sure that you have the appropriate integrity checks in place before performing the hard-delete.

MySQL PhpMyAdmin: Alter AUTO_INCREMENT and/or INSERT_ID

I have an invoices table which stores a single record for each invoice, with the id column (int AUTO_INCREMENT) being the primary key, but also the invoice reference number.
Now, unfortunately I've had to manual migrate some invoices generated on an old system which have a five digit id, instead of a four digit one which the current system uses.
However, even when I reset the AUTO_INCREMENT through PhpMyAdmin (Table Operations) back to the next four digit id, it still inserts a five digit one being the higher id currently in the table plus one.
From searching around, it would seem that I actually need to change the insert_id as well as the AUTO_INCREMENT ? I've tried to execute ALTER TABLE invoices SET insert_id=8125 as well as ALTER TABLE invoices insert_id=8125 but neither of these commands seem to be valid.
Can anyone explain the correct way that I can reset the AUTO_INCREMENT so that it will insert records with id's 8125 onwards, and then when it gets to 10962 it will skip over the four records I've manually added and continue sequential id's from 10966 onwards. If it won't skip over 10962 - 10966 then this doesn't really matter, as the company doesn't generate that many invoices each year so this will occur in a subsequent year hence not causing a problem hopefully.
I would really appreciate any help with this sticky situation I've found myself in! Many Thanks
First thing I'll suggest is to ditch PHPMyAdmin because it's one of the worst "applications" ever made to be used to work with MySQL. Get a proper GUI. My favourite is SQLYog.
Now on to the problem. Never, ever tamper with the primary key, don't try to "reset" it as you said or to update columns that have an integer generated by the database. As for why, the topic is broad and can be discussed in another question, just never, ever touch the primary key once you've set it up.
Second thing is that someone was deleting records of invoices hence the autoincrement is now at 10k+ rather than at 8k+. It's not a bad thing, but if you need sequential values for your invoices (such as there can't be a gap between invoices 1 and 5) then use an extra field called sequence_id or invoice_ref and use triggers to calculate that number. Don't rely on auto_increment feature that it'll reuse numbers that have been lost trough DELETE operation.
Alternatively, what you can do is export the database you've been using, find the CREATE TABLE definition for the invoices table, and find the line where it says "AUTO_INCREMENT = [some number]" and delete that statement. Import into your new database and the auto_increment will continue from the latest invoice. You could do the same by using ALTER TABLE however it's safer to re-import.