MySql Soft delete - mysql

I have an existing application (with MySQL DB).
I just got a new requirement where I need to delete some records from one of main entity. I dont want to apply hard delete here as its risky for whole application. If I use soft delete I have to add another field is_deleted and because of that i have to update all my queries (like where is_deleted = '0').
Please let me know if there is any smarter way to handle this situation. I have to make changes in half of the queries if I introduce a new flag to handle deletes.

Your application can run without any changes. MySQL is ANSI-SPARC Architecture compliant . With external schema you achieve codd's rule 9 "Logical data independence":
Changes to the logical level (tables, columns, rows, and so on) must
not require a change to an application based on the structure. Logical
data independence is more difficult to achieve than physical data
independence.
You can rename your tables and create views with original table names. A sample:
Let's supose a table named my_data:
REMAME TABLE my_data TO my_data_flagged
ALTER TABLE my_data_flagged
ADD COLUMN is_deleted boolean NOT NULL default 0;
CREATE VIEW my_data AS
SELECT *
FROM my_data_flagged
WHERE is_deleted = '0'
Another way is create a trigger and make a copy of erased rows in independent table.

Four suggestions:
Instead of using a bit called is_deleted, use a dateTime called something like deleted_Date... have this value be NULL if it is still active, and be a timestamp for the deletion date otherwise. This way you also know when a particular record was deleted.
Instead of updating half of your queries to exclude deleted records, create a view that does this filtering, and then update your queries to use this view instead of applying the filtering everywhere.
If the soft deleted records are involved in any type of relationships, you may have to create triggers to ensure that active records can't have a parent that is flagged as deleted.
Think ahead to how you want to eventually hard-delete these soft-deleted records, and make sure that you have the appropriate integrity checks in place before performing the hard-delete.

Related

MySQL Trigger to update corresponding column in another table

I'm trying to write a MySQL trigger for a table update (and a similar one for insert) that will take the updated columns and update corresponding columns in another table.
My set-up is this: I have one table (A) with several columns of numerical values and a record number Primary Key. I have another table (B) with identical column names but with short text descriptors that relate to each numerical value and also a record number as a Foreign Key referring to table A. Both of these tables may grow over time to include more columns - always matching each other - each with a simple predictable name (sticking with integers for now). All records are 1:1.
My hope was that I could write triggers for both update and insert on table A that would look at the numbers and, based on some simple logic, assign a descriptor to the corresponding record in table B (inserting that record in the case of the insert trigger). It got rather complicated quickly because I had to query INFORMATION_SCHEMA.COLUMNS to identify all current column names in table A, check each OLD vs NEW to verify that column was updated (for the update trigger anyway), do some logic to determine the appropriate descriptor, then INSERT/UPDATE the corresponding column in table B. I can't figure out how to set up a procedure/trigger that doesn't require storing column names in a variable to dynamically build an SQL statement. This is, of course, not allowed in a trigger and I have made some attempts at getting around this by moving the dynamic SQL statement into a separate stored procedure. None of this has worked and I've run into so many roadblocks, I'm coming to the conclusion that I'm going about this in entirely the wrong way.
Since I'm very new to database design, I just don't know what question to ask at this point other than, is there a better way or alternatively, is there a fix to my approach outlined above?
As always, I've searched thoroughly and not found any questions that answer mine but, if you see one that does, please point me that way!

MySQL - Storing Default Values for System

I have a few tables storing their corresponding records for my system. For example, there could be a table called templates and logos. But for each table, one of the rows will be a default in the system. I would have normally added a is_default column for each table, but all of the rows except for 1 would have been 0.
Another colleague of mine sees another route, in which there is a system_defaults table. And that table has a column for each table. For example, this table would have a template_id column and a logo_id column. Then that column stores the corresponding default.
Is one way more correct than the other generally? The first way, there are many columns with the same value, except for 1. And the second, I suppose I just have to do a join to get the details, and the table grows sideways whenever I add a new table that has a default.
The solutions mainly differ in the ways to make sure that no more than one default value is assigned for each table.
is_default solution: Here it may happen that more than one record of a table has the value 1. It depends on the SQL dialect of your database whether this can be excluded by a constraint. As far as I understand MySQL, this kind of constraint can't be expressed there.
Separate table solution: Here you can easily make sure by your table design that at most one default is present per table. By assigning not null constraints, you can also force defaults for specific tables, or not. When you introduce a new table, you are extending your database (and the software working on it) anyway, so the additional attribute on the default table won't hurt.
A middle course might be the following: Have a table
Defaults
id
table_name
row_id
with one record per table, identified by the table name. Technically, the problem of more than one default per table may also occur here. But if you only insert records into this table when a new table gets introduced, then your operative software will only need to perform updates on this table, never inserts. You can easily check this via code inspection.

How to implement temporal data in MySQL

I currently have a non-temporal MySQL DB and need to change it to a temporal MySQL DB. In other words, I need to be able to retain a history of changes that have been made to a record over time for reporting purposes.
My first thought for implementing this was to simply do inserts into the tables instead of updates, and when I need to select the data, simply doing a GROUP BY on some column and ordering by the timestamp DESC.
However, after thinking about things a bit, I realized that that will really mess things up because the primary key for each insert (which would really just be simulating a number of updates on a single record) will be different and thus mess up any linkage that uses the primary key to link to other records in the DB.
As such, my next thought was to continue updating the main tables in the DB, but also create a new insert into an "audit table" that is simply a copy of the full record after the update, and then when I needed to report on temporal data, I could use the audit table for querying purposes.
Can someone please give me some guidance or links on how to properly do this?
Thank you.
Make the given table R temporal(ie, to maintain the history).
One design is to leave the table R as it is and create a new table R_Hist with valid_start_time and valid_end_time.
Valid time is the time when the fact is true.
The CRUD operations can be given as:
INSERT
Insert into both R
Insert into R_Hist with valid_end_time as infinity
UPDATE
Update in R
Insert into R_Hist with valid_end_time as infinity
Update valid_end_time with the current time for the “latest” tuple
DELETE
Delete from R
Update valid_end_time with the current time for the “latest” tuple
SELECT
Select from R for ‘snapshot’ queries (implicitly ‘latest’ timestamp)
Select from R_Hist for temporal operations
Instead, you can choose to design new table for every attribute of table R. By this particular design you can capture attribute level temporal data as opposed to entity level in the previous design. The CRUD operations are almost similar.
I did a column Deleted and a column DeletedDate. Deleted defaults to false and deleted date null.
Complex primary key on IDColumn, Deleted, and DeletedDate.
Can index by deleted so you have real fast queries.
No duplicate primary key on your IDColumn because your primary key includes deleted and deleted date.
Assumption: you won't write to the same record more than once a millisecond. Could cause duplicate primary key issue if deleted date is not unique.
So then I do a transaction type deal for updates: select row, take results, update specific values, then insert. Really its an update to deleted true deleted date to now() then you have it spit out the row after update and use that to get primary key and/or any values not available to whatever API you built.
Not as good as a temporal table and takes some discipline but it builds history into 1 table that is easy to report on.
I may start updating the deleted date column and change it to added/Deleted in addition to the added date so I can sort records by 1 column, the added/deleted column while always updated the addedBy column and just set the same value as the added/Deleted column for logging sake.
Either way could just do a complex case when not null as addedDate else addedDate as addedDate order by AddedDate desc. so, yeah, whatever, this works.

A Never Delete Relational DB Schema Design

I am considering designing a relational DB schema for a DB that never actually deletes anything (sets a deleted flag or something).
1) What metadata columns are typically used to accomodate such an architecture? Obviously a boolean flag for IsDeleted can be set. Or maybe just a timestamp in a Deleted column works better, or possibly both. I'm not sure which method will cause me more problems in the long run.
2) How are updates typically handled in such architectures? If you mark the old value as deleted and insert a new one, you will run into PK unique constraint issues (e.g. if you have PK column id, then the new row must have the same id as the one you just marked as invalid, or else all of your foreign keys in other tables for that id will be rendered useless).
If your goal is auditing, I'd create a shadow table for each table you have. Add some triggers that get fired on update and delete and insert a copy of the row into the shadow table.
Here are some additional questions that you'll also want to consider
How often do deletes occur. What's your performance budget like? This can affect your choices. The answer to your design will be different depending of if a user deleting a single row (like lets say an answer on a Q&A site vs deleting records on an hourly basis from a feed)
How are you going to expose the deleted records in your system. Is it only through administrative purposes or can any user see deleted records. This makes a difference because you'll probably need to come up with a filtering mechanism depending on the user.
How will foreign key constraints work. Can one table reference another table where there's a deleted record?
When you add or alter existing tables what happens to the deleted records?
Typically the systems that care a lot about audit use tables as Steve Prentice mentioned. It often has every field from the original table with all the constraints turned off. It often will have a action field to track updates vs deletes, and include a date/timestamp of the change along with the user.
For an example see the PostHistory Table at https://data.stackexchange.com/stackoverflow/query/new
I think what you're looking for here is typically referred to as "knowledge dating".
In this case, your primary key would be your regular key plus the knowledge start date.
Your end date might either be null for a current record or an "end of time" sentinel.
On an update, you'd typically set the end date of the current record to "now" and insert a new record the starts at the same "now" with the new values.
On a "delete", you'd just set the end date to "now".
i've done that.
2.a) version number solves the unique constraint issue somewhat although that's really just relaxing the uniqueness isn't it.
2.b) you can also archive the old versions into another table.

Do I need to lock a MySQL table when doing a SELECT followed by an INSERT?

I'm no database guru, so I'm curious if a table lock is necessary in the following circumstance:
We have a web app that lets users add entries to the database via an HTML form
Each entry a user adds must have a unique URL
The URL should be generated on the fly, by pulling the most recent ID from the database, adding one, and appending it to the newly created entry
The app is running on ExpressionEngine (I only mention this in case it makes my situation easier to understand for those familiar with the EE platform)
Relevant DB Columns
(exp_channel_titles)
entry_id (primary key, auto_increment)
url_title (must be unique)
My Hypothetical Solution -- is table locking required here?
Let's say there are 100 entries in the table, and each entry in the table has a url_title like entry_1, entry_2, entry_3, etc., all the way to entry_100. Each time a user adds an entry, my script would do something like this:
Query (SELECT) the table to determine the last entry_id and assign it to the variable $last_id
Add 1 to the returned value, and assign the sum to the variable $new_id
INSERT the new entry, setting the url_title field of the latest entry to entry_$new_id (the 101st entry in the table would thus have a url_title of entry_101)
Since my database knowledge is limited, I don't know if I need to worry about locking here. What if a thousand people try to add entries to the database within a 10 second period? Does MySQL automatically handle this, or do I need to lock the table while each new entry is added, to ensure each entry has the correct id?
Running on the MyISAM engine, if that makes a difference.
I think you should look at one of two approaches:
Use and AUTO_INCREMENT column to assign the id
Switching from MyISAM to the InnoDb storage engine which is fully transactional and wrapping your queries in a transaction