I know of two ways to delete data from a database table
DELETE it forever
Use a flag like isActive/isDeleted
Now the problem with isActive is that I have to track everywhere in my SQL queries that whether the record is active or not. Using DELETE however gets rid of the data forever.
What would be the best way to backup this data?
Assuming I have multiple tables in a database, should I have a common function which just backs everything up and stores it in another table (in XML probably?) or is there any other way.
I am using MySQL but am curious about techniques used in other DBs as well.
Replace the table with a view that hides the inactive items.
Or write a trigger on DELETE that backs up the row to an archive table.
You could use a trigger that fires on deleting records to back them up into some kind of graveyard table.
You could use an isDeleted column and defien a view which selects all columns except isDeleted with the condition isDeleted=false. Then have all your stps work only with the view.
You could maintain a history table, where you back the record up and time stamp
One of the biggest reasons for not deleting data is that it may be required for a relation - for example the the user may decide to delete an old customer from the database, but you still need the customer record because it is referenced by old invoices (which may have a much longer lifespan).
Based on this the best solution is often the "IsDeleted" type of column, combined with a view (Quassnoi has mentioned partitioning, which can help with performance issues that might pop up due to a lot of invisible data).
You can partition your tables on the DELETED column and define the views which would include the condition:
… AND deleted = 0
This will make the queries over the active data just as simple and efficient.
Well, if you were using SqlServer you can use triggers, which will allow you to move the record to a deleted table.
Related
So, I came up with an idea to store my user information and the updates they make to their own profiles in a way that it is always possible to rollback (as an option to give to the user, for auditing and support purposes, etc.) while at the same time improving (?) the security and prevent malicious activity.
My idea is to store the user's info in rows but never allow the API backend to delete or update those rows, only to insert new ones that should be marked as the "current" data row. I created a graphical explanation:
Schema image
The potential issues that I come up with this model is the fact that users may update the information too frequently, bloating up the database (1 million users and an average of 5 updates per user are 5 million entries). However, for this I came up with the idea of putting apart the rows with "false" in the "current" column through partitioning, where they should not harm the performance and will await to be cleaned up every certain time.
Am I right to choose this model? Is there any other way to do such a thing?
I'd also use a second table user_settings_history.
When a setting is created, INSERT it in the user_settings_history table, along with a timestamp of when it was created. Then also UPDATE the same settings in the user_settings table. There will be one row per user in user_settings, and it will always be the current settings.
So the user_settings would always have the current settings, and the history table would have all prior sets of settings, associated with the date they were created.
This simplifies your queries against the user_settings table. You don't have to modify your queries to filter for the current flag column you described. You just know that the way your app works, the values in user_settings are defined as current.
If you're concerned about the user_settings_history table getting too large, the timestamp column makes it fairly easy to periodically DELETE rows over 180 days old, or whatever number of days seems appropriate to you.
By the way, 5 million rows isn't so large for a MySQL database. You'd want your queries to use an index where appropriate, but the size alone isn't disadvantage.
I have a PHP application which use MySQL database. It has table called profile which store user's details. Now there is a need of keeping snapshot of that profile when he perform a task. Which means whole table row related to a user must be cloned.
I found two ways of doing that.
1) Add another column to table to mention whether it was cloned. Then his original profile can be separated. (original/cloned). Profile data will be maintained in one table.
Other method is ..
2) Add another table similar to profile (with same fields) and store cloned profiles in that. Profile data will be maintained in two tables.
What is most efficient in terms of performance and usability ?
If you have it in one table with just a column/field to distinguish whether it's a clone or original then you will have always double the number of records to handle. Whereas if it is on another table you have only one table to worry about each time unless you need both at one time. Another thing is if you have a separate table you have a virtual back-up for your main table. So, in any case that one is in trouble you have something to fall back on and vice versa. Additionally you don't put yourself in danger of mixing up the records if it is a separate table. In other words I would prefer your second approach rather than the first one.
This is for a CRM application using PHP/MySQL. Various entities like customer, contact, note, etc, can be "deleted" by the user. Rather than actually deleting the entity from the database, I just want it to appear deleted to the application, but be kept in the DB and able to be "restored" if needed at a later time. Maybe even add some kind of "recycle bin" to the app.
I've thought of several ways to do this:
Move the deleted entity to another table. (customer to customer_deleted)
Change an attribute on the entity. (enabled to false)
I'm sure there are other ways and that each have their own implications on DB size, performance, etc, I'm just wondering what's the generally recommended way to do something like this?
I would go a combination of both:
Set a flag deleted to true
Use a cronjob to move the entries after a while to a tabelle of type ARCHIVE
If you need to restore the entry, select into the article table and delete from Archive
Why i would go this way?
If a customer deleted the wrong one, the restore could be done instand
After a few weeks/month the article table may grow up to much, so i would archive all entries that are deleted for 1 week p.a.
A common practice is to set a deleted_at column to the date at which the entity was deleted by the user (defaults to null). You may also include a deleted_by column for marking who deleted it. Using some kind of deleted column makes FK relationships easier to work with since these wont break. By moving the row to a new table you would have to update FK (and then update them again if you ever undelete). The downside is that you have to ensure all your queries exclude deleted rows (where this wouldnt be a problem if you moved the row to a new table). Many ORM's make this filtering easier so it depends on what you are using.
I am a bit rusty with mysql and trying to jump in again..So sorry if this is too easy of a question.
I basically created a data model that has a table called "Master" with required fields of a name and an IDcode and a then a "Details" table with a foreign key of IDcode.
Now here's where its getting tricky..I am entering:
INSERT INTO Details (Name, UpdateDate) Values (name, updateDate)
I get an error: saying IDcode on details doesn't have a default value..so I add one then it complains that Field 'Master_IDcode' doesn't have a default value
It all makes sense but I'm wondering if there's any easy way to do what I am trying to do. I want to add data into details and if no IDcode exists, I want to add an entry into the master table. The problem is I have to first add the name to the fund Master..wait for a unique ID to be generated(for IDcode) then figure that out and add it to my query when I enter the master data. As you can imagine the queries are going to probably get quite long since I have many tables.
Is there an easier way? where everytime I add something it searches by name if a foreign key exists and if not it adds it on all the tables that its linked to? Is there a standard way people do this? I can't imagine with all the complex databases out there people have not figured out a more easier way.
Sorry if this question doesn't make sense. I can add more information if needed.
p.s. this maybe a different question but I have heard of Django for python and that it helps creates queries..would it help my situation?
Thanks so much in advance :-)
(decided to expand on the comments above and put it into an answer)
I suggest creating a set of staging tables in your database (one for each data set/file).
Then use LOAD DATA INFILE (or insert the rows in batches) into those staging tables.
Make sure you drop indexes before the load, and re-create what you need after the data is loaded.
You can then make a single pass over the staging table to create the missing master records. For example, let's say that one of your staging table contains a country code that should be used as a masterID. You could add the master record by doing something along the lines of:
insert
into master_table(country_code)
select distinct s.country_code
from staging_table s
left join master_table m on(s.country_code = m.country_code)
where m.country_code is null;
Then you can proceed and insert the rows into the "real" tables, knowing that all detail rows references a valid master record.
If you need to get reference information along with the data (such as translating some code) you can do this with a simple join. Also, if you want to filter rows by some other table this is now also very easy.
insert
into real_table_x(
key
,colA
,colB
,colC
,computed_column_not_present_in_staging_table
,understandableCode
)
select x.key
,x.colA
,x.colB
,x.colC
,(x.colA + x.colB) / x.colC
,c.understandableCode
from staging_table_x x
join code_translation c on(x.strange_code = c.strange_code);
This approach is a very efficient one and it scales very nicely. Variations of the above are commonly used in the ETL part of data warehouses to load massive amounts of data.
One caveat with MySQL is that it doesn't support hash joins, which is a join mechanism very suitable to fully join two tables. MySQL uses nested loops instead, which mean that you need to index the join columns very carefully.
InnoDB tables with their clustering feature on the primary key can help to make this a bit more efficient.
One last point. When you have the staging data inside the database, it is easy to add some analysis of the data and put aside "bad" rows in a separate table. You can then inspect the data using SQL instead of wading through csv files in yuor editor.
I don't think there's one-step way to do this.
What I do is issue a
INSERT IGNORE (..) values (..)
to the master table, wich will either create the row if it doesn't exist, or do nothing, and then issue a
SELECT id FROM master where someUniqueAttribute = ..
The other option would be stored procedures/triggers, but they are still pretty new in MySQL and I doubt wether this would help performance.
I have a database in which tables have 4 common columns - Createdby,createdon,modifiedby,modifiedon.
Now, the purpose of these columns is track who modified the record.Is there an alternative to this design?
Also, how to update these columns - Should I use triggers? But, for "by" columns I need the userid.(we are using linq-to-sql)
thanks
Assuming that you still want to keep track of who changed and created each row and when, you can either have what you have now or have a separate table just to keep track of the changes, using triggers to write to that table every time that a change has happened on one of the other 4 tables.
Yes, usually triggers are used to update those kinds of columns. The userid shouldn't be a problem when you save the trigger. You may be using Linq but that should not matter because the trigger will be in SQL Server anyway.