I have an app where you can select a person from a list to view more details. Now each time a person is selected to view more details, I want to record that in a recent history table. I'm not sure the best way to approach this to have say the latest 10 person selections.
I know I need to create the history table but am wondering if I should just do an insert for each person click and select only 10 results with the most recent dates or if I should worry about just updating only 10 records to keep the row count low?
Any input would be appreciated.
I would update, otherwise you keep adding more and more data which you are not going to use. Maybe you won't run into problems with this specific case (because people won't select tens of thousands of persons a day), but in general you should be careful with just adding data without cleaning it up.
That said, the first step is the same anyway:
Adding or updating the person
So, if someone selects 'Bob', I would try to update 'Bob' in the history and set his lastselecteddate to now. If that fails, then insert 'Bob', again with the current timestamp.
Cleaning up the older history
After that you can decide whether or not to clean up old history. You could delete all rows but the newest ten, or you can keep a hundred, or not clean them at all. Also, instead of cleaning them up right away, you could make a job that does this once every day.
If you clean up right away, you can decide to not do that when you updated a person. After all, if you didn't insert one, you shouldn't have to clean-up one.
Related
So, I came up with an idea to store my user information and the updates they make to their own profiles in a way that it is always possible to rollback (as an option to give to the user, for auditing and support purposes, etc.) while at the same time improving (?) the security and prevent malicious activity.
My idea is to store the user's info in rows but never allow the API backend to delete or update those rows, only to insert new ones that should be marked as the "current" data row. I created a graphical explanation:
Schema image
The potential issues that I come up with this model is the fact that users may update the information too frequently, bloating up the database (1 million users and an average of 5 updates per user are 5 million entries). However, for this I came up with the idea of putting apart the rows with "false" in the "current" column through partitioning, where they should not harm the performance and will await to be cleaned up every certain time.
Am I right to choose this model? Is there any other way to do such a thing?
I'd also use a second table user_settings_history.
When a setting is created, INSERT it in the user_settings_history table, along with a timestamp of when it was created. Then also UPDATE the same settings in the user_settings table. There will be one row per user in user_settings, and it will always be the current settings.
So the user_settings would always have the current settings, and the history table would have all prior sets of settings, associated with the date they were created.
This simplifies your queries against the user_settings table. You don't have to modify your queries to filter for the current flag column you described. You just know that the way your app works, the values in user_settings are defined as current.
If you're concerned about the user_settings_history table getting too large, the timestamp column makes it fairly easy to periodically DELETE rows over 180 days old, or whatever number of days seems appropriate to you.
By the way, 5 million rows isn't so large for a MySQL database. You'd want your queries to use an index where appropriate, but the size alone isn't disadvantage.
I am looking for a (not too convoluted) solution for a MySQL problem. Say I have the following table (with a joint index on group and item):
Group item
nogroup item_a
group_a item_a
Then, eventually, item_a no longer belongs to group_a. So I want to do something like:
update table set group = "nogroup" where item = "item_a" on duplicate key delete.
(obviously this is not a valid symtax but I am looking for a way around this)
I still want to keep a copy of the record with nogroup because, if later on, item_a comes back, i can change its group back to group_a or any other group depending on the case. Whenever item_a is added, there is an insert and it copies all the data from the nogroup record and sets a proper group label. At that point there are two records for item_a: one with group_a and one with no group. The reason it is done this way is to reuse previous data as much as possible as a new entry(with no previous record) is much more involved and take significantly more time and processing.
Say an item belongs to group_a and group_b but suddenly it does not belong to any group: the first update to set group to "nogroup" will work but the second update will create a duplicate key entry error.
The option of "not updating the group column at all" and using "insert on duplicate key update" does not work because there won't be duplicates when the groups are different and this will lead to cases where an item does not belong to a group anymore and yet the record will still be present in the database. The option of verifying if "nogroup" exists first and then updating it to a specific group does not work either because if item_a belongs to more than one group this would update all other records to the same group.
Basically, an item can belong to 1) any number of groups including "nogroup" or 2) solely belonging to "nogroup" and there should always be a copy of at least nogroup somewhere in the database.
It looks like I won't be able to do this in just one query but if someone has a clean way of dealing with this, that would be much appreciated. Maybe some of my assumptions above are wrong and there is an easy way to do it.
Your whole process of maintaining this items-to-groups mapping sounds too complicated. Why not just have a table that has a mapping? Then, when an item is removed from a group, delete it from the table. When it is added, add it to the table. Don't bother with "nogroup".
If you want an archive table, then create one. Have an insert/update/delete trigger (whichever is or are appropriate) that will populate an archive with information that you want to keep over time.
I do not understand why re-using an existing row would be beneficial in terms of performance. There is no obvious database reason why this would be the case.
I am also confused as to why you need a "nogroup" tag at all. If you need a list of items, maintain that list in its own table. And call the table Items -- a much clearer name than "nogroup".
I agree with Gordan's approach. However if you have to do it with a single table it cannot be done in 1 SQL query. You will have to use 2 queries 1 for update and 1 for delete.
Ok, look at this scenario, I got a table in which users may modify many rows at the same time. I want that when someone is modifying records then other people can't modify the same record. Ex, there's table1 with 3 columns (ID, text, flag):
ID-text-flag
11-txt1-0
12-txt2-0
13-txt3-0
14-txt4-0
I did some research about Concurrency Control in Mysql, the 1st solution that people suggest is to use SELECT ... FOR UPDATE or SELECT ... LOCK IN SHARE MODEin Mysql. However, there is a limitation for this solution, that is u have to update immediately right after you select a record. For example:
SELECT text FROM table1 where ID<=20 FOR UPDATE;
UPDATE table1 SET text = 'new text' where ID<=20;
However, my application requires users to download the data to the Gui & then the users may spend many hours to work on that data before they commit to update.
There is a second solution, Mysql provide the Row Lock mechanism, but this solution require to use InnoDB & its overhead is quite high.
Another option is to use table lock in mysql, the overhead is quite low, but we can't lock the whole table for many hours right? For example, user A may modify the record 1 to 10, user B at the same time may need to modify record 11 to 20, so the user B shouldn't wait until the user A finishes modifying.
In my opinion, i want to have a column "flag", when user A is modifying some records, the flag of these records will turn to 1, if user B wants to modify the same records, the system will popup message saying someone is modifying it. When user A finishes, it will bring the flag to 0, & user A can modify these same record.
But there is a complicated problem for this solution, what if user A forgot to save data? if that happens then the flag is 1 forever? Then the flag should have an expired time after a few hours? & how to do that?
Maybe we need time-stamp or some mechanism to let the flag turn back to 0 after a certain time? But this is more complicated than i thought. We can't let the DB to check the time-stamp of the flag every 1 hour?
--> I am not sure if this is the most elegant solution? I have no commercial experience in DB design & i want to know how DB people manage this issue in a commercial environment?
Can you find a better & more elegant solution than all the above solutions?
As Zerkms suggested, we do not need flag at all, but we need to build the "Record comparing system" in our Application.
Ok, say any user can download data for updating, after he modifies record & before he actually updates DB, the Record comparing system will check the ID & the old text of the data (not the new data that users has just modified) with the same ID & current text of the data in DB, if they are the same then he can commit updating, but if they are different then it means someone else updated the records before him, so the system will not allow him to update. He then has to download data again & start over.
Since it is very rare that 2 users update the same records at the same time, so i think this solution is quite feasible & easy to implement & do not increase the overhead. This is called the Optimistic concurrency control as we build the Record comparing system at the application level, so no database overhead involved. I think this is the most elegant solution for my application need.
Do u think so?
Note: for this system, we can use auto increment ID (for in the case of inserting new records)
I was wondering what would be the best solution to dynamically archive rows. For instance when a user marks a task as completed, that task needs to be archived yet still accessible.
What would be the best practices for achieving this? Should I just leave it all in the same table and leave out completed tasks from the queries? I'm afraid that over time the table will become huge (1,000,000 rows in a year or less). Or should I create another table ie task_archive and query that row whenever data is needed from it?
I know similar questions have been asked before but most of them where about archiving thousands of rows simultaneously, I just need to know what would be the best method (and why) to archive 1 row at a time once it's been marked completed
For speed and ease of use, I would generally leave the row in the same table (and flag it as completed) and then later move it to an archive table. This way the user doesn't incur the delay of making that move on the spot; the move can happen as a batch process during non-busy periods.
When that move should happen depends on your application. For example, if they have a dashboard widget that shows "Recently Completed Tasks" that shows all of the tasks completed in the past week (and lets them drill in to see details), it might make sense to move the rows to the archive a week after they've been completed. Or if they frequently need to look at tasks from the current semester (for an academic app) but rarely for previous semesters, make the batch move happen at the end of the semester.
If the table is indexed 1,000,000 rows shouldn't be that big a deal, honestly.
You could use a trigger to capture that the order was marked completed, remove from the current table, and insert into the archive table.
Or, you could create a stored procedure that performed the archive. For example
sp_markcompleted(taskid)
start transaction;
insert into newtable select * from oldtable where id=taskid;
delete from oldtable where id=taskid;
commit;
I have a requirement to store all versions of an entity in a easily indexed way and was wondering if anyone has input on what system to use.
Without versioning the system is simply a relational database with a row per, for example, person. If the person's state changes that row is changed to reflect this. With versioning the entry should be updated in such a way so that we can always go back to a previous version. If I could use a temporal database this would be free and I would be able to ask 'what is the state of all people as of yesterday at 2pm living in Dublin and aged 30'. Unfortunately there doesn't seem to be any mature open source projects that can do temporal.
A really nasty way to do this is just to insert a new row per state change. This leads to duplication, as a person can have many fields but only one changing per update. It is also then quite slow to select the correct version for every person given a timestamp.
In theory it should be possible to use a relational database and a version control system to mimic a temporal database but this sounds pretty horrendous.
So I was wondering if anyone has come across something similar before and how they approached it?
Update
As suggested by Aaron here's the query we currently use (in mysql). It's definitely slow on our table with >200k rows. (id = table key, person_id = id per person, duplicated if the person has many revisions)
select name from person p where p.id = (select max(id) from person where person_id = p.person_id and timestamp <= :timestamp)
Update
It looks like the best way to do this is with a temporal db but given that there aren't any open source ones out there the next best method is to store a new row per update. The only problem is duplication of unchanged columns and a slow query.
There are two ways to tackle this. Both assume that you always insert new rows. In every case, you must insert a timestamp (created) which tells you when a row was "modified".
The first approach uses a number to count how many instances you already have. The primary key is the object key plus the version number. The problem with this approach seems to be that you'll need a select max(version) to make a modification. In practice, this is rarely an issue since for all updates from the app, you must first load the current version of the person, modify it (and increment the version) and then insert the new row. So the real problem is that this design makes it hard to run updates in the database (for example, assign a property to many users).
The next approach uses links in the database. Instead of a composite key, you give each object a new key and you have a replacedBy field which contains the key of the next version. This approach makes it simple to find the current version (... where replacedBy is NULL). Updates are a problem, though, since you must insert a new row and update an existing one.
To solve this, you can add a back pointer (previousVersion). This way, you can insert the new rows and then use the back pointer to update the previous version.
Here is a (somewhat dated) survey of the literature on temporal databases: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.6988&rep=rep1&type=pdf
I would recommend spending a good while sitting down with those references and/or Google Scholar to try to find some good techniques that fit your data model. Good luck!