I have to design a database schema for an application I'm building. I will be using MySQL. In this application, users enter data and it gets saved in the database obviously. However, this data is not accessible to the public until the user publishes the data. Currently, I have one column for storing all the data. I was wondering if a boolean field in this table that indicates whether the data has been published is a good idea. Or, is it much better design to create one table for saved data and one table for published data and move the saved data to the published data table when the user presses Publish.
What are the advantages and disadvantages of using each one and is one of them considered better design than the other?
Case: Binary
They are about equal. Use this as a learning exercise -- Implement it one way; watch it for a while, then switch to the other way.
(same) Space: Since a row exists exactly once, neither option is 'better'.
(favor 1 table) When "publishing" it takes a transaction to atomically delete from one table and insert into the other.
(favor 2 tables) Certain SELECTs will spend time filtering out records with the other value for published. (This applies to deleted, embargoed, approved, and a host of other possible boolean flags.)
Case: Revision history
If there are many revisions of a record, then two tables, Current data and History, is better. That is because the 'important' queries involve fetching the only Current data.
(PARTITIONs are unlikely to help in either case.)
So, I came up with an idea to store my user information and the updates they make to their own profiles in a way that it is always possible to rollback (as an option to give to the user, for auditing and support purposes, etc.) while at the same time improving (?) the security and prevent malicious activity.
My idea is to store the user's info in rows but never allow the API backend to delete or update those rows, only to insert new ones that should be marked as the "current" data row. I created a graphical explanation:
Schema image
The potential issues that I come up with this model is the fact that users may update the information too frequently, bloating up the database (1 million users and an average of 5 updates per user are 5 million entries). However, for this I came up with the idea of putting apart the rows with "false" in the "current" column through partitioning, where they should not harm the performance and will await to be cleaned up every certain time.
Am I right to choose this model? Is there any other way to do such a thing?
I'd also use a second table user_settings_history.
When a setting is created, INSERT it in the user_settings_history table, along with a timestamp of when it was created. Then also UPDATE the same settings in the user_settings table. There will be one row per user in user_settings, and it will always be the current settings.
So the user_settings would always have the current settings, and the history table would have all prior sets of settings, associated with the date they were created.
This simplifies your queries against the user_settings table. You don't have to modify your queries to filter for the current flag column you described. You just know that the way your app works, the values in user_settings are defined as current.
If you're concerned about the user_settings_history table getting too large, the timestamp column makes it fairly easy to periodically DELETE rows over 180 days old, or whatever number of days seems appropriate to you.
By the way, 5 million rows isn't so large for a MySQL database. You'd want your queries to use an index where appropriate, but the size alone isn't disadvantage.
Currently, in my app there are just 3 roles visitors can have:
admin that has all privileges
user that can can perform several actions about him/her-self within the system
guest that can just watch and send bug reports
Everything is primitively implemented, as follows: in a DB each user has a field where his being admin (stands for 2 in the field) or user (1) is indicated, and in the application_controller.rb it is just checked if logged_in? && current_user.DB_FIELD == 2 (or > 0), and in the necessary controller there occurs a before_filter check, etc.
However, such a simple implementation worked great till recently when we decided to extend the functionality of the system, that is, partly, to allow admin to join users into groups, but there are some moments. For better understanging of what I am going to ask, let me describe the situation from the way I see it (maybe you can suggest something much better and logical):
I am an admin. I open /groups, and see a list of groups.
What is a group? A group, on the one hand, is a set of permissions, and on the other hand, is a combination of users that should have the same permissions within my app.
What is a permission? A permission is one action that each user of the group it assigned to can perform.
I want to unite new users in one group, but this group doesn't exist. So I click the button (which stands for /groups/new), and the Create Group window pops up. There, I have a textfield for a group name, a bulk of checkboxes, each stands for a permission, a field for adding users, and a Save button. I write the group name, check all the permissions I want to assign to this group, add users to this group (I am going to implement this through ajax search: starting typing a user's name, he/she appears, click Enter, and one user is added, then repeat these actions if needed - is it an OK approach?), and click Save.
Ok, I got a new group with several users. But stop, I realized I forgot to add one more person! I return to the Edit Group window (/groups/edit), and refill the misfilled fields. Click Save - and again some magic (I mean, update operations over the DB).
And so, what I have at the final stage? I can freely c/r/u/d the groups, managing users and permissions in them, and perform it in a very GUI-driven way (I mean, checkboxes, ajax search field, etc.)
For two weeks I have been googling/stackoverflowing/scrutinizing info about rails role- and group-based authorizations; have found a lot of solutions like cancan, easy_roles, troles, etc. gems, but cannot find in any of them how to implement a group-based approach, which is dynamic (or customizable? or dynamically customizable?). The only thing that really 100% suits my needs is a redmine permission and permission group approach, but it is overcomplicated due to its over9000-functionality, so I couldn't even fully understand how it is implemented, let alone implement it on my own.
And the questions are (assuming that the set of permissions is permanent so can be hardcoded, and the set of groups is absolutely free; also, if the user doesn't belong to any group he/she has default user permissions; moreover, permissions are not just for c/r/u/d operations, but also for the manually created methods):
What is the best way to implement the above mentioned system? Any existing yet not found by me gem or approach?
How to painlessly-for-scalability store the permissions and the permission groups? A bitmask, or separate permission, permission-to-group assignment, and group tables?
How to painlessly put users into groups? A group field in the user's DB row, or a separate user-to-group assignment table?
Preferably, that the permissions assigned to the group the user being added to, instantly, without any user relogins, apply to him.
Thank you in advanced!
Through several nights I finally came to a solution, which is, to my mind, rather easy yet powerful, but obviously not the best (but still an) implementation.
So, we have now +1 tables, which is of groups, where the columns are id, name, and permission. The last column is a usual integer which represents all the permissions in a decimal number.
The permissions are "aliased" in the controller: e.g. 1 stands for can_manage_smth, 2 stands for can_view_smth, etc.
The permission choice panel is in the /groups section, and is a simple set of checkboxes, applying an onchange action to each we ajaxly perform an OR operation with the permission stored in the table (e.g. we select 3 checkboxes standing for the values of 1, 8, and 16, then we get in our table 25, which is, in turn, a result of 1 | 8 | 16).
So answering my questions:
Not the best but still a solution.
It almost does not affect the scalability because adding a new permission (which is a very rare action) will just demand a new alias of the permission and its before_filter checkings in the beginning of the controller. And I used a bitmask but not as a binary but just a usual decimal value with which simple binary logic operands can play.
No separate user-to-group assignment tables, just a single group_id column in a user table (which already existed).
Hope everything implemented will work perfectly. If any issues occur, I will indicate here. Also, if any new implementation ideas come.
Anyway, thanks to everybody!
This is for a CRM application using PHP/MySQL. Various entities like customer, contact, note, etc, can be "deleted" by the user. Rather than actually deleting the entity from the database, I just want it to appear deleted to the application, but be kept in the DB and able to be "restored" if needed at a later time. Maybe even add some kind of "recycle bin" to the app.
I've thought of several ways to do this:
Move the deleted entity to another table. (customer to customer_deleted)
Change an attribute on the entity. (enabled to false)
I'm sure there are other ways and that each have their own implications on DB size, performance, etc, I'm just wondering what's the generally recommended way to do something like this?
I would go a combination of both:
Set a flag deleted to true
Use a cronjob to move the entries after a while to a tabelle of type ARCHIVE
If you need to restore the entry, select into the article table and delete from Archive
Why i would go this way?
If a customer deleted the wrong one, the restore could be done instand
After a few weeks/month the article table may grow up to much, so i would archive all entries that are deleted for 1 week p.a.
A common practice is to set a deleted_at column to the date at which the entity was deleted by the user (defaults to null). You may also include a deleted_by column for marking who deleted it. Using some kind of deleted column makes FK relationships easier to work with since these wont break. By moving the row to a new table you would have to update FK (and then update them again if you ever undelete). The downside is that you have to ensure all your queries exclude deleted rows (where this wouldnt be a problem if you moved the row to a new table). Many ORM's make this filtering easier so it depends on what you are using.
I have a case that what will happen when at one end Admin is editing the Details of user "A" in a table "users" and at the same time user "A" itself edits its details in table users. Whose effect will reflected.. And what can be done to make it specific to some one or to give the priority?
Thanks and Regards...
As Michael J.V. says, the last one wins - unless you have a locking mechanism, or build application logic to deal with this case.
Locking mechanisms tend to dramatically reduce the performance of your database.
http://dev.mysql.com/doc/refman/5.5/en/internal-locking.html gives an overview of the options in MySQL. However, the scenario you describe - Admin accesses record, has a lock on that record until they modify the record - will cause all kinds of performance issues.
The alternative is to check for a "dirty" record prior to writing the record back. Pseudocode:
User finds record
Application stores (hash of) record in memory
User modifies copy of record
User instructs application to write record to database
Application retrieves current database state, compares to original
If identical
write change to database
If not identical
notify user
In this model, the admin's change would trigger the "notify user" flow; your application may decide to stop the write, or force the user to refresh the record from the database prior to modifying it and trying again.
More code, but far less likely to cause performance/scalability issues.