I came up with this SQL structure to allow rolling back and auditing user information, will this be adequate? - mysql

So, I came up with an idea to store my user information and the updates they make to their own profiles in a way that it is always possible to rollback (as an option to give to the user, for auditing and support purposes, etc.) while at the same time improving (?) the security and prevent malicious activity.
My idea is to store the user's info in rows but never allow the API backend to delete or update those rows, only to insert new ones that should be marked as the "current" data row. I created a graphical explanation:
Schema image
The potential issues that I come up with this model is the fact that users may update the information too frequently, bloating up the database (1 million users and an average of 5 updates per user are 5 million entries). However, for this I came up with the idea of putting apart the rows with "false" in the "current" column through partitioning, where they should not harm the performance and will await to be cleaned up every certain time.
Am I right to choose this model? Is there any other way to do such a thing?

I'd also use a second table user_settings_history.
When a setting is created, INSERT it in the user_settings_history table, along with a timestamp of when it was created. Then also UPDATE the same settings in the user_settings table. There will be one row per user in user_settings, and it will always be the current settings.
So the user_settings would always have the current settings, and the history table would have all prior sets of settings, associated with the date they were created.
This simplifies your queries against the user_settings table. You don't have to modify your queries to filter for the current flag column you described. You just know that the way your app works, the values in user_settings are defined as current.
If you're concerned about the user_settings_history table getting too large, the timestamp column makes it fairly easy to periodically DELETE rows over 180 days old, or whatever number of days seems appropriate to you.
By the way, 5 million rows isn't so large for a MySQL database. You'd want your queries to use an index where appropriate, but the size alone isn't disadvantage.

Related

Should I use more columns or more rows?

I need to create a table where each user (approx 60 atm) would have a defined task for each day. Right now the database have one column for each user with the task name in it (which is bad in my opinion as each new user would need to change the scheme of the table) and a "date" column.
A solution would be to have a "user" column and add a "task" column but that would mean there would be 60 (number of current users) rows per day.
I don't really know what's the best situation in this case.
Should I use more columns or more rows?
They're two completely different things, so this comparison doesn't make much sense...
Right now the database have one column for each user
Bad idea. Full stop. A user is a record of data, not a structural element of the database itself. For example, a table of users might contain columns like Username, Email, RegistrationDate, etc. It would not be a single row of data in which you add a column for each new user.
This would be a nightmare to maintain, would render things like Foreign Keys useless (and, honestly, render the entire concept of a relational database useless), would reach resource limits very quickly, etc.
Each record of information is a row, not a column (or table). In this case, each row in your table is a "User Task". It defines (or has a Feorign Key to) a User and defines (or has a Foreign Key to) a Task.
but that would mean there would be 60 (number of current users) rows per day
If the number of records in the table starts to become a problem, you can start looking into things like sharding and partitioning, archiving old data, etc. You've got time though, because "dozens of records per day" is sustainable for thousands of years. (And by then I imagine the hardware will be at least twice as good as it is today.)
Right now the database have one column for each user with the task
name in it (which is bad in my opinion as each new user would need > to change the scheme of the table)
You're right, this is very bad. Using one column for user, one for the task and one for the date, will be much better.
60 rows per day is not much. This means 21.900 rows per years and 219.000 rows in ten years. Mysql is able to handle millions of rows in a table
If you have two indexes, one for user and one for the date, searching for data will be fast enough.
Knowing nothing else about your database or schema, why not create a dimension table to store your users and fact table to track your task details?
That way you can more easily add new users and the tasks table would continue to grow as new facts are added. It would also be very easy to denormalize this model for query and/or reporting purposes.
Adding columns is a nuisance and can be slow. Instead have a table with columns (user, task, etc)
Even "60 rows per second" is not a problem. 600/second might be.
See the tag [pivot-table] for how to turn rows into columns for output display.

Database Design: Difference between using boolean fields and duplicate tables

I have to design a database schema for an application I'm building. I will be using MySQL. In this application, users enter data and it gets saved in the database obviously. However, this data is not accessible to the public until the user publishes the data. Currently, I have one column for storing all the data. I was wondering if a boolean field in this table that indicates whether the data has been published is a good idea. Or, is it much better design to create one table for saved data and one table for published data and move the saved data to the published data table when the user presses Publish.
What are the advantages and disadvantages of using each one and is one of them considered better design than the other?
Case: Binary
They are about equal. Use this as a learning exercise -- Implement it one way; watch it for a while, then switch to the other way.
(same) Space: Since a row exists exactly once, neither option is 'better'.
(favor 1 table) When "publishing" it takes a transaction to atomically delete from one table and insert into the other.
(favor 2 tables) Certain SELECTs will spend time filtering out records with the other value for published. (This applies to deleted, embargoed, approved, and a host of other possible boolean flags.)
Case: Revision history
If there are many revisions of a record, then two tables, Current data and History, is better. That is because the 'important' queries involve fetching the only Current data.
(PARTITIONs are unlikely to help in either case.)

Should a session table be cleared off from the records after a user logs out?

I am using a MySql table to store a session record for the current logged in user. Once the user logs off, I update few fields in the same record and flags(revoked) it that it should not be used again. So for every LogIn a new record is created. This serves my purpose, but it turns out that the table is going to grow huge.
What should be the standard approach for storing Sessions? Should the ones, which are revoked be stored in a separate table, or should they be deleted or left in the same table?
I consider leaving the data in the same session table. While querying for a particular record, I query with two fields : (idPeople (not unique) and revoked (0 or 1)), for example SELECT * FROM session WHERE idPeople = "someValue" AND revoked = 0. and then update the record if needed while the user is, logged in or kogging out. Will the huge size of table affect this? or MySql will handle this? And what are other ramifications for this which I am unable to see?
First, it may be a good idea to add a unique field to your table (e.g. SESSION_ID, which could be a running auto-increment number), define this field as a unique ID, and use it to quickly find the record to be updated (i.e. revoke=1).
Second, this type of table always triggers the question you are asking, and the best answer can only be given after you assess and answer some preliminary questions, for instance:
When you wish to check the activities of a user, how far into the past does it make sense to go? One month? One year?
What is the longest period that you may wish to keep this information available (even using non routine queries to retrieve?
What type of questions (queries) I expect to be asked on this table?
One you answer those questions, you can consider the following options:
Have a routine process that would run once a day (at midnight or any other time your system can afford it) which would delete rows whose timestamp is older than, say, one month (or any other period suiting your needs), OR
Same as above but would first copy those records to an "history" table,
Change the structure of your table to a more efficient one, by adding some fields (as suggested above) and indices that would provide good answers for your "SELECT" needs.

Working out users points - update vs select

I have users who earn points by taking parts in various activities on the website and then the user can spend these points on whatever they like, the way I have it set up the at the minute is I have a table -
tbl_users_achievements and tbl_users_purchased_items
I have these two tables to track what the users have done and what they have bought (Obviously!)
But instead of having a column in my user tables called 'user_points', I have decided to display their points by doing a SELECT on all achievements and getting a sum of the points they have earnt, I am then doing another select on how many points they have spent.
I thought it might of been better to have a column to store their points and when they buy something and win stuff I do an UPDATE on the column for that user, but that seemed like multiple areas I have to manage, I have to insert a new row for the transaction and then update their column where if I use a query to work out their total won - spent I only have to insert the row and do no update. But the problem is then comes to performance of running and doing a calculation with the query.
So which solution would you go with and why?
Have a column to store their points and do an update
Use a query to work out the users points they can spend and have no column
Your current model is logically the right one - a key aspect for RDBMS normalization is not to repeat any information, and keeping an explicit "this customer has x points" column repeats data.
The benefits of this are obvious - you have less data manipulation code to write, and don't have to worry about what happens when you insert the transaction but can't update the users table.
The downsides are that you're running additional queries every time you show the customer profile; this can create a performance problem. The traditional response to that performance problem is to de-normalize, for instance by keeping a calculated total against the user table.
Only do that if that's absolutely, provably necessary.
myself, I would put the user points into a separate table PK'd by user ID or whatever and store them there and do updates to increment or decrement as achievements are attained or points spent.

MySQL - Saving and loading

I'm currently working on a game, and just a while ago i started getting start on loading and saving.
I've been thinking, but i really can't decide, since I'm not sure which would be more efficient.
My first option:
When a user registers, only the one record is inserted (into 'characters' table). When the user tries to login, and after he/she has done so successfully, the server will try loading all information from the user (which is separate across multiple tables, and combines via mysql 'LEFT JOIN'), it'll run though all the information it has and apply them to the entity instance, if it runs into a NULL (which means the information isn't in the database yet) it'll automatically use a default value.
At saving, it'll insert or update, so that any defaults that have been generated at loading will be saved now.
My second option:
Simply insert all the required rows at registration (rows are inserted when from website when the registration is finished).
Downsides to first option: useless checks if the user has logged in once already, since all the tables will be generated after first login.
Upsides to first option: if any records from tables are deleted, it would insert default data instead of kicking player off saying it's character information is damaged/lost.
Downsides to second option: it could waste a bit of memory, since all tables are inserted at registration, and there could be spamming bots, and people who don't even manage to get online.
Upsides to first option: We don't have to check for anything in the server.
I also noted that the first option may screw up any search systems (via admincp, if we try looking a specific users).
I would go with the second option, add default rows to your user account, and flag the main user table as incomplete. This will maintain data integrity across your database, whereas every user record is complete in it's entirety. If you need to remove the record, you can simply add a cascading delete script to clean house.
Also, I wouldn't develop your data schema based off of malacious bots creating accounts. If you are concerned about the integrity of your user accounts, add some sort of data validation into your solution or an automated clean-house script to clear out incomplete accounts once the meet a certain criteria, i.e. the date created meeting a certain threshold.
You mention that there's multiple tables of data for each user, with some that can have a default value if none exist in the table. I'm guessing this is set up something like a main "characters" table, with username, password, and email, and a separate table for something like "favorite shortcuts around the site", and if they haven't specified personal preferences, it defaults to a basic list of "profile, games list, games by category" etc.
Then the question becomes when registering, should an explicit copy of the favorite shortcuts default be added for that user, or have the null value default to a default list?
I'd suggest that it depends on the nature of the auxiliary data tables; specifically the default value for those tables. How often would the defaults change? If the default changes often, a setup like your first option would result in users with only a 'basic' entry would frequently get new auxiliary data, while those that did specify their own entries would keep their preferences. Using your second option, if the default changed, in order to keep users updated, a search/replace would have to be done to change entries that were the old default to the new default.
The other suggestion is to take another look at your database structure. You don't mention that your current table layout is set in stone; is there a way to not have all the LEFT JOIN tables, and have just one 'characters' table?