Combine data across dozens of DB's in a non-expensive query? - mysql

I run a site where companies create accounts and have their users sign up for their accounts. A couple of the companies I work with have sensitive data and for that reason the decision was made a while back that we would maintain separate databases for each company that registers with our site.
To be clear, the DB's look similar to the below for companies A, B and C.
db_main /* Stores site info for each company */
db_a
db_b
db_c
Now, I'm finding that sometimes a user creates an account with both company A and company B, so it would be nice if I could combine their progress from the two sites (A and B). For example, if the user earns 5 points on site A, and 5 points on site B, I would like for their total site points to read "10" (their combined total from 5 + 5).
There are hundreds of these databases, though, and I'm worried that it will be rough on the server to be constantly running queries across all databases. The user's score, for instance, is calculated on each page load.
Is there an efficient way to run queries in this manner?

Joining to 100 DB's should never be an option, and to your question, it won't be efficient.
What I would suggest instead is to create a global table that stores a cache of the points you are after globally. Points should not be 'sensitive' in any way from the sounds of it. I assume a userID is not either. Given that a customer should never have direct query access to this table, it should be a non-issue.
Scenario:
User joins siteA
earns 5 points
dbA gets updated
dbGlobalPoints gets upsert'ed (if exists (it won't), update points+5, else insert userID, 5)
User then joins siteB with same username (this may be your biggest issue if you don't have unique id's across systems)
profile query pulls/joins dbGlobalPoints for display
earns 10 points.
dbB gets updated
dbGlobalPoints gets upsert'ed (if exists (it will), update points+10, else insert userID, 10)
On initial run, a 'rebuild' process of sorts will need to be run which steps through each company table and populates the global table. This will also be useful later for a 'recount' process (say, you drop dbA and don't want those points to count anymore)
you could also make this a subroutine that fires per user just once (in the background) if they don't have a record in the global points database.

Related

I came up with this SQL structure to allow rolling back and auditing user information, will this be adequate?

So, I came up with an idea to store my user information and the updates they make to their own profiles in a way that it is always possible to rollback (as an option to give to the user, for auditing and support purposes, etc.) while at the same time improving (?) the security and prevent malicious activity.
My idea is to store the user's info in rows but never allow the API backend to delete or update those rows, only to insert new ones that should be marked as the "current" data row. I created a graphical explanation:
Schema image
The potential issues that I come up with this model is the fact that users may update the information too frequently, bloating up the database (1 million users and an average of 5 updates per user are 5 million entries). However, for this I came up with the idea of putting apart the rows with "false" in the "current" column through partitioning, where they should not harm the performance and will await to be cleaned up every certain time.
Am I right to choose this model? Is there any other way to do such a thing?
I'd also use a second table user_settings_history.
When a setting is created, INSERT it in the user_settings_history table, along with a timestamp of when it was created. Then also UPDATE the same settings in the user_settings table. There will be one row per user in user_settings, and it will always be the current settings.
So the user_settings would always have the current settings, and the history table would have all prior sets of settings, associated with the date they were created.
This simplifies your queries against the user_settings table. You don't have to modify your queries to filter for the current flag column you described. You just know that the way your app works, the values in user_settings are defined as current.
If you're concerned about the user_settings_history table getting too large, the timestamp column makes it fairly easy to periodically DELETE rows over 180 days old, or whatever number of days seems appropriate to you.
By the way, 5 million rows isn't so large for a MySQL database. You'd want your queries to use an index where appropriate, but the size alone isn't disadvantage.

Working out users points - update vs select

I have users who earn points by taking parts in various activities on the website and then the user can spend these points on whatever they like, the way I have it set up the at the minute is I have a table -
tbl_users_achievements and tbl_users_purchased_items
I have these two tables to track what the users have done and what they have bought (Obviously!)
But instead of having a column in my user tables called 'user_points', I have decided to display their points by doing a SELECT on all achievements and getting a sum of the points they have earnt, I am then doing another select on how many points they have spent.
I thought it might of been better to have a column to store their points and when they buy something and win stuff I do an UPDATE on the column for that user, but that seemed like multiple areas I have to manage, I have to insert a new row for the transaction and then update their column where if I use a query to work out their total won - spent I only have to insert the row and do no update. But the problem is then comes to performance of running and doing a calculation with the query.
So which solution would you go with and why?
Have a column to store their points and do an update
Use a query to work out the users points they can spend and have no column
Your current model is logically the right one - a key aspect for RDBMS normalization is not to repeat any information, and keeping an explicit "this customer has x points" column repeats data.
The benefits of this are obvious - you have less data manipulation code to write, and don't have to worry about what happens when you insert the transaction but can't update the users table.
The downsides are that you're running additional queries every time you show the customer profile; this can create a performance problem. The traditional response to that performance problem is to de-normalize, for instance by keeping a calculated total against the user table.
Only do that if that's absolutely, provably necessary.
myself, I would put the user points into a separate table PK'd by user ID or whatever and store them there and do updates to increment or decrement as achievements are attained or points spent.

User Point System DB Design Approach

I am looking point system DB Design. My Question is quite similar to the question that I have found here. : - Database design - Approach for storing points for users .
In this system, user earns points when any of this action happens:-
User Register on the website. (i.e Active Entry to the User table)
User Writes Answer of the Other Users Question. (i.e Entry to the
Answer table)
User Answers are rated be other Users. (i.e Entry to Answer Rating
Table for the User )
User Invites Other Users to Join the platform
From the DB Design Side, I have created these two tables:-
Action_Master table, and
User_Action_Point table.
The Action_Master table contains this :- (id, action_name, action_point)
The User_Action_Point table stores the history of each actions, so it look like this:-
(id, action_master_id, action_done, created_by, created_at, updated_by, updated_at, deleted_at)
Now the problem here is the User_Action_Point table, it contains the repeated data of the User Table, Answer Table and Answer_rating Table.
This problem is very well addressed by Jeffrey in the first answer of the linked question. According to his answer we should have to execute Views or Stored Procedure to sum up the points from different tables every time. This approach is awesome because we need not to handle the overhead of data deletion or any other changes that may affect the User Points.
But, is that a good way when we need users points very frequently ? Don't you think this approach can increase the db response time or the loads on the MySQL server ?
or I need to store the aggregated Users points data in some table with the overhead of handling repeated data (i.e if anything get deleted then we also have to minus those points in the point table.)
Please Suggest.

Indefinite number of tables vs indefinite number of row with multiple columns

Which one would be better (performance wise and maintenance), a database which creates table dynamically or just adding rows dynamically?
Suppose I am building a project in which I let users to register. Say I have a table which store only basic personal infos, like name, dob, Date of joining, address, phone, etc. Say 10 columns.
Now is the tricky part.
Scene 1: Creating multiple tables
When a user complete registration, a message table is created. So each table is created for each users. The rows of each message table varies for each user.
In the same way there is a cart table for each user like the message table.
For this scene 1, 2 tables are created with every registration.
Scene 2: Adding Rows
The scenario is same here as well, but in this case I have 2 tables for message and cart. Rows are added only when there is an activity.
Note:
You must assume that the number of users is more than 2000 and expect 50+ users to be active all the time. Which means the message and cart tables are always busy for both the cases. Like there is always a query for update, add, delete, insert, select etc. simultaneously.
Also which scene will consume more disk space.
While writing this, it make me wonder what technique would Facebook and others use. If they use the Scene 2 style (all users (billions) use the same big long message table)... Just wondering
Databases has some basic rules defined for Database Design called
"Database Normalization", These basic rules allow us eliminating
redundant data.
1st Normal Form
Store One piece of information in only One Column, A column should store only One piece of information.
2ns Normal Form
A Table should have only the columns that are related to each other. All the related columns should be in One table.
Now if you look at your advised design, A Separate Table for each USER
will split SAME information/Columns about all the user in 1000's of
tables. Which violates the 2nd Normal Form.
You need to Create One Table and put all the related Columns in that
one table for all the users. and you can make use of normal t-sql to
query your data but if you have a table for each user my guess is your
every query that you execute from your application will be built
dynamically and for every query you will be using dynamic sql. which
is one of the Sql Devils and you want to avoid using it whenever
possible.
My suggestion would be read more about Database Design. Once you have
some basic understanding of database design. Draw it on a piece of
paper and see if it provides you everything that your business
requires / expects from this application , Spend sometime on it now it
will save you a lot of pain later.

Database structure - most common queries span 3-4 tables. Should I reduce tables?

I am creating a new DB in MySQL for an application and wondered if anyone could provide some advice on the following set up. I'll try and simplify things as best as I can.
This DB is designed to store alerts which are related to specific items created by a user. In turn there is the need to store notes related to the items and/or alerts. At first I considered the following structure...
USERS table - to store basic app user info (e.g. user_id. name, email) - this is the only bit I'm fairly certain does not need to be changed
ITEMS table: contains info on particular item (4 fields or so). Contains user_id to indicate which user created/owns this item
ALERTS table: contains info on the alert, item_id to indicate which item the alert is related to, contains user_id to indicate which user created alert
NOTES table: contains note info, user_id of note owner, item_id if associated with an item, alert_id if associated with alert
Relationships:
An item does not always have an an alert associated with it
An item or alert does not always have a note associated with it
An alert is always associated with an item. More than one alert can be associated with the same item.
A note is always associated with an item or alert. More than one note can be associated with the same item or alert.
Once first created item info is unlikely to be updated by a user.
For arguments sake let's say that each user will create an average of 10 items, each item will have an average of 2 alerts associated with it. There will be an average of 2 notes per item/alert.
Very common queries that will be run:
1) Return all items created by a particular user with any associated alerts and notes. Given a user_id this query would span 3 tables
2) Checking each day for alerts that need to be sent to a user's email address. WHERE alert date==today, return user's email address, item name and any associated notes. This would require a query spanning 4 tables which is why I'm wondering if I need to take a different approach...
Option 1) one table to cover items, alerts and notes. user_id owner for each row. Every time you add a note to an item or alert you are repeating the alert and/or item info. Seems a bit wasteful but item and alert info won't be large.
Option 2) I don't foresee the need to query notes (famous last words?) so how about serializing note data so multiple notes are stored in one row in either the item or alert table (or just a combined alert/item table)
Option 3) Anything else you can think of? I'm asking this question as each option I've considered doesn't feel quite right.
I appreciate this is currently a small project and so performance shouldn't be of great concern and I should just go with the 4 tables. It's more that my common queries will end up being relatively complex that makes me think I need to re-evaluate the structure.
I would say that the common wisdom is to normalize to start and denormalize only when performance data suggest that it's necessary.
Make sure that your tables are indexed properly, with foreign key relationships for JOINs.
If you think you'll end up with a lot of data, this might be a good time to think about a partitioning strategy. Partitioning your fast-growing tables by time would be a good first step.
Four tables is not complex. I commonly write report queries that hit 15 or more tables in a database structure that has hundreds of tables (most with millions of records) and I wouldn't even say our dbs are anything more than medium sized (a typical db in our system might have around 200 gigs of data, so not large at all as databases go). Because they are properly indexed, they still run fast unless I am doing very complex calculations. Normalize, don't even consider denormalizing until you are an experienced database designer who knows better than to worry about the number of tables.