MySQL design regarding a web

MySQL design regarding a web - mysql

I am tackling a problem in class to design a mySQL representation of a web that stores a list of events associated with a person. So, for this table/tables, it would have 2 columns, one of which is the person's name and the other is the event. However, a person will generally have anywhere from 30-1000 events, so this table, which we plan to have for our entire undergraduate class of 6000 students, will have millions of entries. Is there a better way to store this in mySQL that will take less space, but will still be able to retrieve individual events and the list of people that attended it just as easily as if it was a table of two columns?

Yes, there is a technique called many-to-many, and essentially breaks your one table into three, which is critical when you consider that there are indeed exactly three entities being modeled (as a good sanity check)
Person
Event
A Person's association with an Event
You model this as three tables, with the first two having essentially two columns each: one with a unique index (called "primary key"), and the second being a semantic name (person name, event name). Note that you can also add any number of columns to these with only one factor of increased storage (most likely your first move will be to add a date column to the event table).
The third table is the interesting one, it contains only 2 columns, each numeric, both of which are references to the other tables (each row is simply: (person_id, event_id)). We term these "foreign keys".
This structure means a few things:
No matter how many events someone goest to, that someone is only represented once.
same with events, not matter how many attendees
The attendance is a "first-class" entity, and can grow to include it's own attributes (i.e. "role")
This structure is called many-to-many because each person may attend many events, and each event may have many attendees.
The quintessential feature of the design is that no single piece of domain knowledge is repeated, only "keys" are repeated as necessary to model the real-world domain. (i.e. in your first example, accounting for a name change would require an unknown quantity of updates, and might lead to data anomalies, avoidance of which is a primary concern of database normalization.

Don't worry about "space". This isn't the 1970s and we're not going to run out of columns on punch cards to store data. You should be concerned with expressing your requirements in the proper, most normalized data structure. With proper indexing there shouldn't be a problem, not with this volume of data.
Remember indexes need to be defined on anything you will include as part of a WHERE clause, and sometimes you may need to add additional indexes for large lists fetched with ORDER BY and LIMIT.
Whenever possible or practical use an integer identifier instead of a string. These are stored as a small number of bytes, typically 4, compared with a variable length string which is typically at least the length of the string in bytes plus 1.
A properly normalized database will use numerical identifiers for things anyway, so this kind if thing isn't a huge concern. The only time you go against this, or deliberately de-normalize your data, is when you have a legitimate performance problem that cannot be easily solved using some other method.
As always, test your schema by generating large amounts of dummy data and see how it performs. Since you have a good idea of the requirements in advance, do some testing at those levels, and then, to be on the safe side, try 2x, 5x and 10x the data to see how much flexibility your design has. It's okay to have performance limitations so long as you know at what kind of scale you'll experience them.

mySQL relational databases were designed specifically to handle this sort of problem. Handling millions of entries is not a problem. Complex queries may take a couple seconds but will perform remarkably well.
It is best design to store 1 event per row. The way you are going about it sounds like the best way. Good Luck.

Related

Should I split tables based on activity?

I'm working on a hobby project that is a online game. That game stores player data in one big flat file. The data itself contains all the information of the player from Name to even items on the player itself. It's a rather large amount of columns by itself and having dozens of items only increases the flat file size to boot.
To give you a visual. My current player file is 192 columns (not accounting for items).
Player Data
There is 51 columns in my flat files for player data after I reduced the fluff. This does not include the items or the abilities for the players. I've already decided those can be separated into separate tables and linked with a FK.
The 51 columns of data are unique to the player and should not be duplicated. They are not what I've been told as good candidates for normalization.
Table
id
name
password
race
sex
class
level
gold
silver
experience
quest
armor
strength
wisdom
dexterity
etc
Activity
However, the activity of when some of these columns are selected and updated is vastly different from one another. Some are updated when the player moves, others are rarely utilized outside of when the player logs into the game and loaded into memory. Records are never dropped or rebuilt. Every column has a value. frequency of activity is anywhere from every second to once a month.
Question
That leads me to a question. Instead of traditional way of normalizing data, can I split these columns up based on activity and increase performance if they were in the same table? Or should I leave them the same table all together and just rely on proper indexing? Most of the columns look good to go, but like I said, some are used more than others. But, there is a vast difference in when some are used more than others. This sort of scares me.

What you're mentioning is called denormalization and is actually a quite known and frequent matter.
There are no general rules and indications as to when to denormalize.
This depends on so many things specific to each project (like the hardware, the type of DB, and the "activity" you mention to name a few) that it comes down to profiling each application to get to a conclusion.
Also, sometimes denormalization means splitting a table into two tables with a one-to-one relationship (like in your case). Sometimes it means getting rid of FKs and putting everything in a BIG table with many columns to avoid the joins when selecting.
Most importantly, keep in mind that your question is as much about performance than it is about scalability. Separating into different tables/databases mean you could eventually store the data in different machines, each having a specific hardware architecture with a database that fits the use case.
Example of denormalization in the gaming industry
One example of denormalization I can think of when it comes to MMORPGs is to store all the unfrequently changed user data in a BLOB. Not only is this denormalizing, but the whole row is stored as a series of bytes. Dr. E.F. Codd wouldn't be happy at all.
One company that does this is Playfish.
This means that you have faster selects at the cost of slower updates and, most importantly, changing the schema for the user becomes a real hassle (but the reasoning here is it will always be Username, Password, E-mail until the end of time). This also means that your user data can now be stored in a simpler key/value store instead of an RDBMS with more overhead. Of course, the login server fetching user information won't need to be as performant as the one handling the gameplay.
So try reading about use cases for denormalization (this is a very active topic) and see where you can apply your findings in your case. Also, keep in mind that pre-optimization can be sometimes counter-productive, maybe you should focus now on developing your game. When you have scaling/performance problems, you will most probably have the funding that comes with the high number of users to address the problem. Good luck!

What is the most efficient way to store a list in a relational database?

I have read many strong statements here and elsewhere on the subject of storing arrays in mysql. The rules of normalization seem to suggest its a bad idea and searching within the stored array fosters inelegant code. HOWEVER, for the application I am working on it seems like a reasonable solution to store an array in a field. I'm sure that is what everyone wrongly thinks in this position but I can't figure out a better way. Here is the setup:
I have a series of tables that store registered students, courses they can take and their performance on each course. All are "normalized" to avoid duplication and errors. I want to be able to generate a "myCourses" section so after login the student sees courses they are eligible for and courses they have taken but are free to review. The approach that comes to mind is two arrays; my_eligible_courses and my_completed_courses. On registration, the student is given a set of courses for which they are eligible. This could be stored as rows where there are multiple occurrences of studentid, one for each course they can take:
student1 course 1
student1 course 2
student1 course n
The table could then be queried for all of student 1's eligible courses and displayed as a list when the student logs in.
Alternately, studentid could be a primary key and in a column "eligible_courses" there would be an array (course 1,course 2, course n).
There is a table for student performance, to record every course taken and metrics associated with student performance. It will be queried to report on student performance, quality of course etc but this table will grow quite large. I'm having a hard time believing that the most efficient way to generate a list of my_completed_courses is to query this table by studentid every time they login just to give them a list of completed courses.
One other complication is that the set of courses a student is eligible is variable and expanding as new courses are developed, which to me seems to suggest that generating a set of new columns for each new course is a bad idea-for example, new course_name, pretest_score, posttest_score, time_to_complete, ... Also, a table for each new course seems like a complicated solution for the relatively mundane endpoint of generating a simple set of lists.
So to restate the question, is it better to store "inelegant" arrayed list of eligible and completed courses in a registered student table or dynamically generate these lists?
I'm guessing this is still too vague but any discussion of db design that gives an example of an inelegant array vs a restructured schema would be appreciated.

You should feel confident that if you have indexes on your tables for the appropriate columns, querying for my_completed_courses will be pretty snappy.
When your table grows to the point that you notice slowdown, you can configure your MySQL server with appropriate memory allocation settings so that it can keep more data cached in memory. Or you could look into that now.
In response to the edit you made about adding new courses: Don't add a new column for each course. Don't add a new table for each course. Create a table for courses, and add rows for each course.
You should then be able to join your tables together on indexed columns to generate the list of data you need.

This is a bad idea for two obvious reasons:
DBMS can't enforce proper referentialX (and possibly domain) integrity and relying on application-level integrity is almost always a bad idea.
While the database will be able to answer the query: "based on given student, give me courses", you won't be able to (efficiently) go in the opposite direction, should you ever need to.
X What's to stop a buggy application from storing a non-existent ID in array? Or deleting a course that is still referenced by students? Even if your application is careful about course deletion, there is no way to do it efficiently - you'll need a full table scan to examine all arrays.
Why are you even trying this? A link (aka. junction) table would solve these problems, for a moderate cost of some additional storage space.
If you are really concerned about storage space, you could even switch the DBMS and use one that supports leading-edge index compression (such as Oracle).
I'm having a hard time believing that the most efficient way to generate a list of my_completed_courses is to query this table by studentid every time they login just to give them a list of completed courses.
Databases are very good at querying humongous amounts of data. In this case, if you use the clustering properly, the DBMS will be able to get this data in very few I/O operations, meaning very fast. Did you perform any actual benchmarks? Have you measured any actual performance problem?
Also, a table for each new course seems like a complicated solution for the relatively mundane endpoint of generating a simple set of lists.
Generating a new table may be justified in case it will have different columns. But, that doesn't sound like what you are trying to do.
It seems to me that you simply need:
CHECK (
(COMPLETED = 0 AND (performance fields) IS NULL)
OR (COMPLETED = 1 AND (performance fields) IS NOT NULL)
)
When a student enrolls into course, insert a row in STUDENT_COURSE, set COMPLETED to 0 and leave performance fields NULL.
When the student completed the course, set COMPLETED to 1 and fill the performance fields.
(BTW, you could even omit COMPLETED altogether and just rely on testing the performance fields for NULL.)
InnoDB tables are clustered, which means that rows in STUDENT_COURSE belonging to the same student are stored physically close together, which means that getting courses of the given student is extremely fast.
If you need to go in the opposite direction (get students of a given course), add an index on same fields but in opposite order: {COURSE_ID, STUDENT_ID}. You might even consider covering in this case.
Since we are talking about small number of rows, leaving COMPLETED unindexed is just fine. If you are really concerned about that, you can even do something like:
The COMPLETED_STUDENT_COURSE is a B-Tree only for completed courses (and essentially a subset of STUDENT_COURSE which is a B-Tree for all enrolled courses).

Here are a few thoughts that I believe may assist you in making a good decision.
Generally, it is a rule to use correctly normalised tables. But there can be exceptions to this. Perhaps your project may be such.
Most of the time, new developers tend to focus on getting the data into a DB. They get stuck when it comes to retrieving it for a specific purpose. So given both cases of arrays vs. relational tables, ask your self if either method serves your purpose. For example, if you wanted to list the courses of student X, your array method is just fine. This is because you can retrieve it by the primary key like a student ID. But if you wanted to know how many students are on course A, the array method will be a horrible way to go.
Then again, the above point would depend on your data volume as well. For example, if you only have about a hundred students, you'll probably not notice a difference in performance. But if you're looking at several thousand records and you have a big list of courses for students, the array approach is not the way to go.
Benchmark. This is the best way for you to find out your answer. You can use MySQL's explain or just time it using your program that executes the queries. Try each method with your standard volume of data and see which one works best. For example, in the recent past, MySQL was boasting about their strength of the ISAM engine. Then I had to work on a large application that involved millions of records. And here, I noticed that each time a new record came in, Indexes had to be rebuilt. So now we had to bend the rules. Likewise, you'd better do your tests with the correct volumes of data and make a better decision.
But do not take this example as a rule. Rather, go by the standards of normalisation and only bend the rules for exceptions.

Database Design For Tournament Management Software

I'm currently designing a web application using php, javascript, and MySQL. I'm considering two options for the databases.
Having a master table for all the tournaments, with basic information stored there along with a tournament id. Then I would create divisions, brackets, matches, etc. tables with the tournament id appended to each table name. Then when accessing that tournament, I would simply do something like "SELECT * FROM BRACKETS_[insert tournamentID here]".
My other option is to just have generic brackets, divisions, matches, etc. tables with each record being linked to the appropriate tournament, (or matches to brackets, brackets to divisions etc.) by a foreign key in the appropriate column.
My concern with the first approach is that it's a bit too on the fly for me, and seems like the database could get messy very quickly. My concern with the second approach is performance. This program will hopefully have a national if not international reach, and I'm concerned with so many records in a single table, and with so many people possibly hitting it at the same time, it could cause problems.
I'm not a complete newb when it comes to database management; however, this is the first one I've done completely solo, so any and all help is appreciated. Thanks!

Do not create tables for each tournament. A table is a type of an entity, not an instance of an entity. Maintainability and scalability would be horrible if you mix up those concepts. You even say so yourself:
This program will hopefully have a national if not international reach, and I'm concerned with so many records in a single table, and with so many people possibly hitting it at the same time, it could cause problems.
How on Earth would you scale to that level if you need to create a whole table for each record?
Regarding the performance of your second approach, why are you concerned? Do you have specific metrics to back up those concerns? Relational databases tend to be very good at querying relational data. So keep your data relational. Don't try to be creative and undermine the design of the database technology you're using.
You've named a few types of entities:
Tournament
Division
Bracket
Match
Competitor
etc.
These sound like tables to me. Manage your indexes based on how you query the data (that is, don't over-index or you'll pay for it with inserts/updates/deletes). Normalize the data appropriately, de-normalize where audits and reporting are more prevalent, etc. If you're worried about performance then keep an eye on the query execution paths for the ways in which you access the data. Slight tweaks can make a big difference.
Don't pre-maturely optimize. It adds complexity without any actual reason.

First, find the entities that you will need to store; things like tournament, event, team, competitor, prize etc. Each of these entities will probably be tables.
It is standard practice to have a primary key for each of them. Sometimes there are columns (or group of columns) that uniquely identify a row, so you can use that as primary key. However, usually it's best just to have a column named ID or something similar of numeric type. It will be faster and easier for the RDBMS to create and use indexes for such columns.
Store the data where it belongs: I expect to see the date and time of an event in the events table, not in the prizes table.
Another crucial point is conforming to the First normal form, since that assures data atomicity. This is important because it will save you a lot of headache later on. By doing this correctly, you will also have the correct number of tables.
Last but not least: add relevant indexes to the columns that appear most often in queries. This will help a lot with performance. Don't worry about tables having too many rows, RDBMS-es these days handle table with hundreds of millions of rows, they're designed to be able to do that efficiently.

Beside compromising the quality and maintainability of your code (as others have pointed out), it's questionable whether you'd actually gain any performance either.
When you execute...
SELECT * FROM BRACKETS_XXX
...the DBMS needs to find the table whose name matches "BRACKETS_XXX" and that search is done in the DBMS'es data dictionary which itself is a bunch of tables. So, you are replacing a search within your tables with a search within data dictionary tables. You pay the price of the search either way.
(The dictionary tables may or may not be "real" tables, and may or may not have similar performance characteristics as real tables, but I bet these performance characteristics are unlikely to be better than "normal" tables for large numbers of rows. Also, performance of data dictionary is unlikely to be documented and you really shouldn't rely on undocumented features.)
Also, the DBMS would suddenly need to prepare many more SQL statements (since they are now different statements, referring to separate tables), which would present the additional pressure on performance.

The idea of creating new tables whenever a new instance of an item appears is really bad, sorry.
A (surely incomplete) list of why this is a bad idea:
Your code will need to automatically add tables whenever a new Division or whatever is created. This is definitely a bad practice and should be limited to extremely niche cases - which yours definitely isn't.
In case you decide to add or revise a table structure later (e.g. adding a new field) you will have to add it to hundreds of tables which will be cumbersome, error prone and a big maintenance headache
A RDBMS is built to scale in terms of rows, not tables and associated (indexes, triggers, constraints) elements - so you are working against your tool and not with it.
THIS ONE SHOULD BE THE REAL CLINCHER - how do you plan to handle requests like "list all matches which were played on a Sunday" or "find the most recent three brackets where Frank Perry was active"?
You say:
I'm not a complete newb when it comes to database management; however, this is the first one I've done completely solo...
Can you remember another project where tables were cloned whenever a new set was required? If yes, didn't you notice some problems with that approach? If not, have you considered that this is precisely what a DBA would never ever do for any reason whatsoever?

Single table or seperate table for each user to hold similar records? (performance??)

I have 2 scenarios for a MySQL DB and I'm not sure which to choose, and I've run into the same dilemma for a few tables.
I'm making a web application only accessed by members. Each member has their own deals, expenses, and say "listings". The criteria for the records is the same across users, but each user can have completely different amounts of records.
My 2 scenarios are whether I should have one table for deals, one table for listings, one table for expenses...and have a field in each that links to the primary key for a particular user. Or...if it is better to have a separate deal table, expense table, and listing table for each user..(using a combined string like "user"+deals, or "user"+exp). Deals can be used across 1 or 2 users, but expenses and listings are completely independent. I am going to have a master deal table to hold all the info for each deal, but there is a user deal table(s) that links their primary key to a deal primary key.
So, separate tables or one table? If there are thousands of users with hundreds of deals/expenses/listings..I just don't want the queries to be extremely slow after a lot of deals or expenses have built up...No user will ever need to view anything from other users...strictly just their data.
Also, I'm familiar with how a database works and stores data, but I'm not 100% clear. I just want it to work quickly, so my other question is (although it may be stupid) when a user submits a new deal or expense...is it inserted in the beginning or end the table? Or is it irrelevant...because a query will search everything in the table either way before returning information?

Always use one table to store one kind of entity.
Or more specifically, what you're talking about is a nasty, complicated optimisation that works in an incredibly small subset of cases which almost certainly isn't yours.
You want to use just one table for one kind of entry. Index it appropriately, and try to get rid of old records when you don't need them any more.
Also, a lot of peoples' idea of "big data" isn't actually particularly big. Databases normally need little optimisation while their data still fit in RAM, which on a modern system means, say, 32Gb.

Regarding your second question:
In MySql the order of the records on the disk is defined by your PRIMARY KEY. Meaning a record does not get inserted at the end or the beginning, but rather wherever it belongs based on the primary key.
In other db's you have th option to use CLUSTERED KEYS in order to use another key than the PRIMARY to order the records on disk, but this is not supported in MySql to my knowledge.
Regarding your first question:
I found myself in this position a couple of times and recently I keep getting back to one blog post (last of a series, the conclusion is in the bottom):
http://weblogs.asp.net/manavi/archive/2011/01/03/inheritance-mapping-strategies-with-entity-framework-code-first-ctp5-part-3-table-per-concrete-type-tpc-and-choosing-strategy-guidelines.aspx
I quote:
Before we get into this discussion, I
want to emphasize that there is no one
single "best strategy fits all
scenarios" exists. As you saw, each of
the approaches have their own
advantages and drawbacks. Here are
some rules of thumb to identify the
best strategy in a particular
scenario:
If you don’t require polymorphic associations or queries, lean toward
TPC—in other words, if you never or
rarely query for BillingDetails and
you have no class that has an
association to BillingDetail base
class. I recommend TPC (Table per Concrete Type) (only) for the
top level of your class hierarchy,
where polymorphism isn’t usually
required, and when modification of the
base class in the future is unlikely.
If you do require polymorphic associations or queries, and
subclasses declare relatively few
properties (particularly if the main
difference between subclasses is in
their behavior), lean toward TPH (Table per Hierarchy). Your
goal is to minimize the number of
nullable columns and to convince
yourself (and your DBA) that a
denormalized schema won’t create
problems in the long run.
If you do require polymorphic associations or queries, and
subclasses declare many properties
(subclasses differ mainly by the data
they hold), lean toward TPT (Table per Type). Or,
depending on the width and depth of
your inheritance hierarchy and the
possible cost of joins versus unions,
use TPC.
By default, choose TPH only for simple
problems. For more complex cases (or
when you’re overruled by a data
modeler insisting on the importance of
nullability constraints and
normalization), you should consider
the TPT strategy. But at that point,
ask yourself whether it may not be
better to remodel inheritance as
delegation in the object model
(delegation is a way of making
composition as powerful for reuse as
inheritance). Complex inheritance is
often best avoided for all sorts of
reasons unrelated to persistence or
ORM. EF acts as a buffer between the
domain and relational models, but that
doesn’t mean you can ignore
persistence concerns when designing
your classes.

Which is more efficient: Multiple MySQL tables or one large table?

I store various user details in my MySQL database. Originally it was set up in various tables meaning data is linked with UserIds and outputting via sometimes complicated calls to display and manipulate the data as required. Setting up a new system, it almost makes sense to combine all of these tables into one big table of related content.
Is this going to be a help or hindrance?
Speed considerations in calling, updating or searching/manipulating?
Here's an example of some of my table structure(s):
users - UserId, username, email, encrypted password, registration date, ip
user_details - cookie data, name, address, contact details, affiliation, demographic data
user_activity - contributions, last online, last viewing
user_settings - profile display settings
user_interests - advertising targetable variables
user_levels - access rights
user_stats - hits, tallies
Edit: I've upvoted all answers so far, they all have elements that essentially answer my question.
Most of the tables have a 1:1 relationship which was the main reason for denormalising them.
Are there going to be issues if the table spans across 100+ columns when a large portion of these cells are likely to remain empty?

Multiple tables help in the following ways / cases:
(a) if different people are going to be developing applications involving different tables, it makes sense to split them.
(b) If you want to give different kind of authorities to different people for different part of the data collection, it may be more convenient to split them. (Of course, you can look at defining views and giving authorization on them appropriately).
(c) For moving data to different places, especially during development, it may make sense to use tables resulting in smaller file sizes.
(d) Smaller foot print may give comfort while you develop applications on specific data collection of a single entity.
(e) It is a possibility: what you thought as a single value data may turn out to be really multiple values in future. e.g. credit limit is a single value field as of now. But tomorrow, you may decide to change the values as (date from, date to, credit value). Split tables might come handy now.
My vote would be for multiple tables - with data appropriately split.
Good luck.

Combining the tables is called denormalizing.
It may (or may not) help to make some queries (which make lots of JOINs) to run faster at the expense of creating a maintenance hell.
MySQL is capable of using only JOIN method, namely NESTED LOOPS.
This means that for each record in the driving table, MySQL locates a matching record in the driven table in a loop.
Locating a record is quite a costly operation which may take dozens times as long as the pure record scanning.
Moving all your records into one table will help you to get rid of this operation, but the table itself grows larger, and the table scan takes longer.
If you have lots of records in other tables, then increase in the table scan can overweight benefits of the records being scanned sequentially.
Maintenance hell, on the other hand, is guaranteed.

Are all of them 1:1 relationships? I mean, if a user could belong to, say, different user levels, or if the users interests are represented as several records in the user interests table, then merging those tables would be out of the question immediately.
Regarding previous answers about normalization, it must be said that the database normalization rules have completely disregarded performance, and is only looking at what is a neat database design. That is often what you want to achieve, but there are times when it makes sense to actively denormalize in pursuit of performance.
All in all, I'd say the question comes down to how many fields there are in the tables, and how often they are accessed. If user activity is often not very interesting, then it might just be a nuisance to always have it on the same record, for performance and maintenance reasons. If some data, like settings, say, are accessed very often, but simply contains too many fields, it might also not be convenient to merge the tables. If you're only interested in the performance gain, you might consider other approaches, such as keeping the settings separate, but saving them in a session variable of their own so that you don't have to query the database for them very often.

Do all of those tables have a 1-to-1 relationship? For example, will each user row only have one corresponding row in user_stats or user_levels? If so, it might make sense to combine them into one table. If the relationship is not 1 to 1 though, it probably wouldn't make sense to combine (denormalize) them.
Having them in separate tables vs. one table is probably going to have little effect on performance though unless you have hundreds of thousands or millions of user records. The only real gain you'll get is from simplifying your queries by combining them.
ETA:
If your concern is about having too many columns, then think about what stuff you typically use together and combine those, leaving the rest in a separate table (or several separate tables if needed).
If you look at the way you use the data, my guess is that you'll find that something like 80% of your queries use 20% of that data with the remaining 80% of the data being used only occasionally. Combine that frequently used 20% into one table, and leave the 80% that you don't often use in separate tables and you'll probably have a good compromise.

Creating one massive table goes against relational database principals. I wouldn't combine all them into one table. Your going to get multiple instances of repeated data. If your user has three interests for example, you will have 3 rows, with the same user data in just to store the three different interests. Definatly go for the multiple 'normalized' table approach. See this Wiki page for database normalization.
Edit:
I have updated my answer, as you have updated your question... I agree with my initial answer even more now since...
a large portion of these cells are
likely to remain empty
If for example, a user didn't have any interests, if you normalize then you simple wont have a row in the interest table for that user. If you have everything in one massive table, then you will have columns (and apparently a lot of them) that contain just NULL's.
I have worked for a telephony company where there has been tons of tables, getting data could require many joins. When the performance of reading from these tables was critical then procedures where created that could generate a flat table (i.e. a denormalized table) that would require no joins, calculations etc that reports could point to. These where then used in conjunction with a SQL server agent to run the job at certain intervals (i.e. a weekly view of some stats would run once a week and so on).

Why not use the same approach Wordpress does by having a users table with basic user information that everyone has and then adding a "user_meta" table that can basically be any key, value pair associated with the user id. So if you need to find all the meta information for the user you could just add that to your query. You would also not always have to add the extra query if not needed for things like logging in. The benefit to this approach also leaves your table open to adding new features to your users such as storing their twitter handle or each individual interest. You also won't have to deal with a maze of associated ID's because you have one table that rules all metadata and you will limit it to only one association instead of 50.
Wordpress specifically does this to allow for features to be added via plugins, therefore allowing for your project to be more scalable and will not require a complete database overhaul if you need to add a new feature.

I think this is one of those "it depends" situation. Having multiple tables is cleaner and probably theoretically better. But when you have to join 6-7 tables to get information about a single user, you might start to rethink that approach.

I would say it depends on what the other tables really mean.
Does a user_details contain more then 1 more / users and so on.
What level on normalization is best suited for your needs depends on your demands.
If you have one table with good index that would probably be faster. But on the other hand probably more difficult to maintain.
To me it look like you could skip User_Details as it probably is 1 to 1 relation with Users.
But the rest are probably alot of rows per user?

Performance considerations on big tables
"Likes" and "views" (etc) are one of the very few valid cases for 1:1 relationship _for performance. This keeps the very frequent UPDATE ... +1 from interfering with other activity and vice versa.
Bottom line: separate frequent counters in very big and busy tables.
Another possible case is where you have a group of columns that are rarely present. Rather than having a bunch of nulls, have a separate table that is related 1:1, or more aptly phrased "1:rarely". Then use LEFT JOIN only when you need those columns. And use COALESCE() when you need to turn NULL into 0.
Bottom Line: It depends.
Limit search conditions to one table. An INDEX cannot reference columns in different tables, so a WHERE clause that filters on multiple columns might use an index on one table, but then have to work harder to continue the filtering columns in other tables. This issue is especially bad if "ranges" are involved.
Bottom line: Don't move such columns into a separate table.
TEXT and BLOB columns can be bulky, and this can cause performance issues, especially if you unnecessarily say SELECT *. Such columns are stored "off-record" (in InnoDB). This means that the extra cost of fetching them may involve an extra disk hit(s).
Bottom line: InnoDB is already taking care of this performance 'problem'.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008