Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 11 months ago.
The community reviewed whether to reopen this question 11 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
So this is more of a design question.
I have one primary key (say the user's ID), and I have tons of information associated with that user.
Should I have multiple tables broken down into categories according to the information, or should I have just one table with many columns?
The way I used to do it was to have multiple tables, so say, one table for application usage data, one table for profile info, one table for back end tokens etc. to keep things looking organized.
Recently some one told me that it's better not to do it that way and having a table with lots of columns is fine. The thing is, all those columns have the same primary key.
I'm pretty new to database design so which approach is better and what are the pros and cons?
What's the conventional way of doing it?
Any time information is one-to-one (each user has one name and password), then it's probably better to have it one table, since it reduces the number of joins the database will need to do to retrieve results. I think some databases have a limit on the number of columns per table, but I wouldn't worry about it in normal cases, and you can always split it later if you need to.
If the data is one-to-many (each user has thousands of rows of usage info), then it should be split into separate tables to reduce duplicate data (duplicate data wastes storage space, cache space, and makes the database harder to maintain).
You might find the Wikipedia article on database normalization interesting, since it discusses the reasons for this in depth:
Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.
Denormalization is also something to be aware of, because there are cases where repeating data is better (since it reduces the amount of work the database needs to do when reading data). I'd highly recommend making your data as normalized as possible to start out, and only denormalize if you're aware of performance problems in specific queries.
One big table is often a poor choice. Related tables are what relational database were designed to work with. If you index properly and know how to write performant queries, they are going to perform fine.
When tables get too many columns, then you can run into issues with the actual size of the page that the database is storing the information on. Either the record can end up being too large for the page, in which can you may end up not being able to create or update a specific record which makes users unhappy or you may (in SQL Server at least) be allowed some overflow for particular datatypes (with a set of rules you need to look up if you are doing this) but if many records will overflow the page size you can create tremedous performance problems. Now how MYSQL handles the pages and whether you have a problem when the potential page size gets too large is something you would have to look up in the documentation for that database.
Came across this, and as someone who used to use MySQL a lot, and then switched over to Postgres recently, one of the big advantages is that you can add JSON objects to a field in Postgres.
So if you are in this situation, you don't have to necessarily decide between one large table with many columns and splitting it up, but you can merge columns into JSON objects to reduce it e.g. instead of address being 5 columns, it can just be one. You can also query on that object too.
I have a good example. Overly Normalized database with the following set of relationships:
people -> rel_p2staff -> staff
and
people -> rel_p2prosp -> prospects
Where people has names and persons details, staff has just the staff record details, prospects has just prospects details, and the rel tables are relationship tables with foreign keys from people linking to staff and prospects.
This sort of design carries on for entire database.
Now to query this set of relations it's a multi-table join every time, sometimes 8 and more table join. It has been working fine up to mid this year, when it started getting very slow now that we past 40000 records of people.
Indexing and all low hanging fruits had been used up last year, all queries are optimized to perfection. This is the end of the road for the particular normalized design and management now approved a rebuilt of entire application that depends on it as well as restructure of the database, over a term of 6 months. $$$$ Ouch.
The solution will be to have a direct relation for people -> staff and people -> prospect
ask yourself these questions if you put everything in one table, will you have multiple rows for that user? If you have to update a user do you want to keep an audit trail? Can the user have more than one instance of a data element? (like phone number for instance) will you have a case where you might want to add an element or set of elements later?
if you answer yes then most likely you want to have child tables with foreign key relationships.
Pros of parent/child tables is data integrity, performance via indexes (yes you can do it on a flat table also) and IMO easier to maintain if you need to add a field later, especially if it will be a required field.
Cons design is harder, queries become slightly more complex
But, there are many cases where one big flat table will be appropriate so you have to look at your situation to decide.
I'm already done doing some sort of database design. for me, it depends on the difficulty of the system with database management; yeah it is true to have unique data in one place only but it is really hard to make queries with overly normalized database with lots of record. Just combine the two schema; use one huge table if you feel that you'll be having a massive records that are hard to maintain just like facebook,gmail,etc. and use different table for one set of record for simple system... well this is just my opinion .. i hope it could help.. just do it..you can do it... :)
The conventional way of doing this would be to use different tables as in a star schema or snowflake schema. Howeevr, I would base this strategy to be two fold. I believe in the theory that data should only exist in one place, there for the schema I mentioned would work well. However, I also believe that for reporting engines and BI suites, a columnar approach would be hugely beneficial becuase it is more supportive of the the reporting needs. Columnar approaches like those with infobright.org have huge performance gains and compression that makes using both approaches incredibly useful. Alot of companies are starting to realize that have just one database architecture in the organization is not supportive of the full range of their needs. Alot of companies are implementing both the concept of having more than one database achitecture.
i think having a single table is more effective but you should make sure that the table is organised in a manner that it shows the relationship,trend as well as the difference in variables of the same row.
for example if the table shows age and grades of the students you should arange the table in a manner that thank highest scorer is well differentiated with the lowest scorer and the difference in the age of students is even.
Related
This question already has answers here:
MySQL: multiple tables or one table with many columns? [closed]
(8 answers)
How to design a product table for many kinds of product where each product has many parameters
(4 answers)
Closed 1 year ago.
I am working on a project built with Laravel and migrations are already built. The problem, though, is that a lot of tables are built in the regular way (too many columns). I am planning to divide them to multiple tables and connect with relationships.
For instance, this is a users table (there are much more columns there):
id
username
language
type
name
photo
about
phone
email
password
time_zone
What I am planning to do, is to create something similar to WordPress (speaking about table schema, not the code, of course), so there would be tables:
users,
user_metas // or user_options,
posts,
post_metas // or post_options
Each *_metas table will be related to the main model by hasMany relationship (user hasMany options), or even make a single options table with OneToMany relation to User + Post.
This approach, hopefully, will simplify working with DB in some ways, as well as make it simplier to add new options, but, on the other hand, will make it more heavy for the server (have to perform more SQL queries) and will open possibility to more bugs (more code === more bugs). Also, I am not the fan of putting all of the info to one table, since if there are relations, why not to use them?
Have to note, that user and post will have a lot of options, with ability to add new ones (and different posts won't have the same set of options).
So, what approach is better here?
life is a tradeoff! your challenge is too! but how?
for example, there is a topic named normalization in designing systems that tries to break one big table (that contains all system features together) into smaller ones (1NF Level), then will do it again (all base on determining rules) till 2NF - 3NF - BCNF and ...
although there are more levels after BCNF, it is said that normalization to this level is enough, cause more and more break downs will bring new challenges and disadvantages with itself, for example, more relationships and more queries as you know, and obviously have some advantages: less redundancy and less complexity, you must choose the solutions based on your vision and your desired functionalities.
just consider that there is no pre-defined best practice approach about normalization and whatever you choose has its adv/disadvantages.
in your situation, if users have multiple metadata, if individual access to that data is probable, if users identical information fetches frequently, if you want to have user-profiling microservices later if users meta data are not required and ..., so it is rational to the breaks user database.
I think that having tons of columns in a table is alright if it doesn't violate any normalization forms (you can google it, its like best practices when designing a database).
I would only add relationships if that is required, for example in your DB you want to map multiple phone numbers to one user, so then you add a one-to-many relationship (one user, many phones). Since I'm pretty sure that join operation which you would need to utilize on your read operations doesn't come for free.
For the following part I'm assuming your metas would be defined something among these lines:
user_id; key; value
If you would create something like metas, you would have problem in how to define value datatype. In order to support all possible types i guess you would pick VARCHAR datatype. But then for example if you would store a 32bit number there, you would be wasting server disk space. Since int32 is 4 bytes and -2147483648 is 11 characters so 11 bytes.
Note: most relational databases offer dynamic columns or JSON datatype, but that should only be used when you want to store properties (size, color, weight etc etc) that are not defined at designing stage, and their keys can change from row to row. For example you could see: https://mariadb.com/kb/en/dynamic-columns/
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am designing a database and am wondering which approach should I use. I am going to describe the database I intend to design and the possible approaches that I can use to store the data in the tables.
Please recommend which approach I should use and why?
About the data:
A) I have seven attributes that need to be taken care of. These are just examples and not the actual ones I intend to store. Let me call them:
1)Name
2)DOB (Modified..I had earlier put in age here..)
3)Gender
4)Marital Status
5)Salary
6)Mother Tongue
7)Father's Name
B) There will be a minimum of 10000 rows in the table and they can go up from there in the long term
C) The number of attributes can change over the period of time. That is, new attributes can be added to the existing dataset. No attributes will ever be removed.
Approach 1
Create a table with 7 attributes and store the data as it is. Added new columns if and when new attributed need to be added.
Pro: Easier to read the data and information is well organized
Con: There can be a lot of null values in certain rows for certain attributes for which values are unknown.
Approach 2
Create a table with 3 attributes. Let them be called :
1) Attr_Name : Stores the attribute name . eg name,age,gender ..etc
2) Attr_Value :Stores value for the above attribute, eg : Tom, 25, Male
3) Unique ID : Uniquely identifies the Name, Value pair in the database. eg. SSN
So, in approach 2, in case new attributes need to be added for certain rows, we can just add them to the hashmap we have created without worrying about null values.
Pro: Hashmap structure. Eliminates nulls.
Con: Data is not easy to read. Information cannot be easily grasped.
C) The Question
Which is the better approach.?
I feel that approach 1 is the better approach. Because its not too tough to handle null values and data is well organized and its easy to grasp this king of data. Please suggest which approach I should use and why?
Thanks!
This is a typical narrow table (attribute based) vs. wide table discussion. The problem with approach #2 is that you are probably going to have to pivot the data, to get it into a form the user can work with (back into a wide view format). This can be very resource intensive as the number of rows grows, and as the number of attributes grows. It's also hard to look at the table, in raw table view, and see what's going on.
We have had this discussion many times at our company. We have some tables that lend themselves very well to an attribute type schema. We've always decided against it because of the necessity to pivot the data and the inability to view the data and have it make sense (but this is the lessor of the two problems for us - we just don't want to pivot millions of rows of data).
BTW, I wouldn't store age as a number. I would store the birth date, if you have it. Also, I don't know what 'Mother Tongue' refers to, but, if it's the language the mother speaks, I would store this as a FK to a master language table. It's more efficient and lessens the problem of bad data because of a misspelled language.
Your second option is one of teh worst design mistakes you can make. This should only be done when you have hundreds of attributes that change constantly and are in no way the same from object to object (such as medical lab tests). If you need to do that, then do not under any circumstances use a relational database to do it. NOSQL database handle EAV designs better by far than relational ones.
Another problem with design 2 is that it becomes almost impossible to have good data integrity as you cannot correctly enforce FKs and data types and add contraints to the data. Since this stuff shoudl never be designed to happen only in the application since things other than the application often affect the data, this factor alone is enough to make your second idea foolish and foolhardy.
The first design will perform better in general. It will be easier to write queries and it will force you to think about what needs to change when you add an attribute (this is a plus not a minus) instead of having to design to always show all attributes whether you need them or not. If you would have a lot of nulls, then add a related table rather than more columns(you can have one-to-one related tables). Usually in this case you might have something that you know only a subset of the records will have and they often fall into groupings by subject fairly naturally. For instance you might have general people related attributes (name, phone, email, address) that belong in one table. Then you might have student-related attributes that belong in a separate table and teacher-related attributes that belong in a third table. Or you might have things you need for all insurance policies and separate tables for vehicle insurance, health insurance, House insurance and life insurance.
There is a third design possibility. If you have a set of attributes you know up front then put them in one table and have an EAV table only for attributes that cannot be determined at design time. This is the common pattern when the application wants to have the flexibility for the user to add customer specific data fields.
I don't think anyone can really determine which one is better immediately, but here are a couple of things to think about:
Do you have sample data? If yes then see if there will be a lot of nulls, if there are not then just go with option 1
Do you have a good sense on how the attributes will grow? For instance looking at the attributes you listed above, you may not know all of them, but they all do exist - so in theory you could fill the table. If you will have a lot of sparse data then #2 may work
When you do get new types of data can you group it into another table and use a foreign key? For instance if you want to capture the address you could always have an address table that references your initial table
What type of queries do you plan on using? It's much harder to query a key-value table than a "normal one" (not super hard, just harder - if you're comfortable using implied joins and the like to normalize the data then it's probably not a big deal).
Overall I'd be really careful before you implemented #2 - I've done it for certain specialized cases (metrics gathering where I have dozens of different metrics and don't really want to maintain dozens of different tables) but in general it's more trouble than it's worth.
For something like this I'd just create one table, and either add columns as you go along, or just create new tables for new data structures if necessary.
I have 40+ columns in my table and i have to add few more fields like, current city, hometown, school, work, uni, collage..
These user data wil be pulled for many matching users who are mutual friends (joining friend table with other user friend to see mutual friends) and who are not blocked and also who is not already friend with the user.
The above request is little complex, so i thought it would be good idea to put extra data in same user table to fast access, rather then adding more joins to the table, it will slow the query more down. but i wanted to get your suggestion on this
my friend told me to add the extra fields, which wont be searched on one field as serialized data.
ERD Diagram:
My current table: http://i.stack.imgur.com/KMwxb.png
If i join into more tables: http://i.stack.imgur.com/xhAxE.png
Some Suggestions
nothing wrong with this table and columns
follow this approach MySQL: Optimize table with lots of columns - which serialize extra fields into one field, which are not searchable's
create another table and put most of the data there. (this gets harder on joins, if i already have 3 or more tables to join to pull the records for users (ex. friends, user, check mutual friends)
As usual - it depends.
Firstly, there is a maximum number of columns MySQL can support, and you don't really want to get there.
Secondly, there is a performance impact when inserting or updating if you have lots of columns with an index (though I'm not sure if this matters on modern hardware).
Thirdly, large tables are often a dumping ground for all data that seems related to the core entity; this rapidly makes the design unclear. For instance, the design you present shows 3 different "status" type fields (status, is_admin, and fb_account_verified) - I suspect there's some business logic that should link those together (an admin must be a verified user, for instance), but your design doesn't support that.
This may or may not be a problem - it's more a conceptual, architecture/design question than a performance/will it work thing. However, in such cases, you may consider creating tables to reflect the related information about the account, even if it doesn't have a x-to-many relationship. So, you might create "user_profile", "user_credentials", "user_fb", "user_activity", all linked by user_id.
This makes it neater, and if you have to add more facebook-related fields, they won't dangle at the end of the table. It won't make your database faster or more scalable, though. The cost of the joins is likely to be negligible.
Whatever you do, option 2 - serializing "rarely used fields" into a single text field - is a terrible idea. You can't validate the data (so dates could be invalid, numbers might be text, not-nulls might be missing), and any use in a "where" clause becomes very slow.
A popular alternative is "Entity/Attribute/Value" or "Key/Value" stores. This solution has some benefits - you can store your data in a relational database even if your schema changes or is unknown at design time. However, they also have drawbacks: it's hard to validate the data at the database level (data type and nullability), it's hard to make meaningful links to other tables using foreign key relationships, and querying the data can become very complicated - imagine finding all records where the status is 1 and the facebook_id is null and the registration date is greater than yesterday.
Given that you appear to know the schema of your data, I'd say "key/value" is not a good choice.
I would advice to run some tests. Try it both ways and benchmark it. Nobody will be able to give you a definitive answer because you have not shared your hardware configuration, sample data, sample queries, how you plan on using the data etc. Here is some information that you may want to consider.
Use The Database as it was intended
A relational database is designed specifically to handle data. Use it as such. When written correctly, joining data in a well written schema will perform well. You can use EXPLAIN to optimize queries. You can log SLOW queries and improve their performance. Databases have been around for years, if putting everything into a single table improved performance, don't you think that would be all the buzz on the internet and everyone would be doing it?
Engine Types
How will inserts be affected as the row count grows? Are you using MyISAM or InnoDB? You will most likely want to use InnoDB so you get row level locking and not table. Make sure you are using the correct Engine type for your tables. Get the information you need to understand the pros and cons of both. The wrong engine type can kill performance.
Enhancing Performance using Partitions
Find ways to enhance performance. For example, as your datasets grow you could partition the data. Data partitioning will improve the performance of a large dataset by keeping slices of the data in separate partions allowing you to run queries on parts of large datasets instead of all of the information.
Use correct column types
Consider using UUID Primary Keys for portability and future growth. If you use proper column types, it will improve performance of your data.
Do not serialize data
Using serialized data is the worse way to go. When you use serialized fields, you are basically using the database as a file management system. It will save and retrieve the "file", but then your code will be responsible for unserializing, searching, sorting, etc. I just spent a year trying to unravel a mess like that. It's not what a database was intended to be used for. Anyone advising you to do that is not only giving you bad advice, they do not know what they are doing. There are very few circumstances where you would use serialized data in a database.
Conclusion
In the end, you have to make the final decision. Just make sure you are well informed and educated on the pros and cons of how you store data. The last piece of advice I would give is to find out what heavy users of mysql are doing. Do you think they store data in a single table? Or do they build a relational model and use it the way it was designed to be used?
When you say "I am going to put everything into a single table", you are saying that you know more about performance and can make better choices for optimization in your code than the team of developers that constantly work on MySQL to make it what it is today. Consider weighing your knowledge against the cumulative knowledge of the MySQL team and the DBAs, companies, and members of the database community who use it every day.
At a certain point you should look at the "short row model", also know as entity-key-value stores,as well as the traditional "long row model".
If you look at the schema used by WordPress you will see that there is a table wp_posts with 23 columns and a related table wp_post_meta with 4 columns (meta_id, post_id, meta_key, meta_value). The meta table is a "short row model" table that allows WordPress to have an infinite collection of attributes for a post.
Neither the "long row model" or the "short row model" is the best model, often the best choice is a combination of the two. As #nevillek pointed out searching and validating "short row" is not easy, fetching data can involve pivoting which is annoyingly difficult in MySql and Oracle.
The "long row model" is easier to validate, relate and fetch, but it can be very inflexible and inefficient when the data is sparse. Some rows may have only a few of the values non-null. Also you can't add new columns without modifying the schema, which could force a system outage, depending on your architecture.
I recently worked on a financial services system that had over 700 possible facts for each instrument, most had less than 20 facts. This could have been built by setting up dozens of tables, each for a particular asset class, or as a table with 700 columns, but we chose to use a combination of a table with about 20 columns containing the most popular facts and a 4 column table which contained the other facts. This design was efficient but was difficult ot access, so we built a few table functions in PL/SQL to assist with this.
I have a general comment for you,
Think about it: If you put anything more than 10-12 columns in a table even if it makes sense to put them in a table, I guess you are going to pay the price in the short term, long term and medium term.
Your 3 tables approach seems to be better than the 1 table approach, but consider making those into 5-6 tables rather than 3 tables because you still can.
Move currently, currently_position, currently_link from user-table and work from user-profile into a new table with your primary key called USERWORKPROFILE.
Move locale Information from user-profile to a newer USERPROFILELOCALE information because it is generic in nature.
And yes, all your generic attributes in all the tables should be int and not varchar.
For instance, City needs to move out to a new table called LIST_OF_CITIES with cityid.
And your attribute city should change from varchar to int and point to cityid in LIST_OF_CITIES.
Do not worry about performance issues; the more tables you have, better the performance, because you are actually handing out the performance to the database provider instead of taking it all in your own hands.
Another question.
Is it better to vertically partition wide table (in my instance I am thinking about splitting login details from address, personal etc. details of the user) on a design stage or better leave it be and partition it after having some data and doing profiling?
Answer seems to be obvious but I am concerned that row splitting a table sometime down the line will mean additional work to rewrite the user model + it seems reasonable to split often accessed login details from more static personal details.
Anyone has some experience backed advice on how to proceed :)? Thanks in advance.
Premature optimization is...
Splitting columns off to a different table has drawbacks:
Some operations that required a single query now require two queries or a join
It's not trivial to enforce that every row in each table needs to have a corresponding row in the other. Thus, you might face integrity problems
On the other hand, it's dubious at best that doing it will improve performance. Unless you can prove it beforehand (and creating a 10 million records table with random data and running some queries is trivial), I wouldn't do it. Doug Kress' suggestions of encapsulation and avoiding SELECT * are the right way.
The only reason to do it is if your single table design is not normalized and normalization implies breaking up the table.
I believe it would be better to keep it as a single table, but encapsulate your access to the data as much as possible, so that it would be easy to refactor later.
When you do access the data, be sure to only gather the information you need in the query (avoid 'SELECT *').
Having said that, be sure that the data saved with the table is normalized appropriately. You may find that you want to store multiple addresses for a user, for instance - in which case you should put it in a separate table.
I store various user details in my MySQL database. Originally it was set up in various tables meaning data is linked with UserIds and outputting via sometimes complicated calls to display and manipulate the data as required. Setting up a new system, it almost makes sense to combine all of these tables into one big table of related content.
Is this going to be a help or hindrance?
Speed considerations in calling, updating or searching/manipulating?
Here's an example of some of my table structure(s):
users - UserId, username, email, encrypted password, registration date, ip
user_details - cookie data, name, address, contact details, affiliation, demographic data
user_activity - contributions, last online, last viewing
user_settings - profile display settings
user_interests - advertising targetable variables
user_levels - access rights
user_stats - hits, tallies
Edit: I've upvoted all answers so far, they all have elements that essentially answer my question.
Most of the tables have a 1:1 relationship which was the main reason for denormalising them.
Are there going to be issues if the table spans across 100+ columns when a large portion of these cells are likely to remain empty?
Multiple tables help in the following ways / cases:
(a) if different people are going to be developing applications involving different tables, it makes sense to split them.
(b) If you want to give different kind of authorities to different people for different part of the data collection, it may be more convenient to split them. (Of course, you can look at defining views and giving authorization on them appropriately).
(c) For moving data to different places, especially during development, it may make sense to use tables resulting in smaller file sizes.
(d) Smaller foot print may give comfort while you develop applications on specific data collection of a single entity.
(e) It is a possibility: what you thought as a single value data may turn out to be really multiple values in future. e.g. credit limit is a single value field as of now. But tomorrow, you may decide to change the values as (date from, date to, credit value). Split tables might come handy now.
My vote would be for multiple tables - with data appropriately split.
Good luck.
Combining the tables is called denormalizing.
It may (or may not) help to make some queries (which make lots of JOINs) to run faster at the expense of creating a maintenance hell.
MySQL is capable of using only JOIN method, namely NESTED LOOPS.
This means that for each record in the driving table, MySQL locates a matching record in the driven table in a loop.
Locating a record is quite a costly operation which may take dozens times as long as the pure record scanning.
Moving all your records into one table will help you to get rid of this operation, but the table itself grows larger, and the table scan takes longer.
If you have lots of records in other tables, then increase in the table scan can overweight benefits of the records being scanned sequentially.
Maintenance hell, on the other hand, is guaranteed.
Are all of them 1:1 relationships? I mean, if a user could belong to, say, different user levels, or if the users interests are represented as several records in the user interests table, then merging those tables would be out of the question immediately.
Regarding previous answers about normalization, it must be said that the database normalization rules have completely disregarded performance, and is only looking at what is a neat database design. That is often what you want to achieve, but there are times when it makes sense to actively denormalize in pursuit of performance.
All in all, I'd say the question comes down to how many fields there are in the tables, and how often they are accessed. If user activity is often not very interesting, then it might just be a nuisance to always have it on the same record, for performance and maintenance reasons. If some data, like settings, say, are accessed very often, but simply contains too many fields, it might also not be convenient to merge the tables. If you're only interested in the performance gain, you might consider other approaches, such as keeping the settings separate, but saving them in a session variable of their own so that you don't have to query the database for them very often.
Do all of those tables have a 1-to-1 relationship? For example, will each user row only have one corresponding row in user_stats or user_levels? If so, it might make sense to combine them into one table. If the relationship is not 1 to 1 though, it probably wouldn't make sense to combine (denormalize) them.
Having them in separate tables vs. one table is probably going to have little effect on performance though unless you have hundreds of thousands or millions of user records. The only real gain you'll get is from simplifying your queries by combining them.
ETA:
If your concern is about having too many columns, then think about what stuff you typically use together and combine those, leaving the rest in a separate table (or several separate tables if needed).
If you look at the way you use the data, my guess is that you'll find that something like 80% of your queries use 20% of that data with the remaining 80% of the data being used only occasionally. Combine that frequently used 20% into one table, and leave the 80% that you don't often use in separate tables and you'll probably have a good compromise.
Creating one massive table goes against relational database principals. I wouldn't combine all them into one table. Your going to get multiple instances of repeated data. If your user has three interests for example, you will have 3 rows, with the same user data in just to store the three different interests. Definatly go for the multiple 'normalized' table approach. See this Wiki page for database normalization.
Edit:
I have updated my answer, as you have updated your question... I agree with my initial answer even more now since...
a large portion of these cells are
likely to remain empty
If for example, a user didn't have any interests, if you normalize then you simple wont have a row in the interest table for that user. If you have everything in one massive table, then you will have columns (and apparently a lot of them) that contain just NULL's.
I have worked for a telephony company where there has been tons of tables, getting data could require many joins. When the performance of reading from these tables was critical then procedures where created that could generate a flat table (i.e. a denormalized table) that would require no joins, calculations etc that reports could point to. These where then used in conjunction with a SQL server agent to run the job at certain intervals (i.e. a weekly view of some stats would run once a week and so on).
Why not use the same approach Wordpress does by having a users table with basic user information that everyone has and then adding a "user_meta" table that can basically be any key, value pair associated with the user id. So if you need to find all the meta information for the user you could just add that to your query. You would also not always have to add the extra query if not needed for things like logging in. The benefit to this approach also leaves your table open to adding new features to your users such as storing their twitter handle or each individual interest. You also won't have to deal with a maze of associated ID's because you have one table that rules all metadata and you will limit it to only one association instead of 50.
Wordpress specifically does this to allow for features to be added via plugins, therefore allowing for your project to be more scalable and will not require a complete database overhaul if you need to add a new feature.
I think this is one of those "it depends" situation. Having multiple tables is cleaner and probably theoretically better. But when you have to join 6-7 tables to get information about a single user, you might start to rethink that approach.
I would say it depends on what the other tables really mean.
Does a user_details contain more then 1 more / users and so on.
What level on normalization is best suited for your needs depends on your demands.
If you have one table with good index that would probably be faster. But on the other hand probably more difficult to maintain.
To me it look like you could skip User_Details as it probably is 1 to 1 relation with Users.
But the rest are probably alot of rows per user?
Performance considerations on big tables
"Likes" and "views" (etc) are one of the very few valid cases for 1:1 relationship _for performance. This keeps the very frequent UPDATE ... +1 from interfering with other activity and vice versa.
Bottom line: separate frequent counters in very big and busy tables.
Another possible case is where you have a group of columns that are rarely present. Rather than having a bunch of nulls, have a separate table that is related 1:1, or more aptly phrased "1:rarely". Then use LEFT JOIN only when you need those columns. And use COALESCE() when you need to turn NULL into 0.
Bottom Line: It depends.
Limit search conditions to one table. An INDEX cannot reference columns in different tables, so a WHERE clause that filters on multiple columns might use an index on one table, but then have to work harder to continue the filtering columns in other tables. This issue is especially bad if "ranges" are involved.
Bottom line: Don't move such columns into a separate table.
TEXT and BLOB columns can be bulky, and this can cause performance issues, especially if you unnecessarily say SELECT *. Such columns are stored "off-record" (in InnoDB). This means that the extra cost of fetching them may involve an extra disk hit(s).
Bottom line: InnoDB is already taking care of this performance 'problem'.