Given a series of complex websites that all use the same user tacking mysql database. (this is not our exact situation: but a simplification of the situation to make this post a brief/efficient as possible)
We don't always know where a user is when he starts using a site. In fact there are about 50 points in the code where the country field might get updated. We might collect it from the IP address on use. We might get it when he uses his credit card. We might get it when he fills out a form. Heck we might get it when we talk to him on the phone.
Assume a simple structure like:
CREATE TABLE `Users` (
`ID` INT NOT NULL AUTO_INCREMENT ,
`County` VARCHAR(45) NULL ,
PRIMARY KEY (`ID`) );
What Im wondering is what is the best way to keep track of one more scrap of information on this person:
`Number_of_Users_in_My_Country`.
I know I could run a simple query to get it with each record. But I constantly need two other bits of information: (Keep in mind that Im not really dealing with countries but other groups that number in the 100,000X : again: counties is just to make this post simple)
User count by Country and
Selection of countries with less than x users.
Im wondering if I should create a trigger when the country value changes to update the Number_of_Users_in_My_Country field?
As Im new to mySQL I would love to know thoughts on this or any other approach.
Lots of people will tell you not to do that, because it's not normalized. However, if it's trivial to keep an aggregate value (to save complex joins in certain queries), I'd say go for it. Keep in mind with your triggers that you can't update the same table as the trigger's definition, so be careful in defining how certain events propagate updates to other tables, lest you get in a loop.
An additional recommendation: I would keep a table for countries, and use a foreign key reference from Users to Countries. Then in countries, have a column for total users in that country. Users_in_my_country seems to have a very specific use, and it would be easier to maintain from the countries' perspective.
Given that you've simplified the question somewhat, it's hard to be totally precise.
In general, if at all possible, I prefer to calculate these derived values on the fly. And to find out if it's valuable, I prefer to try it out; 100.000x records is not a particularly scary number, and I'd much prefer to spend time tuning a query/indexing scheme once than dealing with maintenance crazy for the life of the application.
If you've tried that, and still can't get it to work, my next consideration would be to work with stale/cached data. It all depends on your business, but if it's okay for the "number of users in my country" value to be slightly out of date, then calculating these values and caching them in the application layer would be much better. Caching has lots of pre-existing libraries you can use, it's well understood by most developers, and with high traffic web sites, caching for even a few seconds can have a dramatic effect on your performance and scalability. Alternatively, have a script that populates a table "country_usercount" and run it every minute or so.
If the data must, absolutely, be fresh, I'd include the logic to update the counts in the application layer code - it's a bit ugly, but it's easy to debug, and behaves predictably. So, every time the event fires that tells you which country the user is from, you update the country_usercount table from the application code.
The reason I dislike triggers is that they can lead to horrible, hard to replicate bugs and performance issues - if you have several of those aggregated pre-calculated fields, and you write a trigger for each, you could easily end up with lots of unexpected database activity.
Related
suppose I have some table that is tracking a users weight over time.
CREATE TABLE `userWeights` (
`weight_id` int PRIMARY KEY AUTO_INCREMENT,
`user_id` int,
`weight` float,
`date_created` timestamp
);
The obvious use-case here is some REST endpoint getWeights(user_id) and then display it as a graph on the UI.
The standard query I would write would be something like:
SELECT * FROM userWeights WHERE user_id=user_id ORDER BY date_created ASC
but seeing as as we're sorting by date, business logic that would never change for this use-case, could the sorting computational load be outsourced to the client's device and thus improve the performance of the SQL query? would this be a mostly insignificant improvement for, say, a table of 1000 users where we store 6 months of daily weight measurements?
SELECT * FROM userWeights WHERE user_id=user_id
results.sort(date_created, ASCENDING); //e.g. code on android device
EDIT: I ask because many cloud function/cloud database hosts charge based on computation time per invocation.
In general, sorting is going to be faster on the database server than in the client application.
In your two scenarios, the amount of data being passed between the database and the client is the same. The only difference is the overhead for sorting.
Normally, my recommendation is to do the sorting on the more powerful system, closer to the data. However, if you have a good reasons -- and cost considerations are a good reason -- and sorting on the client side fits your application performance requirements, then you can definitely consider doing that processing client-side.
The performance of the actual sorting of the data client-side vs. server-side is negligible - but usually faster closer to the data (doing it on the database server).
However, there are some other things to consider when thinking of where to put the sorting:
Maintainability - If you put the logic client-side, then any new client you add will have to implement the sorting logic as well. So if you have a web app, iOS app and Android app, you will have to make sure to implement the sort in all three applications to get a consistent UX. If you want to change the sort across all apps, you now have to remember to do it in three places, whereas if the logic was in a shared API or DB then you can make the change in one place.
Affordability- Like you said, cloud providers can charge for computation time. If that is a serious concern, you could move the sorting client-side but like I said before, the future maintainability will take a hit which will cost you developer time and therefore dollars in the future.
I would recommend figuring out how import maintainability is to you vs. affordability, and if moving the sorting client-side really would help affordability in the long run.
KISS.
Adding an ORDER BY to the query is simple and hard to screw up.
Adding sorting to the client takes more keystrokes and is more error-prone.
Furthermore, it is possible to eat-your-cake without even paying for it! The sort can be "free":
CREATE TABLE `userWeights` (
`user_id` int,
`weight` float,
`date_created` timestamp,
PRIMARY KEY(user_id, date_created)
);
You still need the ORDER BY clause on the SELECT, but notice that it will take zero effort on either the server or the client.
The way you had the table definition, it had to scan the entire table to find the rows for the requested user_id; this change avoids that, too. Also, this saves space of the useless weight_id.
My suggestion has one problem: You can't record two different weights taken in the same second. (That seems like a bug to even try!)
Cost savings...
The Optimizer will not charge anything for discovering that the data is already sorted. (Above)
Don't say SELECT *, that returns all the columns; you need only two: SELECT date_created, weight.
Syntax problem: WHERE user_id=user_id -- One of those needs to be the parameter coming in. Else, it is equivalent to "TRUE".
We currently have a table that contains 90 columns and as the table is growing and the business needs change, we're having to alter the table alot (add/remove cols & indexes).
|------ (Table name: quotes)
|Column|Type|Null|Default
|------
|//**id**//|int(11)|No|
....
|completed_at|datetime|Yes|NULL
|reviewed_at|datetime|Yes|NULL
|marked_dud_at|datetime|Yes|NULL
|closed_at|datetime|Yes|NULL
|subscribed_at|datetime|Yes|NULL
|admin_checked_at|datetime|Yes|NULL
|priced_at|datetime|Yes|NULL
|number_verified_at|datetime|Yes|NULL
|created_at|datetime|Yes|NULL
|deleted_at|datetime|Yes|NULL
For the application, our staff are constantly querying all sorts of variations on the above data, example being where it has been completed (completed_at), checked (admin_checked_at) and not deleted, reviewed (deleted_at, reviewed_at)
We're thinking it may be easier to offload some of these columns into their own row, we'll call it quotes_actions, then when querying do some joining.
|------ (Table name: quotes_actions)
|Column|Type|Null|Default
|------
|//**id**//|int(11)|No|
|quote_id|int(11)|No|
|action|varchar(100)|No|
|user_id|int(11)|No|
|time|datetime|Yes|NULL
|created_at|datetime|Yes|NULL
An example would be action = 'completed' using the field, with an index covering quote_id and action.
We've split the data into this format on 150,000 rows and it's not any faster nor slower than querying the original database with correct indexes.
Has anyone got any experience with this and has any recommendations or pitfalls for each approach? It's taking a lot of time to add covering indexes and add columns to the original table as we needed them, whereas the second approach has the indexes set up ready to go but is introducing a lot more joins and more complicated queries.
0.09s
select * from `quotes`
where `completed_at` is not null
and `approved_at` is not null
and deleted_at is null
=>
0.0005s
select * from `quotes_new`
inner join quotes_actions as q1 on q1.action = 'completed' and q1.quote_id = quotes_new.id
inner join quotes_actions as q2 on q2.action = 'approved' and q2.quote_id = quotes_new.id
where quotes_new.deleted_at is null
In addition, if the 2nd approach is better, how do you query for negative results, where a quote hasn't been approved?
Database design will vary from application to application, and things that are great for one implementation will be terrible for another. You've identified a few things that are important to you:
speed of data access (at least no reduction in current performance)
ability to respond to application needs/changes
limiting complexity of queries
Without being able to see the entirity of your database and how you are using it, these are the principles I would follow:
Use Stored Procedures and Views for as much as possible
This is just good design. You create an adapter layer between your application and the data tables, which allows you to make whatever changes you need to in the database (and the views/stored procs) without having to change the application itself. Decoupling your systems makes maintenance significantly easier. Also this is good for security, as if the only way outsiders can access the data is through your stored procs, you've eliminated a few avenues of attack. (There's also debate about whether or not the DBMS will cache execution plans for stored procedures, making them execute faster than similar queries, but I'm not a DBA or DBDev, so I'm not touching that).
Attempt to limit width of tables
One thing I've seen time and time again is every time a need arises in a production systems, a column gets added to a table and they call it a day. Far easier than rewriting a bunch of queries or reviewing table structures. This is terrible design. If you've already limited the changes needed to the application layer by following my first piece of advice, you've limited the work needed to actually resolve table changes in the right way. You should always evaluate whether data belongs to the row in question, or if it should be offloaded into its own table. You shouldn't be afraid to radically alter your database, as sometimes it is necessary.
Looking at the data you've provided, I think your second option is okay. You've identified many columns that actually represent the same thing (the "status changes" or as you put it "quote actions" that occur) and offloaded that from the main table to a secondary table. This is perfectly fine, and likely will be effective. You can further "cheat" to make this table faster by offloading status onto its own table, and using an integer to represent it instead of a string (since the string doesn't matter to the database, and integers are far faster to index and search).
This is not to say a wide table is a bad thing, sometimes tables just need to be wide. You just need to evaluate whether the data really belongs to the entity the data row represents.
Approach queries in new ways
You will want to play with the execution plan tools of your DBMS and understand how each query really works. Changing the order of joins can drastically alter the query return speed, and you shouldn't be afraid to use table variables and temp tables in your queries. They are all tools at your disposal.
Querying for Negative Results
Since you asked this question specifically, I'll address it. This requires thinking about your query in a little different way (consequently, if you haven't, you should look into taking a course or working through a textbook of Relational Algebra, it makes understanding databases so much easier).
Your original query made finding something where the quote was not approved easy. It was all in the table: approved_at is null. Simple, easy peasy, no problems. Now, however, instead of being in a column on the main table, it is in its own table, that also represents all the other actions that could be taken. You need to break the problem down a little.
You want to find the set wherein of all orders, there is no action to signify it is approved. In SQL that looks like:
select quote_id from quotes_action where quote_id not in
(select quote_id from quotes_action where action = 'approved');
Final Thoughts
You need to sit down with your team and talk about how you want to move forward with this product. Spend a few days or a couple weeks really thinking deeply about it. Brainstorm....hackathon....do something to find a solution you like and makes your product better and more maintainable. We've all been in the situation where we have an unmaintainable product that could have been fixed at some point, but is beyond that point. Try not to get to that point, and fix it while you have the opportunity.
I have 40+ columns in my table and i have to add few more fields like, current city, hometown, school, work, uni, collage..
These user data wil be pulled for many matching users who are mutual friends (joining friend table with other user friend to see mutual friends) and who are not blocked and also who is not already friend with the user.
The above request is little complex, so i thought it would be good idea to put extra data in same user table to fast access, rather then adding more joins to the table, it will slow the query more down. but i wanted to get your suggestion on this
my friend told me to add the extra fields, which wont be searched on one field as serialized data.
ERD Diagram:
My current table: http://i.stack.imgur.com/KMwxb.png
If i join into more tables: http://i.stack.imgur.com/xhAxE.png
Some Suggestions
nothing wrong with this table and columns
follow this approach MySQL: Optimize table with lots of columns - which serialize extra fields into one field, which are not searchable's
create another table and put most of the data there. (this gets harder on joins, if i already have 3 or more tables to join to pull the records for users (ex. friends, user, check mutual friends)
As usual - it depends.
Firstly, there is a maximum number of columns MySQL can support, and you don't really want to get there.
Secondly, there is a performance impact when inserting or updating if you have lots of columns with an index (though I'm not sure if this matters on modern hardware).
Thirdly, large tables are often a dumping ground for all data that seems related to the core entity; this rapidly makes the design unclear. For instance, the design you present shows 3 different "status" type fields (status, is_admin, and fb_account_verified) - I suspect there's some business logic that should link those together (an admin must be a verified user, for instance), but your design doesn't support that.
This may or may not be a problem - it's more a conceptual, architecture/design question than a performance/will it work thing. However, in such cases, you may consider creating tables to reflect the related information about the account, even if it doesn't have a x-to-many relationship. So, you might create "user_profile", "user_credentials", "user_fb", "user_activity", all linked by user_id.
This makes it neater, and if you have to add more facebook-related fields, they won't dangle at the end of the table. It won't make your database faster or more scalable, though. The cost of the joins is likely to be negligible.
Whatever you do, option 2 - serializing "rarely used fields" into a single text field - is a terrible idea. You can't validate the data (so dates could be invalid, numbers might be text, not-nulls might be missing), and any use in a "where" clause becomes very slow.
A popular alternative is "Entity/Attribute/Value" or "Key/Value" stores. This solution has some benefits - you can store your data in a relational database even if your schema changes or is unknown at design time. However, they also have drawbacks: it's hard to validate the data at the database level (data type and nullability), it's hard to make meaningful links to other tables using foreign key relationships, and querying the data can become very complicated - imagine finding all records where the status is 1 and the facebook_id is null and the registration date is greater than yesterday.
Given that you appear to know the schema of your data, I'd say "key/value" is not a good choice.
I would advice to run some tests. Try it both ways and benchmark it. Nobody will be able to give you a definitive answer because you have not shared your hardware configuration, sample data, sample queries, how you plan on using the data etc. Here is some information that you may want to consider.
Use The Database as it was intended
A relational database is designed specifically to handle data. Use it as such. When written correctly, joining data in a well written schema will perform well. You can use EXPLAIN to optimize queries. You can log SLOW queries and improve their performance. Databases have been around for years, if putting everything into a single table improved performance, don't you think that would be all the buzz on the internet and everyone would be doing it?
Engine Types
How will inserts be affected as the row count grows? Are you using MyISAM or InnoDB? You will most likely want to use InnoDB so you get row level locking and not table. Make sure you are using the correct Engine type for your tables. Get the information you need to understand the pros and cons of both. The wrong engine type can kill performance.
Enhancing Performance using Partitions
Find ways to enhance performance. For example, as your datasets grow you could partition the data. Data partitioning will improve the performance of a large dataset by keeping slices of the data in separate partions allowing you to run queries on parts of large datasets instead of all of the information.
Use correct column types
Consider using UUID Primary Keys for portability and future growth. If you use proper column types, it will improve performance of your data.
Do not serialize data
Using serialized data is the worse way to go. When you use serialized fields, you are basically using the database as a file management system. It will save and retrieve the "file", but then your code will be responsible for unserializing, searching, sorting, etc. I just spent a year trying to unravel a mess like that. It's not what a database was intended to be used for. Anyone advising you to do that is not only giving you bad advice, they do not know what they are doing. There are very few circumstances where you would use serialized data in a database.
Conclusion
In the end, you have to make the final decision. Just make sure you are well informed and educated on the pros and cons of how you store data. The last piece of advice I would give is to find out what heavy users of mysql are doing. Do you think they store data in a single table? Or do they build a relational model and use it the way it was designed to be used?
When you say "I am going to put everything into a single table", you are saying that you know more about performance and can make better choices for optimization in your code than the team of developers that constantly work on MySQL to make it what it is today. Consider weighing your knowledge against the cumulative knowledge of the MySQL team and the DBAs, companies, and members of the database community who use it every day.
At a certain point you should look at the "short row model", also know as entity-key-value stores,as well as the traditional "long row model".
If you look at the schema used by WordPress you will see that there is a table wp_posts with 23 columns and a related table wp_post_meta with 4 columns (meta_id, post_id, meta_key, meta_value). The meta table is a "short row model" table that allows WordPress to have an infinite collection of attributes for a post.
Neither the "long row model" or the "short row model" is the best model, often the best choice is a combination of the two. As #nevillek pointed out searching and validating "short row" is not easy, fetching data can involve pivoting which is annoyingly difficult in MySql and Oracle.
The "long row model" is easier to validate, relate and fetch, but it can be very inflexible and inefficient when the data is sparse. Some rows may have only a few of the values non-null. Also you can't add new columns without modifying the schema, which could force a system outage, depending on your architecture.
I recently worked on a financial services system that had over 700 possible facts for each instrument, most had less than 20 facts. This could have been built by setting up dozens of tables, each for a particular asset class, or as a table with 700 columns, but we chose to use a combination of a table with about 20 columns containing the most popular facts and a 4 column table which contained the other facts. This design was efficient but was difficult ot access, so we built a few table functions in PL/SQL to assist with this.
I have a general comment for you,
Think about it: If you put anything more than 10-12 columns in a table even if it makes sense to put them in a table, I guess you are going to pay the price in the short term, long term and medium term.
Your 3 tables approach seems to be better than the 1 table approach, but consider making those into 5-6 tables rather than 3 tables because you still can.
Move currently, currently_position, currently_link from user-table and work from user-profile into a new table with your primary key called USERWORKPROFILE.
Move locale Information from user-profile to a newer USERPROFILELOCALE information because it is generic in nature.
And yes, all your generic attributes in all the tables should be int and not varchar.
For instance, City needs to move out to a new table called LIST_OF_CITIES with cityid.
And your attribute city should change from varchar to int and point to cityid in LIST_OF_CITIES.
Do not worry about performance issues; the more tables you have, better the performance, because you are actually handing out the performance to the database provider instead of taking it all in your own hands.
I have read many strong statements here and elsewhere on the subject of storing arrays in mysql. The rules of normalization seem to suggest its a bad idea and searching within the stored array fosters inelegant code. HOWEVER, for the application I am working on it seems like a reasonable solution to store an array in a field. I'm sure that is what everyone wrongly thinks in this position but I can't figure out a better way. Here is the setup:
I have a series of tables that store registered students, courses they can take and their performance on each course. All are "normalized" to avoid duplication and errors. I want to be able to generate a "myCourses" section so after login the student sees courses they are eligible for and courses they have taken but are free to review. The approach that comes to mind is two arrays; my_eligible_courses and my_completed_courses. On registration, the student is given a set of courses for which they are eligible. This could be stored as rows where there are multiple occurrences of studentid, one for each course they can take:
student1 course 1
student1 course 2
student1 course n
The table could then be queried for all of student 1's eligible courses and displayed as a list when the student logs in.
Alternately, studentid could be a primary key and in a column "eligible_courses" there would be an array (course 1,course 2, course n).
There is a table for student performance, to record every course taken and metrics associated with student performance. It will be queried to report on student performance, quality of course etc but this table will grow quite large. I'm having a hard time believing that the most efficient way to generate a list of my_completed_courses is to query this table by studentid every time they login just to give them a list of completed courses.
One other complication is that the set of courses a student is eligible is variable and expanding as new courses are developed, which to me seems to suggest that generating a set of new columns for each new course is a bad idea-for example, new course_name, pretest_score, posttest_score, time_to_complete, ... Also, a table for each new course seems like a complicated solution for the relatively mundane endpoint of generating a simple set of lists.
So to restate the question, is it better to store "inelegant" arrayed list of eligible and completed courses in a registered student table or dynamically generate these lists?
I'm guessing this is still too vague but any discussion of db design that gives an example of an inelegant array vs a restructured schema would be appreciated.
You should feel confident that if you have indexes on your tables for the appropriate columns, querying for my_completed_courses will be pretty snappy.
When your table grows to the point that you notice slowdown, you can configure your MySQL server with appropriate memory allocation settings so that it can keep more data cached in memory. Or you could look into that now.
In response to the edit you made about adding new courses: Don't add a new column for each course. Don't add a new table for each course. Create a table for courses, and add rows for each course.
You should then be able to join your tables together on indexed columns to generate the list of data you need.
This is a bad idea for two obvious reasons:
DBMS can't enforce proper referentialX (and possibly domain) integrity and relying on application-level integrity is almost always a bad idea.
While the database will be able to answer the query: "based on given student, give me courses", you won't be able to (efficiently) go in the opposite direction, should you ever need to.
X What's to stop a buggy application from storing a non-existent ID in array? Or deleting a course that is still referenced by students? Even if your application is careful about course deletion, there is no way to do it efficiently - you'll need a full table scan to examine all arrays.
Why are you even trying this? A link (aka. junction) table would solve these problems, for a moderate cost of some additional storage space.
If you are really concerned about storage space, you could even switch the DBMS and use one that supports leading-edge index compression (such as Oracle).
I'm having a hard time believing that the most efficient way to generate a list of my_completed_courses is to query this table by studentid every time they login just to give them a list of completed courses.
Databases are very good at querying humongous amounts of data. In this case, if you use the clustering properly, the DBMS will be able to get this data in very few I/O operations, meaning very fast. Did you perform any actual benchmarks? Have you measured any actual performance problem?
Also, a table for each new course seems like a complicated solution for the relatively mundane endpoint of generating a simple set of lists.
Generating a new table may be justified in case it will have different columns. But, that doesn't sound like what you are trying to do.
It seems to me that you simply need:
CHECK (
(COMPLETED = 0 AND (performance fields) IS NULL)
OR (COMPLETED = 1 AND (performance fields) IS NOT NULL)
)
When a student enrolls into course, insert a row in STUDENT_COURSE, set COMPLETED to 0 and leave performance fields NULL.
When the student completed the course, set COMPLETED to 1 and fill the performance fields.
(BTW, you could even omit COMPLETED altogether and just rely on testing the performance fields for NULL.)
InnoDB tables are clustered, which means that rows in STUDENT_COURSE belonging to the same student are stored physically close together, which means that getting courses of the given student is extremely fast.
If you need to go in the opposite direction (get students of a given course), add an index on same fields but in opposite order: {COURSE_ID, STUDENT_ID}. You might even consider covering in this case.
Since we are talking about small number of rows, leaving COMPLETED unindexed is just fine. If you are really concerned about that, you can even do something like:
The COMPLETED_STUDENT_COURSE is a B-Tree only for completed courses (and essentially a subset of STUDENT_COURSE which is a B-Tree for all enrolled courses).
Here are a few thoughts that I believe may assist you in making a good decision.
Generally, it is a rule to use correctly normalised tables. But there can be exceptions to this. Perhaps your project may be such.
Most of the time, new developers tend to focus on getting the data into a DB. They get stuck when it comes to retrieving it for a specific purpose. So given both cases of arrays vs. relational tables, ask your self if either method serves your purpose. For example, if you wanted to list the courses of student X, your array method is just fine. This is because you can retrieve it by the primary key like a student ID. But if you wanted to know how many students are on course A, the array method will be a horrible way to go.
Then again, the above point would depend on your data volume as well. For example, if you only have about a hundred students, you'll probably not notice a difference in performance. But if you're looking at several thousand records and you have a big list of courses for students, the array approach is not the way to go.
Benchmark. This is the best way for you to find out your answer. You can use MySQL's explain or just time it using your program that executes the queries. Try each method with your standard volume of data and see which one works best. For example, in the recent past, MySQL was boasting about their strength of the ISAM engine. Then I had to work on a large application that involved millions of records. And here, I noticed that each time a new record came in, Indexes had to be rebuilt. So now we had to bend the rules. Likewise, you'd better do your tests with the correct volumes of data and make a better decision.
But do not take this example as a rule. Rather, go by the standards of normalisation and only bend the rules for exceptions.
I'm planning to develop a PHP Web App, it will mainly be used by registered users(sessions)
While thinking about the DB design, I was contemplating that in order to give the best user experience possible there would be lots of options for the user to activate, deactivate, specify, etc.
For example:
- Options for each layout elements, dialog boxes, dashboard, grid, etc.
- color, size, stay visible, invisible, don't ask again, show everytime, advanced mode, simple mode, etc.
This would get like 100s of fields ranging from simple Yes/No or 1 to N values..., for each user.
So, is it having a field for each of these options the way to go?
or how do those CRMs or CMS or other Web Apps do it to store lots of 1-2 char long values?
Do they group them on Text fields separated by a special char and then "explode" them as an array for runtime usage?
thank you
How about something like this:
CREATE TABLE settings (
user_id INT,
setting_name VARCHAR(255),
setting_value CHAR(2)
)
That way, to store a configuration setting for a user, you can do:
INSERT INTO settings (user_id, setting_name, setting_value),
VALUES (1, "timezone", "+8")
And when you need to query a setting for a particular user, you can do:
SELECT setting_value FROM settings
WHERE user_id = 1 AND setting_name = "timezone"
I would absolutely be inclined to have individual fields for each option. My rule of thumb is that each column holds exactly one piece of data whenever possible. No more, no less. As was mentioned earlier, the ease of maintenance and the ability to add / drop options down the road far outweighs the pain in the arse of setting it up. I would, however, put some thought into how you create the table(s). The idea mentioned earlier was to have a Settings table with 100 columns ( one for each option ) and one row for each user. That would work, to be sure. If it were me I would be inclined to break it down a bit further. You start with a basic User table, of course. That would hold the basics of username, password, userid etc. That way you can use the numeric userid as the key index for your Settings table(s). But after that I would try to break down the settings into smaller tables based on logical usage. For example, if you have 100 options, and 19 of those pertain to how a user views / is viewed / behaves in one specific part of the site, say something like a forum, then break those out into a separate table ie ForumSettings. Maybe there are 12 more that pertain to email preferences, but would not be used in other areas of the site / app. Now you have an EmailSettings table. Doing this would not only reduce the number of columns in your generic Settings table, but it would also make writing queries for specific tasks or areas of the app much easier, speed up the performance a tick, and make maintenance moving forward far less painful. Some may disagree as from a strictly data modeling perspective I'm pretty sure that the one Settings table would be indicated. But from a real world perspective, I have never gone wrong using logical chunks such as this.
From a pure data-model perspective, that would be the clearest design (though awful wide). Some might try to bitmask them into a single field for assumed space reasons, but the logic to encode/decode makes that not worthwhile, in my opinion. Also you lose the ability to index on them.
Another option (I just saw posted) is to hold a separate table with an FK back to the user table. But then you have to iterate over the results to get the value you want to check for.