Database design: Using hundred of fields for little values - mysql

I'm planning to develop a PHP Web App, it will mainly be used by registered users(sessions)
While thinking about the DB design, I was contemplating that in order to give the best user experience possible there would be lots of options for the user to activate, deactivate, specify, etc.
For example:
- Options for each layout elements, dialog boxes, dashboard, grid, etc.
- color, size, stay visible, invisible, don't ask again, show everytime, advanced mode, simple mode, etc.
This would get like 100s of fields ranging from simple Yes/No or 1 to N values..., for each user.
So, is it having a field for each of these options the way to go?
or how do those CRMs or CMS or other Web Apps do it to store lots of 1-2 char long values?
Do they group them on Text fields separated by a special char and then "explode" them as an array for runtime usage?
thank you

How about something like this:
CREATE TABLE settings (
user_id INT,
setting_name VARCHAR(255),
setting_value CHAR(2)
)
That way, to store a configuration setting for a user, you can do:
INSERT INTO settings (user_id, setting_name, setting_value),
VALUES (1, "timezone", "+8")
And when you need to query a setting for a particular user, you can do:
SELECT setting_value FROM settings
WHERE user_id = 1 AND setting_name = "timezone"

I would absolutely be inclined to have individual fields for each option. My rule of thumb is that each column holds exactly one piece of data whenever possible. No more, no less. As was mentioned earlier, the ease of maintenance and the ability to add / drop options down the road far outweighs the pain in the arse of setting it up. I would, however, put some thought into how you create the table(s). The idea mentioned earlier was to have a Settings table with 100 columns ( one for each option ) and one row for each user. That would work, to be sure. If it were me I would be inclined to break it down a bit further. You start with a basic User table, of course. That would hold the basics of username, password, userid etc. That way you can use the numeric userid as the key index for your Settings table(s). But after that I would try to break down the settings into smaller tables based on logical usage. For example, if you have 100 options, and 19 of those pertain to how a user views / is viewed / behaves in one specific part of the site, say something like a forum, then break those out into a separate table ie ForumSettings. Maybe there are 12 more that pertain to email preferences, but would not be used in other areas of the site / app. Now you have an EmailSettings table. Doing this would not only reduce the number of columns in your generic Settings table, but it would also make writing queries for specific tasks or areas of the app much easier, speed up the performance a tick, and make maintenance moving forward far less painful. Some may disagree as from a strictly data modeling perspective I'm pretty sure that the one Settings table would be indicated. But from a real world perspective, I have never gone wrong using logical chunks such as this.

From a pure data-model perspective, that would be the clearest design (though awful wide). Some might try to bitmask them into a single field for assumed space reasons, but the logic to encode/decode makes that not worthwhile, in my opinion. Also you lose the ability to index on them.
Another option (I just saw posted) is to hold a separate table with an FK back to the user table. But then you have to iterate over the results to get the value you want to check for.

Related

Best Approach for maintaining 'users in country' count MySql

Given a series of complex websites that all use the same user tacking mysql database. (this is not our exact situation: but a simplification of the situation to make this post a brief/efficient as possible)
We don't always know where a user is when he starts using a site. In fact there are about 50 points in the code where the country field might get updated. We might collect it from the IP address on use. We might get it when he uses his credit card. We might get it when he fills out a form. Heck we might get it when we talk to him on the phone.
Assume a simple structure like:
CREATE TABLE `Users` (
`ID` INT NOT NULL AUTO_INCREMENT ,
`County` VARCHAR(45) NULL ,
PRIMARY KEY (`ID`) );
What Im wondering is what is the best way to keep track of one more scrap of information on this person:
`Number_of_Users_in_My_Country`.
I know I could run a simple query to get it with each record. But I constantly need two other bits of information: (Keep in mind that Im not really dealing with countries but other groups that number in the 100,000X : again: counties is just to make this post simple)
User count by Country and
Selection of countries with less than x users.
Im wondering if I should create a trigger when the country value changes to update the Number_of_Users_in_My_Country field?
As Im new to mySQL I would love to know thoughts on this or any other approach.
Lots of people will tell you not to do that, because it's not normalized. However, if it's trivial to keep an aggregate value (to save complex joins in certain queries), I'd say go for it. Keep in mind with your triggers that you can't update the same table as the trigger's definition, so be careful in defining how certain events propagate updates to other tables, lest you get in a loop.
An additional recommendation: I would keep a table for countries, and use a foreign key reference from Users to Countries. Then in countries, have a column for total users in that country. Users_in_my_country seems to have a very specific use, and it would be easier to maintain from the countries' perspective.
Given that you've simplified the question somewhat, it's hard to be totally precise.
In general, if at all possible, I prefer to calculate these derived values on the fly. And to find out if it's valuable, I prefer to try it out; 100.000x records is not a particularly scary number, and I'd much prefer to spend time tuning a query/indexing scheme once than dealing with maintenance crazy for the life of the application.
If you've tried that, and still can't get it to work, my next consideration would be to work with stale/cached data. It all depends on your business, but if it's okay for the "number of users in my country" value to be slightly out of date, then calculating these values and caching them in the application layer would be much better. Caching has lots of pre-existing libraries you can use, it's well understood by most developers, and with high traffic web sites, caching for even a few seconds can have a dramatic effect on your performance and scalability. Alternatively, have a script that populates a table "country_usercount" and run it every minute or so.
If the data must, absolutely, be fresh, I'd include the logic to update the counts in the application layer code - it's a bit ugly, but it's easy to debug, and behaves predictably. So, every time the event fires that tells you which country the user is from, you update the country_usercount table from the application code.
The reason I dislike triggers is that they can lead to horrible, hard to replicate bugs and performance issues - if you have several of those aggregated pre-calculated fields, and you write a trigger for each, you could easily end up with lots of unexpected database activity.

Rails and mysql - adding columns to a table dynamically based upon form values

I'm working on a legacy app - right now, we allow admins to generate forms with custom fields (they create the field, choose an input type, label, etc).
When the user fills out this custom form, all fields from that form are checked - if that field is not a column on the users table, we add it to the users table as a column.
For example, if an admin added a field called "flight arrival time", we would add a column called "flight_arrival_time" to the users table, and the User model would have an attribute called #user.flight_arrival_time.
What alternatives might there be to this current course of action? Is there a more efficient way of storing these values?
Here are some of the limitations:
We have tens of thousands of users ( I was told that storing these attributes in a different table and joining them would slow the system A LOT. We often have around 20 or so admins querying, importing, updating, and generally using the hell out of our system, which is already pretty slow under load. I wouldn't have the power to say "buy more {X} so we can be faster. ).
I assume a join table (called something like user_attributes) would store the user_id, the attribute name, and the attribute value. If each user has an additional 15 attributes, and we have 100,000 users, how much slower will it be?
The storage must be easily query-able ( We use a dynamic search that allows the users to choose any column from the User model as a search field and find an inputted value ).
Would you option allow easy queries (for instance, find all users whose attribute named "Flight Arrival Time" is tomorrow). Would this also become very slow?
I will experiment a bit, generate some of the proposed schema, generate 100,000 users and 20 attributes for each, and run some test queries to check execution times, but I'd like some input on where to start.
Thanks for your input.
not exactly an answer, but i think that this kind of app would benefit a lot from a document-oriented database / NOSQL
system like mongoDB.
Such systems are schema-less by design.
To add my two cents, let users make dynamic changes on the schema seems a very dangerous option in an RDBMS environment to begin with. You could end up with tables with thousands of mostly empty columns, and rails would instantiate objects with thousands of methods on them. .. and what happens when you delete a column ?
In a long run with the approach you are following can make your database very slow.
Your database size will grow as adding columns according to user behavior will leads to null values for other tuples.
It's better you use Document oriented databases. like mongodb, couchdb, cassandra etc.
There's a gem for that. https://github.com/Liooo/dynabute
Also, this question is a duplication of
Rails: dynamic columns/attributes on models?

Database revisions for data and relations for moderating content changes

Short version: I'm looking for suggestions on how to implement a database versioning system that keeps track of relations as well as record data for moderating a social web app with user-editable content.
Long version: I'm developing a social app where all the users can edit content. To give you an example, let's say one type of an item to edit is a Book. A Book can have a title and a few authors (many-to-many relation to authors table). Of course this example is rather simple; real items will have many more fields as well as relations, but the rule should be the same.
Now, I need to implement a versioning system to moderate changes made by users. Let's say there are two groups of users: normal users and trusted users. Changes made by normal users are logged, but aren't commited until moderator accepts that particular change. Changes made by trusted users are commited immediately, but still logged so that they can be reverted at any time.
If I were to keep revisions of a Book from the example, I would need to keep track of changing relations with authors. I need to be able to track adding and deleting relations, and be able to revert them.
So if the title gets modified, I need to be able to revert the change. If a relation to an author gets deleted, I need to be able to revert that, as well as if some relation gets added, I need to be able to revert that too.
I only need to keep track of an item and it's relations, not anything that it relates to. If we had tables foos, bars, and foos_bars, I would be interested only in logging foos and foos_bars, bars would be pretty independent.
I'm familiar with this great question, as well as it's kind-of-an-adversary solution, and pretty comprehensive article on the second approach and it's follow-up. However, none of those give any special consideration to keeping track of relations as well as normal table data that would be obvious answer to my problem.
I like the one-table-for-all-history approach, as it allows for keeping only part of changes easily, and undo others. Like if one user submitted fields A and B, and then second user submitted A and B, it would be easy to undo just second user's B-change, and keep the A. It's also nice to have one table for the whole functionality, as opposed to many tables with the other approach. It also makes it easy to see who did exactly what (e.g. modified only foobar field) - it doesn't seem to be easy with the other approach. And it seems like it would be easier to automate the moderation process - we don't really even need to know table names, as everything needed is stored in a revision record.
If I were to use the one-revisions-table-for-each-revisioned-table approach, having limited experience in writing triggers, I don't know if it would be possible or relatively easy to implement a system that automatically records an edit, but doesn't commit it immediately unless some parameter is set (e.g. edit_from_trusted_user == true). And it makes me think of triggers invoking when I wouldn't really want them to (as the moderation wouldn't apply to e.g. changes made by admin, or some other "objects" that could try to modify the data).
No matter which solution I choose, it seems as if I'll have to add a rather artificial id to all many-to-many relation tables (instead of [book_id, author_id] I would have [id, book_id, author_id]).
I thought about implementing relations in the one table approach like so:
if we have standard revision table structure
[ID] [int]
[TableName] [varchar]
[RecordID] [int]
[FieldName] [varchar]
[OldValue] [varchar]
[NewValue] [varchar]
[EventType] [enum]
[EventDate] [datetime]
[UserID] [int]
we could store relations by simply setting RecordID and FieldName to NULL, EventType to either ADD or DELETE, and OldValue and NewValue to relation's foreign keys. The only problem is, some of my relations have some additional data (like a graph's edge weight), so I would have to store that somewhere too. Then again, operation of adding a new relation could be split into 2-event sequence: ADD and SET(weight), but then artificial relation IDs would be needed, and I'm not sure if such a solution wouldn't have some bad implications in the future.
There will be around 5 to 10 versioned tables, each with, on average, 3 many-to-many relations to keep track of. I'm using MySQL on InnoDB, app is written in PHP 5.3 and connected to the db using PDO.
Putting versioning in the app logic instead of db triggers is fine with me. I just need the whole thing to work, and be reasonably efficient. I expect reverts to occur rather seldom compared to edits, and edits will be few compared to number of views of content. Only moderators will access revision data, to either accept or reject recent changes.
Do you have any experience implementing such system? What are suggested solutions to this problem? Any considerations that come to mind?
I searched SO and the net for quite some time, but didn't find anything to help me with the matter. However, if I missed something, I'll be grateful for any links / directions.
Thanks.

What is the right way to do flexible columns in database?

Im storing columns in database with users able to add and remove columns, with fake columns. How do I implement this efficiently?
The best way would be to implement the data structure vertically, instead of normal horizontal.
This can be done using something like
TableAttribute
AttributeID
AttributeType
AttributeValue
This application of vertical is mostly used in applications where users can create their own custom forms, and field (if i recall corretly the devexpress form layout allows you to create custom layouts). Mostly used in CRM applications, this is easily modified inproduction, and easily maintainable, but can greatly decrease SQL performance once the data set becomes very large.
EDIT:
This will depend on how far you wish to take it. You can set it up that it will be per form/table, add attributes that describe the actual control (lookup, combo, datetime, etc...) position of the controls, allowed values (min/max/allow null).
This can become a very cumbersome task, but will greatly depend on your actual needs.
I'd think you could allow that at the user-permission level (grant the ALTER privilege on the appropriate tables) and then restrict what types of data can be added/deleted using your presentation layer.
But why add columns? Why not a linked table?
Allowing users to define columns is generally a poor choice as they don't know what they are doing or how to relate it properly to the other data. Sometimes people use the EAV approach to this and let them add as many columns as they want, but this quickly gets out of control and causes performance issues and difficulty in querying the data.
Others take the approach of having a table with user defined columns and give them a set number of columns they can define. This works better performance wise but is more limiting interms of how many new columns they can define.
In any event you should severely restrict who can define the additional columns only to system admins (who can be at the client level). It is a better idea to actually talk to users in the design phase and see what they need. You will find that you can properly design a system that has 90+% of waht the customer needs if you actually talk to them (and not just to managers either, to users at all levels of the organization).
I know it is common in today's world to slough off our responsibility to design by saying we are making things flexible, but I've had to use and provide dba support for many of these systems and the more flexible they try to make the design, the harder it is for the users to use and the more the users hate the system.

MySQL Column Unification, any performance improvements?

I'm designing a MySQL table for an authentication system for a high-traffic personal website. Every time a user comment, article, etc is displayed the following fields will be needed:
login
User Display
User Bio ( A little signature )
Website Account
YouTube Account
Twitter Account
Facebook Account
Lastfm Account
So everything is in one table to prevent the need to call sub-tables. So my question is:
¿Would there be any improvements if I combine Website, Youtube, Twitter, Facebook and Lastfm columns to one?
For example:
[website::something.com][youtube::youtube.com/something]
No, combining these columns would not result in any improvement. Indeed it seems you would extend the overall length (with the adding of prefix and separators, hence potentially worsening performance.
A few other tricks however, may help:
reduce the size of the values stored in "xxxAccount" columns, by removing altogether, or replacing with short-hand codes, the most common parts of these values (the examples shown indicate some kind of URL whereby the beginning will likely be repeated.
depending on the average length of the bio, and typical text found therein, it may also be useful to find ways of shrinking its [storage] size, with simple replacement of common words, or possibly with actual compression (ZIP and such), although doing so may result in having to store the column in a BLOB column which may then become separated from the table, depending on the server implementation/configuration.
And, of course, independently form any improvements at the level of the database, the use model indicated seems to prompt for caching this kind of data agressively, to avoid the trick to SQL altogether.
Well i dont think so , think of it this way .. you will need some way to split them and that would require additional processing and then why not just have one field in the whole table and have everything in that? :) Dont worry about the performance it would be better with separate columns