I've been trying to answer a complex Mysql data structure problem for custom fields for an online app. I'm fairly new to Mysql so any input is appreciated.
The current database is a relational database and each user of the service will share the same database and tables.
Here is an example of what I'm trying to do.
Let's say I'm trying to create a list. This list can contain up to 30 custom fields. The user can choose between 12 unique elements and each element can have up to 15 user defined attributes.
Each list can be unique within an account as well as between accounts. Accounts can have numerous lists and each list could have different quantities of elements as well as different attributes per element.
An element can be many things, for example: multiple choice, radio button, phone field, address, single line text, multi-line text, etc.
An example of attributes for a multiple choice (checkbox) element could be: red, green, blue, orange, white, black
An example of a single line text element could be: First Name input field.
Each element must also have a user defined title field and tag field which can be referenced and used in other features of the app.
Segmentation is very important as well. A user needs to be able to segment a list based on any element. For example, a user may want to segment list "ABC" based on all records where "red" is present in multiple choice element #1 (they may have more than 1 multiple choice element for a list).
In this example I would assume that arrays, EAV, Serialized LOB would work fine. However, I'm not sure what would be the best structure for my needs at my scale.
In reality, there will most likely be up to 50,000 records per list and there is a real possibility of 20,000+ accounts - each with numerous lists. Therefore, I'm looking for the most efficient and flexible structure.
To make matters even more complex I also need to ensure an efficient way to add/ delete elements to any particular list at any given time. For example, if a user creates a list with the maximum allow number of custom fields (30) and then three months later decides they want to delete a field, I need a way to find that list and all associated values for that custom field and then delete all the values, element type and its attributes. The user would then be allowed to add a new element to this list.
I've reviewed many of the EAV posts on this site, as well as this http://www.martinfowler.com/eaaCatalog/serializedLOB.html It doesn't seem that EAV would be very efficient for my needs due to the data retrieval downsides.
I was also wondering how well a multi-dimensional array would work at this scale? I believe wordpress uses this for their custom fields.
Any input would be greatly appreciated as to how best to structure the database for this situation. Thank you!
You can read about how FriendFeed implements custom fields:
http://bret.appspot.com/entry/how-friendfeed-uses-mysql
They use a combination of Serialized LOB, with extra tables containing inverted indexes. You don't need an extra table for every possible attribute in your LOB, only the ones you want to search for with assistance from an index.
You can use json enconding and decoding (i'm assuming you're using PHP) to store the input info in a table with a collumn to store the user and other to store this data as text. The answers have to be stored in another table (with a FK to use CASCADE ON DELETE).
If you can specify the max size of the input specification, use a varchar field.
This can't be the best aprouch (need some profiling tests to make sure it's robust enough) but can sure be used.
Related
What is the most performance-efficient way to allow end-users to add custom properties to a core table used by an application.
For example, core table FRIENDS has columns ID, FIRST_NAME, LAST_NAME, and BIRTHDAY.
User 1 wants to also track additional properties FAVORITE_COLOR and LUCKY_NUMBER, but User 2 wants to also track different additional properties ZODIAC_SIGN, MARRIAGE_ANNIVERSRY_DATE, and GOLF_HANDICAP.
I have implemented two approaches for testing:
First approach: Add a new table FRIENDS_CUSTOM_PROPERTIES having an FK pointer back to FRIENDS and two columns for value pairs (KEY and VALUE such as FAVORITE_COLOR, YELLOW). This approach potentially requires many queries on FRIENDS_CUSTOM_PROPERTIES to retrieve all the properties for a given friend.
Second approach: Add extension columns right on the FRIENDS table itself of varying data types for CUSTOM_1, CUSTOM_2, ... CUSTOM_64, etc. If a user needed more custom properties than there were columns, my design would "spill over" to approach 1. This approach is more brute force but easily results in many NULL column values on many rows.
I can make both work but am unsure the best approach to determine which is better (or if there is already a clear best practice one way or another).
Thanks.
Approach number one is called entity-attribute-value as Rick James noted in the comments. It can do the job, but you sacrifice lots of useful features of SQL, like data types and constraints. See EAV FAIL for some of my writing on this.
You wrote something about running "many queries" but there's no advantage to doing that. You should plan on fetching the set of custom properties for a user in one query, and saving it to a map object in your client application.
The latter approach number two is incomplete. You would also need to store some kind of metadata so you know that for user 1, CUSTOM_1 means "Favorite Color" and CUSTOM_2 means "Lucky Number" and so on. Where do you plan to store the meaning of each column per user?
At least with the EAV design, each attribute comes with a key, so you know exactly what it means. And EAV allows for an unlimited number of properties, because each property gets a new row.
Ultimately, any design that allow for "user-defined properties" conflicts with principles of relational databases. Your columns no longer have any concept of a type. Read a book like SQL and Relation Theory to understand more about this.
I have a web page which feeds users data in HTML tabular format. The table can be quite complex and long (say 40-50 columns in total). Different users will be interested in different columns, so to accommodate, the user has the ability to modify the columns of the table to his own preferences, so that he can:
change their width
change their order
set them hidden
(Each table has a unique table-id, and each column in a table has a unique Id within that table.)
The problem, of course, is that the next time the user uses the website, he won't want to spend time again re-configuring his preferences. So my website needs to store his preferences (which I can do in his database profile).
But what is the best way of storing the required data, so that (using JavaScript) I can re-draw the table for the user, as he last left it?
I can use either XML or JSON as a technical platform for storing the required information.
But the question is, what is the best format for storing each column's width, position and visibility, in such a way that my JavaScript can redraw his screen with minimum processing?
The problem, really, is that sometimes I need to access the column's data by referencing the column ID, and other times by its numeric position. So sometimes an object would be a better option, and sometimes an array.
Interested to know how other people store this kind of data.
--EDIT--
To re-word the question (and abstract away from the real world situation of HTML tables), I suppose what I really want to know is:
How do I best store data in such a way that I can access it by both
its ID (primary-key) and Numeric-Index, given that the Index of the ID
(primary-key) can change over time.
I need to have several forms with drop-down lists using select tags. There are two options I have for representing the selected choice in each list:
Store the choice as a string or integer.
Store all possible choices for a particular list in a separate table, and then use a foreign key from the main table to this table.
For instance, one list will ask the user for the college that he attends. The user can either select one of the choices in the list, or select "Other" and enter a different value in an input box.
Another list will ask how many miles he has driven in the last year. The options would be of the form "0-100 miles", "100-500 miles", "500-1000 miles", and so on. If I use option 1, I could either store the entire string, or a short version of the string, or an integer. In the latter two options, I will manually convert the value to the display value.
I'm leaning towards option 2, but want to avoid having to change everything later. The only issue I've run into with this options is that I have to populate the database with the initial values for each table (I'm using Django and can use fixtures).
Since this is so common, which option do people tend to use? What are the pros and cons?
Definitely option 2.
Normalize everything.
Measure query performance.
Use caching or denormalize only when you notice poor performance results.
I'm trying to do it like this:
Every single user can choose fields (like structures on MySQL) where this fields can handle their respective value, it's like doing a DB inside a DB.
But how can I do it using a single table?
(not talking about user accounts etc where I should be able to use a pointer to his own "structure")
Do something like: varchar Key where register something like "Name:asd" where PHP explode : to get the respective structure ('name' in this case) and the respective value? ('asd')
Use BLOB? can someone turn the light on for me? I don't know how to do something where works better than my current explanation...
I know my text is confuse and sorry for any bad english.
EDIT:
Also, they could add multiple keys/"structures" where accepts a new value
And they are not able to see the Database or Tables, they still normal users
My server does not support Postogre
In my opinion you should create two tables.
with the user info
with 3 fields (userid, key and value)
Each user has 1 record in the first table. Each user can have 0 or more records in the second table. This will ensure you can still search the data and that users can easily add more key/value pairs when needed.
Don't start building a database in a database. In this case, since the user makes the field by himself there is no relation between the fields as I understand? In that case it would make sense to take a look at the NoSQL databases since they seem to fit very good for this kind of situations.
Another thing to check is something like:
http://www.postgresql.org/docs/8.4/static/hstore.html
Do not try to build tables like: records, fields, field types etc. That's a bad practice and should not be needed.
For a more specific answer on your wishes we need a bit more info about the data the user is storing.
While i think the rational answer to this question is the one given by PeeHaa, if you really want the data to fit into one table you could try saving a serialized PHP array in one of the fields. Check out serialize and unserialize
Generates a storable representation of a value
This is useful for storing or passing PHP values around without losing
their type and structure.
This method is discouraged as it is not at all scalable.
Use a table with key-value pairs. So three columns:
user id
key ("name")
value ("asd")
Add an index on user id, so that you can query a user's attributes easily. If you wanted to query all users with the same properties, then you could add a second index on key and/or value.
Hope you are using a programming language also to get the data and present them.
You can have a single table which has a varchar field. Then you store the serialized data of the field structure and their value in that field. When you want to get the structure, query the data and De-serialize that varchar field data.
As per my knowledge every programming language supports serialization and De-serialization.
Edited : This is not a scalable option.
I am working on a large web application project and the previous designer favored the use of ids as handles to form fields over name attributes.
I suppose one advantage of this is that the lookup of that field via Javascript is faster through ids.
A big problem I'm now running into, however, is that ids have global scope. I want to refactor a large set of database column names to a more standard naming scheme, which doesn't include any column name prefix to identify which table the column belongs to. This is going to cause problems in those forms that use ids, since the field ids correspond directly to the column names. Column names which were things like "zon_name" and "pro_name" are now going to both be just "name". This will cause non-unique ids in the html.
So, after that long preamble, here's my question...
Before I try to address this scoping issue by changing all the forms to use name attributes instead of ids, are there any other reasons I'm not considering that the original developer may have had for using ids besides the speediness of their lookup?
I know this is a long one so I appreciate anyone who is brave enough to read through and give a good answer. Thanks!
Name and id do different things and, while there is some overlap, they are not interchangeable for the most important things they do.
Use name
To determine what key will be given to the data when the form is submitted to the server
To create radio groups
From JS/CSS when you need to reference multiple form controls at once (and when adding a class or using the element type is not more appropriate)
Use id
In the for attribute of the control's <label>
From JS/CSS when you need to reference a specific input
I suppose one advantage of this is that the lookup of that field via Javascript is faster through ids.
Not significantly (especially when the name is a unique one).
It sounds like the original designer hasn't been following standard conventions and has come up with something highly JavaScript dependant.
If you're using forms, you should be using <label for="aFormElement"> along with your form elements.
The for attribute on label matches up with an id attribute, not a name attribute.
So, you really need both id (for the label, amongst other things) and name for server-side code.
For the speed to find your elements, you can set just id on the form.
Then for the fields use name to read them like:
var form = document.getElementById('theForm'),
productName = form.productName.value;