I need to have several forms with drop-down lists using select tags. There are two options I have for representing the selected choice in each list:
Store the choice as a string or integer.
Store all possible choices for a particular list in a separate table, and then use a foreign key from the main table to this table.
For instance, one list will ask the user for the college that he attends. The user can either select one of the choices in the list, or select "Other" and enter a different value in an input box.
Another list will ask how many miles he has driven in the last year. The options would be of the form "0-100 miles", "100-500 miles", "500-1000 miles", and so on. If I use option 1, I could either store the entire string, or a short version of the string, or an integer. In the latter two options, I will manually convert the value to the display value.
I'm leaning towards option 2, but want to avoid having to change everything later. The only issue I've run into with this options is that I have to populate the database with the initial values for each table (I'm using Django and can use fixtures).
Since this is so common, which option do people tend to use? What are the pros and cons?
Definitely option 2.
Normalize everything.
Measure query performance.
Use caching or denormalize only when you notice poor performance results.
Related
I wanted to use a relational database(MySql) to store my data as key-value pair.
I would be getting no. of key-value pairs dynamically.
I can create a simple table to store them in separate columns.
Values can be of type- int, varchar, text or date.
The problem which I am facing is:
When I need to run a query on key whose value should be an integer and I need to use and greater than or less than query with it. Same case when I need to use between query with date fields.
How can I achieve it?
------------------------------------------------Edit---------------------------------------------------
For greater clarity, I am providing the background for this question which I have divided into three parts:
1. Data 2: Use Case 3. Possible Designs
1. Data
Suppose I'm creating data store for census of a country**(Just an example)**. Fields for storing data would be different for male, female, boy or girl and also it will vary according to the person's profession. The number of fields depends on the requirement which can increase up to 500 or more.
2. Use Case
Show a paginated list of persons whose monthly income is between $7000 to $10000. User can click on any page number and the database should directly fetch the data for that page number. For example, if we are showing 10 results in a page and user clicks on the 5th page then we should show him the list of the person's from 40 to 50.
Some of the values belonging to a particular group store description which can have large data. So they should be stored as TEXT.
3. Possible Designs
I can create a separate table for each different type and store their data in respective fields. But the problem I'm thinking about this approach is that MySQL table has a maximum row size limit of 65,535 bytes. Going by this approach and storing all data horizontally might cross the max size limit. As the number of fields are not fixed and can change as per requirement.
Instead of storing data horizontally I can store them vertically using Entity Attribute Value design(key-value pair). For now, the increase in the number of rows due to this design is not a problem. Using this I can store data of all male, female or child in the same table. But the problem with this approach is:
I will lose the Datatype of certain important fields. I can not query and get the list of persons whose income is more than 1000.
For storing data or all fields in single Value type, I need to make it varchar. But some fields store large data which requires TEXT as the type.
Considering the above problem, I thought that instead of creating only one value field, I will create multiple value fields like value_int, value_varchar, value_date or value_text.
DB structure
For this problem, I will be using MySQL and cannot change the DB due to certain restrictions. So I am looking for a design with MySQL only.
Going by key-value approach is a good idea or not? Or any other possible design which can be used?
In very general terms, if you know the entities and attributes of your problem domain, and the data is relational, I'd use a relational schema (your "possible design 1"). If you actually encounter problems with maximum row width, your problem domain might contain logical subgroupings of attributes, so you can split them into separate table.
For instance:
Person (id, name, ...)
Person_demographics (person_id, age, location, ...)
Person_finance (person_id, income, wealth...)
If you don't know the entities and attributes in advance, I recommend using MySQL's JSON support. or XML support. This gives you access to much better query options than EAV.
The problem with EAV-like solutions in your scenario is that any non-trivial queries end up being incredibly complicated - "find all responses where salary is between x and y, and the age is z, in locations (a, b, c)" turns into a horrible mess of SQL, but with XPath this is pretty straightforward.
I am trying to set up a few checklists, which users can save & go back to. I haven't set up user profiles yet.
How should I set up MySQL database? So far I have one database (e.g. lists_db), and am creating a new table for each separate list. Is this the right way to do it?
Also, what fields shall I have? ID presumably, and then what? How does MySQL read checkboxes?
Thanks in advance :-)
Are you storing the questions and answers or just the answers? If you just had one list, you could create a table for the questions (columns question_id, question_text) and a table for the answers (columns question_id, user_id, checked).
You could create a new table for every list, but this might be cumbersome. With two separate tables, it's possible to add a column to the questions (question_group_id) and store everything in a single column.
Regarding how MySQL reads checkboxes, databases generally don't store information specific to a UI component. In this case, the underlying data element is a boolean indicating whether it is checked or not - or in MySQL, a bit datatype.
I suspect you may be conflating persistence with user interface. MySQL has no notion of a 'check box'. Rather, a database is a representer of facts: It will remember the information you ask it to remember, but it makes no attempt at organizing that information in a way that's useful to a particular application.
As for remembering whether a box has been checked, you most likely want BOOLEAN (a synonym for TINYINT(1). You would represent an unchecked box with a value of zero, and a checked box with a non-zero value.
I've been trying to answer a complex Mysql data structure problem for custom fields for an online app. I'm fairly new to Mysql so any input is appreciated.
The current database is a relational database and each user of the service will share the same database and tables.
Here is an example of what I'm trying to do.
Let's say I'm trying to create a list. This list can contain up to 30 custom fields. The user can choose between 12 unique elements and each element can have up to 15 user defined attributes.
Each list can be unique within an account as well as between accounts. Accounts can have numerous lists and each list could have different quantities of elements as well as different attributes per element.
An element can be many things, for example: multiple choice, radio button, phone field, address, single line text, multi-line text, etc.
An example of attributes for a multiple choice (checkbox) element could be: red, green, blue, orange, white, black
An example of a single line text element could be: First Name input field.
Each element must also have a user defined title field and tag field which can be referenced and used in other features of the app.
Segmentation is very important as well. A user needs to be able to segment a list based on any element. For example, a user may want to segment list "ABC" based on all records where "red" is present in multiple choice element #1 (they may have more than 1 multiple choice element for a list).
In this example I would assume that arrays, EAV, Serialized LOB would work fine. However, I'm not sure what would be the best structure for my needs at my scale.
In reality, there will most likely be up to 50,000 records per list and there is a real possibility of 20,000+ accounts - each with numerous lists. Therefore, I'm looking for the most efficient and flexible structure.
To make matters even more complex I also need to ensure an efficient way to add/ delete elements to any particular list at any given time. For example, if a user creates a list with the maximum allow number of custom fields (30) and then three months later decides they want to delete a field, I need a way to find that list and all associated values for that custom field and then delete all the values, element type and its attributes. The user would then be allowed to add a new element to this list.
I've reviewed many of the EAV posts on this site, as well as this http://www.martinfowler.com/eaaCatalog/serializedLOB.html It doesn't seem that EAV would be very efficient for my needs due to the data retrieval downsides.
I was also wondering how well a multi-dimensional array would work at this scale? I believe wordpress uses this for their custom fields.
Any input would be greatly appreciated as to how best to structure the database for this situation. Thank you!
You can read about how FriendFeed implements custom fields:
http://bret.appspot.com/entry/how-friendfeed-uses-mysql
They use a combination of Serialized LOB, with extra tables containing inverted indexes. You don't need an extra table for every possible attribute in your LOB, only the ones you want to search for with assistance from an index.
You can use json enconding and decoding (i'm assuming you're using PHP) to store the input info in a table with a collumn to store the user and other to store this data as text. The answers have to be stored in another table (with a FK to use CASCADE ON DELETE).
If you can specify the max size of the input specification, use a varchar field.
This can't be the best aprouch (need some profiling tests to make sure it's robust enough) but can sure be used.
I'm writing a CMS for various forms and such, and I find I'm creating a lot of drop-downs. I don't really feel like mucking up my database with tons of random key/string value tables for simple drop-downs with 2-4 options that change very infrequently. What do you do to manage this in a responsible way?
This is language-agnostic, but I'm working in Rails, if anyone has specific advice.
We put everything into a single LookUp table in the database, with a column that mapped to an enum that described which lookup it was for (title, country, etc.).
This enabled us to add the flexibility of an "Other, please specify" option in lookup dropdowns. We made a control that encapsulated this, with a property to turn this behaviour on or off on a case-by-case basis.
If the end user picked "Other, please specify", a textbox would appear for them to enter their own value. This would be added to the lookup table, but flagged as an ad hoc item.
The table contained a flag denoting the status of each lookup value: Active, Inactive, AdHoc. Only Active ones would appear in the dropdown; AdHoc ones were those created via the "Other, please specify" option.
An admin page showed the frequency of usage of the AdHoc values, allowing the administrators of the site to promote common popular values into general usage (i.e. changing their Status flag to Active).
This may well be overkill for your app, but it worked really well for ours: the app was basically almost entirely CRUD operations on very business-specific data. We had dozens of lookups throughout the site that the customer wanted to be able to maintain themselves. This gave them total flexibility with no intervention from us.
You cold have one single dropdown table with an extra column to say what the drop down is for... limit the results with a where clause...
At my current position, we implemented a LookupCode table that contains a CodeGroup,Code, and Meaning column, as well as some others (like active). That way you have a single table that contains all of your lookup values are in a single location and you can do some quick lookups to bind to your dropdown lists.
I need to save a list of user ids who viewed a page, streamed a song and / or downloaded it. What I do with the list is add to it and show it. I don't really need to save more info than that, and I came up with two solutions. Which one is better, or is there an even better solution I missed:
The KISS solution - 1 table with the primary key the song id and a text field for each of the three interactions above (view, download, stream) in which there will be a comma separated list of user ids. Adding to it will be just a concatenation operation.
The "best practice" solution - Have 3 tables with the primary key the song id and a field of user id that did the interaction. Each row has one user id and I could add stuff like date and other stuff.
One thing that makes me lean towards options 2 is that it may be easier to check whether the user has already voted on a song?
tl;dr version - Is it better to use a text field to save arrays as comma separated values, or have each item in the array in a separate table row.
Definitely the 2nd:
You'll be able to scale your application as it grows
It will be less programming language dependent
You'll be able to make queries faster and cleaner
It will be less painful for any other programmer coding / debugging your application later
Additionally, I'd add a new table called "operations" with their ID, so you can add different operations if you need later, storing the operation ID instead of a string on each row ("view", "download", "stream").
It's definitely better to have each item in a separate row. Manipulating text fields has performance disadvantages by itself. But if ever you want to find out which songs user 1234 has viewed/listened to/etc., you'd have to do something like
SELECT * FROM songactions WHERE userlist LIKE '%,1234,%' OR userlist LIKE '1234,%' OR userlist LIKE '%,1234' OR userlist='1234';
It'd be just horribly, horribly painful.