Storing attributes with multiple integer values - mysql

I need to store a dynamic number of integer attributes (1-8). I'm storing them in individual columns in the database table, like:
attribute_1, attribute_2, ..., attribute_8
This makes for a fairly messy model when methods need to reference these, as well as an unwieldy database table and schema.
These are assigned default values (but are overridable on a form), and represent unique identifiers for the user.
For instance, a Brew is composed of up to eight batches before they are mixed together in a fermenter. The brewer might want to go back and refer to any one of these by its unique identifying number. I'm assigning these unique values based on the last highest value when creating a new Brew. However, the user may want to override these default values to some other numbers.
In most cases (smaller breweries), they'll probably only use the first two, but some larger breweries would use all eight.
There must be a better way to store these than having eight different attributes with the same name and a number at the end.
I'm using MySQL. Is there an easy/concise way to store an array or a JSON hash but still be able to edit these values on a form?

I would not store attributes like that. It will limit you in the future. Let say you want to know which brews have used attribute_4? You will have to scan the entire brews table, open the attributes field and deconstruct it to see if 4 is in there.
Much better to separate Brew and Attributes in two tables, and link them, like so:
Another benefit, is you can add attributes easily.
Storing JSON is ok, like #max pointed out. I just propose the normalized database way of doing it.

Related

Does storing int value as varchar in mysql affect performance heavily?

I'm working on a website which should be multilingual and also in some products number of fields may be more than other products (for example may be in the future a products have an extra feature which old products doesn't have it). because of this problem I decided to have a product table with common fields which all products can have and in all languages are same (like width and height) and add another three tables for storing extra fields as below:
field (id,name)
field_name(field_id,lang_id,name)
field_value(product_id, field_id, lang_id, value)
by doing this I can fetch all the values from one table but the problem is that values can be in different types, for example it could be a number or a text. I checked on an open source project "Drupal" and in that they create a table for each field type and by doing joins they will retrieve a node data. I want to know which way will impact the performance more? having a table for each extra field or storing all of their value in one table and convert their type on the fly by casting?
thank you in advance
Yes, but no. You are storing your data in an entity-attribute-value form (EAV). This is rather inefficient in general. Here are some issues:
As you have written it, you cannot do type checking.
You cannot set-up foreign key relationships in the database.
Fetching the results for a single row requires multiple joins or a group by.
You cannot write indexes on a specific column to speed access.
There are some work-arounds. You can get around the typing issue by having separate columns for different types. So, the data structure would have:
Name
Type
ValueString
ValueInt
ValueDecimal
Or whatever types you want to support.
There are some other "tricks" if you want to go this route. The most important is to decimal align the numbers. So, instead of storing '1' and '10', you would store ' 1' and '10'. This makes the value more amenable to ordering.
When faced with such a problem, I often advocate a hybrid approach. This approach would have a fixed record with the important properties all nicely located in columns with appropriate types and indexes -- columns such as:
ProductReleaseDate
ProductDescription
ProductCode
And whatever values are most useful. An EAV table can then be used for additional properties that are optional. This generally balances the power of the relational database to handle structured data along with the flexibility of an EAV approach to support variable columns.

Set Data Type in mySQL

My knowledge of relational databases is more limited, but is there a SQL command that can be used to create a column that contains a set in each row?
I am trying to create a table with 2 columns. 1 for specific IDs and a 2nd for sets that correspond to these IDs.
I read about
http://dev.mysql.com/doc/refman/5.1/en/set.html
However, the set data type requires that you know what items may be in your set. However, I just want there to be a variable-number list of items that don't repeat.
It would be much better to create that list of items as multiple rows in a second table. Then you could have as many items in the list you want, you could sort them, search for a specific item, make sure they're unique, etc.
See also my answer to Is storing a delimited list in a database column really that bad?
No, there's no MySQL data type for arbitrary sets. You can use a string containing a comma-delimited list; there are functions like FIND_IN_SET() that will operate on such values.
But this is poor database design. If you have an open-ended list, you should store it in a table with one row per value. This will allow them to be indexed, making searching faster.
MySQL doesn't support arrays, lists or other data structures like that. It does however support strings so use that and FIND_IN_SET() function:
http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_find-in-set
"SET" data type won't be a good choice here.
You can use the "VARCHAR" and store the values in CSV format. You handle them at application level.
Example: INSERT into my_table(id, myset) values(1, "3,4,7");

Which is faster: Many rows or many columns?

In MySQL, is it generally faster/more efficient/scalable to return 100 rows with 3 columns, or 1 row with 100 columns?
In other words, when storing many key => value pairs related to a record, is it better to store each key => value pair in a separate row with with the record_id as a key, or to have one row per record_id with a column for each key?
Also, assume also that keys will need to be added/removed fairly regularly, which I assume would affect the long term maintainability of the many column approach once the table gets sufficiently large.
Edit: to clarify, by "a regular basis" I mean the addition or removal of a key once a month or so.
You should never add or remove columns on a regular basis.
http://en.wikipedia.org/wiki/Entity-Attribute-Value_model
There are a lot of bad things about this model and I would not use it if there was any other alternative. If you don't know the majority (except a few user customizable fields) of data columns you need for your application, then you need to spend more time in design and figure it out.
If your keys are preset (known at design time), then yes, you should put each key into a separate column.
If they are not known in design time, then you have to return your data as a list of key-value pairs which you should later parse outside the RDBMS.
If you are storing key/value pairs, you should have a table with two columns, one for the key (make this the PK for the table) and one for the value (probably don't need this indexed at all). Remember, "The key, the whole key, and nothing but the key."
In the multi-column approach, you will find that you table grows without bound because removing the column will nuke all the values and you won't want to do it. I speak from experience here having worked on a legacy system that had one table with almost 1000 columns, most of which were bit fields. Eventually, you stop being able to make the case to delete any of the columns because someone might be using it and the last time you did it, you had work till 2 am rolling back to backups.
First: determine how frequently your data needs to be accessed. If the data always needs to be retrieved in one shot and most of it used then consider storing all the key pairs as a serialized value or as an xml value. If you need to do any sort of complex analysis on that data and you need the value pairs then columns are ok but limit them to values that you know you will need to perform your queries on. It’s generally easier to design queries that use one column for one parameter than row. You will also find it easier to work with
the returned values if they are all in one row than many.
Second: separate your most frequently accessed data and put it in its own table and the other data in another. 100 columns is a lot by the way so I recommend that you split your data into smaller chunks that will be more manageable.
Lastly: If you have data that may frequently change then you should use create the column (key) in one table and then use its numerical key value against which you would store the key value. This assumes that you will be using the same key more than once and should speed up your search when you go to do your lookup.

Optimal Way to Store/Retrieve Array in Table

I currently have a table in MySQL that stores values normally, but I want to add a field to that table that stores an array of values, such as cities. Should I simply store that array as a CSV? Each row will need it's own array, so I feel uneasy about making a new table and inserting 2-5 rows for each row inserted in the previous table.
I feel like this situation should have a name, I just can't think of it :)
Edit
number of elements - 2-5 (a selection from a dynamic list of cities, the array references the list, which is a table)
This field would not need to be searchable, simply retrieved alongside other data.
The "right" way would be to have another table that holds each value but since you don't want to go that route a delimited list should work. Just make sure that you pick a delimiter that won't show up in the data. You can also store the data as XML depending on how you plan on interacting with the data this may be a better route.
I would go with the idea of a field containing your comma (or other logical delimiter) separated values. Just make sure that your field is going to be big enough to hold your maximum array size. Then when you pull the field out, it should be easy to perform an explode() on the long string using your delimiter, which will then immediately populate your array in the code.
Maybe the word you're looking for is "normalize". As in, move the array to a separate table, linked to the first by means of a key. This offers several advantages:
The array size can grow almost indefinitely
Efficient storage
Ability to search for values in the array without having to use "like"
Of course, the decision of whether to normalize this data depends on many factors that you haven't mentioned, like the number of elements, whether or not the number is fixed, whether the elements need to be searchable, etc.
Is your application PHP? It might be worth investigating the functions serialize and unserialize.
These two functions allow you to easily store an array in the database, then recreate that array at a later time.
As others have mentioned, another table is the proper way to go.
But if you really don't want to do that(?), assuming you're using PHP with MySQL, why not use the serialize() and store a serialized value?

Does MYSQL stores it in an optimal way it if the same string is stored in multiple rows?

I have a table where one of the columns is a sort of id string used to group several rows from the table. Let's say the column name is "map" and one of the values for map is e.g. "walmart". The column has an index on it, because I use to it filter those rows which belong to a certain map.
I have lots of such maps and I don't know how much space the different map values take up from the table. Does MYSQL recognizes the same map value is stored for multiple rows and stores it only once internally and only references it with an internal numeric id?
Or do I have to replace the map string with a numeric id explicitly and use a different table to pair map strings to ids if I want to decrease the size of the table?
MySQL will store the whole data for every row, regardless of whether the data already exists in a different row.
If you have a limited set of options, you could use an ENUM field, else you could pull the names into another table and join on it.
I think MySQL will duplicate your content each time : it stores data row by row, unless you explicitly specify otherwise (putting the data in another table, like you suggested).
Using another table will mean you need to add a JOIN in some of your queries : you might want to think a bit about the size of your data (are they that big ?), compared to the (small ?) performance loss you may encounter because of that join.
Another solution would be using an ENUM datatype, at least if you know in advance which string you will have in your table, and there are only a few of those.
Finally, another solution might be to store an integer "code" corresponding to the strings, and have those code translated to strings by your application, totally outside of the database (or use some table to store the correspondances, but have that table cached by your application, instead of using joins in SQL queries).
It would not be as "clean", but might be better for performances -- still, this may be some kind of micro-optimization that is not necessary in your case...
If you are using the same values over and over again, then there is a good functional reason to move it to a separate table, totally aside from disk space considerations: To avoid problems with inconsistent data.
Suppose you have a table of Stores, which includes a column for StoreName. Among the values in StoreName "WalMart" occurs 300 times, and then there's a "BalMart". Is that just a typo for "WalMart", or is that a different store?
Also, if there's other data associated with a store that would be constant across the chain, you should store it just once and not repeatedly.
Of course, if you're just showing locations on a map and you really don't care what they are, it's just a name to display, then this would all be irrelevant.
And if that's the case, then buying a bigger disk is probably a simpler solution than redesigning your database just to save a few bytes per record. Because if we're talking arbitrary strings for place names here, then trying to find duplicates and have look-ups for them is probably a lot of work for very little gain.