How to use Key-Value pair in Relational database(MySql)? - mysql

I wanted to use a relational database(MySql) to store my data as key-value pair.
I would be getting no. of key-value pairs dynamically.
I can create a simple table to store them in separate columns.
Values can be of type- int, varchar, text or date.
The problem which I am facing is:
When I need to run a query on key whose value should be an integer and I need to use and greater than or less than query with it. Same case when I need to use between query with date fields.
How can I achieve it?
------------------------------------------------Edit---------------------------------------------------
For greater clarity, I am providing the background for this question which I have divided into three parts:
1. Data 2: Use Case 3. Possible Designs
1. Data
Suppose I'm creating data store for census of a country**(Just an example)**. Fields for storing data would be different for male, female, boy or girl and also it will vary according to the person's profession. The number of fields depends on the requirement which can increase up to 500 or more.
2. Use Case
Show a paginated list of persons whose monthly income is between $7000 to $10000. User can click on any page number and the database should directly fetch the data for that page number. For example, if we are showing 10 results in a page and user clicks on the 5th page then we should show him the list of the person's from 40 to 50.
Some of the values belonging to a particular group store description which can have large data. So they should be stored as TEXT.
3. Possible Designs
I can create a separate table for each different type and store their data in respective fields. But the problem I'm thinking about this approach is that MySQL table has a maximum row size limit of 65,535 bytes. Going by this approach and storing all data horizontally might cross the max size limit. As the number of fields are not fixed and can change as per requirement.
Instead of storing data horizontally I can store them vertically using Entity Attribute Value design(key-value pair). For now, the increase in the number of rows due to this design is not a problem. Using this I can store data of all male, female or child in the same table. But the problem with this approach is:
I will lose the Datatype of certain important fields. I can not query and get the list of persons whose income is more than 1000.
For storing data or all fields in single Value type, I need to make it varchar. But some fields store large data which requires TEXT as the type.
Considering the above problem, I thought that instead of creating only one value field, I will create multiple value fields like value_int, value_varchar, value_date or value_text.
DB structure
For this problem, I will be using MySQL and cannot change the DB due to certain restrictions. So I am looking for a design with MySQL only.
Going by key-value approach is a good idea or not? Or any other possible design which can be used?

In very general terms, if you know the entities and attributes of your problem domain, and the data is relational, I'd use a relational schema (your "possible design 1"). If you actually encounter problems with maximum row width, your problem domain might contain logical subgroupings of attributes, so you can split them into separate table.
For instance:
Person (id, name, ...)
Person_demographics (person_id, age, location, ...)
Person_finance (person_id, income, wealth...)
If you don't know the entities and attributes in advance, I recommend using MySQL's JSON support. or XML support. This gives you access to much better query options than EAV.
The problem with EAV-like solutions in your scenario is that any non-trivial queries end up being incredibly complicated - "find all responses where salary is between x and y, and the age is z, in locations (a, b, c)" turns into a horrible mess of SQL, but with XPath this is pretty straightforward.

Related

When dealing with databases, does adding a different table when we can use a simple hash a good thing?

For example, here's the problem I faced... I have three tables. Products, Districtrates, Deliverycharges. In my app, a product's delivery will be calculated through a pre-defined rate defined in the Districtrates table. If we want, we can also add a custom rate overriding the pre-defined rate. Each product can have all 25 districts or only some. So here's my solution :
Create three tables as I mentioned above. Districtrates table will only have 25 records for all the 25 districts in my country. For each product, I will add 25 records to the Deliverycharges table with the productID, deliveryrateID and a custom rate value if available. Some products might have less than 25 districts (Only the ones available for that product).
I can even store this in a simple hash in one cell in the products table. Like this : {district1: nil, district2: 234, district4: 543} (It's in Ruby syntax). In here, if the value is nil, we can take the default value from the deliveryrate table. Here also the hash will have all 25 districts! But the above method (creating a table) is easy to work with. The only problem is, it will add nearly 25 records per each product.
So my question is, is this a good thing? This is only one scenario... There are more where we can use one simple array or hash in a cell rather than creating a table. Creating a table is easy to maintain. But is it the right way?
One of the main points of using a relational database is the ability to query (and update) the data in it using SQL.
That only works if you put the data in a form that the database actually understands. Traditionally, this means defining a table schema.
There are now extensions to let the database work with "semi-structured" data (such as XML/JSON/JSONB), but you should only need to go there when the data really does not fit into the relational model, otherwise you are giving up on a lot of features/performance.
If you put a Ruby string into a text column, you will not have any way to use it from SQL. So no proper searching, indexing, or efficient updates of these delivery rates.

Redshift Usage - 1 row by 400 columns per user or (20-400) rows by 4 columns per user

We are building an analytics engine which has to store attribute preference score for each user. We are expecting 400 attributes and they may change(at what frequency is not known as yet). We are planning to store this in Redshift.
My qs is:
Should we store as 1 row per user with 400 cols(1 column for each attribute)
or should we go for a table structure like
(uid, attribute id, attribute value, preference score) which will be (20-400)rows by 3 columns
Which kind of storage would lead to a better performance in Redshift.
Should be really consider NoSQL for this?
Note:
1. This is a backend for real time application with increasing number of users.
2. For processing, the above table has to be read with entire information of all attibutes for one user i.e indirectly create a 1*400 matrix at runtime.
Please help me which desgin would be ideal for such a use case. Thank you
You can go for tables like given in this example and then use bitwise functions
http://docs.aws.amazon.com/redshift/latest/dg/r_bitwise_examples.html
Bitwise functions are here
For your problem, I would suggest a two table design. Its more pain in the beginning but will help in future.
First table would be a key value kind of first table, which would store all the base data and would be kind of future proof, where you can add/remove more attributes, but this table will continue working.
And a N(400 in your case) column 2nd table. This second table you can build using the first table. For the second table, you can start with a bare minimum set of columns .. lets say only 50 out of those 400. So that querying this table would be really fast. And the structure of this table can be refreshed periodically to match with the current reporting requirements. Also you will always have the base table in case you need to backfill any data.

What should I do if I am unable to have a different amount of columns per table row?

Basically, I keep records of client information, and sometimes they will have multiple addresses (properties), e.g.:
id, name, phone, primaryProperty, properties
Considering I cannot create a random number of columns for each entry, I currently just grab the string value from a JavaScript array I create, which is used to hold all of the clients properties, e.g.:
['123 Fakestreet Faketown QC A1A1A1', '555 Falsestreet Falsetown QC B2B2B2']
then I convert it to a string, and then shove it into MySql
"123 Fakestreet Faketown QC A1A1A1, 555 Falsestreet Falsetown QC B2B2B2"
Unfortunately, as cool as arrays are, this makes it so that I can never properly query the above properties individually, unless I echo the value of the column out and use a for loop to change the contents around.
--
I realize that I could probably get away with increasing the total amount of columns on my clients table depending on the maximum amount of properties a client needs, but having a lot of unused fields for all of the other clients is a little strange, no?
I thought about creating a separate table for the properties, but then I would have a problem when it comes to updating the clients information efficiently, and without calling another update.php file. I have a fear of losing internet connection in between updating tables.... if that makes sense.
Normalization is an answer. Just split it into 2 tables. 1st one contains the unique data of the customer (name, shoe size etc.) and the ID. The 2nd table contains the same ID and addresses in the separate rows. Then you just join the tables on the ID column as you need.
Cheers!
G.

Should i use relations or split the result

I'm creating a database that should contain coordinates, textsize, etc.
My first table looks like this
id, template_id, data_1, data_2, data_3, data_4, data_5, data_6, data_7, data_8
Every data_x field should have one of the following formats:
svg string;textsize
x;y;textheight;textwidth
x;y;imageheight;imagewidth
In the future more formats could be added
My question is, should i use those formats (and split them using eg PHP) or should i create a table for each format with relationships? What is the fastest/best practice?
I hope i explained myself well enough..
First, you should not be storing these in separate columns. You should have another table with one row per table1 id and another per data item. It would have at least two columns:
Table1Id
DataColumn
It might also have an auto-incremented id, a sequential number to enumerate the ids and so on.
As for your question on how to store the data, that depends on how you are going to access them. If the database is going to be blind to the contents, the you can store them all in a single field. If you need to access them, then you might have to go to the next level and break things out into separate data tables, one for each type of value, that the above data column would refer to.
In any case, the more important change at this point is to put the "array" of data values into separate rows of another table.

How to design the database when you need too many columns? [duplicate]

This question already has answers here:
How do you know when you need separate tables?
(9 answers)
Closed 9 years ago.
I have a table called cars but each car has hundreds of attributes and they keep on increasing over time (horsepower, torque, a/c, electric windows, etc...) My table has each attribute as a column. Is that the right way to do it when I have thousands of rows and hundreds of columns? Also, I made each attribute a column so I facilitate advanced searching / filtering.
Using MySQL database.
Thanks
This is an interesting question IMHO, and the answer may depend on your specific data model and implementation. The most important factor in this case is data density.
How much of each row is actually filled up, in average?
If most of your fields are always present, then data scope partition may be the way to go.
If most of your fields are empty, then a metadata-like structure (like #JayC suggested) may be more attractive.
Let's use the case you mentioned, and do some simulations.
On the first case, scope partition, the idea is to implement partitions based on scope or usage. As an example of partitioning by usage, let's say that the most retrieved fields are Model, Year, Maker and Color. These fields may compose your main [CAR] table, the owner of the ID field which will exclusively identify the vehicle.
Now let's say that Engine, Horsepower, Torque and Cylinders are also used for searches from time to time, but not so frequently. These may exist on a secondary table [CAR_INFO_1], which is tied to the first table by the presence of the CAR_ID field, a foreign key. Proceed by creating as many partitions you need.
Advantage: Simpler queries. You may coalesce all information about a vehicle if you do a joint query (for example inside a VIEW).
Downside: Maintenance. Each new field must be implemented in the model itself, and you need an updated data model to locate where the field you need is actually stored (or abstract it inside a view.)
Metadata format is much more elegant, but demands more of your database engine. Check #JayC's and #Nitzan Shaked's answers for details.
Advantages: 100% data density. You'll never have empty Data values. Also maintenance - a new attribute is created by adding it as a row to the metadata identifier table. Data structure is less complex as well.
Downside: Complex queries, together with more complex execution plans. Let's say you need all Ford cars made in 2010 that are blue. It would be very trivial on the first case:
SELECT * FROM CAR WHERE Model='Ford' AND Year='2010' AND Color='Blue'
Now the same query on a metadata-structured model:
Assume the existence of this two tables,
CAR_METADATA_TYPE
ID DESC
1 'Model'
2 'Year'
3 'Color'
and
CAR_METADATA [CAR_ID], [METADATA_TYPE_ID], [VALUE]
The query itself would like something like this:
SELECT * FROM CAR, CAR_METADATA [MP1], CAR_METADATA [MP2], CAR_METADATA [MP3]
WHERE MP1.CAR_ID = CAR.ID AND MP1.METADATA_TYPE_ID = 1 AND MP1.Value='Ford'
AND MP2.CAR_ID = CAR.ID AND MP2.METADATA_TYPE_ID = 2 AND MP2.Value='2010'
AND MP3.CAR_ID = CAR.ID AND MP3.METADATA_TYPE_ID = 3 AND MP3.Value='Blue'
So, it all depends on you needs. But given your case, my suggestion would be the Metadata format.
(But do a model cleanup first - no repeated fields, 1:N data on their own table instead of inline fields like Color1, Color2, Color3, this kind of stuff ;) )
I guess the obvious question is, then: why not have a table car_attrs(car, attr, value)? Each attribute is a row. Most queries can be re-written to use this form.
If it is all about features, create a features table, list all your features as rows and give them some sort of automatic id, and create a car_features that with foreign keys to both your cars table and your features table that associates cars with features, maybe along with any values associated with the relationship (one passenger electric seat, etc.).
If you have ever changing attributes, then consider storing them in an XML blob or text structure in one column. This structure is not relational. The most important attributes will then be duplicated in additional columns so you can craft queries to search on them as the Blob will not be searchable from SQL queries. This will cut down on the amount of columns in that table and allow for expansion without changing the database schema.
As others as suggested, if you want all the attributes in a table, then use an attribute table to define them. Then will depend on your requirements and needs of the application.