Mysql: how to structure this data and search it - mysql

I'm new to mysql. Right now, I have this kind of structure in mysql database:
| keyID | Param | Value
| 123 | Location | Canada
| 123 | Cost | 34
| 123 | TransportMethod | Boat
...
...
I have probably like 20 params with unique values for each Key ID. I want to be able to search in mysql given the 20 params with each of the values and figure out which keyID.
Firstly, how should I even restructure mysql database? Should I have 20 param columns + keyID?
Secondly, (relates to first question), how would I do the query to find the keyID?

If your params are identical across different keys (or all params are a subset of some set of params that the objects may have), you should structure the database so that each column is a param, and the row corresponds to one KeyID and the values of its params.
|keyID|Location|Cost|TransportMethod|...|...
|123 |Canada |34 |Boat ...
|124 | ...
...
Then to query for the keyID you would use a SELECT, FROM, and WHERE statement, such as,
SELECT keyID
FROM key_table
WHERE Location='Canada'
AND Cost=34
AND TransportMethod='Boat'
...
for more info see http://www.w3schools.com/php/php_mysql_where.asp
edit: if your params change across different objects (keyIDs) this will require a different approach I think

The design you show is called Entity-Attribute-Value. It breaks many rules of relational database design, and it's very hard to use with SQL.
In a relational database, you should have a separate column for each attribute type.
CREATE TABLE MyTable (
keyID SERIAL PRIMARY KEY,
Location VARCHAR(20),
Cost NUMERIC(9,2),
TransportMethod VARCHAR(10)
);

I agree that Nick's answer is probably best, but if you really want to keep your key/value format, you could accomplish what you want with a view (this is in PostgreSQL syntax, because that's what I'm familiar with, but the concept is the same for MySQL):
CREATE OR REPLACE VIEW myview AS
SELECT keyID,
MAX(CASE WHEN Param = 'Location' THEN Value END) AS Location,
MAX(CASE WHEN Param = 'Cost' THEN Value END) AS Cost,
....
FROM mytable;
Performance here is likely to be dismal, but if your queries are not frequent, it could get the job done.

Related

Is there a way in MySQL to use aggregate functions in a sub section of binary column?

Suppose we have 2 numbers of 3 bits each attached together like '101100', which basically represents 5 and 4 combined. I want to be able to perform aggregation functions like SUM() or AVG() on this column separately for each individual 3-bit column.
For instance:
'101100'
'001001'
sum(first three column) = 6
sum(last three column) = 5
I have already tried the SUBSTRING() function, however, speed is the issue in that case as this query will run on millions of rows regularly. And string matching will slow the query.
I am also open for any new databases or technologies that may support this functionality.
You can use the function conv() to convert any part of the string to a decimal number:
select
sum(conv(left(number, 3), 2, 10)) firstpart,
sum(conv(right(number, 3), 2, 10)) secondpart
from tablename
See the demo.
Results:
| firstpart | secondpart |
| --------- | ---------- |
| 6 | 5 |
With the current understanding I have of your schema (which is next to none), the best solution would be to restructure your schema so that each data point is its own record instead of all the data points being in the same record. Doing this allows you to have a dynamic number of data points per entry. Your resulting table would look something like this:
id | data_type | value
ID is used to tie all of your data points together. If you look at your current table, this would be whatever you are using for the primary key. For this answer, I am assuming id INT NOT NULL but yours may have additional columns.
Data Type indicates what type of data is stored in that record. This would be the current tables column name. I will be using data_type_N as my values, but yours should be a more easily understood value (e.g. sensor_5).
Value is exactly what it says it is, the value of the data type for the given id. Your values appear to be all numbers under 8, so you could use a TINYINT type. If you have different storage types (VARCHAR, INT, FLOAT), I would create a separate column per type (val_varchar, val_int, val_float).
The primary key for this table now becomes a composite: PRIMARY KEY (id, data_type). Since your previously single record will become N records, the primary key will need to adjust to accommodate that.
You will also want to ensure that you have indexes that are usable by your queries.
Some sample values (using what you placed in your question) would look like:
1 | data_type_1 | 5
1 | data_type_2 | 4
2 | data_type_1 | 1
2 | data_type_2 | 1
Doing this, summing the values now becomes trivial. You would only need to ensure that data_type_N is summed with data_type_N. As an example, this would be used to sum your example values:
SELECT data_type,
SUM(value)
FROM my_table
WHERE id IN (1,2)
GROUP BY data_type
Here is an SQL Fiddle showing how it can be used.

SQL: Text type with a lot of commonly used values

I have a table that basically looks like the following:
Timestamp | Service | Observation
----------+---------+------------
... | vm-1 | 15
... | vm-1 | 20
... | vm-1 | 20
... | vm-1 | 20
... | vm-1 | 20
... | vm-1 | 20
... | bvm-2 | 184
... | bvm-2 | 104
... | bvm-2 | 4
... | bvm-2 | 14
... | bvm-2 | 657
... | bvm-2 | 6
... | bvm-2 | 6
The Service column will not have a lot of different values. I don't know at table creation time what all possible values are going to be so I can't use an enum, but the number of distinct values are going to grow very slowly at (less than ~10 new distinct values per month or less), whereas I'll have thousands of new observations per day.
Right now I'm just thinking of using a VARCHAR or mysql's TEXT type for the Service column, but given the specifics of the situation those kind of seem wasteful.
Are databases usually smart about this sort of thing? Or is there some way I can hint to the database that this behavior is something that it can reliably exploit?
I'm using MySQL 5.7. I'd prefer something standards compliant or portable, but I'm also open to MySQL specific workarounds.
EDIT:
In other words, what I want is for the column to be treated like an enum, but have the database figure out dynamically based on the data that shows up in the table what the different enum values are.
Every time you need to use an enum you should consider creating another table and reference to it. It's basic normalization. So create one table for the ServiceType with a name and an id field the name can be VARCHAR and the id should be INT. The actual table then just uses the id instead of the service name.
You can write a simple stored procedure to do the inserting and looking up of duplicate names as well as a view to access the results so outside of the DB you barely know how it is internally handled.
Your stored procedure needs to:
Check if the service exists and insert it if not. INSERT IGNORE ... is probably your friend here.
Get the ID of the service with SELECT id INTO #serv_id FROM ServiceType WHERE name = [service_name];
Insert into the table with the service ID instead of the service.
Don't over optimize. MySQL does not store TINYINT more efficiently than INT so just use the latter and it won't fail until you have billions of services.
I think , you have to create a new table for store the services and and then this table primary key (service_id) can be replaced in place of service text. But main table service column should be int type for storing the service id . So please change the service column type to int(4) .
hope it will be helpfull

check if value is present in one of the database rows

Im looking for a way to check if a value is present in one of the rows of the page column.
For example if should check if the value '45' is present?
Id | page |
---------------
1 | 23 |
---------------
2 | |
---------------
3 | 33,45,55 |
---------------
4 | 45 |
---------------
The find_in_set function is just what you're looking for:
SELECT *
FROM mytable
WHERE FIND_IN_SET('45', page) > 0
You should not store values in lists. This is especially true in this case:
Values should be stored in the proper data type. You are storing numbers as characters.
Foreign key relationships should be properly defined.
SQL doesn't have very good string processing functions.
Resulting queries cannot make use of indexes.
SQL has a great data type for lists, called a table. In this case, you want a junction table.
Sometimes, you are stuck with other people's really bad design decisions. In that case, you can use find_in_set() as suggested by Mureinik.

Mysql efficiently storing dynamic customer data in row or column

'customer_data' table:
id - int auto increment
user_id - int
json - TEXT field containing json object
tags - varchar 200
* id + user_id are set as index.
Each customer (user_id) may have multiple lines.
"json" is text because it may be very large with many keys or or not so big with few keys containing short values.
I usually search for the json for user_id.
Problem: with over 100,000 lines and it takes forever to complete a query. I understand that TEXT field are very wasteful and mysql does not index them well.
Fix 1:
Convert the "json" field to multiple columns in the same table where some columns may be blank.
Fix 2:
Create another table with user_id|key|value, but I may go into huge "joins" and will that not be much slower? Also the key is string but value may be int or text and various lengths. How to I reconcile that?
I know this is a pretty regular usecase, what are the "industry standards" for this usecase?
UPDATE
So I guess Fix 2 is the best option, how would I query this table and get one row result, efficiently?
id | key | value
-------------------
1 | key_1 | A
2 | key_1 | D
1 | key_2 | B
1 | key_3 | C
2 | key_3 | E
result:
id | key_1 | key_2 | key_3
---------------------------
1 | A | B | C
2 | D | | E
This answer is a bit outside the box defined in your question, but I'd suggest:
Fix 3: Use MongoDB instead of MySQL.
This is not to criticize MySQL at all -- MySQL is a great structured relational database implementation. However, you don't seem interested in using either the structured aspects or the relational aspects (either because of the specific use case and requirements or because of your own programming preferences, I'm not sure which). Using MySQL because relational architecture suits your use case (if it does) would make sense; using relational architecture as a workaround to make MySQL efficient for your use case (as seems to be the path you're considering) seems unwise.
MongoDB is another great database implementation, which is less structured and not relational, and is designed for exactly the sort of use case you describe: flexibly storing big blobs of json data with various identifiers, and storing/retrieving them efficiently, without having to worry about structural consistency between different records. JSON is Mongo's native document representation.

Hashed string in MySQL

Is there some kind of hashed string type in MySQL?
Let's say we have a table
user | action | target
-----------------------
1 | likes | 14
2 | follows | 190
I don't want to store "action" as text, because it takes much space and is slow to index. Actions are likely to be limited (up to 50 actions, I guess) but can be added/removed in the future. So I would like to avoid storing all actions by numbers in PHP. I would like to have a table that handles this transparently.
For example, table above would be stored as (1,1,14), (2,2,190) internally, and keys would be stored in another table (1 = likes, 2 = follows).
INSERT INTO table (41, "likes", 153)
Here "likes" is resolved to 1.
INSERT INTO table (23, "dislikes", 1245)
Here we have no key for "dislikes" to it is added and stored internally as 3.
Possible?
If you have a fixed (or reasonably fixed) set of values, then you can use an enum field. This is implemented as a bitmask internally and as a result takes a small amount of disk space. Here is an example definition:
CREATE TABLE enum_test (
myEnum enum('enabled', 'disabled', 'unknown')
);
Yes it is, with a subquery like this:
INSERT INTO table (23, (SELECT id FROM actions WHERE action="dislikes") , 1245)
This way it is possible to don't know the ID from PHP side, but only the action name, and still input it in the database as an ID
This assuming you have a 'actions' table
id | action
-----------
1 | like
2 | dislike
You want a table called "actions", and a foreign key called "action_id". That is how database normalization works:
user_actions:
user | action_id | target
-----------------------
1 | 1 | 14
2 | 2 | 190
actions:
id | name
--------------
1 | likes
2 | follows
As far as making insert into user_actions (1, 'likes', 47) work: You shouldn't care. Trying to make your SQL pretty is a pointless pursuit; you should never actually have to write any in your application code. The database interactions should be handled by a layer of models/business objects, and their internal implementation shouldn't matter to you.
As far as making insert into user_actions (1, 'dislikes', 47) automatically create new records in the actions table: That again isn't the database's job. Your models should be handling this.