Is there some kind of hashed string type in MySQL?
Let's say we have a table
user | action | target
-----------------------
1 | likes | 14
2 | follows | 190
I don't want to store "action" as text, because it takes much space and is slow to index. Actions are likely to be limited (up to 50 actions, I guess) but can be added/removed in the future. So I would like to avoid storing all actions by numbers in PHP. I would like to have a table that handles this transparently.
For example, table above would be stored as (1,1,14), (2,2,190) internally, and keys would be stored in another table (1 = likes, 2 = follows).
INSERT INTO table (41, "likes", 153)
Here "likes" is resolved to 1.
INSERT INTO table (23, "dislikes", 1245)
Here we have no key for "dislikes" to it is added and stored internally as 3.
Possible?
If you have a fixed (or reasonably fixed) set of values, then you can use an enum field. This is implemented as a bitmask internally and as a result takes a small amount of disk space. Here is an example definition:
CREATE TABLE enum_test (
myEnum enum('enabled', 'disabled', 'unknown')
);
Yes it is, with a subquery like this:
INSERT INTO table (23, (SELECT id FROM actions WHERE action="dislikes") , 1245)
This way it is possible to don't know the ID from PHP side, but only the action name, and still input it in the database as an ID
This assuming you have a 'actions' table
id | action
-----------
1 | like
2 | dislike
You want a table called "actions", and a foreign key called "action_id". That is how database normalization works:
user_actions:
user | action_id | target
-----------------------
1 | 1 | 14
2 | 2 | 190
actions:
id | name
--------------
1 | likes
2 | follows
As far as making insert into user_actions (1, 'likes', 47) work: You shouldn't care. Trying to make your SQL pretty is a pointless pursuit; you should never actually have to write any in your application code. The database interactions should be handled by a layer of models/business objects, and their internal implementation shouldn't matter to you.
As far as making insert into user_actions (1, 'dislikes', 47) automatically create new records in the actions table: That again isn't the database's job. Your models should be handling this.
Related
I have a table that basically looks like the following:
Timestamp | Service | Observation
----------+---------+------------
... | vm-1 | 15
... | vm-1 | 20
... | vm-1 | 20
... | vm-1 | 20
... | vm-1 | 20
... | vm-1 | 20
... | bvm-2 | 184
... | bvm-2 | 104
... | bvm-2 | 4
... | bvm-2 | 14
... | bvm-2 | 657
... | bvm-2 | 6
... | bvm-2 | 6
The Service column will not have a lot of different values. I don't know at table creation time what all possible values are going to be so I can't use an enum, but the number of distinct values are going to grow very slowly at (less than ~10 new distinct values per month or less), whereas I'll have thousands of new observations per day.
Right now I'm just thinking of using a VARCHAR or mysql's TEXT type for the Service column, but given the specifics of the situation those kind of seem wasteful.
Are databases usually smart about this sort of thing? Or is there some way I can hint to the database that this behavior is something that it can reliably exploit?
I'm using MySQL 5.7. I'd prefer something standards compliant or portable, but I'm also open to MySQL specific workarounds.
EDIT:
In other words, what I want is for the column to be treated like an enum, but have the database figure out dynamically based on the data that shows up in the table what the different enum values are.
Every time you need to use an enum you should consider creating another table and reference to it. It's basic normalization. So create one table for the ServiceType with a name and an id field the name can be VARCHAR and the id should be INT. The actual table then just uses the id instead of the service name.
You can write a simple stored procedure to do the inserting and looking up of duplicate names as well as a view to access the results so outside of the DB you barely know how it is internally handled.
Your stored procedure needs to:
Check if the service exists and insert it if not. INSERT IGNORE ... is probably your friend here.
Get the ID of the service with SELECT id INTO #serv_id FROM ServiceType WHERE name = [service_name];
Insert into the table with the service ID instead of the service.
Don't over optimize. MySQL does not store TINYINT more efficiently than INT so just use the latter and it won't fail until you have billions of services.
I think , you have to create a new table for store the services and and then this table primary key (service_id) can be replaced in place of service text. But main table service column should be int type for storing the service id . So please change the service column type to int(4) .
hope it will be helpfull
I'm designing a database (MySQL) that will manage a fleet of vehicles.
Company has many garages across the city, at each garage, vehicles gets serviced (operation). An operation can be any of 3 types of services.
Table Vehicle, Table Garagae, Table Operation, Table Operation Type 1, Table Operation Type 2, Table Operation type 3.
Each Operation has the vehicle ID, garage ID, but how do I link it to the the other tables (service tables) depending on which type of service the user chooses?
I would also like to add a billing table, but I'm lost at how to design the relationship between these tables.
If I have fully understood it I would suggest something like this (first of all you shouldn't have three operation tables):
Vehicles Table
- id
- garage_id
Garages Table
- id
Operations/Services Table
- id
- vehicle_id
- garage_id
- type
Customer Table
- id
- service_id
billings Table
- id
- customer_id
You need six tables:
vechicle: id, ...
garage: id, ...
operation: id, vechicle_id, garage_id, operation_type (which can be
one of the tree options/operations available, with the possibility to be extended)
customer: id, ...
billing: id, customer_id, total_amount
billingoperation: id, billing_id, operation_id, item_amount
You definitely should not creat three tables for operations. In the future if you would like to introduce a new operation that would involve creating a new table in the database.
For the record, I disagree with everyone who is saying you shouldn't have multiple operation tables. I think that's perfectly fine, as long as it is done properly. In fact, I'm doing that with one of my products right now.
If I understand, at the core of your question, you're asking how to do table inheritance, because Op Type 1 and Op Type 2 (etc.) IS A Operation. The short answer is that you can't. The longer answer is that you can't...at least not without some helper logic.
I assume you have some sort of program that will pull data from the database, rather than you just writing sql commands by hand. Working under that assumption, let's use this as a subset of your database:
Garage
------
GarageId | GarageLocation | etc.
---------|----------------|------
1 | 123 Main St. | XX
Operation
---------
OperationId | GarageId | TimeStarted | TimeEnded | OperationTypeDescId | OperationTypeId
------------|----------|-------------|-----------|---------------------|----------------
2 | 1 | noon | NULL | 2 | 2
OperationTypeDesc
-------------
OperationTypeDescId | Name | Description
--------------------|-------|-------------------------
1 | OpFoo | Do things with the stuff
2 | OpBar | Do stuff with the things
OpFoo
-----
OpID | Thing1 | Thing2
-----|--------|-------
1 | 123 | abc
OpBar
-----
OpID | Stuff1 | Stuff2
-----|--------|-------
1 | 456 | def
2 | 789 | ghi
Using this setup, you have the following information:
A garage has it's information, plain and simple
An operation has a unique ID (OperationId), a garage where it was executed, an ID referencing the description of the operation, and the OperationType ID (more on this in a moment).
A pre-populated table of operation types. Each type has a unique ID (OperationTypeDescId), the name of the operation, and a human-readable description of what that operation is.
1 table for each row in OperationTypeDesc. For convenience, the table name should be the same as the Name column
Now we can begin to see where inheritance comes into play. In the operation table, the OperationTypeId references the OpId of the relevant table...the "relevant table" is determined by the OperationTypeDescId.
An example: Let's say we had the above data set. In this example we know that there is an operation happening in a garage at 123 Main St. We know it started at noon, and has not yet ended. We know the type of operation is "OpBar". Since we know we're doing an OpBar operation instead of an OpFoo operation, we can focus on only the OpBar-relevant attributes, namely stuff1 and stuff2. Since the Operations's OperationTypeId is 2, we know that Stuff1 is 789 and Stuff2 is ghi.
Now the tricky part. In your program, this is going to require Reflection. If you don't know what that is, it's the practice of getting a Type from the NAME of that type. In our example, we know what table to look at (OpBar) because of its name in the OperationTypeDesc table. Put another way, you don't automatically know what table to look in; reflection tells you that information.
Edit:
Csaba says "In the future if you would like to introduce a new operation that would involve creating a new table in the database". That is correct. You would also need to add a new row to the OperationTypeDesc table. Csaba implies this is a bad thing, and I disagree - with a few provisions. If you are going to be adding a new operation type frequently, then yes, he makes a very good point. you don't want to be creating new tables constantly. If, however, you know ahead of time what types of operations will be performed, and will very rarely add new types of operations, then I maintain this is the way to go. All of your info common to all operations goes in the Operation table, and all op-specific info goes into the relevant "sub-table".
There is one more very important note regarding this. Because of how this is designed, you, the human, must be aware of the design. Whenever you create a new operation type, it's not as simple as creating the new table. Specifically, you have to make sure that the new table name and the OperationTypeDesc "Name" entry are the same. Think of it as an extra constraint - an "INTEGER" column can only contain ints, otherwise the db won't allow the data. In the same manner, the "Name" column can only contain the name of an existing table. You the human must be aware of that constraint, because it cannot be (easily) automatically enforced.
Which of the following options, if any, is considered best practice when designing a table used to store user settings?
(OPTION 1)
USER_SETTINGS
-Id
-Code (example "Email_LimitMax")
-Value (example "5")
-UserId
(OPTION 2)
create a new table for each setting where, for example, notification settings would require you to create:
"USER_ALERT_SETTINGS"
-Id
-UserId
-EmailAdded (i.e true)
-EmailRemoved
-PasswordChanged
...
...
"USER_EMAIL_SETTINGS"
-Id
-UserId
-EmailLimitMax
....
(OPTION 3)
"USER"
-Name
...
-ConfigXML
Other answers have ably outlined the pros and cons of your various options.
I believe that your Option 1 (property bag) is the best overall design for most applications, especially if you build in some protections against the weaknesses of propety bags.
See the following ERD:
In the above ERD, the USER_SETTING table is very similar to OP's. The difference is that instead of varchar Code and Value columns, this design has a FK to a SETTING table which defines the allowable settings (Codes) and two mutually exclusive columns for the value. One option is a varchar field that can take any kind of user input, the other is a FK to a table of legal values.
The SETTING table also has a flag that indicates whether user settings should be defined by the FK or by unconstrained varchar input. You can also add a data_type to the SETTING to tell the system how to encode and interpret the USER_SETTING.unconstrained_value. If you like, you can also add the SETTING_GROUP table to help organize the various settings for user-maintenance.
This design allows you to table-drive the rules around what your settings are. This is convenient, flexible and easy to maintain, while avoiding a free-for-all.
EDIT: A few more details, including some examples...
Note that the ERD, above, has been augmented with more column details (range values on SETTING and columns on ALLOWED_SETTING_VALUE).
Here are some sample records for illustration.
SETTING:
+----+------------------+-------------+--------------+-----------+-----------+
| id | description | constrained | data_type | min_value | max_value |
+----+------------------+-------------+--------------+-----------+-----------+
| 10 | Favourite Colour | true | alphanumeric | {null} | {null} |
| 11 | Item Max Limit | false | integer | 0 | 9001 |
| 12 | Item Min Limit | false | integer | 0 | 9000 |
+----+------------------+-------------+--------------+-----------+-----------+
ALLOWED_SETTING_VALUE:
+-----+------------+--------------+-----------+
| id | setting_id | item_value | caption |
+-----+------------+--------------+-----------+
| 123 | 10 | #0000FF | Blue |
| 124 | 10 | #FFFF00 | Yellow |
| 125 | 10 | #FF00FF | Pink |
+-----+------------+--------------+-----------+
USER_SETTING:
+------+---------+------------+--------------------------+---------------------+
| id | user_id | setting_id | allowed_setting_value_id | unconstrained_value |
+------+---------+------------+--------------------------+---------------------+
| 5678 | 234 | 10 | 124 | {null} |
| 7890 | 234 | 11 | {null} | 100 |
| 8901 | 234 | 12 | {null} | 1 |
+------+---------+------------+--------------------------+---------------------+
From these tables, we can see that some of the user settings which can be determined are Favourite Colour, Item Max Limit and Item Min Limit. Favourite Colour is a pick list of alphanumerics. Item min and max limits are numerics with allowable range values set. The SETTING.constrained column determines whether users are picking from the related ALLOWED_SETTING_VALUEs or whether they need to enter a USER_SETTING.unconstrained_value. The GUI that allows users to work with their settings needs to understand which option to offer and how to enforce both the SETTING.data_type and the min_value and max_value limits, if they exist.
Using this design, you can table drive the allowable settings including enough metadata to enforce some rudimentary constraints/sanity checks on the values selected (or entered) by users.
EDIT: Example Query
Here is some sample SQL using the above data to list the setting values for a given user ID:
-- DDL and sample data population...
CREATE TABLE SETTING
(`id` int, `description` varchar(16)
, `constrained` varchar(5), `data_type` varchar(12)
, `min_value` varchar(6) NULL , `max_value` varchar(6) NULL)
;
INSERT INTO SETTING
(`id`, `description`, `constrained`, `data_type`, `min_value`, `max_value`)
VALUES
(10, 'Favourite Colour', 'true', 'alphanumeric', NULL, NULL),
(11, 'Item Max Limit', 'false', 'integer', '0', '9001'),
(12, 'Item Min Limit', 'false', 'integer', '0', '9000')
;
CREATE TABLE ALLOWED_SETTING_VALUE
(`id` int, `setting_id` int, `item_value` varchar(7)
, `caption` varchar(6))
;
INSERT INTO ALLOWED_SETTING_VALUE
(`id`, `setting_id`, `item_value`, `caption`)
VALUES
(123, 10, '#0000FF', 'Blue'),
(124, 10, '#FFFF00', 'Yellow'),
(125, 10, '#FF00FF', 'Pink')
;
CREATE TABLE USER_SETTING
(`id` int, `user_id` int, `setting_id` int
, `allowed_setting_value_id` varchar(6) NULL
, `unconstrained_value` varchar(6) NULL)
;
INSERT INTO USER_SETTING
(`id`, `user_id`, `setting_id`, `allowed_setting_value_id`, `unconstrained_value`)
VALUES
(5678, 234, 10, '124', NULL),
(7890, 234, 11, NULL, '100'),
(8901, 234, 12, NULL, '1')
;
And now the DML to extract a user's settings:
-- Show settings for a given user
select
US.user_id
, S1.description
, S1.data_type
, case when S1.constrained = 'true'
then AV.item_value
else US.unconstrained_value
end value
, AV.caption
from USER_SETTING US
inner join SETTING S1
on US.setting_id = S1.id
left outer join ALLOWED_SETTING_VALUE AV
on US.allowed_setting_value_id = AV.id
where US.user_id = 234
See this in SQL Fiddle.
Option 1 (as noted, "property bag") is easy to implement - very little up-front analysis. But it has a bunch of downsides.
If you want to restrain the valid values for UserSettings.Code, you need an auxiliary table for the list of valid tags. So you have either (a) no validation on UserSettings.Code – your application code can dump any value in, missing the chance to catch bugs, or you have to add maintenance on the new list of valid tags.
UserSettings.Value probably has a string data type to accommodate all the different values that might go into it. So you have lost the true data type – integer, Boolean, float, etc., and the data type checking that would be done by the RDMBS on insert of an incorrect values. Again, you have bought yourself a potential QA problem. Even for string values, you have lost the ability to constrain the length of the column.
You cannot define a DEFAULT value on the column based on the Code. So if you wanted EmailLimitMax to default to 5, you can’t do it.
Similarly, you can’t put a CHECK constraint on the Values column to prevent invalid values.
The property bag approach loses validation of SQL code. In the named column approach, a query that says “select Blah from UserSettings where UserID = x” will get a SQL error if Blah does not exist. If the SELECT is in a stored procedure or view, you will get the error when you apply the proc/view – way before the time the code goes to production. In the property bag approach, you just get NULL. So you have lost another automatic QA feature provided by the database, and introduced a possible undetected bug.
As noted, a query to find a UserID where conditions apply on multiple tags becomes harder to write – it requires one join into the table for each condition being tested.
Unfortunately, the Property Bag is an invitation for application developers to just stick a new Code into the property bag without analysis of how it will be used in the rest of application. For a large application, this becomes a source of “hidden” properties because they are not formally modeled. It’s like doing your object model with pure tag-value instead of named attributes: it provides an escape valve, but you’re missing all the help the compiler would give you on strongly-typed, named attributes. Or like doing production XML with no schema validation.
The column-name approach is self-documenting. The list of columns in the table tells any developer what the possible user settings are.
I have used property bags; but only as an escape valve and I have often regretted it. I have never said “gee, I wish I had made that explicit column be a property bag.”
Consider this simple example.
If you have 2 tables, UserTable(contains user details) and
SettingsTable(contains settings details). Then create a new table UserSettings for relating the UserTable and SettingsTable as shown below
Hope you will found the right solution from this example.
Each option has its place, and the choice depends on your specific situation. I am comparing the pros and cons for each option below:
Option 1: Pros:
Can handle many options
New options can easily be added
A generic interface can be developed to manage the options
Option 1: Cons
When a new option is added, its more complex to update all user accounts with the new option
Option names can spiral out of control
Validation of allowed option values is more complex, additional meta data is needed for that
Option 2: Pros
Validation of each option is easier than option 1 since each option is an individual column
Option 2: Cons
A database update is required for each new option
With many options the database tables could become more difficult to use
It's hard to evaluate "best" because it depends on the kind of queries you want to run.
Option 1 (commonly known as "property bag", "name value pairs" or "entity-attribute-value" or EAV) makes it easy to store data whose schema you don't know in advance. However, it makes it hard - or sometimes impossible - to run common relational queries. For instance, imagine running the equivalent of
select count(*)
from USER_ALERT_SETTINGS
where EmailAdded = 1
and Email_LimitMax > 5
This would rapidly become very convoluted, especially because your database engine may not compare varchar fields in a numerically meaningful way (so "> 5" may not work the way you expect).
I'd work out the queries you want to run, and see which design supports those queries best. If all you have to do is check limits for an individual user, the property bag is fine. If you have to report across all users, it's probably not.
The same goes for JSON or XML - it's okay for storing individual records, but makes querying or reporting over all users harder. For instance, imagine searching for the configuration settings for email adress "bob#domain.com" - this would require searching through all XML documents to find the node "email address".
I'm new to mysql. Right now, I have this kind of structure in mysql database:
| keyID | Param | Value
| 123 | Location | Canada
| 123 | Cost | 34
| 123 | TransportMethod | Boat
...
...
I have probably like 20 params with unique values for each Key ID. I want to be able to search in mysql given the 20 params with each of the values and figure out which keyID.
Firstly, how should I even restructure mysql database? Should I have 20 param columns + keyID?
Secondly, (relates to first question), how would I do the query to find the keyID?
If your params are identical across different keys (or all params are a subset of some set of params that the objects may have), you should structure the database so that each column is a param, and the row corresponds to one KeyID and the values of its params.
|keyID|Location|Cost|TransportMethod|...|...
|123 |Canada |34 |Boat ...
|124 | ...
...
Then to query for the keyID you would use a SELECT, FROM, and WHERE statement, such as,
SELECT keyID
FROM key_table
WHERE Location='Canada'
AND Cost=34
AND TransportMethod='Boat'
...
for more info see http://www.w3schools.com/php/php_mysql_where.asp
edit: if your params change across different objects (keyIDs) this will require a different approach I think
The design you show is called Entity-Attribute-Value. It breaks many rules of relational database design, and it's very hard to use with SQL.
In a relational database, you should have a separate column for each attribute type.
CREATE TABLE MyTable (
keyID SERIAL PRIMARY KEY,
Location VARCHAR(20),
Cost NUMERIC(9,2),
TransportMethod VARCHAR(10)
);
I agree that Nick's answer is probably best, but if you really want to keep your key/value format, you could accomplish what you want with a view (this is in PostgreSQL syntax, because that's what I'm familiar with, but the concept is the same for MySQL):
CREATE OR REPLACE VIEW myview AS
SELECT keyID,
MAX(CASE WHEN Param = 'Location' THEN Value END) AS Location,
MAX(CASE WHEN Param = 'Cost' THEN Value END) AS Cost,
....
FROM mytable;
Performance here is likely to be dismal, but if your queries are not frequent, it could get the job done.
I've got two tables
A:
plant_ID | name.
1 | tree
2 | shrubbery
20 | notashrubbery
B:
area_ID | name | plants
1 | forrest | *needhelphere*
now I want the area to store any number of plants, in a specific order and some plants might show up a number of times: e.g 2,20,1,2,2,20,1
Whats the most efficient way to store this array of plants?
Keeping in mind I need to make it so that if I perform a search to find areas with plant 2, i don't get areas which are e.g 1,20,232,12,20 (pad with leading 0s?) What would be the query for that?
if it helps, let's assume I have a database of no more than 99999999 different plants. And yes, this question doesn't have anything to do with plants....
Bonus Question
Is it time to step away from MySQL? Is there a better DB to manage this?
If you're going to be searching both by forest and by plant, sounds like you would benefit from a full-on many-to-many relationship. Ditch your plants column, and create a whole new areas_plants table (or whatever you want to call it) to relate the two tables.
If area 1 has plants 1 and 2, and area 2 has plants 2 and 3, your areas_plants table would look like this:
area_id | plant_id | sort_idx
-----------------------------
1 | 1 | 0
1 | 2 | 1
2 | 2 | 0
2 | 3 | 1
You can then look up relationships from either side, and use simple JOINs to get the relevant data from either table. No need to muck about in LIKE conditions to figure out if it's in the list, blah, bleh, yuck. I've been there for a legacy database. No fun. Use SQL to its greatest potential.
How about this:
table: plants
plant_ID | name
1 | tree
2 | shrubbery
20 | notashrubbery
table: areas
area_ID | name
1 | forest
table: area_plant_map
area_ID | plant_ID | sequence
1 | 1 | 0
1 | 2 | 1
1 | 20 | 2
That's the standard normalized way to do it (with a mapping table).
To find all areas with a shrubbery (plant 2), do this:
SELECT *
FROM areas
INNER JOIN area_plant_map ON areas.area_ID = area_plant_map.area_ID
WHERE plant_ID = 2
You know this violates normal form?
Typically, one would have an areaplants table: area_ID, plant_ID with a unique constraint on the two and foreign keys to the other two tables. This "link" table is what gives you many-many or many-to-one relationships.
Queries on this are generally very efficient, they utilize indexes and do not require parsing strings.
8 years after this question was asked, here's 2 ideas:
1. Use json type (link)
As of MySQL 5.7.8, MySQL supports a native JSON data type defined by RFC 7159 that enables efficient access to data in JSON (JavaScript Object Notation) documents.
2. Use your own codification
Turn area_id into a string field (varchar or text, your choice, think about performance), then you can represent values as for example -21-30-2-4-20- then you can filter using %-2-%.
If you somehow try one of these, I'd love it if you shared your performance results, with 100M rows as you suggested.
--
Remember than using any of these breaks first rule of normalization, which says every column should hold a single value
Your relation attributes should be atomic, not made up of multiple values like lists. It is too hard to search them. You need a new relation that maps the plants to the area_ID and the area_ID/plant combination is the primary key.
Use many-to-many relationship:
CREATE TABLE plant (
plant_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255)
) ENGINE=INNODB;
CREATE TABLE area (
area_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255)
) ENGINE=INNODB;
CREATE TABLE plant_area_xref (
plant_id INT NOT NULL,
area_id INT NOT NULL,
sort_idx INT NOT NULL,
FOREIGN KEY (plant_id) REFERENCES plant(plant_id) ON DELETE CASCADE,
FOREIGN KEY (area_id) REFERENCES area(area_id) ON DELETE CASCADE,
PRIMARY KEY (plant_id, area_id, sort_idx)
) ENGINE=INNODB;
EDIT:
Just to answer your bonus question:
Bonus Question Is it time to step away from MySQL? Is there a better DB to manage this?
This has nothing to do with MySQL. This was just an issue with bad database design. You should use intersection tables and many-to-many relationship for cases like this in every RDBMS (MySQL, Oracle, MSSQL, PostgreSQL etc).