I am trying to make a simple item database using MySQL for a game. Here is what my 3 tables would look like
items
itemId | itemName
-------------------
0001 | chest piece
0002 | sword
0003 | helmet
attributes (attribute lookup table)
attributeId | attributeName
---------------------------------
01 | strength
02 | agility
03 | intellect
04 | defense
05 | damage
06 | mana
07 | stamina
08 | description
09 | type
item_attributes (junction table)
itemId | attributeId | value (mixed type, bad?)
------------------------------------
0001 | 01 | 35
0001 | 03 | 14
0001 | 09 | armor
0001 | 08 | crafted by awesome elves
0002 | 09 | weapon
0002 | 05 | 200
0002 | 02 | 15
0002 | 08 | your average sword
0003 | 04 | 9000
0003 | 09 | armor
0003 | 06 | 250
My problem with this design is that value column in item_attributes table needs to use varchar data type, since the value's data can be int, char, varchar. I think this is a bad approach because I would not be able to quickly sort my items based on particular attributes. It would also suffer performance hit when a query such as get items with attribute strength that has value between 15 and 35 is processed.
Here is my potential fix. I simply added a data_type column to the attributes table. So it would look something like this
attributes (attribute lookup table)
attributeId | attributeName | data_type
---------------------------------------------------
01 | strength | int
09 | type | char
08 | intellect | varchar
Then I would add 3 more columns to item_attributes table, int, char, varchar. Here is how the new item_attributes table would look like.
item_attributes (junction table)
itemId | attributeId | value | int | char | varchar
------------------------------------------------------------------------
0002 | 09 | weapon | null |weapon| null
0002 | 05 | 200 | 200 | null | null
0002 | 02 | 15 | 15 | null | null
0002 | 08 | your average sword | null | null | your average sword
So now if I were to sort items based on its strength attribute, I would use int column. Or search for an item based on its description, I would search the varchar column.
I still, however, believe my design is a bit weird. Now I would have to look up the data_type column in attribute table and dynamically determine which column in item_attributes table is relevant to what I am looking for.
Any inputs would be greatly appreciated.
Thanks in advance.
EDIT 11/29/2010
Here is a detailed list of my items
--------------------------------------
http://wow.allakhazam.com/ihtml?27718
Aldor Defender's Legplates
Binds when picked up
LegsPlate
802 Armor
+21 Strength
+14 Agility
+21 Stamina
Item Level 99
Equip: Improves hit rating by 14.
--------------------------------------
http://wow.allakhazam.com/ihtml?17967
Refined Scale of Onyxia
Leather
Item Level 60
--------------------------------------
http://wow.allakhazam.com/ihtml?27719
Aldor Leggings of Puissance
Binds when picked up
LegsLeather
202 Armor
+15 Agility
+21 Stamina
Item Level 99
Equip: Increases attack power by 28.
Equip: Improves hit rating by 20.
--------------------------------------
http://wow.allakhazam.com/ihtml?5005
Emberspark Pendant
Binds when equipped
NeckMiscellaneous
+2 Stamina
+7 Spirit
Requires Level 30
Item Level 35
--------------------------------------
http://wow.allakhazam.com/ihtml?23234
Blue Bryanite of Agility
Gems
Requires Level 2
Item Level 10
+8 Agility
--------------------------------------
http://wow.allakhazam.com/ihtml?32972
Beer Goggles
Binds when picked up
Unique
HeadMiscellaneous
Item Level 10
Equip: Guaranteed by Belbi Quikswitch to make EVERYONE look attractive!
--------------------------------------
http://wow.allakhazam.com/ihtml?41118
Gadgetzan Present
Binds when picked up
Unique
Item Level 5
"Please return to a Season Organizer"
--------------------------------------
http://wow.allakhazam.com/ihtml?6649
Searing Totem Scroll
Unique
Quest Item
Requires Level 10
Item Level 10
Use:
--------------------------------------
http://wow.allakhazam.com/ihtml?6648
Stoneskin Totem Scroll
Unique
Quest Item
Requires Level 4
Item Level 4
Use:
--------------------------------------
http://wow.allakhazam.com/ihtml?27864
Brian's Bryanite of Extended Cost Copying
Gems
Item Level 10
gem test enchantment
--------------------------------------
EDIT #2
These 10 examples are not representative of all 35316 items data that I have collected.
NeckMiscellaneous means that item is in both categories of `Neck` and `Misc`.
Unique means the only one item can be used on character.
Don’t read too much into the “Action”, they are just quest description
When an item says `Equip: increase attack power by 28` it just means +28 attack power on the player character. It is the same as +15 agility.
There are a total of 241884 one-to-many item-attribute records, so that comes about to 241884/35316 ~= 8 average attributes per item. Also the data is mined from the website into a gigantic text file. There is NO “well formed” information to identify an item’s type or category. So if the word “sword” appears on either 3rd or 4th line, it is automatically categorized as sword.
The item might get changed on each new update of the game.
There is no universal attribute shared amongst the item besides `name`
The item data is accessible through a web app. Unclear about what you mean by bits and vectors?
The regular expression is used during data mining stage to clean up the special character and search for specific keyword in order to categorize the items. Also to extract attribute name and value. For example, +15 agility would have string agility extracted as attribute name and 15 as value. (I don’t understand much about question 6 and 6.1. Slog stands for server log here? Translate regexes to SQL?)
Model Diagram
Here is an example how a query looks like
select *
from itemattributestat
where item_itemId=251
item_itemId | attribute_attributeId | value | listOrder
=======================================================
'251', '9', '0', '1'
'251', '558', '0', '2'
'251', '569', '0', '3'
'251', '4', '802', '4'
'251', '583', '21', '5'
'251', '1', '14', '6'
'251', '582', '21', '7'
'251', '556', '99', '8'
'251', '227', '14', '9'
The list order is here to keep track of which attribute should be listed first. For formatting purpose
create view itemDetail as
select Item_itemId as id, i.name as item, a.name as attribute, value
from ((itemattributestat join item as i on Item_itemId=i.itemId)
join attribute as a on Attribute_attributeId=a.attributeId)
order by Item_itemId asc, listOrder asc;
The above view produces the following with
select *
from itemdetail
where id=251;
id | item | attribute | value
'251', 'Aldor Defender''s Legplates', 'Binds when picked up', '0'
'251', 'Aldor Defender''s Legplates', 'Legs', '0'
'251', 'Aldor Defender''s Legplates', 'Plate', '0'
'251', 'Aldor Defender''s Legplates', 'Armor', '802'
'251', 'Aldor Defender''s Legplates', 'Strength', '21'
'251', 'Aldor Defender''s Legplates', 'Agility', '14'
'251', 'Aldor Defender''s Legplates', 'Stamina', '21'
'251', 'Aldor Defender''s Legplates', 'Item Level', '99'
'251', 'Aldor Defender''s Legplates', 'Equip: Improves hit rating by ##.', '14'
An attribute with value 0 means the attribute name represents the item type. 'Equip: Improves hit rating by ##.', '14' ## is place holder here, a processed output on a browser will be 'Equip: Improves hit rating by 14.'
Why do you have an attribute table ?
Attributes are columns, not tables.
The website link tells us nothing.
The whole idea of a database is that you join the many small tables, as required, for each query, so you need to get used to that. Sure, it gives you a grid, but a short and sweet one, without Nulls. What you are trying to do is avoid tables; go with just one massive grid, which is full of Nulls.
(snip)
Do not prefix your attribute names (column names) with the table name, that is redundant. This will become clear to you when you start writing SQL which uses more than one table: then you can use the table name or an alias to prefix any column names that are ambiguous.
The exception is the PK, which is rendered fully, and used in that form wherever it is an FK.
Browse the site, and read some SQL questions.
After doing that, later on, you can think about if you wantstrength and defense to be attributes (columns) of type; or not. Et cetera.
Responses to Comments 30 Nov 10
.
Excellent, you understand your data. Right. Now I understand why you had an Attribute table.
Please make sure those 10 examples are representative, I am looking at them closely.
Type:Gem Name:Emberspark Pendant ... Or, is NeckMiscellaneous a type ?
Is Unique a true ItemType ? I think Not
Action.Display "Please return to a Season Organizer"
Where are the Attrinutes for AttackPower and HitRating ?
.
How many different types of items (of 35,000) are there, ala my Product Cluster example. Another way of stating that question is, how many variations are there. I mean, meaningfully, not 3500 Items ÷ 8 Attributes ?
Will the item_attributes change without a release of s/w (eg. a new Inner Strength attribute) ?
Per Item, what Attributes are repeating (more than one); so far I see only Action ?
It is a game, so you need a db that is tight and very fast, maybe fully memory resident, right. No Nulls. No VAR Anything. Shortest Datatypes. Never Duplicate Anything (Don't Repeat Yourself). Are you happy with bits (booleans) and vectors ?
Do you need to easily translate those regexes into SQL, or are you happy with a serious slog for each (ie. once you get them working in SQL they are pretty stable and then you don't mess with it, unless you find a bug) (no sarcasm, serious question) ?
6.1 Or maybe it is the other way round: the db is disk-resident; you load it into memory once; you run the regexes on that during gameplay; occasionally writing to disk. Therefore there is no need to translate the regexes to SQL ?
Here's a Data Model of where I am heading, this not at all certain; it will be modulated by your answers. To be clear:
Sixth Normal Form is The Row consists of the Primary Key and, at most, one Attribute.
I have drawn (6.1) not (6), because your data reinforces my belief that you need a pure 6NF Relational database
My Product Cluster Data Model, the better-than-EAV example, is 6NF, then Normalised again (Not in the Normal Form sense) by DataType, to reduce no of tables, which you have already seen. (EAV people usually go for one or a few gigantic tables.)
This is straight 5NF, with only the 2 tables on the right in 6NF.
Link to Game Data Model
Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.
Response to Edit #2 05 Dec 10
1.1. Ok, corrected.
1.2. Then IsUnique is an Indicator (boolean) for Item.
1.3. Action. I understand. So Where are you going to store it ?
1.4. NeckMiscellaneous means that item is in both categories of Neck and Misc. That means two separate Item.Name=Emberspark Pendant, each with a different Category.
.
2. and 5. So you do need fast fast memory-resident db. That's why I am trying to get you across the line, away from GridLand, into RelationalLand.
.
3. Ok, we stay with Fifth Normal Form, no need for 6NF or the Product Cluster (tables per Datatype). Sofar the Values are all Integers.
.
4. I can see additionally: Level, RequiredLevel, IsUnique, BindsPickedUp, BindsEquipped.
.
5. Bits are booleans { 0 | 1 }. Vectors are required for (Relational) projections. We will get to them later.
.
6. Ok, you've explained, You are not translating regular expressions to SQL. (Slog means hard labour).
.
7. What is Category.ParentId ? Parent Category ? That has not come up before.
.
8. Attribute.GeneratedId ?
Please evaluate the Data Model (Updated). I have a few more columns, in addition to what you have in yours. If there is anything you do not understand in the Data Model, ask a specific question. You've read the Notation document, right ?
I have Action as a table, with ItemAction holding the Value:
Equip: increase attack power by 28 is Action.Name=Increase attack power by and ItemAction.Value=28.
I think having the data_type column just further complicates the design. Why not simply have type and description be columns on the items table? It stands to reason that every item would have each of those values, and if it doesn't then a null would do just fine in a text column.
You can even further normalize the type by having an item_types table and the type column in items would be a numeric foreign key to that table. Might not be necessary, but might make it easier to key off of the type on the items table.
Edit:
Thinking about this further, it seems like you may be trying to have your data tables match a domain model. Your items would have a series of attributes on them in the logic of the application. This is fine. Keep in mind that your application logic and your database persistence layout can be different. In fact, they should not rely on each other at all at a design level. In many small applications they will likely be the same. But there are exceptions. Code (presumably object-oriented, but not necessarily) and relational data have different designs and different limitations. De-coupling them from one another allows the developer to take advantage of their designs rather than be hindered by their limitations.
You are dealing with a two common problems:
Entities that are similar to each other but not identical (all items have a name and description, but not necessarily an intellect).
A design in which you need to add attributes once the database is in production (you can pretty easily predict that at some point you'll need to add, for instance, a magic-resistance attribute to some items).
You've solved your problem by reinventing the EAV system in which you store both attribute names and values as data. And you've rediscovered some of the problems with this system (type checking, relational integrity).
In this case, I'd personally go with a solution midway between the relational one and the EAV one. I'd take the common columns and add them, as columns, to either the items table or, if items represents kinds of items, not individual ones, to the items_owners table. Those columns would include description and possibly type and in the example you gave, pretty much match up with the text columns. I'd then keep the existing layout for those attributes that are numerical ratings, making the value type int. That gives you type-checking and proper normalization across the integer attributes (you won't be storing lots of NULLs) at the expense of the occasional NULL type or description.
Related
Say that I have an e-commerce that 100 kind of products. Now, I'm offering a voucher for the customer. The voucher can be used for some of the products or all products.
So, the table I have now is like this:
| Voucher |
------------------
| id |
| voucher_number |
| created_at |
| expired_date |
| status | (available, unavailable)
| Voucher_detail |
------------------
| id |
| id_voucher |
| product_id |
So, the question is, if the voucher is set to be available for all products. There will be 100 records in voucher_detail because there are 100 products. Isn't it a waste, because the voucher will only be used for one products.
Or there are another database design that is better than this one?
Well, I don't think there is a better design for your situation. I mean, when designing the database, it should fit all the use-cases, and this way, you cover them all.
Of course, when having a voucher for all the products it needs 100 rows, but this perfectly suits the situation where you have the voucher for like 5 products, and it is a secure and sure way to know exactly each voucher for what products it can be used.
Was thinking, maybe you could have all the products for the voucher, in a column, separated by ',' and split them when reading (if you really care about size) but that just doesn't feel right.
You probably don't generate more than a small subset of the 2^100-1 possible combinations, correct? Perhaps you have only a few, and they can be itemized as "all", "clothing", "automotive", etc.? If so ponder working around that.
If you really need true/false for 100 things, consider the SET datatype. Since it is restricted to 64 bits, you will need at least 2 SET columns. Again, you may decide to 'group' the bits in some logical way. At one bit per item, we are talking about around a dozen bytes for the entire 100 choices.
"all" would probably be represented as every bit being on.
With the SET approach, you would have string names for each of th 100 cases. The SET would be any combination of them. Please study the manual on SET; the manipulations are a bit strange.
I'm trying to design a first serious database and I arrived upon a problem. Here's a quick overview – I have event reports in .csv files, which I parse. They contain user card numbers, points and final places. I have no problem filling that data into my database, but I would also like to store in which events the users exist and how many points they have in each of them, instead of just adding up to a total sum of points across all their events.
So I thought I would make a table called events, which would have a column id (primary key, a_i), event_name (to be displayed on frontend next to respective points) and event_date (same thing, basically).
The next table would be called users, and it would have a column card_number (unique key), total_points (to display the total sum of all points), event_ids and points. There lies the problem – I would like to have in the last two columns a list of comma-separated values, which would "tie-in" with each other. Example:
`event_ids` = (2, 4, 12, 43)
`points` = (202, 11, 444, 1)
So that when I get this info in frontend, I would just loop through these values and get the event_id (which would be the same as id in the events table) and get all the info I need. However, it seems that list of values in a column in MySQL database is a big NO-NO.
So, how to do this right? I hope you understand my problem, thank you in advance. I'd like to begin with best practices. Any help please? I would like an example..
I'm thinking something like this:
user_id | event_id | points
3 | 5 | 250
3 | 12 | 120
3 | 1 | 200
3 | 52 | 40
6 | 2 | 101
6 | 5 | 3
How would I do this?
Which of the following options, if any, is considered best practice when designing a table used to store user settings?
(OPTION 1)
USER_SETTINGS
-Id
-Code (example "Email_LimitMax")
-Value (example "5")
-UserId
(OPTION 2)
create a new table for each setting where, for example, notification settings would require you to create:
"USER_ALERT_SETTINGS"
-Id
-UserId
-EmailAdded (i.e true)
-EmailRemoved
-PasswordChanged
...
...
"USER_EMAIL_SETTINGS"
-Id
-UserId
-EmailLimitMax
....
(OPTION 3)
"USER"
-Name
...
-ConfigXML
Other answers have ably outlined the pros and cons of your various options.
I believe that your Option 1 (property bag) is the best overall design for most applications, especially if you build in some protections against the weaknesses of propety bags.
See the following ERD:
In the above ERD, the USER_SETTING table is very similar to OP's. The difference is that instead of varchar Code and Value columns, this design has a FK to a SETTING table which defines the allowable settings (Codes) and two mutually exclusive columns for the value. One option is a varchar field that can take any kind of user input, the other is a FK to a table of legal values.
The SETTING table also has a flag that indicates whether user settings should be defined by the FK or by unconstrained varchar input. You can also add a data_type to the SETTING to tell the system how to encode and interpret the USER_SETTING.unconstrained_value. If you like, you can also add the SETTING_GROUP table to help organize the various settings for user-maintenance.
This design allows you to table-drive the rules around what your settings are. This is convenient, flexible and easy to maintain, while avoiding a free-for-all.
EDIT: A few more details, including some examples...
Note that the ERD, above, has been augmented with more column details (range values on SETTING and columns on ALLOWED_SETTING_VALUE).
Here are some sample records for illustration.
SETTING:
+----+------------------+-------------+--------------+-----------+-----------+
| id | description | constrained | data_type | min_value | max_value |
+----+------------------+-------------+--------------+-----------+-----------+
| 10 | Favourite Colour | true | alphanumeric | {null} | {null} |
| 11 | Item Max Limit | false | integer | 0 | 9001 |
| 12 | Item Min Limit | false | integer | 0 | 9000 |
+----+------------------+-------------+--------------+-----------+-----------+
ALLOWED_SETTING_VALUE:
+-----+------------+--------------+-----------+
| id | setting_id | item_value | caption |
+-----+------------+--------------+-----------+
| 123 | 10 | #0000FF | Blue |
| 124 | 10 | #FFFF00 | Yellow |
| 125 | 10 | #FF00FF | Pink |
+-----+------------+--------------+-----------+
USER_SETTING:
+------+---------+------------+--------------------------+---------------------+
| id | user_id | setting_id | allowed_setting_value_id | unconstrained_value |
+------+---------+------------+--------------------------+---------------------+
| 5678 | 234 | 10 | 124 | {null} |
| 7890 | 234 | 11 | {null} | 100 |
| 8901 | 234 | 12 | {null} | 1 |
+------+---------+------------+--------------------------+---------------------+
From these tables, we can see that some of the user settings which can be determined are Favourite Colour, Item Max Limit and Item Min Limit. Favourite Colour is a pick list of alphanumerics. Item min and max limits are numerics with allowable range values set. The SETTING.constrained column determines whether users are picking from the related ALLOWED_SETTING_VALUEs or whether they need to enter a USER_SETTING.unconstrained_value. The GUI that allows users to work with their settings needs to understand which option to offer and how to enforce both the SETTING.data_type and the min_value and max_value limits, if they exist.
Using this design, you can table drive the allowable settings including enough metadata to enforce some rudimentary constraints/sanity checks on the values selected (or entered) by users.
EDIT: Example Query
Here is some sample SQL using the above data to list the setting values for a given user ID:
-- DDL and sample data population...
CREATE TABLE SETTING
(`id` int, `description` varchar(16)
, `constrained` varchar(5), `data_type` varchar(12)
, `min_value` varchar(6) NULL , `max_value` varchar(6) NULL)
;
INSERT INTO SETTING
(`id`, `description`, `constrained`, `data_type`, `min_value`, `max_value`)
VALUES
(10, 'Favourite Colour', 'true', 'alphanumeric', NULL, NULL),
(11, 'Item Max Limit', 'false', 'integer', '0', '9001'),
(12, 'Item Min Limit', 'false', 'integer', '0', '9000')
;
CREATE TABLE ALLOWED_SETTING_VALUE
(`id` int, `setting_id` int, `item_value` varchar(7)
, `caption` varchar(6))
;
INSERT INTO ALLOWED_SETTING_VALUE
(`id`, `setting_id`, `item_value`, `caption`)
VALUES
(123, 10, '#0000FF', 'Blue'),
(124, 10, '#FFFF00', 'Yellow'),
(125, 10, '#FF00FF', 'Pink')
;
CREATE TABLE USER_SETTING
(`id` int, `user_id` int, `setting_id` int
, `allowed_setting_value_id` varchar(6) NULL
, `unconstrained_value` varchar(6) NULL)
;
INSERT INTO USER_SETTING
(`id`, `user_id`, `setting_id`, `allowed_setting_value_id`, `unconstrained_value`)
VALUES
(5678, 234, 10, '124', NULL),
(7890, 234, 11, NULL, '100'),
(8901, 234, 12, NULL, '1')
;
And now the DML to extract a user's settings:
-- Show settings for a given user
select
US.user_id
, S1.description
, S1.data_type
, case when S1.constrained = 'true'
then AV.item_value
else US.unconstrained_value
end value
, AV.caption
from USER_SETTING US
inner join SETTING S1
on US.setting_id = S1.id
left outer join ALLOWED_SETTING_VALUE AV
on US.allowed_setting_value_id = AV.id
where US.user_id = 234
See this in SQL Fiddle.
Option 1 (as noted, "property bag") is easy to implement - very little up-front analysis. But it has a bunch of downsides.
If you want to restrain the valid values for UserSettings.Code, you need an auxiliary table for the list of valid tags. So you have either (a) no validation on UserSettings.Code – your application code can dump any value in, missing the chance to catch bugs, or you have to add maintenance on the new list of valid tags.
UserSettings.Value probably has a string data type to accommodate all the different values that might go into it. So you have lost the true data type – integer, Boolean, float, etc., and the data type checking that would be done by the RDMBS on insert of an incorrect values. Again, you have bought yourself a potential QA problem. Even for string values, you have lost the ability to constrain the length of the column.
You cannot define a DEFAULT value on the column based on the Code. So if you wanted EmailLimitMax to default to 5, you can’t do it.
Similarly, you can’t put a CHECK constraint on the Values column to prevent invalid values.
The property bag approach loses validation of SQL code. In the named column approach, a query that says “select Blah from UserSettings where UserID = x” will get a SQL error if Blah does not exist. If the SELECT is in a stored procedure or view, you will get the error when you apply the proc/view – way before the time the code goes to production. In the property bag approach, you just get NULL. So you have lost another automatic QA feature provided by the database, and introduced a possible undetected bug.
As noted, a query to find a UserID where conditions apply on multiple tags becomes harder to write – it requires one join into the table for each condition being tested.
Unfortunately, the Property Bag is an invitation for application developers to just stick a new Code into the property bag without analysis of how it will be used in the rest of application. For a large application, this becomes a source of “hidden” properties because they are not formally modeled. It’s like doing your object model with pure tag-value instead of named attributes: it provides an escape valve, but you’re missing all the help the compiler would give you on strongly-typed, named attributes. Or like doing production XML with no schema validation.
The column-name approach is self-documenting. The list of columns in the table tells any developer what the possible user settings are.
I have used property bags; but only as an escape valve and I have often regretted it. I have never said “gee, I wish I had made that explicit column be a property bag.”
Consider this simple example.
If you have 2 tables, UserTable(contains user details) and
SettingsTable(contains settings details). Then create a new table UserSettings for relating the UserTable and SettingsTable as shown below
Hope you will found the right solution from this example.
Each option has its place, and the choice depends on your specific situation. I am comparing the pros and cons for each option below:
Option 1: Pros:
Can handle many options
New options can easily be added
A generic interface can be developed to manage the options
Option 1: Cons
When a new option is added, its more complex to update all user accounts with the new option
Option names can spiral out of control
Validation of allowed option values is more complex, additional meta data is needed for that
Option 2: Pros
Validation of each option is easier than option 1 since each option is an individual column
Option 2: Cons
A database update is required for each new option
With many options the database tables could become more difficult to use
It's hard to evaluate "best" because it depends on the kind of queries you want to run.
Option 1 (commonly known as "property bag", "name value pairs" or "entity-attribute-value" or EAV) makes it easy to store data whose schema you don't know in advance. However, it makes it hard - or sometimes impossible - to run common relational queries. For instance, imagine running the equivalent of
select count(*)
from USER_ALERT_SETTINGS
where EmailAdded = 1
and Email_LimitMax > 5
This would rapidly become very convoluted, especially because your database engine may not compare varchar fields in a numerically meaningful way (so "> 5" may not work the way you expect).
I'd work out the queries you want to run, and see which design supports those queries best. If all you have to do is check limits for an individual user, the property bag is fine. If you have to report across all users, it's probably not.
The same goes for JSON or XML - it's okay for storing individual records, but makes querying or reporting over all users harder. For instance, imagine searching for the configuration settings for email adress "bob#domain.com" - this would require searching through all XML documents to find the node "email address".
I have two tables:
Avatars:
Id | UserId | Name | Size
-----------------------------------------------
1 | 2 | 124.png | Large
2 | 2 | 124_thumb.png | Thumb
Profiles:
Id | UserId | Location | Website
-----------------------------------------------
1 | 2 | Dallas, Tx | www.example.com
These tables could be merged into something like:
User Meta:
Id | UserId | MetaKey | MetaValue
-----------------------------------------------
1 | 2 | location | Dallas, Tx
2 | 2 | website | www.example.com
3 | 2 | avatar_lrg | 124.png
4 | 2 | avatar_thmb | 124_thumb.png
This to me could be a cleaner, more flexible setup (at least at first glance). For instance, if I need to allow a "user status message", I can do so without touching the database.
However, the user's avatars will be pulled far more than their profile information.
So I guess my real questions are:
What king of performance hit would this produce?
Is merging these tables just a really bad idea?
This is almost always a bad idea. What you are doing is a form of the Entity Attribute Value model. This model is sometimes necessary when a system needs a flexible attribute system to allow the addition of attributes (and values) in production.
This type of model is essentially built on metadata in lieu of real relational data. This can lead to referential integrity issues, orphan data, and poor performance (depending on the amount of data in question).
As a general matter, if your attributes are known up front, you want to define them as real data (i.e. actual columns with actual types) as opposed to string-based metadata.
In this case, it looks like users may have one large avatar and one small avatar, so why not make those columns on the user table?
We have a similar type of table at work that probably started with good intentions, but is now quite the headache to deal with. This is because it now has 100s of different "MetaKeys", and there is no good documentation about what is allowed and what each does. You basically have to look at how each is used in the code and figure it out from there. Thus, figure out how you will document this for future developers before you go down that route.
Also, to retrieve all the information about each user it is no longer a 1-row query, but an n-row query (where n is the number of fields on the user). Also, once you have that data, you have to post-process each of those based on your meta-key to get the details about your user (which usually turns out to be more of a development effort because you have to do a bunch of String comparisons). Next, many databases only allow a certain number of rows to be returned from a query, and thus the number of users you can retrieve at once is divided by n. Last, ordering users based on information stored this way will be much more complicated and expensive.
In general, I would say that you should make any fields that have specialized functionality or require ordering to be columns in your table. Since they will require a development effort anyway, you might as well add them as an extra column when you implement them. I would say your avatar pics fall into this category, because you'll probably have one of each, and will always want to display the large one in certain places and the small one in others. However, if you wanted to allow users to make their own fields, this would be a good way to do this, though I would make it another table that can be joined to from the user table. Below are the tables I'd suggest. I assume that "Status" and "Favorite Color" are custom fields entered by user 2:
User:
| Id | Name |Location | Website | avatarLarge | avatarSmall
----------------------------------------------------------------------
| 2 | iPityDaFu |Dallas, Tx | www.example.com | 124.png | 124_thumb.png
UserMeta:
Id | UserId | MetaKey | MetaValue
-----------------------------------------------
1 | 2 | Status | Hungry
2 | 2 | Favorite Color | Blue
I'd stick with the original layout. Here are the downsides of replacing your existing table structure with a big table of key-value pairs that jump out at me:
Inefficient storage - since the data stored in the metavalue column is mixed, the column must be declared with the worst-case data type, even if all you would need to hold is a boolean for some keys.
Inefficient searching - should you ever need to do a lookup from the value in the future, the mishmash of data will make indexing a nightmare.
Inefficient reading - reading a single user record now means doing an index scan for multiple rows, instead of pulling a single row.
Inefficient writing - writing out a single user record is now a multi-row process.
Contention - having mixed your user data and avatar data together, you've forced threads that only one care about one or the other to operate on the same table, increasing your risk of running into locking problems.
Lack of enforcement - your data constraints have now moved into the business layer. The database can no longer ensure that all users have all the attributes they should, or that those attributes are of the right type, etc.
i am using a simple database design and i think the best database example is e-commerce, because it does have a lot of problems, and its familiar to cms.
USERS TABLE
UID |int | PK NN UN AI
username |varchar(45) | UQ INDEX
password |varchar(100) | 100 varchar for $6$rounds=5000$ crypt php sha512
name |varchar(100) | 45 for first name 45 for last 10 for spaces
gender |bit | UN ,0 for women 1 for men, lol.
phone |varchar(30) | see [2]
email |varchar(255) | see RFC 5322 [1]
verified |tinyint | UN INDEX
timezone |tinyint | -128 to 127 just engough for +7 -7 or -11 +11 UTC
timeregister |int | 31052010112030 for 31-05-2010 11:20:30
timeactive |int | 01062010110020 for 1-06-2010 11:00:20
COMPANY TABLE
CID |int | PK NN UN AI
name |varchar(45) |
address |varchar(100) | not quite sure about 100.
email |varchar(255) | see users.email, this is for the offcial email
phone |varchar(30) | see users.phone
link |varchar(255) | for www.website.com/companylink 255 is good.
imagelogo |varchar(255) | for the retrieving image logo & storing
imagelogosmall |varchar(255) | not quite good nameing huh? let see the comments
yahoo |varchar(100) | dont know
linkin |varchar(100) | dont know
twitter |varchar(100) | twitter have 100 max username? is that true?
description |TEXT | or varchar[30000] for company descriptions
shoutout |varchar(140) | status that companies can have.
verified |tinyint | UN INDEX
PRODUCT TABLE
PID |int | PK NN UN AI
CID |int | from who?santa? FK: company.cid cascade delete
name |varchar(100) | longest productname maybe hahaha.
description |TEXT | still confused useing varchar[30000]
imagelarge |varchar(255) | for the retrieving product image & storing
imagesmall |varchar(255) | for the retrieving small product image & storing
tag |varchar(45) | for tagging like stackoverflow ( index )
price |decimal(11,2) | thats in Zimbabwe dollar.
see Using a regular expression to validate an email address
see What's the longest possible worldwide phone number I should consider in SQL varchar(length) for phone
why innodb specific ?
please see quote How to choose optimized datatypes for columns [innodb specific]?
its getting of the topic so i have to create another question and it people doent understand what im trying to say, or maybe i cant explain what i want there . this time its + database design.
so again please copy the example above and put your changes + comments just like the example. give an advice about it.
Remember for INNODB mysql. read the quote on above link. thanks.
I'm going to answer this as if you're asking for advice on column definitions. I will not copy and paste your tables, because that's completely silly.
Don't store dates and times as integers. Use a DATETIME column instead.
Keep in mind that MySQL stores DATETIMEs as GMT, but presents them in whatever timezone it's been configured to use. It may be worth setting the connection time zone to GMT so that your separate time zone storage will work.
Keep in mind that not all time zones are full hour offsets from GMT. Daylight Saving Time can throw a monkey wrench in hour-based calculations as well. You may want to store the time zone string (i.e. "America/Los_Angeles") and figure out the proper offset at runtime.
You do not need to specify a character count for integer columns.
Don't be afraid of TEXT columns. You have a lot of VARCHAR(255)s for data that can easily be longer than 255 characters, like a URL.
Keep in mind that optimizing for a specific database engine, or optimizing for storage on disk is the very last thing you should do. Make your column definitions fit the data. Premature optimization is one of the many roots of all evil in this world. Don't worry about tinyint vs smallint vs int vs bigint, or char vs varchar vs text vs bigtext. It won't matter for you 95% of the time as long as the data fits.
You should store all your dates/times in GMT. It's a best practice to convert them to 0 offset and then convert them back to whatever local time zone offset the user is currently in for display.
The maximum length of a URL in Internet Explorer (the lowest common denominator) is 2,000 characters (just use TEXT).
You don't set lengths on INT types (take them off!). INT is 32 bits (-2147483648 to 2147483647), BIGINT is 64 bits, TINYINT is 8 bits.
For bool/flags you can use BIT which is 1 or 0 (your "verified" for example)
VARCHAR(255) might be too small for "imagelarge" and "imagesmall" if it is to includes the image file name and path (see above for max URL length).
If you are confused on how big a VARCHAR is too big and when to start using TEXT, just use TEXT!
10.2. Numeric Types
USERS TABLE
UID |int(11) | PK as primery key ? NN as not null ? UN AI
username |varchar(45) | UQ
password |varchar(200) | 200 is better.
name |varchar(100) | ok
gender |blob | f and M
phone |varchar(30) |
email |varchar(300) | thats 256 , put 300 insted
verified |tinyint(1) | UN
timezone(delate) |datetime | this should be a php job
timeregister |datetime |
timeactive |datetime |
COMPANY TABLE
CID |int(11) | PK NN UN AI
name |varchar(45) |
address |varchar(100) | 100 is fine
email |varchar(255) |
phone |varchar(30) |
link |varchar(255) |
imagelogo |varchar(255) |
imagelogosmall |varchar(255) | the nameing is just fine for me
yahoo |varchar(100) | see 3.
linkin |varchar(100) | linkin use real names, maybe 100 is great
twitter |varchar(20) | its 20 ( maybe 15 )
description |TEXT |
shoutout |varchar(140) | seems ok.
verified |tinyint(1) | UN
PRODUCT TABLE
PID |int(11) | PK NN UN AI
CID |int(11) | FK: company.cid cascade delete & update
name |varchar(100) |
description |TEXT |
imagelarge |varchar(255) |
imagesmall |varchar(255) |
tag |varchar(45) |
price |decimal(11,2) |
in php see php.net/manual/en/function.date-default-timezone-set.php
$time = date_default_timezone_set('Region/Town');
$time = date( 'Y-m-d H:i:s' );
echo $time;
http://www.eph.co.uk/resources/email-address-length-faq/
For what it's worth, the integer argument (e.g. INT(11)) is not meaningful for storage or optimization in any way. The argument does not indicate a max length or max range of values, it's only a hint for display. This confuses a lot of MySQL users, perhaps because they're used to CHAR(11) indicating max length. Not so with integers. TINYINT(1) and TINYINT(11) and TINYINT(255) are stored identically as an 8-bit integer, and they have the same range of values.
The max length of an email address is 320 characters. 64 for the local part, 1 for #, and 255 for the domain.
I am not a fan of using VARCHAR(255) as a default string declaration. Why is 255 the right length? Is 254 just not long enough and 256 seems too much? The answer is that people believe that the length of each string is stored somewhere, and by limiting the length to 255 they can ensure that the length only takes 1 byte. They've "optimized" by allowing as long a string as they can while still keeping the length to 1 byte.
In fact, the length of the field is not stored in InnoDB. The offset of each field in the row is stored (see MySQL Internals InnoDB). If your total row length is 255 or less, the offsets use 1 byte. If your total row length could be longer than 255, the offsets use 2 bytes. Since you have several long fields in your row, it's almost certain to store the offsets in two bytes anyway. The ubiquitous value 255 may be optimized for some other RDBMS implementation, but not InnoDB.
Also, MySQL converts rows to a fixed-length format in some cases, padding variable-length fields as necessary. For example, when copying rows to a sort buffer, or storing in a MEMORY table, or preparing the buffer for a result set, it has to allocate memory based on the maximum length of the columns, not the length of usable data on a per-row basis. So if you declare VARCHAR columns far longer than you ever actually use, you're wasting memory in those cases.
This points out the hazard of trying to optimize too finely for a particular storage format.