Database design for user settings - mysql

Which of the following options, if any, is considered best practice when designing a table used to store user settings?
(OPTION 1)
USER_SETTINGS
-Id
-Code (example "Email_LimitMax")
-Value (example "5")
-UserId
(OPTION 2)
create a new table for each setting where, for example, notification settings would require you to create:
"USER_ALERT_SETTINGS"
-Id
-UserId
-EmailAdded (i.e true)
-EmailRemoved
-PasswordChanged
...
...
"USER_EMAIL_SETTINGS"
-Id
-UserId
-EmailLimitMax
....
(OPTION 3)
"USER"
-Name
...
-ConfigXML

Other answers have ably outlined the pros and cons of your various options.
I believe that your Option 1 (property bag) is the best overall design for most applications, especially if you build in some protections against the weaknesses of propety bags.
See the following ERD:
In the above ERD, the USER_SETTING table is very similar to OP's. The difference is that instead of varchar Code and Value columns, this design has a FK to a SETTING table which defines the allowable settings (Codes) and two mutually exclusive columns for the value. One option is a varchar field that can take any kind of user input, the other is a FK to a table of legal values.
The SETTING table also has a flag that indicates whether user settings should be defined by the FK or by unconstrained varchar input. You can also add a data_type to the SETTING to tell the system how to encode and interpret the USER_SETTING.unconstrained_value. If you like, you can also add the SETTING_GROUP table to help organize the various settings for user-maintenance.
This design allows you to table-drive the rules around what your settings are. This is convenient, flexible and easy to maintain, while avoiding a free-for-all.
EDIT: A few more details, including some examples...
Note that the ERD, above, has been augmented with more column details (range values on SETTING and columns on ALLOWED_SETTING_VALUE).
Here are some sample records for illustration.
SETTING:
+----+------------------+-------------+--------------+-----------+-----------+
| id | description | constrained | data_type | min_value | max_value |
+----+------------------+-------------+--------------+-----------+-----------+
| 10 | Favourite Colour | true | alphanumeric | {null} | {null} |
| 11 | Item Max Limit | false | integer | 0 | 9001 |
| 12 | Item Min Limit | false | integer | 0 | 9000 |
+----+------------------+-------------+--------------+-----------+-----------+
ALLOWED_SETTING_VALUE:
+-----+------------+--------------+-----------+
| id | setting_id | item_value | caption |
+-----+------------+--------------+-----------+
| 123 | 10 | #0000FF | Blue |
| 124 | 10 | #FFFF00 | Yellow |
| 125 | 10 | #FF00FF | Pink |
+-----+------------+--------------+-----------+
USER_SETTING:
+------+---------+------------+--------------------------+---------------------+
| id | user_id | setting_id | allowed_setting_value_id | unconstrained_value |
+------+---------+------------+--------------------------+---------------------+
| 5678 | 234 | 10 | 124 | {null} |
| 7890 | 234 | 11 | {null} | 100 |
| 8901 | 234 | 12 | {null} | 1 |
+------+---------+------------+--------------------------+---------------------+
From these tables, we can see that some of the user settings which can be determined are Favourite Colour, Item Max Limit and Item Min Limit. Favourite Colour is a pick list of alphanumerics. Item min and max limits are numerics with allowable range values set. The SETTING.constrained column determines whether users are picking from the related ALLOWED_SETTING_VALUEs or whether they need to enter a USER_SETTING.unconstrained_value. The GUI that allows users to work with their settings needs to understand which option to offer and how to enforce both the SETTING.data_type and the min_value and max_value limits, if they exist.
Using this design, you can table drive the allowable settings including enough metadata to enforce some rudimentary constraints/sanity checks on the values selected (or entered) by users.
EDIT: Example Query
Here is some sample SQL using the above data to list the setting values for a given user ID:
-- DDL and sample data population...
CREATE TABLE SETTING
(`id` int, `description` varchar(16)
, `constrained` varchar(5), `data_type` varchar(12)
, `min_value` varchar(6) NULL , `max_value` varchar(6) NULL)
;
INSERT INTO SETTING
(`id`, `description`, `constrained`, `data_type`, `min_value`, `max_value`)
VALUES
(10, 'Favourite Colour', 'true', 'alphanumeric', NULL, NULL),
(11, 'Item Max Limit', 'false', 'integer', '0', '9001'),
(12, 'Item Min Limit', 'false', 'integer', '0', '9000')
;
CREATE TABLE ALLOWED_SETTING_VALUE
(`id` int, `setting_id` int, `item_value` varchar(7)
, `caption` varchar(6))
;
INSERT INTO ALLOWED_SETTING_VALUE
(`id`, `setting_id`, `item_value`, `caption`)
VALUES
(123, 10, '#0000FF', 'Blue'),
(124, 10, '#FFFF00', 'Yellow'),
(125, 10, '#FF00FF', 'Pink')
;
CREATE TABLE USER_SETTING
(`id` int, `user_id` int, `setting_id` int
, `allowed_setting_value_id` varchar(6) NULL
, `unconstrained_value` varchar(6) NULL)
;
INSERT INTO USER_SETTING
(`id`, `user_id`, `setting_id`, `allowed_setting_value_id`, `unconstrained_value`)
VALUES
(5678, 234, 10, '124', NULL),
(7890, 234, 11, NULL, '100'),
(8901, 234, 12, NULL, '1')
;
And now the DML to extract a user's settings:
-- Show settings for a given user
select
US.user_id
, S1.description
, S1.data_type
, case when S1.constrained = 'true'
then AV.item_value
else US.unconstrained_value
end value
, AV.caption
from USER_SETTING US
inner join SETTING S1
on US.setting_id = S1.id
left outer join ALLOWED_SETTING_VALUE AV
on US.allowed_setting_value_id = AV.id
where US.user_id = 234
See this in SQL Fiddle.

Option 1 (as noted, "property bag") is easy to implement - very little up-front analysis. But it has a bunch of downsides.
If you want to restrain the valid values for UserSettings.Code, you need an auxiliary table for the list of valid tags. So you have either (a) no validation on UserSettings.Code – your application code can dump any value in, missing the chance to catch bugs, or you have to add maintenance on the new list of valid tags.
UserSettings.Value probably has a string data type to accommodate all the different values that might go into it. So you have lost the true data type – integer, Boolean, float, etc., and the data type checking that would be done by the RDMBS on insert of an incorrect values. Again, you have bought yourself a potential QA problem. Even for string values, you have lost the ability to constrain the length of the column.
You cannot define a DEFAULT value on the column based on the Code. So if you wanted EmailLimitMax to default to 5, you can’t do it.
Similarly, you can’t put a CHECK constraint on the Values column to prevent invalid values.
The property bag approach loses validation of SQL code. In the named column approach, a query that says “select Blah from UserSettings where UserID = x” will get a SQL error if Blah does not exist. If the SELECT is in a stored procedure or view, you will get the error when you apply the proc/view – way before the time the code goes to production. In the property bag approach, you just get NULL. So you have lost another automatic QA feature provided by the database, and introduced a possible undetected bug.
As noted, a query to find a UserID where conditions apply on multiple tags becomes harder to write – it requires one join into the table for each condition being tested.
Unfortunately, the Property Bag is an invitation for application developers to just stick a new Code into the property bag without analysis of how it will be used in the rest of application. For a large application, this becomes a source of “hidden” properties because they are not formally modeled. It’s like doing your object model with pure tag-value instead of named attributes: it provides an escape valve, but you’re missing all the help the compiler would give you on strongly-typed, named attributes. Or like doing production XML with no schema validation.
The column-name approach is self-documenting. The list of columns in the table tells any developer what the possible user settings are.
I have used property bags; but only as an escape valve and I have often regretted it. I have never said “gee, I wish I had made that explicit column be a property bag.”

Consider this simple example.
If you have 2 tables, UserTable(contains user details) and
SettingsTable(contains settings details). Then create a new table UserSettings for relating the UserTable and SettingsTable as shown below
Hope you will found the right solution from this example.

Each option has its place, and the choice depends on your specific situation. I am comparing the pros and cons for each option below:
Option 1: Pros:
Can handle many options
New options can easily be added
A generic interface can be developed to manage the options
Option 1: Cons
When a new option is added, its more complex to update all user accounts with the new option
Option names can spiral out of control
Validation of allowed option values is more complex, additional meta data is needed for that
Option 2: Pros
Validation of each option is easier than option 1 since each option is an individual column
Option 2: Cons
A database update is required for each new option
With many options the database tables could become more difficult to use

It's hard to evaluate "best" because it depends on the kind of queries you want to run.
Option 1 (commonly known as "property bag", "name value pairs" or "entity-attribute-value" or EAV) makes it easy to store data whose schema you don't know in advance. However, it makes it hard - or sometimes impossible - to run common relational queries. For instance, imagine running the equivalent of
select count(*)
from USER_ALERT_SETTINGS
where EmailAdded = 1
and Email_LimitMax > 5
This would rapidly become very convoluted, especially because your database engine may not compare varchar fields in a numerically meaningful way (so "> 5" may not work the way you expect).
I'd work out the queries you want to run, and see which design supports those queries best. If all you have to do is check limits for an individual user, the property bag is fine. If you have to report across all users, it's probably not.
The same goes for JSON or XML - it's okay for storing individual records, but makes querying or reporting over all users harder. For instance, imagine searching for the configuration settings for email adress "bob#domain.com" - this would require searching through all XML documents to find the node "email address".

Related

Horizontal scaling issues with row locking on a permissions system

The requirement
I am currently building a permissions system. One of the requirements is that it is horizontally scalable.
To achieve this, we have done the following
There is a single "compiled resource permission" table that looks something like this:
| user_id | resource_id | reason |
| 1 | 1 | 1 |
| 1 | 2 | 3 |
| 2 | 1 | 2 |
The structure of this table denotes user 1 has accesses to resource 1 & 2, and user 2 has access only to resource 1.
The "reason" column is a bitwise number which has bits switched on depending on "Why" they have that permission. Binary bit "1" denotes they are an admin, and binary bit "2" denotes they created the resource.
So user 1 has access to resource 1 because they are an admin. He has access to resource 2 because he is an admin and he created that resource. If he was no longer admin, he would still have access to ticket 2 but not ticket 1.
To figure out what needs to go into this table, we use a "patcher" class that programmatically loops around the users & resources passed to it and logically looks at all DB tables necessary to figure out what rows need adding and what rows need removing from the table.
How we are trying to scale and the problem
To horizontally scale this, we split the logic into chunks and give it to a number of "workers" on an async queue
This only seems to scale so far before it no longer speeds up or sometimes even row locking happens which slows it down.
Is there a particular type of row lock we can use to allow it to scale indefinitely?
Are we approaching this from completely the wrong angle? We have a lot of "Reasons" and a lot of complex permission logic that we need to be able to recompile fairly quickly
SQL Queries that run concurrently, for reference
When we are "adding" reasons:
INSERT INTO `compiled_permissions` (`user_id`, `resource_id`, `reason`) VALUES ((1,1,1), (1,2,3), (2,1,2)) ON DUPLICATE KEY UPDATE `reason` = `reason` | VALUES(`reason`);
When we are "removing" reasons:
UPDATE `compiled_permissions` SET `reason` = `reason` & ~ (CASE
(user_id = 1 AND resource_id = 1 THEN 2 ... CASE FOR EVERY "REASON REMOVAL")
ELSE `reason`
END)
WHERE (`user_id`, `resource_id`) IN ((1,1),(1,2) .. ETC )

Database design. How to approach this please?

I'm designing a database (MySQL) that will manage a fleet of vehicles.
Company has many garages across the city, at each garage, vehicles gets serviced (operation). An operation can be any of 3 types of services.
Table Vehicle, Table Garagae, Table Operation, Table Operation Type 1, Table Operation Type 2, Table Operation type 3.
Each Operation has the vehicle ID, garage ID, but how do I link it to the the other tables (service tables) depending on which type of service the user chooses?
I would also like to add a billing table, but I'm lost at how to design the relationship between these tables.
If I have fully understood it I would suggest something like this (first of all you shouldn't have three operation tables):
Vehicles Table
- id
- garage_id
Garages Table
- id
Operations/Services Table
- id
- vehicle_id
- garage_id
- type
Customer Table
- id
- service_id
billings Table
- id
- customer_id
You need six tables:
vechicle: id, ...
garage: id, ...
operation: id, vechicle_id, garage_id, operation_type (which can be
one of the tree options/operations available, with the possibility to be extended)
customer: id, ...
billing: id, customer_id, total_amount
billingoperation: id, billing_id, operation_id, item_amount
You definitely should not creat three tables for operations. In the future if you would like to introduce a new operation that would involve creating a new table in the database.
For the record, I disagree with everyone who is saying you shouldn't have multiple operation tables. I think that's perfectly fine, as long as it is done properly. In fact, I'm doing that with one of my products right now.
If I understand, at the core of your question, you're asking how to do table inheritance, because Op Type 1 and Op Type 2 (etc.) IS A Operation. The short answer is that you can't. The longer answer is that you can't...at least not without some helper logic.
I assume you have some sort of program that will pull data from the database, rather than you just writing sql commands by hand. Working under that assumption, let's use this as a subset of your database:
Garage
------
GarageId | GarageLocation | etc.
---------|----------------|------
1 | 123 Main St. | XX
Operation
---------
OperationId | GarageId | TimeStarted | TimeEnded | OperationTypeDescId | OperationTypeId
------------|----------|-------------|-----------|---------------------|----------------
2 | 1 | noon | NULL | 2 | 2
OperationTypeDesc
-------------
OperationTypeDescId | Name | Description
--------------------|-------|-------------------------
1 | OpFoo | Do things with the stuff
2 | OpBar | Do stuff with the things
OpFoo
-----
OpID | Thing1 | Thing2
-----|--------|-------
1 | 123 | abc
OpBar
-----
OpID | Stuff1 | Stuff2
-----|--------|-------
1 | 456 | def
2 | 789 | ghi
Using this setup, you have the following information:
A garage has it's information, plain and simple
An operation has a unique ID (OperationId), a garage where it was executed, an ID referencing the description of the operation, and the OperationType ID (more on this in a moment).
A pre-populated table of operation types. Each type has a unique ID (OperationTypeDescId), the name of the operation, and a human-readable description of what that operation is.
1 table for each row in OperationTypeDesc. For convenience, the table name should be the same as the Name column
Now we can begin to see where inheritance comes into play. In the operation table, the OperationTypeId references the OpId of the relevant table...the "relevant table" is determined by the OperationTypeDescId.
An example: Let's say we had the above data set. In this example we know that there is an operation happening in a garage at 123 Main St. We know it started at noon, and has not yet ended. We know the type of operation is "OpBar". Since we know we're doing an OpBar operation instead of an OpFoo operation, we can focus on only the OpBar-relevant attributes, namely stuff1 and stuff2. Since the Operations's OperationTypeId is 2, we know that Stuff1 is 789 and Stuff2 is ghi.
Now the tricky part. In your program, this is going to require Reflection. If you don't know what that is, it's the practice of getting a Type from the NAME of that type. In our example, we know what table to look at (OpBar) because of its name in the OperationTypeDesc table. Put another way, you don't automatically know what table to look in; reflection tells you that information.
Edit:
Csaba says "In the future if you would like to introduce a new operation that would involve creating a new table in the database". That is correct. You would also need to add a new row to the OperationTypeDesc table. Csaba implies this is a bad thing, and I disagree - with a few provisions. If you are going to be adding a new operation type frequently, then yes, he makes a very good point. you don't want to be creating new tables constantly. If, however, you know ahead of time what types of operations will be performed, and will very rarely add new types of operations, then I maintain this is the way to go. All of your info common to all operations goes in the Operation table, and all op-specific info goes into the relevant "sub-table".
There is one more very important note regarding this. Because of how this is designed, you, the human, must be aware of the design. Whenever you create a new operation type, it's not as simple as creating the new table. Specifically, you have to make sure that the new table name and the OperationTypeDesc "Name" entry are the same. Think of it as an extra constraint - an "INTEGER" column can only contain ints, otherwise the db won't allow the data. In the same manner, the "Name" column can only contain the name of an existing table. You the human must be aware of that constraint, because it cannot be (easily) automatically enforced.

Hashed string in MySQL

Is there some kind of hashed string type in MySQL?
Let's say we have a table
user | action | target
-----------------------
1 | likes | 14
2 | follows | 190
I don't want to store "action" as text, because it takes much space and is slow to index. Actions are likely to be limited (up to 50 actions, I guess) but can be added/removed in the future. So I would like to avoid storing all actions by numbers in PHP. I would like to have a table that handles this transparently.
For example, table above would be stored as (1,1,14), (2,2,190) internally, and keys would be stored in another table (1 = likes, 2 = follows).
INSERT INTO table (41, "likes", 153)
Here "likes" is resolved to 1.
INSERT INTO table (23, "dislikes", 1245)
Here we have no key for "dislikes" to it is added and stored internally as 3.
Possible?
If you have a fixed (or reasonably fixed) set of values, then you can use an enum field. This is implemented as a bitmask internally and as a result takes a small amount of disk space. Here is an example definition:
CREATE TABLE enum_test (
myEnum enum('enabled', 'disabled', 'unknown')
);
Yes it is, with a subquery like this:
INSERT INTO table (23, (SELECT id FROM actions WHERE action="dislikes") , 1245)
This way it is possible to don't know the ID from PHP side, but only the action name, and still input it in the database as an ID
This assuming you have a 'actions' table
id | action
-----------
1 | like
2 | dislike
You want a table called "actions", and a foreign key called "action_id". That is how database normalization works:
user_actions:
user | action_id | target
-----------------------
1 | 1 | 14
2 | 2 | 190
actions:
id | name
--------------
1 | likes
2 | follows
As far as making insert into user_actions (1, 'likes', 47) work: You shouldn't care. Trying to make your SQL pretty is a pointless pursuit; you should never actually have to write any in your application code. The database interactions should be handled by a layer of models/business objects, and their internal implementation shouldn't matter to you.
As far as making insert into user_actions (1, 'dislikes', 47) automatically create new records in the actions table: That again isn't the database's job. Your models should be handling this.

Mysql: how to structure this data and search it

I'm new to mysql. Right now, I have this kind of structure in mysql database:
| keyID | Param | Value
| 123 | Location | Canada
| 123 | Cost | 34
| 123 | TransportMethod | Boat
...
...
I have probably like 20 params with unique values for each Key ID. I want to be able to search in mysql given the 20 params with each of the values and figure out which keyID.
Firstly, how should I even restructure mysql database? Should I have 20 param columns + keyID?
Secondly, (relates to first question), how would I do the query to find the keyID?
If your params are identical across different keys (or all params are a subset of some set of params that the objects may have), you should structure the database so that each column is a param, and the row corresponds to one KeyID and the values of its params.
|keyID|Location|Cost|TransportMethod|...|...
|123 |Canada |34 |Boat ...
|124 | ...
...
Then to query for the keyID you would use a SELECT, FROM, and WHERE statement, such as,
SELECT keyID
FROM key_table
WHERE Location='Canada'
AND Cost=34
AND TransportMethod='Boat'
...
for more info see http://www.w3schools.com/php/php_mysql_where.asp
edit: if your params change across different objects (keyIDs) this will require a different approach I think
The design you show is called Entity-Attribute-Value. It breaks many rules of relational database design, and it's very hard to use with SQL.
In a relational database, you should have a separate column for each attribute type.
CREATE TABLE MyTable (
keyID SERIAL PRIMARY KEY,
Location VARCHAR(20),
Cost NUMERIC(9,2),
TransportMethod VARCHAR(10)
);
I agree that Nick's answer is probably best, but if you really want to keep your key/value format, you could accomplish what you want with a view (this is in PostgreSQL syntax, because that's what I'm familiar with, but the concept is the same for MySQL):
CREATE OR REPLACE VIEW myview AS
SELECT keyID,
MAX(CASE WHEN Param = 'Location' THEN Value END) AS Location,
MAX(CASE WHEN Param = 'Cost' THEN Value END) AS Cost,
....
FROM mytable;
Performance here is likely to be dismal, but if your queries are not frequent, it could get the job done.

Need a tip on simple MySQL db design

I am trying to make a simple item database using MySQL for a game. Here is what my 3 tables would look like
items
itemId | itemName
-------------------
0001 | chest piece
0002 | sword
0003 | helmet
attributes (attribute lookup table)
attributeId | attributeName
---------------------------------
01 | strength
02 | agility
03 | intellect
04 | defense
05 | damage
06 | mana
07 | stamina
08 | description
09 | type
item_attributes (junction table)
itemId | attributeId | value (mixed type, bad?)
------------------------------------
0001 | 01 | 35
0001 | 03 | 14
0001 | 09 | armor
0001 | 08 | crafted by awesome elves
0002 | 09 | weapon
0002 | 05 | 200
0002 | 02 | 15
0002 | 08 | your average sword
0003 | 04 | 9000
0003 | 09 | armor
0003 | 06 | 250
My problem with this design is that value column in item_attributes table needs to use varchar data type, since the value's data can be int, char, varchar. I think this is a bad approach because I would not be able to quickly sort my items based on particular attributes. It would also suffer performance hit when a query such as get items with attribute strength that has value between 15 and 35 is processed.
Here is my potential fix. I simply added a data_type column to the attributes table. So it would look something like this
attributes (attribute lookup table)
attributeId | attributeName | data_type
---------------------------------------------------
01 | strength | int
09 | type | char
08 | intellect | varchar
Then I would add 3 more columns to item_attributes table, int, char, varchar. Here is how the new item_attributes table would look like.
item_attributes (junction table)
itemId | attributeId | value | int | char | varchar
------------------------------------------------------------------------
0002 | 09 | weapon | null |weapon| null
0002 | 05 | 200 | 200 | null | null
0002 | 02 | 15 | 15 | null | null
0002 | 08 | your average sword | null | null | your average sword
So now if I were to sort items based on its strength attribute, I would use int column. Or search for an item based on its description, I would search the varchar column.
I still, however, believe my design is a bit weird. Now I would have to look up the data_type column in attribute table and dynamically determine which column in item_attributes table is relevant to what I am looking for.
Any inputs would be greatly appreciated.
Thanks in advance.
EDIT 11/29/2010
Here is a detailed list of my items
--------------------------------------
http://wow.allakhazam.com/ihtml?27718
Aldor Defender's Legplates
Binds when picked up
LegsPlate
802 Armor
+21 Strength
+14 Agility
+21 Stamina
Item Level 99
Equip: Improves hit rating by 14.
--------------------------------------
http://wow.allakhazam.com/ihtml?17967
Refined Scale of Onyxia
Leather
Item Level 60
--------------------------------------
http://wow.allakhazam.com/ihtml?27719
Aldor Leggings of Puissance
Binds when picked up
LegsLeather
202 Armor
+15 Agility
+21 Stamina
Item Level 99
Equip: Increases attack power by 28.
Equip: Improves hit rating by 20.
--------------------------------------
http://wow.allakhazam.com/ihtml?5005
Emberspark Pendant
Binds when equipped
NeckMiscellaneous
+2 Stamina
+7 Spirit
Requires Level 30
Item Level 35
--------------------------------------
http://wow.allakhazam.com/ihtml?23234
Blue Bryanite of Agility
Gems
Requires Level 2
Item Level 10
+8 Agility
--------------------------------------
http://wow.allakhazam.com/ihtml?32972
Beer Goggles
Binds when picked up
Unique
HeadMiscellaneous
Item Level 10
Equip: Guaranteed by Belbi Quikswitch to make EVERYONE look attractive!
--------------------------------------
http://wow.allakhazam.com/ihtml?41118
Gadgetzan Present
Binds when picked up
Unique
Item Level 5
"Please return to a Season Organizer"
--------------------------------------
http://wow.allakhazam.com/ihtml?6649
Searing Totem Scroll
Unique
Quest Item
Requires Level 10
Item Level 10
Use:
--------------------------------------
http://wow.allakhazam.com/ihtml?6648
Stoneskin Totem Scroll
Unique
Quest Item
Requires Level 4
Item Level 4
Use:
--------------------------------------
http://wow.allakhazam.com/ihtml?27864
Brian's Bryanite of Extended Cost Copying
Gems
Item Level 10
gem test enchantment
--------------------------------------
EDIT #2
These 10 examples are not representative of all 35316 items data that I have collected.
NeckMiscellaneous means that item is in both categories of `Neck` and `Misc`.
Unique means the only one item can be used on character.
Don’t read too much into the “Action”, they are just quest description
When an item says `Equip: increase attack power by 28` it just means +28 attack power on the player character. It is the same as +15 agility.
There are a total of 241884 one-to-many item-attribute records, so that comes about to 241884/35316 ~= 8 average attributes per item. Also the data is mined from the website into a gigantic text file. There is NO “well formed” information to identify an item’s type or category. So if the word “sword” appears on either 3rd or 4th line, it is automatically categorized as sword.
The item might get changed on each new update of the game.
There is no universal attribute shared amongst the item besides `name`
The item data is accessible through a web app. Unclear about what you mean by bits and vectors?
The regular expression is used during data mining stage to clean up the special character and search for specific keyword in order to categorize the items. Also to extract attribute name and value. For example, +15 agility would have string agility extracted as attribute name and 15 as value. (I don’t understand much about question 6 and 6.1. Slog stands for server log here? Translate regexes to SQL?)
Model Diagram
Here is an example how a query looks like
select *
from itemattributestat
where item_itemId=251
item_itemId | attribute_attributeId | value | listOrder
=======================================================
'251', '9', '0', '1'
'251', '558', '0', '2'
'251', '569', '0', '3'
'251', '4', '802', '4'
'251', '583', '21', '5'
'251', '1', '14', '6'
'251', '582', '21', '7'
'251', '556', '99', '8'
'251', '227', '14', '9'
The list order is here to keep track of which attribute should be listed first. For formatting purpose
create view itemDetail as
select Item_itemId as id, i.name as item, a.name as attribute, value
from ((itemattributestat join item as i on Item_itemId=i.itemId)
join attribute as a on Attribute_attributeId=a.attributeId)
order by Item_itemId asc, listOrder asc;
The above view produces the following with
select *
from itemdetail
where id=251;
id | item | attribute | value
'251', 'Aldor Defender''s Legplates', 'Binds when picked up', '0'
'251', 'Aldor Defender''s Legplates', 'Legs', '0'
'251', 'Aldor Defender''s Legplates', 'Plate', '0'
'251', 'Aldor Defender''s Legplates', 'Armor', '802'
'251', 'Aldor Defender''s Legplates', 'Strength', '21'
'251', 'Aldor Defender''s Legplates', 'Agility', '14'
'251', 'Aldor Defender''s Legplates', 'Stamina', '21'
'251', 'Aldor Defender''s Legplates', 'Item Level', '99'
'251', 'Aldor Defender''s Legplates', 'Equip: Improves hit rating by ##.', '14'
An attribute with value 0 means the attribute name represents the item type. 'Equip: Improves hit rating by ##.', '14' ## is place holder here, a processed output on a browser will be 'Equip: Improves hit rating by 14.'
Why do you have an attribute table ?
Attributes are columns, not tables.
The website link tells us nothing.
The whole idea of a database is that you join the many small tables, as required, for each query, so you need to get used to that. Sure, it gives you a grid, but a short and sweet one, without Nulls. What you are trying to do is avoid tables; go with just one massive grid, which is full of Nulls.
(snip)
Do not prefix your attribute names (column names) with the table name, that is redundant. This will become clear to you when you start writing SQL which uses more than one table: then you can use the table name or an alias to prefix any column names that are ambiguous.
The exception is the PK, which is rendered fully, and used in that form wherever it is an FK.
Browse the site, and read some SQL questions.
After doing that, later on, you can think about if you wantstrength and defense to be attributes (columns) of type; or not. Et cetera.
Responses to Comments 30 Nov 10
.
Excellent, you understand your data. Right. Now I understand why you had an Attribute table.
Please make sure those 10 examples are representative, I am looking at them closely.
Type:Gem Name:Emberspark Pendant ... Or, is NeckMiscellaneous a type ?
Is Unique a true ItemType ? I think Not
Action.Display "Please return to a Season Organizer"
Where are the Attrinutes for AttackPower and HitRating ?
.
How many different types of items (of 35,000) are there, ala my Product Cluster example. Another way of stating that question is, how many variations are there. I mean, meaningfully, not 3500 Items ÷ 8 Attributes ?
Will the item_attributes change without a release of s/w (eg. a new Inner Strength attribute) ?
Per Item, what Attributes are repeating (more than one); so far I see only Action ?
It is a game, so you need a db that is tight and very fast, maybe fully memory resident, right. No Nulls. No VAR Anything. Shortest Datatypes. Never Duplicate Anything (Don't Repeat Yourself). Are you happy with bits (booleans) and vectors ?
Do you need to easily translate those regexes into SQL, or are you happy with a serious slog for each (ie. once you get them working in SQL they are pretty stable and then you don't mess with it, unless you find a bug) (no sarcasm, serious question) ?
6.1 Or maybe it is the other way round: the db is disk-resident; you load it into memory once; you run the regexes on that during gameplay; occasionally writing to disk. Therefore there is no need to translate the regexes to SQL ?
Here's a Data Model of where I am heading, this not at all certain; it will be modulated by your answers. To be clear:
Sixth Normal Form is The Row consists of the Primary Key and, at most, one Attribute.
I have drawn (6.1) not (6), because your data reinforces my belief that you need a pure 6NF Relational database
My Product Cluster Data Model, the better-than-EAV example, is 6NF, then Normalised again (Not in the Normal Form sense) by DataType, to reduce no of tables, which you have already seen. (EAV people usually go for one or a few gigantic tables.)
This is straight 5NF, with only the 2 tables on the right in 6NF.
Link to Game Data Model
Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.
Response to Edit #2 05 Dec 10
1.1. Ok, corrected.
1.2. Then IsUnique is an Indicator (boolean) for Item.
1.3. Action. I understand. So Where are you going to store it ?
1.4. NeckMiscellaneous means that item is in both categories of Neck and Misc. That means two separate Item.Name=Emberspark Pendant, each with a different Category.
.
2. and 5. So you do need fast fast memory-resident db. That's why I am trying to get you across the line, away from GridLand, into RelationalLand.
.
3. Ok, we stay with Fifth Normal Form, no need for 6NF or the Product Cluster (tables per Datatype). Sofar the Values are all Integers.
.
4. I can see additionally: Level, RequiredLevel, IsUnique, BindsPickedUp, BindsEquipped.
.
5. Bits are booleans { 0 | 1 }. Vectors are required for (Relational) projections. We will get to them later.
.
6. Ok, you've explained, You are not translating regular expressions to SQL. (Slog means hard labour).
.
7. What is Category.ParentId ? Parent Category ? That has not come up before.
.
8. Attribute.GeneratedId ?
Please evaluate the Data Model (Updated). I have a few more columns, in addition to what you have in yours. If there is anything you do not understand in the Data Model, ask a specific question. You've read the Notation document, right ?
I have Action as a table, with ItemAction holding the Value:
Equip: increase attack power by 28 is Action.Name=Increase attack power by and ItemAction.Value=28.
I think having the data_type column just further complicates the design. Why not simply have type and description be columns on the items table? It stands to reason that every item would have each of those values, and if it doesn't then a null would do just fine in a text column.
You can even further normalize the type by having an item_types table and the type column in items would be a numeric foreign key to that table. Might not be necessary, but might make it easier to key off of the type on the items table.
Edit:
Thinking about this further, it seems like you may be trying to have your data tables match a domain model. Your items would have a series of attributes on them in the logic of the application. This is fine. Keep in mind that your application logic and your database persistence layout can be different. In fact, they should not rely on each other at all at a design level. In many small applications they will likely be the same. But there are exceptions. Code (presumably object-oriented, but not necessarily) and relational data have different designs and different limitations. De-coupling them from one another allows the developer to take advantage of their designs rather than be hindered by their limitations.
You are dealing with a two common problems:
Entities that are similar to each other but not identical (all items have a name and description, but not necessarily an intellect).
A design in which you need to add attributes once the database is in production (you can pretty easily predict that at some point you'll need to add, for instance, a magic-resistance attribute to some items).
You've solved your problem by reinventing the EAV system in which you store both attribute names and values as data. And you've rediscovered some of the problems with this system (type checking, relational integrity).
In this case, I'd personally go with a solution midway between the relational one and the EAV one. I'd take the common columns and add them, as columns, to either the items table or, if items represents kinds of items, not individual ones, to the items_owners table. Those columns would include description and possibly type and in the example you gave, pretty much match up with the text columns. I'd then keep the existing layout for those attributes that are numerical ratings, making the value type int. That gives you type-checking and proper normalization across the integer attributes (you won't be storing lots of NULLs) at the expense of the occasional NULL type or description.