MySQL table design - refer to multiple foreign rows - mysql

Minimilized background (ie in bare pseudo code details)
I am making a record keeping (among other things) php/mysql app for my farm. There are lots of types of animals etc that could have pictures (or other records - videos etc.) but just for simplicity I'll only refer to one of each (Goats and Pictures). so say the
tables are approximately like so:
CREATE TABLE BMD_farmrecords_goats (
goat_id INT NOT NULL AUTO_INCREMENT,
goat_name TEXT,
...more columns but whatever, unimportant...
primary_key(goat_id))
CREATE TABLE BMD_farmrecords_pictures (
media_id INT NOT NULL AUTO_INCREMENT,
media_name TEXT,
media_description TEXT,
media_description_short TEXT,
media_date_upload DATE,
media_date_taken DATE,
media_uploader INT, //foreign key constrained to user table but unimportant for question
media_type ENUM('jpg','gif','png'),
media_hidden BOOL,
media_category INT, //foreign key constrained to category table but unimportant for question
PRIMARY KEY (media_id)
So the problem(s):
Obviously a picture could have multiple goats in it so I can't just
have one foreign key in picture to refer to goat.
there are more than one livestock tables that would also make that a poor choice but not worried about that right now
Basically no optimization has been applied as of yet (ie no lengths set, using TEXT rather than varchar(length)) etc; I'm not worried about that until I populate it a bunch and see exactly how long I want everything.
so the question:
what is the best_ way to link a picture to multiple goats (in terms of A) best performance B) best code conformance to standards. I'm thinking I'll have to do an extra table:
create TABLE BMD_farmrecords_goatpictures (
id INT NOT NULL AUTO_INCREMENT
picture_id INT //foreign key to BMD_farmrecords_pictures->media_id
goat_id INT//foreign key to BMD_farmrecords_goats->goat_id
So is there any better way to do that?
Of course with that method I'll probably have to change *_goats table to be a parent *_animals table with then a type field and reference animal_id instead but I'm not worried about that, just about whether or not the extra table referencing both tables is the best method.
thanks;

From the discussion just changing my original idea to use a composite primary key:
create TABLE BMD_farmrecords_goatpictures (
picture_id INT //foreign key to BMD_farmrecords_pictures->media_id
goat_id INT//foreign key to BMD_farmrecords_goats->goat_id
PRIMARY KEY (picture_id, goat_id))

Related

MySQL table for single column

This is a question about database design. Say I have several tables, some of which each have a common expiry field.
CREATE TABLE item (
id INT PRIMARY KEY
)
CREATE TABLE coupon (
id INT PRIMARY KEY FOREIGN KEY (`item.id`),
expiry DATE NOT NULL
)
CREATE TABLE subscription (
id INT PRIMARY KEY FOREIGN KEY (`item.id`),
expiry DATE NOT NULL
)
CREATE TABLE product(
id INT PRIMARY KEY FOREIGN KEY (`item.id`),
name VARCHAR(32)
)
The expiry column does need to be indexed so I can easily query by expiry.
My question is, should I pull the expiry column into another table like so?
CREATE TABLE item (
id INT PRIMARY KEY
)
CREATE TABLE expiry(
id INT PRIMARY KEY,
expiry DATE NOT NULL
)
CREATE TABLE coupon (
id INT PRIMARY KEY FOREIGN KEY (`item.id`),
expiry_id INT NOT NULL FOREIGN KEY(`expiry.id`)
)
CREATE TABLE subscription (
id INT PRIMARY KEY FOREIGN KEY (`item.id`),
expiry_id INT NOT NULL FOREIGN KEY(`expiry.id`)
)
CREATE TABLE product(
id INT PRIMARY KEY FOREIGN KEY (`item.id`),
name VARCHAR(32)
)
Another possible solution is to pull the expiry into another base "class" table.
CREATE TABLE item (
id INT PRIMARY KEY
)
CREATE TABLE expiring_item (
id INT PRIMARY KEY FOREIGN KEY(`item.id`),
expiry DATE NOT NULL
)
CREATE TABLE coupon (
id INT PRIMARY KEY FOREIGN KEY (`expiring_item .id`),
)
CREATE TABLE subscription (
id INT PRIMARY KEY FOREIGN KEY (`expiring_item .id`),
)
CREATE TABLE product(
id INT PRIMARY KEY FOREIGN KEY (`item.id`),
name VARCHAR(32)
)
Given the nature of databases in that refactoring the table structure is difficult once they are being used, I am having trouble weighing the pros and cons of each approach.
From what I see, the first approach uses the least number of table joins, however, I will have redundant data for each expiring item. The second approach seems good, in that any time I need to add an expiry to an item I simply add a foreign key to that table. But, if I discover expiring items (or a subset of expiring items) actually share another attribute then I need to add another table for that. I like the third approach best, because it brings me closest to an OOP like hierarchy. However, I worry that is my personal bias towards OOP programming, and database tables do not use composition in the same way OOP class inheritance does.
Sorry for the poor SQL syntax ahead of time.
I would stick with the first design as 'redundant' data is still valid data if only as a record of what was valid at a point in time and it also allows for renewal with minimum impact. Also the second option makes no great sense as the expiry is an arbritrary item that has no real context outside of the table referencing, in other words unless it is associated with a coupon or a subscription it is an orphan value. Finally the third option makes no more sense in that at what point does a item become expiring? as soon as it is defined? at a set period before expiry...at the end of the day the expiry is an distinct attribute which happens to have the same name and purpose for both the coupon and the subscription but which isn't related to each other or as such the item.
Do not normalize "continuous" values such as datetime, float, int, etc. It makes it very inefficient to do any kind of range test on expiry.
Anyway, a DATE takes 3 bytes; an INT takes 4, so the change would increase the disk footprint for no good reason.
So, use the first, not the second. But...
As for the third, you say "expirations are independent", yet you propose having a single expiry?? Which is it??
If they are not independent, then another principle comes into play. "Don't have redundant data in a database." So, if the same expiry really applies to multiple connected tables, it should be in only one of the tables. Then the third schema is the best. (Exception: There may be a performance issue, but I doubt it.)
If there are different dates for coupon/subscription/etc, then you must not use the third.

What would be the best table structure for variable amount of combination?

I need some advice for the choice of my table structure.
I am working on a project where I need to save values that are a combination of a variable amount of other values.
For example:
A = b,c,d
B = z,r
I was thinking on saving the combinations in a json object inside a column but I am afraid it can be long for big requests and not easy for filtering.
There was also the solution of having a multiple amount of columns (containing null when not necessary), but this will not be a good representation of the data, also filtering will be hard.
Finally I thought the best would be many to many relations, but the joins might be too heavy, are they ?
Do you see any other alternative (besides switching to nosql) ?
This shows the use of Junction tables to avoid saving data in comma separated lists, json, or other mechanisms that would be problematic in at least these areas:
Tables-scans (slowness, non-use of fast indexes)
Maintenance of data
Data integrity
Schema
create table cat
( -- categories
id int auto_increment primary key,
code varchar(20) not null,
description varchar(255) not null
);
create table subcat
( -- sub categories
id int auto_increment primary key,
code varchar(20) not null,
description varchar(255) not null
);
create table csJunction
( -- JUNCTION table for cat / sub categories
-- Note: you could ditch the id below, and go with composite PK on (catId,subCatId)
-- but this makes the PK (primary key) thinner when used elsewhere
id int auto_increment primary key,
catId int not null,
subCatId int not null,
CONSTRAINT fk_csj_cat FOREIGN KEY (catId) REFERENCES cat(id),
CONSTRAINT fk_csj_subcat FOREIGN KEY (subCatId) REFERENCES subcat(id),
unique key (catId,subCatId) -- prevents duplicates
);
insert cat(code,description) values('A','descr for A'),('B','descr for B'); -- id's 1,2 respectively
insert subcat(code,description) values('b','descr for b'),('c','descr for c'),('d','descr for d'); -- id's 1,2,3
insert subcat(code,description) values('r','descr for r'),('z','descr for z'); -- id's 4,5
-- Note that due to the thinness of PK's, chosen for performance, the below is by ID
insert csJunction(catId,subCatId) values(1,1),(1,2),(1,3); -- A gets a,b,c
insert csJunction(catId,subCatId) values(2,4),(2,5); -- B gets r,z
Good Errors
The following errors are good and expected, data is kept clean
insert csJunction(catId,subCatId) values(2,4); -- duplicates not allowed (Error: 1062)
insert csJunction(catId,subCatId) values(13,4); -- junk data violates FK constraint (Error: 1452)
Other comments
In response to your comments, data is cached only in so far as mysql has a Most Recently Used (MRU) strategy, no more or less than any data cached in memory versus physical lookup.
The fact that B may contain not only z,r at the moment, but it could also contain c as does A, does not mean there is a repeat. And as seen in the schema, no parent can duplicate its containment (or repeat) of a child, which would be a data problem anyway.
Note that one could easily go the route of PK's in cat and subcat using the code column. That would unfortunately cause wide indexes, and even wider composite indexes for the junction table. That would slow operations down considerably. Though the data maintenance could be visually more appealing, I lean toward performance over appearance any day.
I will add to this Answer when time permits to show such things as "What categories contain a certain subcategory", deletes, etc.

Data Model, alternative to EAV

I have a Product Database, and I want to attach text, images, videos to the products. I also want that each entity (text, images or videos) have a tag, for further organisation on application.
I thought of using this model:
Content:
content_id|content_product_id|content_type|content_tag_id|content_url|content_title|content_text
Tag
tag_id|tag_name
This mean to use Entity(content_product_id) - Attribute(content_tag_id) - Value (content_url or content_title|content_text) Model.
After reading a lot, I understood that is a bad idea to use this modeling pattern (described as a database antipattern, unscalable and causing performance issues), have you an idea for an alternative method for this ?
I want to use Doctrine ORM, and I would like to find an method that will be easily compatible with that data mapper
I'd create a general table for any type of content:
CREATE TABLE ProductContents(
content_id INT AUTO_INCREMENT PRIMARY KEY,
content_type INT NOT NULL
-- other general attributes like when it was created, by whom, etc.
);
For each text, image, or video, insert one row into this table. If you use an auto-increment primary key, this table is responsible for generating the id number.
For tags, now you simply have a many-to-many relationship between ProductContent and Tags. This is represented by an intersection table.
CREATE TABLE Tags (
tag_id INT AUTO_INCREMENT PRIMARY KEY,
tag TEXT NOT NULL
);
CREATE TABLE ProductContentTagged (
content_id INT,
tag_id INT,
PRIMARY KEY (content_id, tag_id),
FOREIGN KEY (content_id) REFERENCES ProductContents(content_id),
FOREIGN KEY (tag_id) REFERENCES Tags(tag_id),
);
Then if you have any attributes specific to each type of content, create auxiliary tables for each type, with a one-to-one relationship to the content table.
CREATE TABLE ProductContentTexts (
content_id INT PRIMARY KEY,
content TEXT NOT NULL,
FOREIGN KEY (content_id) REFERENCES ProductContents(content_id)
);
CREATE TABLE ProductContentImages (
content_id INT PRIMARY KEY,
image_path TEXT NOT NULL,
FOREIGN KEY (content_id) REFERENCES ProductContents(content_id)
);
CREATE TABLE ProductContentVideos (
content_id INT PRIMARY KEY,
video_path TEXT NOT NULL,
FOREIGN KEY (content_id) REFERENCES ProductContents(content_id)
);
Note these auxiliary tables don't have an auto-increment column. They don't need to -- they will always use the value that was generated by the ProductContents table, and you're responsible for inserting that value.
Bill Karwin's answer is very good.
However, since you say:
I want to use Doctrine ORM, and I would like to find an method that will be easily compatible with that data mapper
I'll relate his answer to that particular ORM.
What Bill describes is inheritance. You have a superclass of "content", represented by a table that holds all the shared data. Then you have subclasses (text, image, video) that extend that superclass by adding content-type-specific columns.
Doctrine2 will do essentially what Bill has suggested when you use class-table inheritance. Once you configure your entities properly, it will create a set of tables very similar to what Bill describes.
So, with Doctrine you cave the Content entity, which is extended by Image, Text, and Video.
As far as the tagging goes, you would just create a basic Tag entity, and Content would have a ManyToMany relationship to Tag. Doctrine will handle creating the intermediate table for you.

mysql - referencing one foreign key to multiple possible primary keys

I'd like to set up the following database scenario:
CREATE TABLE IF NOT EXISTS `points` (
`po_id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`po_north` INT,
`po_east` INT,
PRIMARY KEY (`po_id`),
) ENGINE=InnoDB;
CREATE TABLE IF NOT EXISTS `lines`(
`li_id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`li_from` INT NOT NULL,
`li_to` INT NOT NULL,
PRIMARY KEY (`li_id`),
FOREIGN KEY (`li_from`) REFERENCES points(`po_id`),
FOREIGN KEY (`li_to`) REFERENCES points(`po_id`),
) ENGINE=InnoDB;
Now I want to set up a third table, that sores some metadata like who created or altered a point or a line:
CREATE TABLE IF NOT EXISTS `metadata` (
`me_type` ENUM('point','line') NOT NULL,
`me_type_id` INT UNSIGNED NOT NULL,
`me_permissions` VARCHAR(255) NOT NULL,
`me_created_by` INT UNSIGNED NOT NULL,
`me_created_on` DATETIME NOT NULL,
`me_last_modified_by` INT UNSIGNED NOT NULL,
`me_last_modified_by` DATETIME NOT NULL,
) ENGINE=InnoDB;
My first approach was to set an ENUM with two types (points and lines). But the problem is still, that I cannot properly reference a foreign key to one of the tables. Is there any recommended solution for such problem in MySQL?
BTW:
The fields for me_created_by and me_last_modified_by shall reference to a table storing some user data.
Your case appears to be yet another instance of the design pattern known as "generalization specialization" or perhaps "table design for class inheritance".
If you think of points and lines as classes of objects, they are both subclasses of some more general class of objects. I'm not sure what name to give the superclass in this case. Here's one of several previous questions that address the same issue.
Extending classes in the database
Fowler gives an extensive treatment of the subject. Your case has an added wrinkle, because you are dealing with metadata. But that need not alter the design. You need a third table, which I'll call "Items" for lack of a better term. The key, "it_id" would be assigned an auto number, and you would add an item every time you add either a point or a line. The two columns "po_id" and "li_id" would not be assigned an auto number. Instead they would be foreign keys, referencing "it_id" in the Items table.
The references to points or lines in the metadata table would then be references to "items" and you could use that information to find information about points or lines as the case may be.
How helpful this is depends on what you are trying to do with the metadata.
Your tables points and lines should contain a foreign key to metadata – not the other way around. Doing so will save you from defining any more complicated table setups. Using this approach, a single metadata-entry could be re-used several times for many different points or lines. This isn't even MySQL specific but a general, normalized database structure.
you can do this using a trigger, you need to trigger an event that can create reference key for either point or line before you insert a record based on respective tables

How would I create this MySQL Schema?

Suppose I have a blog post entity.
It has many attributes
It has comments attached to it.
It has many states (deleted/locked/invisible, etc).
It has many "tags". (keywords, school_id, user_id)
Obviously, comments should be its own table, with a many-to-one relationship to Blog table.
But what about "states" or "tags"? Would you put that in another table? Or would you stick that in many columns?
What about attributes...if they get too big? Because as my website grows, the blog post will have more and more attributes attached (title, author, blah, blah....). What happens if the attribute list goes as high as 100?
Here's a sample:
Again.. It's just a sample.. There are other approaches that you can use.
Here we go:
-- basic-basic blog
CREATE TABLE blog_entry (
blog_entry_id INT NOT NULL AUTO_INCREMENT,
blog_entry_title VARCHAR(255) NOT NULL,
blog_entry_text VARCHAR(4000) NOT NULL,
create_date DATETIME,
state_id INT
);
-- create a look-up table for your blog entry's state
CREATE TABLE be_state (
state_id INT NOT NULL AUTO_INCREMENT,
name CHAR(30) NOT NULL,
PRIMARY KEY (state_id)
);
-- create a look-up table for your blog entry's tag/s
CREATE TABLE be_tag (
tag_id INT NOT NULL AUTO_INCREMENT,
name CHAR(30) NOT NULL,
PRIMARY KEY (tag_id)
);
-- a table to store multiple tags to one entry
CREATE TABLE blog_entry_tags (
blog_entry_id INT NOT NULL,
tag_id INT NOT NULL,
PRIMARY KEY (blog_entry_id, tag_id)
);
-- a table to store definitions of attributes
CREATE TABLE be_attribute (
attribute_id INT NOT NULL AUTO_INCREMENT,
name CHAR(30)
);
-- now have a table to which you can assign multiple attributes to one blog
-- of course, this is if I understand you correctly
-- where you want to have additional attributes
-- aside from the basic properties of a blog entry
-- and will allow you, if you choose to do it
-- to not necessarily have all attributes for each entry
CREATE TABLE blog_entry_attributes (
blog_entry_id INT NOT NULL,
attribute_id INT NOT NULL,
PRIMARY KEY (blog_entry_id, attribute_id)
-- PK enforces one blog entry may have only one attribute of its type
-- meaning, no multiple attributes of 'location' attribute,
-- for example, for one blog. Unless of course you wrote half the entry
-- in one location and finished it in the next.. then you should
-- NOT enforce this primary key
);
blog_entry - your main table, where the goods go
be_state - define them here, and insert their state_id values in blog_entry.state_id
be_tag - have multiple tags like we do here
blog_entry_tags - since you can possibly have many tags for one blog entry, store them here and insert blog_entry.blog_entry_id and the corresponding be_tag.tag_id together. one tag of its type per blog entry. meaning you can't tag entry#1 (for example) the tag php twice or more.
be_attribute - store attribute definitions here like location, author, etc
blog_entry_attributes - similar to blog_entry_tags where you can assign one or more than one be_attribute to a blog entry.
Again, this is just one approach.
first of all, states should be a tightly structured thing, so you should create separate columns for them. Think about what you need at the beginning, but you can easily add one or two more columns later.
Tags like keywords shouldn't be stored in columns, because the amount is growing rapidly over time. That wouldn't make any sense. So for that, build a table with id and keyword in it and a link table with post_id and keyword_id. You could also omit the keyword_id and directly link post_id and keyword.
Make sure that both columns combined define the primary key, so you can not end up with a keyword stored several time to one particular post.
For attributes it can be the same. It is not a bad practice to create an attribute table with attribute_id, attribute_name and maybe more information and a link table attribute_id and post_id and content.
You can also easily enhance it to be multilingual by using attribute_ids.
Comments are the same, stored in a separate table with a link to a user and a post: comment_id, user_id, post_id, content and maybe parent_id, which can be a comment_id if you want comments to be commentable again.
That's it for a brief overview.