MySQL - appropriate application of VIEWS or FOREIGN KEYS - mysql

Suppose I have a table that acts as an inventory of my house - inventory_items if you will. inventory_items contains everything I own, but only the most general information (i.e fields that will apply to everything I own, like a name, purchase date).
I then wish to have a separate table for electronics_data which is an inventory item, but has special information to store (lets say serial_number, wattage) and another for furniture_data which contains furniture specific information (number_of_legs, material).
In all instances, items in electronics_data will have a matching item in inventory_items linked by an id field. The same is true of furniture_data.
If I now wish to show a list of my inventory items, but include specific information from the child tables, logically I think to load the inventory_data, find out what type of item this is, and load the right information from the right table. I can think of two better ways:
1) Create a foreign key relationship between inventory_items and electronics_data - thus loading all items will get me all of my child data too. But, not all items in inventory_items will have a matching item in electronics_data so does this mean a foreign key can't work?
2) Create a view which loads the extra tables if a matching item exists in them, and load the view in my application. If I have lots of different 'types' of data, will this make my view unnecessarily slow (checking everything) and actually defeat the object of the view in the first place?
These are general questions - particularly 2) I would imagine is very data dependent.
Thanks!

1) Foreign keys will work, since the specialised tables are the child tables, so you need to make sure that each record in the child table has a corresponding record in the overall inventory_items table. The reverse is not necessarily true.
2) The view can left join the child tables on the inventory_items table. If the fields used in the join are indexed in all tables, then the operation is not that resource intensive. The biggest catch could be how you build the view, if you have lots of specialised child tables. But this is probably a wider application design question anyway (if you are looking at your electronic devices, then you probably do not want to see the fields from the furniture items table - in these specialised views I would use inner join, not left join).

well it will make your life easier if you could join the tables when extracting data. There are a lot of ways to join tables, in your case if all your tables have an I.D column then you could use an 'Equijoin' This is how you could do so
SELECT inventory_items.name, electronics_data.wattage, furniture_data.material
FROM inventory_items, electronics_data, furniture_data
WHERE inventory_items.i.d=electronics_data.i.d=furniture_data.id;
so with a join like this you can add as many columns as you wish but make sure to highlight the table they are from and in the 'WHERE' clause show where they are equal otherwise it wont return any data

I have posted an fairly detailed response to a similar question here, even how to define the views you mention. Note that the code shown in the view definition is for illustration only. It will not show the most efficient way to write it. Better ways should be fairly straight-forward, however.
A word about view performance. Take a view that joins very large tables in such a way that the query
select * from <view>
takes a long time, say 30 minutes. The query
select * from <view> where <criteria>
could take fractions of a second. In most modern DBMSs, the where criteria is merged with the existing query in the view definition to execute the query. It does not execute the view definition and then do the filtering. So test view performance with actual queries not "data dump" queries.

Related

How to set up relational database tables for this many-to-many relationship?

I have a type of data called a chain. Each chain is made up of a specific sequence of another type of data called a step. So a chain is ultimately made up of multiple steps in a specific order. I'm trying to figure out the best way to set this up in MySQL that will allow me to do the following:
Look up all steps in a chain, and get them in the right order
Look up all chains that contain a step
I'm currently considering the following table set up as the appropriate solution:
TABLE chains
id date_created
TABLE steps
id description
TABLE chains_steps (this would be used for joins)
chain_id step_id step_position
In the table chains_steps, the step_position column would be used to order the steps in a chain correctly. It seems unusual for a JOIN table to contain its own distinct piece of data, such as step_position in this case. But maybe it's not unusual at all and I'm just inexperienced/paranoid.
I don't have much experience in all this so I wanted to get some feedback. Are the three tables I suggested the correct way to do this? Are there any viable alternatives and if so, what are the advantages/drawback?
You're doing it right.
Consider a database containing the Employees and Projects tables, and how you'd want to link them in a many-to-many fashion. You'd probably come up with an Assignments table (or Project_Employees in some naming conventions).
At some point you'd decide you want not only to store each project assignment, but you'd also want to store when the assignment started, and when it finished. The natural place to put that is in the assignment itself; it doesn't make sense to store it either with the project or with the employee.
In further designs you might even find it necessary to store further information about the assignment, for example in an employee review process you may wish to store feedback related to their performance in that project, so you'd make the assignment the "one" end of a relationship with a Review table, which would relate back to Assignments with a FK on assignment_id.
So in short, it's perfectly normal to have a junction table that has its own data.
That looks fine, and it's not unusual for the join table to contain a position/rank field.
Look up all steps in a chain, and get them in the right order
SELECT * FROM chains_steps
LEFT JOIN steps ON steps.id = chains_steps.step_id
WHERE chains_steps.chain_id = ?
ORDER BY chains_steps.step_position ASC
Look up all chains that contain a step
SELECT DISTINCT chain_id FROM chains_steps
LEFT JOIN chains ON chains.id = chains_steps.chain_id
I think that the plan you've outlined is the correct approach. Don't worry too much about the presence of step_position on your mapping table. After all the step_position is a bit of data that is directly related to a step in the context of a chain. So the chains_steps table is the right place for it IMHO.
Some things to think about:
Foreign keys - use 'em!
Unique key on the chains_steps table - can a step be present in more than one position in a single chain? What about in different chains?
Good luck!

Best way to do a query with a large number of possible joins

On the project I'm working on we have an activity table and each activity can be linked to one of about 20 different "activity details" tables...
e.g. If the activity was of type "work", then it would have a corresponding activity_details_work record, if it was of type "sick leave" then it would have a corresponding activity_details_sickleave record and so on.
Currently we are loading the activities and then for each activity we have a separate query to go fetch the activity details from the relevant table. This obviously doesn't scale well if you have thousands of activities.
So my initial thought was to have a single query which fetches the activities and joins the details in one go e.g.
SELECT * FROM activity
LEFT JOIN activity_details_1_work ON ...
LEFT JOIN activity_details_2_sickleave ON ...
LEFT JOIN activity_details_3_travelwork ON ...
...etc...
LEFT JOIN activity_details_20_yearleave ON ...
But this will result in each record having 100's of fields, most of which are empty and that feels nasty.
Lazy-loading the details isn't really an option either as the details are almost always requested in the core logic, at least for the main types anyway.
Is there a super clever way of doing this that I'm not thinking of?
Thanks in advance
My suggestion is to define a view for each ActivityType, that is tailored specifically to that activity.
Then add an index on the Activity table lead by the ActivityType field. Cluster said index unless there is an overwhelming need for some other to be clustered (or performance benchmarking shows some other clustering selection to be more performant).
Is there a particular reason why this degree of denormalization was designed in? Is that reason well known?
Chances are your activity tables are like (date_from, date_to, with_who, descr) or something to that effect. As Pieter suggested, consider tossing in a type varchar or enum field in there, so as to deal with a single details table.
If there are rational reasons to keep the tables apart, consider adding triggers that maintain boolean/tinyint fields (has_work, has_sickleave, etc), or a bit string (has_activites_of_type where the first position amounts to has_work, the next to has_sickleave, etc.).
Either way, you'll probably be better off by fetching the activity's details in one or more separate queries -- if only to avoid field name collisions.
I don't think enum is the way to go, because as you say there might be 1000's of activities, then altering your activity table would become an issue.
There is no point doing a left join on a large number of tables either.
So the options that you have are :
See this The first comment might be useful.
I am guessing that your activity table has a field called activity_type_id.
Build a table called activity_types containing fields activity_type_id, activity_name, activity_details_table_name. First query in the following way
activity
inner join
activity_types
using( activity_type_id )
This query gives you the table name on which to query for the details.
This way you can add any new activity type just by adding a row in the activity_types table.

member action table data model suggestion

I'm trying to add an action table, but i'm currently at odds as to how to approach the problem.
Before i go into more detail.
We have members who can do different actions on our website
add an image
update an image
rate an image
post a comment on image
add a blog post
update a blog post
comment on a blog post
etc, etc
the action table allows our users to "Watch" other member's activities if they want to add them to their watch list.
I currently created a table called member_actions with the following columns
[UserID] [actionDate] [actionType] [refID]
[refID] can be a reference either to the image ID in the DB or blogpost ID, or an id column of another actionable table (eg. event)
[actionType] is an Enum column with action names such as (imgAdd,imgUpdate,blogAdd,blogUpdate, etc...)
[actionDate] will decide which records get deleted every 90 days... so we won't be keeping the actions forever
the current mysql query i cam up with is
SELECT act.*,
img.Title, img.FileName, img.Rating, img.isSafe, img.allowComment AS allowimgComment,
blog.postTitle, blog.firstImageSRC AS blogImg, blog.allowComments AS allowBlogComment,
event.Subject, event.image AS eventImg, event.stimgs, event.ends,
imgrate.Rating
FROM member_action act
LEFT JOIN member_img img ON (act.actionType="imgAdd" OR act.actionType="imgUpdate")
AND img.imgID=act.refID AND img.isActive AND img.isReady
LEFT JOIN member_blogpost blog ON (act.actionType="blogAdd" OR act.actionType="blogUpdate")
AND blog.id=act.refID AND blog.isPublished AND blog.isPublic
LEFT JOIN member_event event ON (act.actionType="eventAdd" OR act.actionType="eventUpdate")
AND event.id=act.refID AND event.isPublished
LEFT JOIN img_rating imgrate ON act.actionType="imgRate" AND imgrate.UserID=act.UserID AND imgrate.imgID=act.refID
LEFT JOIN member_favorite imgfav ON act.actionType="imgFavorite" AND imgfav.UserID=act.UserID AND imgfav.imgID=act.refID
LEFT JOIN img_comment imgcomm ON (act.actionType="imgComment" OR act.actionType="imgCommentReply") AND imgcomm.imgID=act.refID
LEFT JOIN blogpost_comment blogcomm ON (act.actionType="blogComment" OR act.actionType="blogCommentReply") AND blogcomm.blogPostID=act.refID
ORDER BY act.actionDate DESC
LIMIT XXXXX,20
Ok so basically, given that i'll be deleting actions older than 90 days every week or so... would it make sense to go with this query for displaying the member action history?
OR should i add a new text column in member_actions table called [actionData] where i can store a few details in json or xml format for fast querying of the member_action table.
It adds to the table size and reduces query complexity, but the table will be purged from periodically from old entries.
the assumption is that eventually we'll have no more than a few 100k members so would i'm concerned about the table size of the member_action table with it's text [actionData] column that will contain some specific details.
I'm leaning towards the [actionData] model but any recommendations or considerations will be appreciated.
another consideration is that it's possible that the table entries for img or blog could get deleted... so i could have action but no reference record...this sure does add to the problem.
thanks in advance
Because you are dealing with user interface issues, performance is key. All the joins will do take time, even with indexes. And, querying the database is likely to lock records in all the tables (or indexes), which can slow down inserts.
So, I lean towards denormalizing the data, by maintaining the text in the record.
However, a key consideration is whether the text can be updated after the fact. That is, you will load the data when it is created. Can it then change? The problem of maintaining the data in light of changes (which could involve triggers and stored procedures) could introduce a lot of additional complexity.
If the data is static, this is not an issue. As for table size, I don't think you should worry about that too much. Databases are designed to manage memory. It is maintaining the table in a page cache, which should contain pages for currently active members. You can always increase memory size, especially for 100,000 users which is well within the realm of today's servers.
I'd be wary of this approach - as you add kinds of actions that you want to monitor the join is going to keep growing (and the sparse extra columns in the select statement as well).
I don't think it would be that scary to have a couple of extra columns in this table - and this query sounds like it would be running fairly frequently, so making it efficient seems like it would be a good idea.

How do I join on the correct one-to-one table (super-type, sub-type model)?

I've been looking into first, second, and third normal forms, and I want to do a better job normalizing my tables. Part of this, I realized, was that I've never understood the purpose of one-to-one tables. From what I understand, "optional" data should be grouped into another table, leaving distinct entities intact, while avoiding the nuances of maintaining several NULL fields in one monolithic table.
So, a real-world scenario. In a CMS, I want to maintain several different types of "pages," making it extendable by additional plugins without affecting the original schema. I have these as sample tables so far:
Pages (title, path, type, etc.)
ContentPages (same as base page, but with keyword/description/content fields)
LinkedPages (same as base page, but contains a reference to another page)
ProductPages (same as base page, but with SKU and other ecomm-related info)
So far, so good. No NULLS. Self-documented design. Super-typing / Sub-typing is consistent between my PHP models and database. Everything's DANDY.
EXCEPT, given any page ID, I don't want to do a first query to get the base page info, figure out what type of page it is, and then get the corresponding sub-type information with another query. Do I have to keep track of this with application state (or URL), or is there a way to know which table to join on, while only knowing the page ID and nothing else?
This is really easy with only one table (obviously), as the NULL fields imply the type, or an ENUM can tell me what it is. Switching back to 1NF isn't an acceptable answer, as I already know how to do it. I want to learn this way ;)
UPDATE: Also wanted to mention that each of the sub-type properties is unique to that type. So, any common property shared by all types will, of course, go into the base page table. Sub-types won't share any other properties. This seemed like a logical way to group the sub-tables, but maybe I'm defeating the purpose of one-to-one tables with this arrangement...
It depends on who's asking the question. If your plugin is driving the query then it can start at its specific subtype and join in the supertype, which it knows must exist.
I don't know what your business requirements are, but it seems to me that if you are trying to keep things modular then you want to drive as many joins from the child side (i.e. the plugin side) as possible.
If you are going to have a query driven from the supertype to the subtype then you can use an outer join and just be ready in your code to handle null columns if the subtype in question isn't present. Obviously that approach is less modular, but I suppose there could be times when that is what you need or want to do.
you could create a view by left outer joining all the subtypes on the main Page table. The view could be queried by a single page_id and would return one row with many null values, the same as you'd get with one big 1st normal form page table.
is there a way to know which table to join on, while only knowing the
page ID and nothing else?
Well, in a supertype/subtype structure, you should know more than the page ID. You should also know the subtype.
Usually, a supertype/subtype structure for 'n' subtypes maps to
n + 1 tables, one for each subtype, plus one for the supertype, and
n updatable views, each of which joins the supertype with the appropriate subtype
So your application should usually be working with the views, not with the base tables. (Usually, but not always.)
If you're not using the views, then when you retrieve the page id numbers from the supertype, you should also retrieve the column that identifies the subtype. Don't have such a column? Fix that. And see this other relevant SO database design problem for a supertype/subtype with code, a description of the structure, and the logic behind it.

MySQL, per found record join a different parent table

I have the following parent <-> child datamodel:
(almost every line is a table, indented means child-of)
consumerGoods
food
meat
item
fruit
item
vegetable
item
The child-items of meat, fruit and vegetables are in the same table (named items) because they have identical attributes. In the items table I have fields that describes the parent and the parentId.
So an item record could be:
id:1
parentType:meat
parentId:4
price:3.25
expDate:2009-12-31
description:bacon
I'm now building a full text MySQL search for the contents of the description field in "items", but I also want each result to have the information of its parent table, so a "bacon-item" has the data that's in its parent record. I also want each returned result to have data that is in the parent food record and the parent consumerGoods record.
I've got the following query now, but I don't know how to join based on the value of a field in a record, or if that's even possible.
SELECT
*
FROM
item
WHERE MATCH
(description
AGAINST
('searchKey')
One way to do this is is to do multiple queries for each matching "item" record, but if I had a lot of results that would be a lot of queries and would also slow down any filtering I'd want to do for facet-based searching. Another option is to make a new table that contains all the parent item info for each item record and search through that, but then I'd have to constantly update that table if I add item records, which is redundant and quite some work.
I'd like to hear it if I'm thinking in the right direction, or if I'm totally misguided. Any suggestions welcome.
As a general rule of thumb your database structure should contain data, but should not itself be data. A sign that you're breaking this is when you feel that you have to join to a different table based on the data you're reading from some other table. At that point you need to back up and consider your overall data model because odds are very good that you're doing something not quite right.
You could join against a subquery containing the union of all parent types:
select *
from item
left join (
select 'meat' as type, Redness, '' as Ripeness from meat
union all
select 'fruit' as type, -1 as Redness, Ripeness from fruit
union all
select 'vegetable' as type, -1 as Redness, Ripeness from vegetable
) parent on parent.type = item.parentType
But if you can, redesign the database. Instead of the complex model, change it to one table of Items and one table of Categories. The categories should contain one row for meat, one for fruit, and one for vegetables.
Since your example is contrived, it's difficult to know what the actual information requirements are in your case. Damir's diagram shows you the correct way to model PKs and FKs when you have a super-type sub-type relationships.
This situation is one case of a pattern called "generalization-specialization". Almost any treatment of object modeling will deal with generalization-specialization, although it may use different terminology. However, if you want to find articles that help you build a relational database that uses specialization-generalization, search for "generalization specialization relational modeling".
The best of the articles will start by teaching you the same concept that Damir's response illustrated for you. From there, you will learn how to create queries and views that can search for either all kinds of items, or for particular kinds of items, if you know what you are searching for.
A sample view follows:
create view FruitItems as
select
c.ConsumerGoodsID,
Price,
Description,
ConsumerGoodType,
ExpiryDate,
FoodType,
IsTropic
from
ConsumerGoods c
INNER JOIN Food f on f.ConsumerGoodsID = c.ConsumerGoodsID
INNER JOIN Fruit fr on fr.ConsumerGoodsID = c.ConsumerGoodsID
Similarly, you could create views for VegetableItems, MeatItems, and HouseSupplyItems, and even one large view, namely Items, that's the union of each of the specialized views.
In the Items view IsTropic would be true for all tropical fruits, false for all non tropical fruits, and null for Meats, Vegetables, and HouseSupplies. I'm not going to show you the entire Item view for a contrived case, but you get the idea. Especially if you read the best of the articles on relational modeling of this pattern.
The Items view might be a little slow, but it could come in handy when you really don't know any better way to search. And if you search for Istropic = True, you'll automatically exclude all the Meats, Vegetables, and HouseSupplies.
As #Andomar suggested, the design is a bit off; having "multiple parent tables" does not map to DB foreign keys concept. Here is one possible suggestion. This one uses two levels of super-type/subtype relationships. Super-type table contains columns specific to all subtypes (categories), while subtype tables contain columns specific only to the category.