Implications of Supertype and Subtype - mysql

Is it bad to implement supertype and subtype to the entire data in a database? I need some advice on this before moving to this direction...
For instance,
I have these tables as objects and they are related,
users
pages
images
entities table as the supertype
entity_id entity_type
1 page
2 page
3 user
4 user
5 image
6 image
users table
user_id entity_id
1 3
2 4
pages table
page_id entity_id
1 1
2 2
images table
image_id entity_id
1 5
2 6
here is the table to map images table with entities table because some images belong to certain page (maybe to blog posts, etc in the future),
map_entity_image table
entity_id image_id
1 1
1 2
so, I will insert a row into the entities table when I have a page, an image, an user, etc to be created.
in the end of the day the rows in this tables will increase in a great numbers. so my worry is that can it cop with large numbers of rows? will this database gets slow and slower in time?
after all, are these a bad structure?
or maybe I am doing supertype/ subtype incorrectly?
edit:
I think the entity should have these data only,
entity_id entity_type
1 page
2 page
unless I want to attach images to users, etc. then it should be like this,
entity_id entity_type
1 page
2 page
3 user
4 user
maybe I am wrong...
EDIT:
so this is the query how I find out how many images attached to the page id 1,
SELECT E.*, P.*, X.*,C.*
FROM entities E
LEFT JOIN pages P ON (P.entity_id = E.entity_id)
LEFT JOIN map_entities_images X ON (X.entity_id = E.entity_id)
LEFT JOIN images C ON (C.image_id = X.image_id)
WHERE P.page_id = 1
returns 2 images.

If all you need is to attach images to users and pages, I'm not sure a full-blown category (aka. "subclass", "subtype", "inheritance") hierarchy would be optimal.
Assuming pages/users can have multiple images, and any given image can be attached to multiple pages/users, and assuming you don't want to attach images to images, your model should probably look like this:
You could use category hierarchy to achieve similar result...
...but with so few subclasses I'd recommend against it (due potential maintainability and performance issues). On the other hand, if there is a potential for adding new subclasses in the future, this might actually be the right solution (ENTITY_IMAGE will automatically "cover" all these new subclasses, so you don't need to introduce a new "link" table for each and every one of them).
BTW, there are 3 major ways to implement the category hierarchy, each with its own set of tradeoffs.

Not exactly an answer to your question, but, what you are describing is not what most modelers would refer to as a "supertype".
This is analogous to super/sub classes in OOP. The supertype is a genric entity, and, the subtype is a more specialized version of the generic entity
The classic example is vehicles. A "vehicle" has a common set of attributes like "owner" , "price", "make", "model". It doesn't matter whether its a car, a bicycle or a boat. However cars have "wheels", "doors" "engine-size" and "engine-type", bicycles have "number-of-gears" and "terrain-type" (BMX, road etc.) and boats have "propellers", "sails" and "cabins".
There are two ways of implementing this.
Firstly there is a "rollup", you have one table which holds all the common attributes for a "vehicle" plus optional attibutes for each type of vehicle.
Secondly there is a "rolldown", you have one table which holds only the common attributes for every vehicle. And one table for each vehicle type to hold the attibutes specific to "cars", "bicycles" and "boats".

Related

Split similar data into two tables?

I have two sets of data that are near identical, one set for books, the other for movies.
So we have things such as:
Title
Price
Image
Release Date
Published
etc.
The only difference between the two sets of data is that Books have an ISBN field and Movies has a Budget field.
My question is, even though the data is similar should both be combined into one table or should they be two separate tables?
I've looked on SO at similar questions but am asking because most of the time my application will need to get a single list of both books and movies. It would be rare to get either books or movies. So I would need to lookup two tables for most queries if the data is split into two tables.
Doing this -- cataloging books and movies -- perfectly is the work of several lifetimes. Don't strive for perfection, because you'll likely never get there. Take a look at Worldcat.org for excellent cataloging examples. Just two:
https://www.worldcat.org/title/coco/oclc/1149151811
https://www.worldcat.org/title/designing-data-intensive-applications-the-big-ideas-behind-reliable-scalable-and-maintainable-systems/oclc/1042165662
My suggestion: Add a table called metadata. your titles table should have a one-to-many relationship with your metadata table.
Then, for example, titles might contain
title_id title price release
103 Designing Data-Intensive Applications 34.96 2017
104 Coco 34.12 2107
Then metadata might contain
metadata_id title_id key value
1 103 ISBN-13 978-1449373320
2 103 ISBN-10 1449373320
3 104 budget USD175000000
4 104 EIDR 10.5240/EB14-C407-C74B-C870-B5B6-C
5 104 Sound Designer Barney Jones
Then, if you want to get items with their ISBN-13 values (I'm not familiar with IBAN, but I guess that's the same sort of thing) you do this
SELECT titles.*, isbn13.value isbn13
FROM titles
LEFT JOIN metadata isbn13 ON titles.title_id = metadata.title_id
AND metadata.key='ISBN-13'
This is a good way to go because it's future-proof. If somebody turns up tomorrow and wants, let's say, the name of the most important character in the book or movie, you can add it easily.
The only difference between the two sets of data is that Books have an
IBAN field and Movies has a Budget field.
Are you sure that this difference that you have now will not be
extended to other differences that you may have to take into account
in the future?
Are you sure that you will not have to deal with any other type of
entities (other than books and movies) in the future which will
complicate things?
If the answer in both questions is "Yes" then you could use 1 table.
But if I had to design this, I would keep a separate table for each entity.
If needed, it's easy to combine their data in a View.
What is not easy, is to add or modify columns in a table, even naming them, just to match requirements of 2 or more entities.
You must be very sure about future requests/features for your application.
I can't image what type of books linked with movies you store thus a lot of movies have different titles than books which are based on. Example: 25 films that changed the name.
If you are sure that your data will be persistent and always the same for books and movies then you can create new table for example Productions and there store attributes Title, Price, Image, Release Date, Published. Then you can store foreign keys of Production entity in your tables Books and Movies.
But if any accident happen in the future you will need to rebuild structure or change your assumptions. But anyway it will be easier with entity Production. Then you just create new row with modified values and assign to selected Book or Movie.
Solution with one table for both books and movies is the worst, because if one of the parameters drive away you will add new row and you will have data for first set (real book and non-existing movie) and second set (non-existing book and real movie).
Of course everything is under condition they may be changes in the future. If you are 100% sure, then 1 table is enough solution, but not correct from the database normalization perspective.
I would personally create separate tables for books and movies.

Bill of Materials: One table for everything, or a table for each sub-level?

I am working with a client in manufacturing whose products are configurations of the same bunch of parts. I am creating a database that holds all valid products and their Bill of Materials. I need help on deciding a Bill Of Material schedule to implement.
The obvious solution is a many-to-many relationship with a junction table:
Table 1: Products
Table 2: Parts
Junction Table: products, parts, part quantities
However, there are multiple levels in my client's product;
-Assembly
-Sub-Assembly
-Component
-Part
and items from lower levels are allowed to be associated with any upper level item;
Assembly |Sub-assembly
Assembly |Component
Assembly |Part
Sub-Assembly |Component
Sub-Assembly |Part
Component |Part
and I suspect the client will want to add more levels in the future when new product lines are added.
Correct me if I am wrong, but I believe the above relation schedule would demand a growing integer sequence of junction tables and queries (0+1+1+2+3...) to display and export the full Bill of Materials which may eventually affect performance.
Someone suggested to put everything in one table:
Table 1: Assemblies, sub-assemblies, components, parts, etc...
Junction table: Children and Parents
This only requires one junction table to create infinite levels of many-to-many relationships. I don't know if I trust this solution, but I can't think of any issues other than accidentally making an item its own parent and creating an infinite loop and that it sounds disorganized.
I lack the experience to determine whether either or neither of these models will work for my client. I am sketching these models in MS Access, but I am open to moving this project to a more powerful platform if necessary. Any input is appreciated. Thank you.
-M
What you are describing is a hierarchy. As such it should take the form:
part_hierarchy:
part_id | parent_part_id | other | attributes | of | this | relationship
So part_id 1 may have a parent part_id 10 "component" which may have a parent_part_id (when looked up itself in this table) of 12 "Assembly. It would look like:
part_id | parent_part_id
1 | 10
10 | 12
and parts table:
part_id | description
1 | widget
10 | widget component
12 | aircraft carrier
That's a little simplified since it doesn't take into account your product/part relationship, but it will all fit together using this methodology.
Nice and simple. Now it doesn't matter how deep the hierarchy goes. It's still just two columns (And any extra columns needed for attributes of this relationship like... create_date, last_changed_by_user, etc.
I would suggest something more powerful than access though since it lacks the ability to pick a part a hierarchy using a Recursive CTE, something that comes with SQL Server, Postgres, Oracle, and the like.
I would 100% avoid any schema that requires you to add more fields or tables as the hierarchy becomes deeper and more complex. That is a path that leads towards pain and regret.
Since the level of nesting is arbitrary, use one table with a self-referencing parent_id foreign key to itself.
While this is technically correct, navigating it requires recursive query that most DB's don't support. However, a simple and effective way of making accessing nested parts simple is to store a "path" to each component, which looks like a path in a file system.
For example, say part id 1 is a top level part that has a child whose id is 2, and part id 2 has a child part with id 3, the paths would be:
id parent_id path
1 null /1
2 1 /1/2
3 2 /1/2/3
Doing this means finding the tree of subparts for any part is simply:
select b.part
from parts a
join parts b on b.path like concat(a.path, '%')
where a.id = ?

Efficient MySQL structure for linking features to accommodation listing

I'm building an accommodation rental site for a specific town.
It will include, Houses, Resorts, Hotels etc.
I'm looking for advice on how best to link Property Features (Air-Con, Swimming Pool etc.) to individual properties.
I have a table of around 50 Property Features set up as feature_id, feature_category, feature_name.
What would be the best way to store which features relate to which property?
Would a column in the property table (prop_features) containing an array of feature_id be the best way?
The only example I've managed to find and be able to dissect the DB showed the features added as feature_1, feature_2 etc. which seemed really inefficient as some properties may only have feature_1 and feature_49 for example.
Each one was added as a column to the property_table.
I'm new to creating databases from scratch, so I'd be very grateful for any advice on how best to start with this section of my project.
(It's also why I'm not having much luck Googling it, as I'm not sure how to put it in more general terms that might yield me a solution).
One solution would be to have an intermediate table that joins properties to features like so:
CREATE TABLE propertyfeatures (property_id INT, feature_id INT);
If we have a property called Acme Hotel (property id 1) that has air conditioning (feature id 2) and swimming pool (feature id 4), the data would look something like:
property_id | feature_id
1 2
1 4
To retrieve features per property (excluding properties without features) a simple query would be:
SELECT
p.property_name,
f.feature_name,
f.feature_category
FROM property AS p
INNER JOIN propertyfeatures AS pf
ON p.property_id = pf.property_id
INNER JOIN features AS f
ON pf.feature_id = f.feature_id
GROUP BY p.property_id
Note: I have made assumptions about table and column names in your existing database. You'd have to adjust the above accordingly.
The only example I've managed to find and be able to dissect the DB showed the features added as feature_1, feature_2 etc. which seemed really inefficient as some properties may only have feature_1 and feature_49 for example. Each one was added as a column to the property_table.
Although this can be done, you're correct in that it's inefficient, or rather, it's awkward to maintain. It's referred to as pivoting because you're changing unique row values into multiple columns. For example, what if a new feature (e.g. Free Wifi) was added? It's not a case of simply inserting a new row of data as it would be with the intermediate table, you'd have to create a new column to support that.
Not only that, but you would still have to define the feature columns manually or dynamically. For reference, take a look at MySQL Pivot Table which demonstrates both manual and dynamic methods.
One simple way would be to add another table to your database having the columns. The keyword to this approach is "junction table", it is pretty basic in database design.
property_identifier | feature_identifier (feature_id in your case)
In this table you can display the connection between the properties and specific features.
So you could say property with property_id 1 has a pool (feature_id: 2) and a nice kitchen (feature_id: 23)
So the table would look like this:
propery_id | feature_id
1 | 2
1 | 23

Database design to assign specific tags to an item

I am trying to build a little system that would assign specific tags to an item, or a person to be precise. So, I have a list of persons and a list of tags. I need to assign 3 specific tags to each person (that correspond to 3 different skills this person might have).
In a nutshell, the output would look like this :
Person 1 | webdesign, ux, jquery
Person 2 | blogging, photography, wordpress
Person 3 | graphic-design, 3d, inventor
...
For now, those lists are stored in two different tables :
persons
-------
person_id
person_name
tags
-------
tag_id
tag_name
My main goal is to avoid repetition and to simply assign 3 existing tags to an existing person.
Could you give me a few hints on how to implement this? I know that a three-table design is common for a tagging system, but is it relevant in my situation?
Thanks for your help.
If you want to ensure that you don't have any duplicates and to be able to add N tags to a person, then to properly implement a normalized design you would need a third table to link the tags to each person
persons_2_tags
--------------
person_id
tag_id
To guarantee uniqueness, you can either use a composite primary key, or add a unique index to the table including both columns.
See an example of the above in this SQL Fiddle.
If you need to enforce the 3 tag limit at the database level, you can add a third column to the persons_2_tags table (tag_number for example) that is an enum with values of 1, 2, 3 and add that to your unique index. Insert logic would need to be handled at the application level, but would be enforced by the index.
Do your requirements specify "exactly" 3 tags?
The third table is recommended to stay normalized. It's a typical Many-to-many relationship. This offeres the greatest amount of flexibility since you can have an unlimited, yet unique list of user/tag pairs.
You could have 3 columns in the user table for each tag. Performance would be improved at the expense of flexibility. Queries like, "List all the users with tag = 'X'" are a little harder. There may be several null values if you allow fewer than 3 tags. Of course in this setup, you'll have to create a new column and a lot of code to expand beyond three columns.
I think that I would probably do the three table design which Jeff O mentioned, however, just to present an alternative view...
If you're just talking about tags, that is, a short string with no other meta data, I don't know that you'd need a tags table. Essentially, the tag itself could be the its id.
persons (person_id, person_name);
tags (person_id, tag);
Yeah, you'd get a bit of repetition there, but they're short strings anyway and it should really make a difference.

Database Design for Rental Listings

I'm designing a simple database for a rental listings website,
sort of like classified ads but only for home/room rentals. This is what I've come up with thus far:
Question 1
For the "post" table, I actually wanted more information. For example, there would be a 'facilities' section where the users can select whether there's 'parking' available, do I need a separate table? Or just use 0 for no and 1 for yes?
Question 2
Here's what I did with the "category" table (sorry I don't know how to pretty print yet)
Category_ID 1 is Rent
Category_ID 2 is buildingType
For "categoryProperty" table
Category_ID 1 categoryPropertyID 1 House
Category_ID 1 categoryPropertyID 2 Room
Category_ID 2 categoryPropertyID 3 Apartment
Category_ID 2 categoryPropertyID 4 Condominium
Category_ID 2 categoryPropertyID 5 Detached
Does the above make sense?
Question 3
Users can post whether they are logged in or not. Just that logged in users/members have the advantage of tracking their ads/adjusting the availability.
How do I record the ads that a member has posted? Like their history.
Should I create a "postHistory" table and set the 'postHistory_ID' as FK to "member" table?
Thanks a lot in advance, I appreciate your help, especially just pointing me to the right direction.
Question 1:
make a separate table and make a One to One relation, that would be the simplest way:
POST -|-----|- EXTRAS
in EXTRAS you may have every extra field (parking=1/0, in_down_town=1/0,has_a_gost=1/0)
Question 2:
This does not make sense, you've two options:
in the Post table create a "type_of_operation", that can have two vales (building_type,rent). Or you can create different tables, but would make this more complicate (you should analyise if the same type can be in both states, etc).
Question 3:
I recommend you to make your users register. Even with a really simple form (email+password) .
Seems to be on the right track -- with respect to your specific questions:
Question #1: Assuming there's more than one type of facility (parking; swimming pool; gym) then you have a many-to-many relationship and you want 2 new tables: Facilities and PropertyFacilities. Each Property (or I guess "post") could have multiple rows in the PropertyFacilities table.
Question #2: Not really clear on what you're getting at -- is it that each property type can either be rented whole or rented per room?
Question #3: Good question, what you want to do is have an Active bit, or an ExpireDate, in your POST table -- then anything that becomes inactive or expired is automatically 'historical' data, no need to marshall it to a history table. Although you'll have to archive eventually of course.