Efficient MySQL structure for linking features to accommodation listing - mysql

I'm building an accommodation rental site for a specific town.
It will include, Houses, Resorts, Hotels etc.
I'm looking for advice on how best to link Property Features (Air-Con, Swimming Pool etc.) to individual properties.
I have a table of around 50 Property Features set up as feature_id, feature_category, feature_name.
What would be the best way to store which features relate to which property?
Would a column in the property table (prop_features) containing an array of feature_id be the best way?
The only example I've managed to find and be able to dissect the DB showed the features added as feature_1, feature_2 etc. which seemed really inefficient as some properties may only have feature_1 and feature_49 for example.
Each one was added as a column to the property_table.
I'm new to creating databases from scratch, so I'd be very grateful for any advice on how best to start with this section of my project.
(It's also why I'm not having much luck Googling it, as I'm not sure how to put it in more general terms that might yield me a solution).

One solution would be to have an intermediate table that joins properties to features like so:
CREATE TABLE propertyfeatures (property_id INT, feature_id INT);
If we have a property called Acme Hotel (property id 1) that has air conditioning (feature id 2) and swimming pool (feature id 4), the data would look something like:
property_id | feature_id
1 2
1 4
To retrieve features per property (excluding properties without features) a simple query would be:
SELECT
p.property_name,
f.feature_name,
f.feature_category
FROM property AS p
INNER JOIN propertyfeatures AS pf
ON p.property_id = pf.property_id
INNER JOIN features AS f
ON pf.feature_id = f.feature_id
GROUP BY p.property_id
Note: I have made assumptions about table and column names in your existing database. You'd have to adjust the above accordingly.
The only example I've managed to find and be able to dissect the DB showed the features added as feature_1, feature_2 etc. which seemed really inefficient as some properties may only have feature_1 and feature_49 for example. Each one was added as a column to the property_table.
Although this can be done, you're correct in that it's inefficient, or rather, it's awkward to maintain. It's referred to as pivoting because you're changing unique row values into multiple columns. For example, what if a new feature (e.g. Free Wifi) was added? It's not a case of simply inserting a new row of data as it would be with the intermediate table, you'd have to create a new column to support that.
Not only that, but you would still have to define the feature columns manually or dynamically. For reference, take a look at MySQL Pivot Table which demonstrates both manual and dynamic methods.

One simple way would be to add another table to your database having the columns. The keyword to this approach is "junction table", it is pretty basic in database design.
property_identifier | feature_identifier (feature_id in your case)
In this table you can display the connection between the properties and specific features.
So you could say property with property_id 1 has a pool (feature_id: 2) and a nice kitchen (feature_id: 23)
So the table would look like this:
propery_id | feature_id
1 | 2
1 | 23

Related

MySQL Schema Advice: Unpredictable Field Additions

A little overview of the problem.
Let's say I have a table named TableA with fixed properties, PropertyA, PropertyB, PropertyC. This has been enough for your own website needs but then you suddenly have clients that want custom fields on your site.
ClientA wants to add PropertyD and PropertyE.
ClientB wants to add PropertyF and PropertyG.
The catch is these clients don't want each others fields. Now imagine if you get more clients, the solution of just adding nullable fields in TableA will be cumbersome and you will end up with a mess of a table. Or at least I assume that's the case feel free to correct me. Is it better if I just do that?
Now I thought of two solutions. I'm asking if there's a better way to do it since I'm not that confident with the trade offs and their future performance.
Proposed Solution #1
data_id is a not exactly a foreign key but it stores whatever corresponding client property is attached to a table A row. Using client_id as the only foreign key present on both the property table and table A.
It feels like it's an anti pattern of some sorts but I could imagine queries will be easy this way but it requires that the developer knows what property table it should pick from. I'm not sure if many tables is a bad thing.
Proposed Solution #2
I believe it's a bit more elegant and can easily add more fields as necessary. Not to mention these are the only tables I would need for everything else. Just to visualize. I will add the request properties in the properties table like so:
Properties
-------------
1 | PropertyD
2 | PropertyE
3 | PropertyF
4 | PropertyG
And whenever I save any data I would tag all properties whenever they are available like so. For this example I want to save a ClientA stored in the Clients table on id 1.
Property_Mapping
--------------------------------------------------------
property_id | table_a_id | property_value | client_id
--------------------------------------------------------
1 | 1 | PROPERTY_D_VALUE | 1
2 | 1 | PROPERTY_E_VALUE | 1
There are obvious possible complexity of query on this one, I'd imagine but it's more a tradeoff. I intended client_id to be placed on property_mapping just in case clients want the same fields. Any advice?
You've discovered the Entity-Attribute-Value antipattern. It's a terrible idea for a relational database. It makes your queries far more complex, and it takes 4-10x the storage space.
I covered some pros and cons of several alternatives in an old answer on Stack Overflow:
How to design a product table for many kinds of product where each product has many parameters
And in a presentation:
Extensible Data Modeling with MySQL
As an example of the trouble EAV causes, consider how you would respond if one of your clients says that PropertyD must be mandatory (i.e. the equivalent of NOT NULL) and PropertyE must be UNIQUE. Meanwhile, the other client says that PropertyG should be restricted to a finite set of values, so you should either use an ENUM data type, or use a foreign key to a table of allowed values.
But you can't implement any of these constraints using your Properties table, because all the values of all the properties are stored in the same column.
You lose features of relational databases when you use this antipattern, such as data types and constraints.

Bill of Materials: One table for everything, or a table for each sub-level?

I am working with a client in manufacturing whose products are configurations of the same bunch of parts. I am creating a database that holds all valid products and their Bill of Materials. I need help on deciding a Bill Of Material schedule to implement.
The obvious solution is a many-to-many relationship with a junction table:
Table 1: Products
Table 2: Parts
Junction Table: products, parts, part quantities
However, there are multiple levels in my client's product;
-Assembly
-Sub-Assembly
-Component
-Part
and items from lower levels are allowed to be associated with any upper level item;
Assembly |Sub-assembly
Assembly |Component
Assembly |Part
Sub-Assembly |Component
Sub-Assembly |Part
Component |Part
and I suspect the client will want to add more levels in the future when new product lines are added.
Correct me if I am wrong, but I believe the above relation schedule would demand a growing integer sequence of junction tables and queries (0+1+1+2+3...) to display and export the full Bill of Materials which may eventually affect performance.
Someone suggested to put everything in one table:
Table 1: Assemblies, sub-assemblies, components, parts, etc...
Junction table: Children and Parents
This only requires one junction table to create infinite levels of many-to-many relationships. I don't know if I trust this solution, but I can't think of any issues other than accidentally making an item its own parent and creating an infinite loop and that it sounds disorganized.
I lack the experience to determine whether either or neither of these models will work for my client. I am sketching these models in MS Access, but I am open to moving this project to a more powerful platform if necessary. Any input is appreciated. Thank you.
-M
What you are describing is a hierarchy. As such it should take the form:
part_hierarchy:
part_id | parent_part_id | other | attributes | of | this | relationship
So part_id 1 may have a parent part_id 10 "component" which may have a parent_part_id (when looked up itself in this table) of 12 "Assembly. It would look like:
part_id | parent_part_id
1 | 10
10 | 12
and parts table:
part_id | description
1 | widget
10 | widget component
12 | aircraft carrier
That's a little simplified since it doesn't take into account your product/part relationship, but it will all fit together using this methodology.
Nice and simple. Now it doesn't matter how deep the hierarchy goes. It's still just two columns (And any extra columns needed for attributes of this relationship like... create_date, last_changed_by_user, etc.
I would suggest something more powerful than access though since it lacks the ability to pick a part a hierarchy using a Recursive CTE, something that comes with SQL Server, Postgres, Oracle, and the like.
I would 100% avoid any schema that requires you to add more fields or tables as the hierarchy becomes deeper and more complex. That is a path that leads towards pain and regret.
Since the level of nesting is arbitrary, use one table with a self-referencing parent_id foreign key to itself.
While this is technically correct, navigating it requires recursive query that most DB's don't support. However, a simple and effective way of making accessing nested parts simple is to store a "path" to each component, which looks like a path in a file system.
For example, say part id 1 is a top level part that has a child whose id is 2, and part id 2 has a child part with id 3, the paths would be:
id parent_id path
1 null /1
2 1 /1/2
3 2 /1/2/3
Doing this means finding the tree of subparts for any part is simply:
select b.part
from parts a
join parts b on b.path like concat(a.path, '%')
where a.id = ?

Classpass.com like database design

I am trying to get my head around creating classpass like database design. I'm new to database design and there are a few things that are not quite for me how to implement them and I can't quite get my head around.
You can check the classpass example:
https://classpass.com/classes
https://classpass.com/studios
EDIT 1: So here is the idea: Each city have multiple neighbourhoods having multiple studios/venues.
After reading spencer7593's comment, here is what I came with and the things that are still not quite clear:
So what I am not quite sure about is:
I am not sure how to store the venue/studio address and geolocation. Is it better to have table Region which defines id | name | parent_id and stores the cities and the neighborhoods recursively? Or add a foreign key constraint to city and neighborhoods? Should I store the lan/lon into the venue table, into the address or even separate locations table? I would like to be able to perform searches like:
show me venues in that neighborhood or city
show me venues which are in radius XX from position
Each class should have a schedule and currently I am not sure how to design it. For example: Spinning class, Mo, We, Fr from 9 AM till 10 AM. I would like
to be able to do queries like:
show me venues, which have spinning classes on Mo
or show me all classes in category Spinning, Boxing for example
or even show me venues offering spinning classes
Should I create an extra table schedules here? Or just create some kind of view which creates the schedule? If it's an extra table, how should I describe start, end of each day of the week?
#Dimitar,
Even though #rhavendc is correct, this question should be placed in Database Adminstrator, I will answer your question in respective order to the best of my knowledge.
I am not sure how to store the venue/studio address and geolocation. [...]
You can easily find Geo-Locations by searching on the web. take MyGeoPosition for example.
I would like to be able to perform searches like
show me venues in that neighborhood or city.
You can do this easily. There are a few ways to do it, and each way will require a bit of tweaking with your ERD design. With the example I attached below, you can run a query to list all the venues with the address_id followed by the city id. The yellow entities are the one I added to ensure integrity.
For example:
-- venue.name is using the "[table].[field]" format to help
-- the engine recognize where the field is coming from.
-- This is useful if you are pulling the fields of the
-- same name from different tables.
select venue.name, city.name
from venue join
address using (address_id) join
city using (city_id);
NOTE: You don't have to include the city_name. I just threw it in there so you can try it out to see all the venues matching it.
If you would like to do it by the neighborhood, you would have to tweak the ERD I gave you by adding neighbor_id in the ADDRESS table. I have attached the example below, You would also have to add neighborhood_id From there, you can run a query like this:
Using this ERD:
-- Remember the format from the previously mentioned code.
select venue.name, neighborhood.name
from venue join
address using (address_id) join
neighborhood using (neighbor_id);
show me venues which are in radius XX from position
You can calculate the amount of miles, kilometers, etc. from longitude and latitude using Haversine's Formula.
Each class should have a schedule and currently I am not sure how to design it. For example: Spinning class, Mo, We, Fr from 9 AM till 10 AM. I would like to be able to do queries like:
show me venues, which have spinning classes on Mo
or show me all classes in category Spinning, Boxing for example
or even show me venues offering spinning classes
This can be easily derived from either of the ERDs I attached here. In the CLASS table, I added a field called parent_class_id which gets the class_id from the same table. This uses recursion, and I know this is a bit of a headache to understand. This recursion will allow the classes with assigned parent class to show that the classes are also offered at different times.
You can get this result by doing so:
-- Remember the format from the previously mentioned code.
select class1.name, class1.class_id, class2.class_id
from class as class1,
class as class2
where class1.parent_class_id = class2.class_id;
or even show me venues offering spinning classes
This may be a tricky one... If you are wondering which venues are offering spinning classes, where spinning is either part of or the name of the class, not a category, it's simple.
Try this...
-- Remember the format from the previously mentioned code.
select venue_id
from venue join
class using (venue_id)
where class_name = 'spinning';
NOTE: Keep in mind that most SQL languages are case-sensitive when it comes to searching for literals. You could try using where UPPER(class_name) = 'SPINNING'.
If the class name may include words other than "spinning" in its name, use this instead: where UPPER(class_name) like '%SPINNING%'.
If you are wondering which classes are offering spinning classes where spinning is a category, that's where the tricky bit comes in. I believe you would have to use a subquery for this.
Try this:
-- Remember the format from the previously mentioned code.
select class_id
from class join
class_category using (class_id)
where cat_id = (select cat_id
from category
where name = 'spinning');
Again, SQL engines are usually sensitive when it comes to literal searches. Make sure your cases are in its correct upper or lower cases.
Should I create an extra table schedules here? Or just create some kind of view which creates the schedule? If it's an extra table, how should I describe start, end of each day of the week?
Yes and no. You could, but if you can understand recursion in database systems, you don't have to.
Hope this helps. :)
Entity Relationship Modeling.
An entity is a person, place, thing, concept or event that can be uniquely identified, is important to the business, and we can store information about.
Based on information in the question, some candidates to consider as entities might be:
studio
class
rating
neighborhood
city
For each entity, what uniquely identifies it? Figure out the candidate keys.
And figure out the relationships between the entities, and the cardinalities. (What is related to what, and how many, required or optional?)
Is a studio related to a class?
Can a studio have more than one class?
Can a studio have zero classes?
Can a class be related to more than one studio?
Is a neighborhood related to zero, one or more city?
Can a studio be related to more than one neighborhood?
Once you've got the entities and relationships, getting the attributes assigned to each entity is pretty straightforward. Just make sure every attribute is dependent on the key, the whole key, and nothing but the key.
FIRST
Your question is not suited to be posted here in Stack Overflow for I guess it's best to be posted in Database Administrators.
SECOND
Here are some info for reading, just to give you a good start for building your database:
Data Modeling (It's kinda broad but it's for the better)
Logical Data Model (Short but comprehensive one)
THIRD
Basically, when designing your database you should first know all the data that would be needed in your system and group them (if needed) to make it small. Normalize it to reduce data redundancy.
EXAMPLE
Let's assume that table venue would be your main table or the center of all the transaction in your system. By that, venue may have subdata for example branch that may hold different branch location... and that branch may have subdata too for example schedule, teacher and/or class which may also related to each other (subdata gets data from another subdata)... so forth and so on with dependent tables.
Then you can also create independent tables but still have connections with others. For example the neighborhood table, it may contain the neighbor location and main venue location (so it should get the id of selected venue from the venuetable)... so forth and so on with related and independent tables.
NOTE
Just remember the "one-to-one, one-to-many" relationship. If a data will be going to hold many kinds of subdata, just split them in different table. If a data will be going to hold only (1) kind of subdata, then put it all in one table.

How to build database for variant management in a webshop

I am searching for a guideline on how to set up my database for a auction side.
My problem is, that there is a lot of different product types - let's say paintings, clothes, computers etc. They have different specifications, and it should be possible to set just Product A in size L on auction - or the whole stock of Product B e.g.
How should I build my database for optimal performance - and coding - in this case?
I would suggest the following database/object structure:
[Auction] n..1 [Category] 1..n [Variation Attribute] 1..n [Attribute Value]
An auction then has a category and several attribute values referring the variation attribute as well:
[Auction] = [Category], [Name], [Description]
[Auction_AttrVal] = [AuctionID], [VarAttrID], [AttrValID]
First of all you can have some kind of category table, which holds items like "Paintings", "Clothes", "Computers". An auction / product is assigned to one category.
Each category then defines variation attributes for this specific category. An example would be "Size" for the category "Clothes" or "CPU" for the category "Computers". You can also add predefined values for the variation attributes to limit the number of variations and avoid differentiations like "3GhZ" vs "3 GhZ".
This mechanism also allows for easy filtering of search results. You select a category and simply load all variation attributes as filters (or add a flag to an attribute to declare it as such) and offer the values for filtering to the end-user.
Furthermore you can make variation attributes for a category mandatory to force users who create the auctions (I'm assuming it's Consumer-to-Consumer) to provide sufficient information for their auction.
The code will probably be quite generic and simple. The database structure is highly flexible and extensible. Performance is much better than having all in one table. You probably should create an index (for the field AuctionID) for the Auction_AttrVal table. Please let me know if the database structure is not explained properly.

Which way is better to implement custom fields in a web application

I have a self made web application in PHP and MySQL. The many different clients using my system would like to augment entities with custom fields. Each client would like to store their own additional data, so this should be done in a dynamic way.
For example client1 would like to add the "color" property to their products, client2 want a field called "safety_level" for their products.
I want a methodology that can be applied not only for products but for users and for any other entities as well.
Here are 2 options I found the optimal, but can't decide which one is the most effective:
OPTION 1:
For every entity I make a [entityname]_customfields table in which I store the additional field values in 1:1.
e.g.:
+---------------------------------------------+
|products_custom_fields |
+---------------------------------------------+
|product_id (PK and FK related to products.id)|
|safety_level |
|some_other_fields |
+---------------------------------------------+
pro: this table can has no more records than the entity table (in this case the products) which means fewer records and it is quite easy to overview.
con: adding new fields or deleting old ones require DDL queries. I don't want to confide DDL to users...not even operators with admin permissions.
OPTION 2:
[entity]_custom_field_values will have N:1 relations to [entity] table. Each row contains the the type of the custom field and the value itself. In this case we need another table which contains the custom field types. e.g.:
custom field values:
+----------------------------------------------------------------------+
|products_custom_field_values |
+----------------------------------------------------------------------+
|custom_field_id |
|custom_field_type (FK product_custom_field_types.custom_field_type_id)|
|value |
+----------------------------------------------------------------------+
custom field types:
+---------------------------------------------------------+
|products_custom_field_types |
+---------------------------------------------------------+
|custom_field_type_id (PK AUTO_INCREMENT) |
|product_id (FK related to products.id) |
+---------------------------------------------------------+
pro: managing fields is easy, does not require to alter table structures
con: more records, all kind of custom field values in a big mess...which is not necessary wrong, because that's the point of MySQL, to extract useful data from a big mess. The question is what about efficiency and performance?
Note: this topic is actually covered in the "SQL Antipatterns", which I strongly recommend you read
I am a lazy person, which means that I tend to apply YANGI to my code. So this is my approach.
So. Let's assume that there are two groups of products:
ProductFoo ProductBar
- productID - productID
- name - name
- price - price
- color - supply
- weight - manufacturerID
- safety
In this case there are three common elements, that go in the main Products table. And the custom parameters would be stored using table inheritance (it's a thing, google it). So, basically you would end up with three tables: Products, ProductsFoo and ProductsBar, where Products table has a "type" field and both of the "child tables" would have a productID foreign key, that's pointing to its parent table.
That's if you know at the development time, what "custom fields" each client will want.
Now, lets assume clients are being difficult and want make up custom fields whenever they feel like it.
In this case I would simply create a Products.data fields, which contains a JSON with all the custom attributes for each product. And only "extract" special attributes to an inheriting table, when client wants to search by that custom attribute (there is not sane way to index JSON, if clients want to search by their new "wallywanker" attribute).
You end up with same basic structure, but the "sub-tables" only contain the attributes, that are expected to be searchable
I hope this made sense.
If its is a company project, follow the standards followed on previous projects.
Have a look at conventions such as Hungarian notation, that would make more sense than repeating a prefix. Also it is more likely your model name is your table name.
Also if you are planning to use ORM they might have some best practices as well.
https://en.wikipedia.org/wiki/Hungarian_notation