Relational databases: Integrate one tree structure into another - mysql

I'm currently designing a relational database table in MySQL for handling multiple categories, representing them later in a tree structure on the client side and filtering on them. Here is a picture of how the structure looks like:
So we have a root element which is set by default. We can after that add children to it (Level one). So far a table structure in the simplest case could be defined so:
| id | name | parent_id |
--------------------------------
1 All Categories NULL
2 History 1
However, I have a requirement that I need to include another tree structure type (Products) in the table (a corresponding API is available). The records from the other table have their own id types (UUID). Basically I need to ingest them in my table. A possible structure will look like so:
| id | UUID | name | parent_id |
----------------------------------------------------------
1 NULL All Categories NULL
2 NULL History 1
3 NULL Products 1
4 CN1001231232 Catalog electricity 3
5 CN1001231242 Catalog basic components 4
6 NULL Shipping 1
I am new to relational databases, but all of these possible NULL values for the UUID indicate (at least for me) to be bad design of database table. Is there a way of avoiding this, or even better way for this "ingestion"?

If you had a table for users, with columns first_name, middle_name, last_name but then a user signed up and said they have no middle name, you could just store NULL for that user's middle_name column. What's bad design about that?
NULL is used when an attribute is unknown or inapplicable on a given row. It seems appropriate for the case you describe, i.e. when records that did not come from the external source have no UUID, and need no UUID.
That said, some computer science theorists insist that NULL is never appropriate. There's a decades-old controversy about whether SQL should even have a NULL.
The alternative would be to create a second table, in which you store only the UUID and the reference to the entity in your first table. Then just don't store rows for the other id's.
| id | UUID |
-------------------
4 CN1001231232
5 CN1001231242
And don't store the UUID column in your first table. This eliminates the NULLs, but it means you need to do a JOIN of the two tables whenever you want to query the entities with their UUID's.

First make sure you actually have to combine these in the same table. Are the products categories? If they are categories and are used like categories then it makes sense to have them in the same table, but if they have categories then they should be kept separate and given a category/parent id.
If you're sure it's appropriate to store them in the same table then the way you have it is good with one adjustment. For the UUID you can use a separate naming scheme that makes it interchangeable with id for those entries and avoids collisions with the other uuids. For example:
| id | UUID | name | parent_id |
----------------------------------------------------------
1 CAT000000001 All Categories NULL
2 CAT000000002 History 1
3 CAT000000003 Products 1
4 CN1001231232 Catalog electricity 3
5 CN1001231242 Catalog basic components 4
6 CAT000000006 Shipping 1

Your requirements combine the two things relational database are not great with out of the box: modelling hierarchies, and inheritance (in the object-oriented sense).
Your design users the "single table inheritance" model (one of 3 competing options). It's the simplest option in terms of design.
In practical terms, you may want to add a column to explicitly state which type of record you're dealing with ("regular category" and "product category") so your queries are more obvious to others.

Related

PHP/MySQL - mapping table with 3 (not 2 as usually) columns to inherit permissions

Making CMS to manage restaurants. Briefly, there are restaurants that have their own divisions. Also, there are meals that need to be assigned to a restaurant (so all its divisions can see those meals) or individual division(s) (so only selected division(s) can see meals).
So I created tables:
restaurant_id | restaurant_name
1 | Restaurant 1
2 | Restaurant 2
division_id | restaurant_id | division_name
1 | 1 | 1-1
2 | 1 | 1-2
3 | 2 | 2-1
4 | 2 | 2-2
meal_id | meal_name
1 | Steak
Also created mapping table meals_to_restaurants_divisions that contains 3 columns - meal_id, restaurant_id, division_id
So, if I want to assign meal to Restaurant 1 and ALL its divisions, I would create record:
meal_id | restaurant_id | division_id
1 | 1 | null
If I want to assign meal only to division 2-2, I would create a record:
meal_id | restaurant_id | division_id
1 | null | 4
Could someone advise if such a scheme is correct? How could it be improved? I know someone will say I should only create mapping table with 2 records - meal_id and division_id and assign meal to all divisions instead of assigning to a restaurant, but here's the catch: if Restaurant 1 gets new division created in future, I want new division to inherit the same permissions (so if existing meal is assigned to a restaurant instead of division, all future divisions will inherit restaurant's permissions). Otherwise, I would need to manually edit every meal and assign a new division to it.
If someone is interested why I use null in restaurant_id in 2nd example, it's because if division's parent is changed later (division is assigned to another restaurant), I don't need to scan mapping table and change restaurant_id value there.
To evaluate if your model is correct, the most important aspect is that it fulfills all requirements. Since your description is not complete, we cannot verify that for you. You may e.g. check if devisions can belong to several restaurants at the same time. Or if you have other attributes for devisions (e.g. staff assigned to a devision, the order history, outstanding royalties or ratings on your website), that change or do not change when you move a devision to another restaurant. You could also think of that process as creating a new devision inside another restaurant and assigning all meals to the new devision (or maybe several devisions). It is more a conceptional question than a formal one, and if it is correct (and also if there is a better one) will depend on the requirements.
Formally, your model is correct, as long as you never set both restaurant_id and division_id in your "permission" table, because such an entry would not make any sense. (And as long as restaurant_id is not part of the key in your devision table, which is not the case according to your description).
Your table is basically a flattened 2-level-tree with some implicit conditions. A more general tree could look like
id | parent
---+-------
1 | null -- restaurant "1" (has no parent)
2 | 1 -- devision "1-1"
3 | 1 -- devision "1-2"
4 | null -- restaurant "2"
5 | 4 -- devision "2-1"
6 | 4 -- devision "2-2"
In this tree, a restaurant would be treated like a normal devision (or a devision would be like a sub-restaurant), and you could assign meals to just that id. If you want to assign meal 1 to restaurant 1 and all its divisions, you would create the record meal=1, id=1. If you want to assign meal 1 only to division "2-2", you would create a record meal=1, id=6.
It implicitly prevents you from a situation where you would assign both restaurant_id and division_id to a meal (which is the mentioned condition in your model), as such a combination does not exist anymore. You "flattened" the tree by moving the parents into the restaurant_id and the (1st-level-)children to the devision_id-column of your meals_to_restaurants_divisions-table.
The tree will not enforce that a restaurant cannot belong to another restaurant or that you cannot have sub-sub-devisions. Basically, you treat restaurants explicitly NOT as a special kind of devision (or vice versa).
So both models, while formally correct, have a lot a implicit conditions that they automatically enforce (like the maximum depth in your model), that you would have to enforce manually (like not setting both columns in your model, or the maximum depth in the tree) or that are not supported (like subdivisions in your model). You will have to compare that to your requirements.
A site note: a tree is actually a bad model for hierarchical data in a database, as it gets complicated to (recursively) query several levels deep, and there are better models. I just used it for simplicity - and since you only have 1st-level-children, this restriction does not apply in your case.

Database design. How to approach this please?

I'm designing a database (MySQL) that will manage a fleet of vehicles.
Company has many garages across the city, at each garage, vehicles gets serviced (operation). An operation can be any of 3 types of services.
Table Vehicle, Table Garagae, Table Operation, Table Operation Type 1, Table Operation Type 2, Table Operation type 3.
Each Operation has the vehicle ID, garage ID, but how do I link it to the the other tables (service tables) depending on which type of service the user chooses?
I would also like to add a billing table, but I'm lost at how to design the relationship between these tables.
If I have fully understood it I would suggest something like this (first of all you shouldn't have three operation tables):
Vehicles Table
- id
- garage_id
Garages Table
- id
Operations/Services Table
- id
- vehicle_id
- garage_id
- type
Customer Table
- id
- service_id
billings Table
- id
- customer_id
You need six tables:
vechicle: id, ...
garage: id, ...
operation: id, vechicle_id, garage_id, operation_type (which can be
one of the tree options/operations available, with the possibility to be extended)
customer: id, ...
billing: id, customer_id, total_amount
billingoperation: id, billing_id, operation_id, item_amount
You definitely should not creat three tables for operations. In the future if you would like to introduce a new operation that would involve creating a new table in the database.
For the record, I disagree with everyone who is saying you shouldn't have multiple operation tables. I think that's perfectly fine, as long as it is done properly. In fact, I'm doing that with one of my products right now.
If I understand, at the core of your question, you're asking how to do table inheritance, because Op Type 1 and Op Type 2 (etc.) IS A Operation. The short answer is that you can't. The longer answer is that you can't...at least not without some helper logic.
I assume you have some sort of program that will pull data from the database, rather than you just writing sql commands by hand. Working under that assumption, let's use this as a subset of your database:
Garage
------
GarageId | GarageLocation | etc.
---------|----------------|------
1 | 123 Main St. | XX
Operation
---------
OperationId | GarageId | TimeStarted | TimeEnded | OperationTypeDescId | OperationTypeId
------------|----------|-------------|-----------|---------------------|----------------
2 | 1 | noon | NULL | 2 | 2
OperationTypeDesc
-------------
OperationTypeDescId | Name | Description
--------------------|-------|-------------------------
1 | OpFoo | Do things with the stuff
2 | OpBar | Do stuff with the things
OpFoo
-----
OpID | Thing1 | Thing2
-----|--------|-------
1 | 123 | abc
OpBar
-----
OpID | Stuff1 | Stuff2
-----|--------|-------
1 | 456 | def
2 | 789 | ghi
Using this setup, you have the following information:
A garage has it's information, plain and simple
An operation has a unique ID (OperationId), a garage where it was executed, an ID referencing the description of the operation, and the OperationType ID (more on this in a moment).
A pre-populated table of operation types. Each type has a unique ID (OperationTypeDescId), the name of the operation, and a human-readable description of what that operation is.
1 table for each row in OperationTypeDesc. For convenience, the table name should be the same as the Name column
Now we can begin to see where inheritance comes into play. In the operation table, the OperationTypeId references the OpId of the relevant table...the "relevant table" is determined by the OperationTypeDescId.
An example: Let's say we had the above data set. In this example we know that there is an operation happening in a garage at 123 Main St. We know it started at noon, and has not yet ended. We know the type of operation is "OpBar". Since we know we're doing an OpBar operation instead of an OpFoo operation, we can focus on only the OpBar-relevant attributes, namely stuff1 and stuff2. Since the Operations's OperationTypeId is 2, we know that Stuff1 is 789 and Stuff2 is ghi.
Now the tricky part. In your program, this is going to require Reflection. If you don't know what that is, it's the practice of getting a Type from the NAME of that type. In our example, we know what table to look at (OpBar) because of its name in the OperationTypeDesc table. Put another way, you don't automatically know what table to look in; reflection tells you that information.
Edit:
Csaba says "In the future if you would like to introduce a new operation that would involve creating a new table in the database". That is correct. You would also need to add a new row to the OperationTypeDesc table. Csaba implies this is a bad thing, and I disagree - with a few provisions. If you are going to be adding a new operation type frequently, then yes, he makes a very good point. you don't want to be creating new tables constantly. If, however, you know ahead of time what types of operations will be performed, and will very rarely add new types of operations, then I maintain this is the way to go. All of your info common to all operations goes in the Operation table, and all op-specific info goes into the relevant "sub-table".
There is one more very important note regarding this. Because of how this is designed, you, the human, must be aware of the design. Whenever you create a new operation type, it's not as simple as creating the new table. Specifically, you have to make sure that the new table name and the OperationTypeDesc "Name" entry are the same. Think of it as an extra constraint - an "INTEGER" column can only contain ints, otherwise the db won't allow the data. In the same manner, the "Name" column can only contain the name of an existing table. You the human must be aware of that constraint, because it cannot be (easily) automatically enforced.

Most efficient, scalable mysql database design

I have interesting question about database design:
I come up with following design:
first table:
**Survivors:**
Survivor_Id | Name | Strength | Energy
second table:
**Skills:**
Skill_Id | Name
third table:
**Survivor_skills:**
Surviror_Id |Skill_Id | Level
In first table Survivors there will be many records and will grow from time to time.
In second table will be just few skills which can survivors learn (for example: recoon (higher view range), sniper (better accuracy), ...). Theese skills aren't like strength or energy which all survivors have.
Third table is the most interesting, there survivors and skills join together. Everything will work just fine but I am worried about data duplication.
For example: survivor with id 1 will have 5 skills so first table would look like this:
// survivor_id | level_id | level
1 | 1 | 2
1 | 2 | 3
1 | 3 | 1
1 | 4 | 5
1 | 5 | 1
First record: survivor with id 1 has skill with id 1 on level 2
Second record ...
Is this proper approach or should I use something different.
Looks good to me. If you are worried about data duplication:
1) your server-side code should be gear to not letting this happen
2) you could check before inserting if it already exists
3) you could use MYSQL: REPLACE INTO - this will replace duplicate rows if configure proerply, or insert new ones (http://dev.mysql.com/doc/refman/5.0/en/replace.html)
4) set a unique index on columns where you want only unique rows, e.g. level_id, level
I concur with the others - this is the proper approach.
However, there is one aspect which hasn't been discussed: the order of columns in the composite key {Surviror_Id, Skill_Id}, which will be governed by the kinds of queries you need to run...
If you need to find skills of the given survivor, the order needs to be: {Surviror_Id, Skill_Id}.
If you need to find survivors with the given skill, the order needs to be: {Skill_Id, Surviror_Id}.
If you need both, you'll need both the key (and the implied index) on {Surviror_Id, Skill_Id} and an index on {Skill_Id, Surviror_Id}1. Since InnoDB tables are clustered, accessing Level through that secondary index requires double-lookup - to avoid that, consider using a covering index {Skill_Id, Surviror_Id, Level} instead.
1 Or vice-verse.

Database structure for "products can be made of many products"

We are building a web database system and we need to allow some products to be made of other products. For example combining 2 or more products as a new product. We are using CakePhp and MySQL.
Here is the data structure diagram of our database:
https://www.dropbox.com/s/ksv22rt45uv69k9/Data%20Structure%20Diagram.png
Would we need to have self referencing products table or create a new table?
You can do either. There are pros and cons to both. Either way you will need a cross reference table.
The cross reference table can refer itself.
products in item
+---------------------+----------------------------+------------+
| product_identifier | product_identifier_child | quantity |
+---------------------+----------------------------+------------+
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 2 | 1 |
+---------------------+----------------------------+------------+
On the bright side, this method means you only have one table of data and only one new cross reference table, and you can add new products as you see fit (along with multiple of the same products, say, with a gift basket). On the downside, your table will be trying to do two different things at the same time. Products that have other products in them may not have a model number. Also, how will you determine whether to check the inventory table? Are you going to have inventory for products that are made out of products, or would you sooner check stock on individual products in order to see if your combo products are in stock? The latter is much more flexible, and you can still reserve inventory this way. It just allows all of your inventory to be in the same unit scale in your inventory table.
To add more flexibility, you can create another table, base products, which has values only the building block products are going to have.
base products
+--------------------------+----------+--------------+
| base product identifier | brand | model number |
+--------------------------+----------+--------------+
You could then attach your inventories to your base products table, and your cross reference table would be products to base products.
The negative here is that now instead of two tables, you have three. However, I am a fan of more tables with fewer columns thanks to increased flexibility. Even if the table tasks are not completely different, letting each table specialize completely can make things a lot easier.
There are numerous ways to go but optimal situation is the one that requires no data duplication and no NULL values. Without stressing yourself out about getting all the way there, try to keep your NULL values out of indexed columns and make sure your name values are only showing up in one place.

Join items to a new item with many-to-many (n:m)

my rule of business is something like a used car/motobike dealership:
My table "stock" contains cars, so no two of the same products as each automobile belongs to a different owner.
Sometimes the owner has two cars that he wants to sell separately, but also wants to sell them together, eg:
Owner has a car and a motorcycle:
+----------------+
| id | Stock |
+----+-----------+
| 1 | car |
+----+-----------+
| 2 | motorcycle|
+----+-----------+
In case he wants to advertise or sell in two ways, the first would be the car for U$10.000 and motobike for U$5.000
But it also gives the option to sell both together for a lower price (car + bike U$ 12.000), eg:
+----+-----------+--------------------+-----------+
| id | id_poster | Stock | Price |
+----+-----------+--------------------+-----------+
| 1 | 1 | car | U$ 10.000 |
+----+-----------+--------------------+-----------+
| 2 | 2 | motorcycle | U$ 5.000 |
+----+-----------+--------------------+-----------+
| 1 | 3 | car | U$ 12.000 |
+----+-----------+--------------------+-----------+
| 2 | 3 | motorcycle | U$ 12.000 |
+----+-----------+--------------------+-----------+
This is the best way to do this?
My structure is already doing so (just as I believe to be the best way), I'm using foreign key and n:m, see my structure:
Ok, so if I'm understanding the question right, you're wondering if using a junction table is right. It's still difficult to tell from just your table structures. The poster table just has a price, and the stock table just has a title and description. It's not clear from those fields just what they're supposed to represent or how they're supposed to be used.
If you truly have a many-to-many relationship between stock and poster entities -- that is, a given stock can have 0, 1 or more poster, and a poster can have 0, 1 or more stock -- then you're fine. A junction table is the best way to represent a true many-to-many relationship.
However, I don't understand why you would want to store a price in poster like that. Why would one price need to be associated with multiple titles and descriptions? That would mean if you changed it in one spot that it would change for all related stock. Maybe that's what you want (say, if your site were offering both A1 and A0 size posters, or different paper weights with a single, flat price across the site regardless of the poster produced). However, there just aren't enough fields in your tables currently to see what you're trying to model or accomplish.
So: Is a junction table the best way to model a many-to-many relationship? Yes, absolutely. Are your data entities in a many-to-many relationship? I have no idea. There isn't enough information to be able to tell.
A price, in and of itself, may be one-to-one (each item has once price), one-to-many (each item has multiple prices, such as multiple currencies), or -- if you use a price category or type system like with paper sizes -- then each item has multiple price categories, and each price category applies to multiple items.
So, if you can tell me why a stock has multiple prices, or why a single poster price might apply to multiple stock, then I can tell you if using a junction table is correct in your situation.
Having seen your edit that includes your business rules, this is exactly the correct structure to use. One car can be in many postings, and one posting may have many cars. That's a classic many-to-many, and using a junction table is absolutely correct.
Not clear how the examples relate to your diagram because you use different terminology, but I think it's safe to say: If you want to store something like "this entity consists of orange, apple and pear" then the DB design you show is the correct way to do it. You'd have one poster entry, and three entries in the poster_has_stock pointing to the same poster and three elements in stock.
The structure which you're using is best solution in your case, no need to change, just 2 minor changes needed:
1. remove 2 indexes: fk_poster_has_stock_stock1_idx and fk_poster_has_stock_poster_idx, because they are primary keys already
2. stock_price field should use decimal data type (more precise)
You can read more about Decimal data type here
I think Your solution is nearly perfect. I think You may add "id" to "poster_has_stock" table. And of course change price type (it was written upper).
But You may consider second option with stock_id in poster table.
WHY?
-There should be no poster with no stock connected to it.
-In most cases there will be offers: one stock <=> one poster
This will allow You also to add as many dependend stocks to poster as You want.
You can also add poster_special_price DECIMAL (9,2) to poster table.
This will allow You easy to show:
price for stock item.
Special price for stock item with it's dependencies.
This will be also easier to manage in controller (create, update) - You will be adding poster already with stock, No transactions will be needed during adding new poster.
you may consider a new table that creates a relationship between the stock items such as:
stock_component
---------------
parent_stock_id
child_stock_id
child_qty
in this way, you can link up many children into one parent in the style of a bill of materials, then the rest of your links can continue to be simply related to stock_id of the appropriate parent.