In my project I have locations of the product, which may look like this:
- Location 1
-- SubLocation 1
-- SubLocation 2
- Location 2
-- SubLocation 3
-- SubLocation 4
Imagine a zones with subzones in the facilty.
I need to store that in DB and then retrive sometime later, like this: SubLocation 1 at Location 1.
My first guess is to have two tables with one to many ralationship, but that won't scale, it later I'll need to have something like this:
- Location 2
-- SubLocation 3
-- SubLocation 4
---- SubLocation 5
---- SubLocation 6
So my question is what's the best way to store such structure in relational database?
You can define parent_id reference FK to another record with id (roots have null parent_id).
To define hierarhy and retrieve all subtree in one query you can define an additional field path (VARCHAR). The field should have full path of ids separated with '_'
In your case SubLocation 5 has the path="2_4_5"
To retrieve all the children of SubLocation 4 you can use
select *
from myTable
where path like '2_4%';
There is level depth restriction (size of the path in fact) but for most cases it should work.
Dealing with hierarchical data is difficult in MySQL. So, although you might store the data in recursive tables, querying the data is (in general) not easy.
If you have a fixed set of hierarchies, such as three (I'm thinking "city", "state", "country"), then you can have a separate table for each entity. This works and is particularly useful in situations where the elements can change over time.
Alternatively, you can have a single table that flattens out the dimensions. So, "city", "state", and "country" are all stored on a single row. This flattens out the data, so it is no longer normalized. Updates become tedious. But if the data is rarely updated, then that is not an issue. This form is a "dimensional" form and used for OLAP solutions.
There are hybrid approaches, where you store each element in a single table, in a recursive form. However, the table also contains the "full path" to the top. For instance in your last example:
/location2/sublocation3
/location2/sublocation4
/location2/sublocation4/sublocation5
/location2/sublocation4/sublocation6
This facilitates querying the data. But it comes at the cost of maintenance. Changing a something such as sublocation4 requires changing many rows. Think triggers.
The easiest solution is to use different tables for different entities, if you can.
You can store it in one tabel and retreive sublocations using self join.
Related
I am creating a website where users can post a listing of their home. I have checkboxes where users can check the characteristics their home contains such as a pool, fireplace, attached/detached garage etc.
I had to designs in mind but I was wondering which is more correct:
Create a column in the home listing table for each characteristics and give it a type of enum('0','1') where 0 stands for not checked and 1 stands for checked
Create a table which holds all the characteristics a property can have like: garage, pool, fireplace etc.. and then create a second table that pulls the characteristic id and pairs it with a home listing id
For eg: home_1 has a pool so a row will be created like this:
| home_1 | 1 |
where home_1 is the listing id and 1 is the id of pool in the characteristics table
Which option should I go with?
Option 1 seems good, because if you go with 2nd option then there will be joins while querying the database. And join are expensive and time taking in MySQL.
more can be found here https://www.percona.com/blog/2013/07/19/what-kind-of-queries-are-bad-for-mysql/
If you want to query the data like "count all detached houses"
Enum with seperate columns will work faster and easier to handle db operations.
If you are willing to query houses ONLY ON addresses, price and such NOT those features. 2nd method is easier to develop and maintain.
In short, use 2nd method if u are not going to query those house characteristics
individually.
It all depends on your method of using the data after you save them. But the basic idea should be to consider mappings in these ways:
Go with the second option when:
If the two entities are many-many (many homes, many characteristics) you should go with the second option (even if it adds little cost of using joins in future).
Since your full db mapping is not known, I am proposing one more option IF the characteristics are independent of property. Meaning, if you are planning to use characteristics to reference some other entities of other tables, then it will be best again to go with your second option.
Go with the first option when
If it is just one-many relationship (one home, many characteristics), your first option works good because not only it would reduce cost while fetching but also will update/remove the dependent characteristics of home when your home record gets updated/deleted.
Lastly, Its only up to you to decide the mapping type and dependencies of data models.
I guess that title isn't very descriptive, so I will explain! I have table called users_favs where is stored all info about which posts user has liked, which post he has favourited and the same for comments. info there is stored as serealized array / or JSON who cares.
Question: What is better? Stay like this or to make 4 tables for each of the fields and store not in serealized version but like user_id => post_id???
What I think about second option is that after some time this field will be GIGANTIC. Also, I will need to make 4 queries (or with JOINS) to take all of the info from these tables.
Keeping it in 1 table means that you'll only need 1 table access and 0 joins to get all the data. While storing it in 4 tables, you'll need at least 1 table access and n-1 joins, when you need n fields of information. Your result set at the end of the query will probably be the same, so the amount of data send over the network is independent of your table structure.
I presume a scenario when you will have data for fav_categories and other columns are null. Similarly for columns fav_posts, liked_posts, liked_comments. So there is a high probability that in each row , only three columns will have data most of the time (id,user_id,any one of rest). If my assumptions are right and the use cases as well , then i would definitely go four four tables.
To add to above you can always choose from whether you want to make read-friendly or write-friendly.
This question already has answers here:
How do you know when you need separate tables?
(9 answers)
Closed 9 years ago.
I have a table called cars but each car has hundreds of attributes and they keep on increasing over time (horsepower, torque, a/c, electric windows, etc...) My table has each attribute as a column. Is that the right way to do it when I have thousands of rows and hundreds of columns? Also, I made each attribute a column so I facilitate advanced searching / filtering.
Using MySQL database.
Thanks
This is an interesting question IMHO, and the answer may depend on your specific data model and implementation. The most important factor in this case is data density.
How much of each row is actually filled up, in average?
If most of your fields are always present, then data scope partition may be the way to go.
If most of your fields are empty, then a metadata-like structure (like #JayC suggested) may be more attractive.
Let's use the case you mentioned, and do some simulations.
On the first case, scope partition, the idea is to implement partitions based on scope or usage. As an example of partitioning by usage, let's say that the most retrieved fields are Model, Year, Maker and Color. These fields may compose your main [CAR] table, the owner of the ID field which will exclusively identify the vehicle.
Now let's say that Engine, Horsepower, Torque and Cylinders are also used for searches from time to time, but not so frequently. These may exist on a secondary table [CAR_INFO_1], which is tied to the first table by the presence of the CAR_ID field, a foreign key. Proceed by creating as many partitions you need.
Advantage: Simpler queries. You may coalesce all information about a vehicle if you do a joint query (for example inside a VIEW).
Downside: Maintenance. Each new field must be implemented in the model itself, and you need an updated data model to locate where the field you need is actually stored (or abstract it inside a view.)
Metadata format is much more elegant, but demands more of your database engine. Check #JayC's and #Nitzan Shaked's answers for details.
Advantages: 100% data density. You'll never have empty Data values. Also maintenance - a new attribute is created by adding it as a row to the metadata identifier table. Data structure is less complex as well.
Downside: Complex queries, together with more complex execution plans. Let's say you need all Ford cars made in 2010 that are blue. It would be very trivial on the first case:
SELECT * FROM CAR WHERE Model='Ford' AND Year='2010' AND Color='Blue'
Now the same query on a metadata-structured model:
Assume the existence of this two tables,
CAR_METADATA_TYPE
ID DESC
1 'Model'
2 'Year'
3 'Color'
and
CAR_METADATA [CAR_ID], [METADATA_TYPE_ID], [VALUE]
The query itself would like something like this:
SELECT * FROM CAR, CAR_METADATA [MP1], CAR_METADATA [MP2], CAR_METADATA [MP3]
WHERE MP1.CAR_ID = CAR.ID AND MP1.METADATA_TYPE_ID = 1 AND MP1.Value='Ford'
AND MP2.CAR_ID = CAR.ID AND MP2.METADATA_TYPE_ID = 2 AND MP2.Value='2010'
AND MP3.CAR_ID = CAR.ID AND MP3.METADATA_TYPE_ID = 3 AND MP3.Value='Blue'
So, it all depends on you needs. But given your case, my suggestion would be the Metadata format.
(But do a model cleanup first - no repeated fields, 1:N data on their own table instead of inline fields like Color1, Color2, Color3, this kind of stuff ;) )
I guess the obvious question is, then: why not have a table car_attrs(car, attr, value)? Each attribute is a row. Most queries can be re-written to use this form.
If it is all about features, create a features table, list all your features as rows and give them some sort of automatic id, and create a car_features that with foreign keys to both your cars table and your features table that associates cars with features, maybe along with any values associated with the relationship (one passenger electric seat, etc.).
If you have ever changing attributes, then consider storing them in an XML blob or text structure in one column. This structure is not relational. The most important attributes will then be duplicated in additional columns so you can craft queries to search on them as the Blob will not be searchable from SQL queries. This will cut down on the amount of columns in that table and allow for expansion without changing the database schema.
As others as suggested, if you want all the attributes in a table, then use an attribute table to define them. Then will depend on your requirements and needs of the application.
I have an existing mysql DB that manages regulations for 50 states. The current setup is relational - three tables for EACH of the 50 states:
state_table contains the chapter/sub-chapter headings
item_table contains the end records
department_table contains the ID's to relate the two.
all combined it handles around 620,000 records
I'm not a DB design expert and have always utilized this as-is and gotten-by however, the nature of tables for all 50 states limits searching across all states etc. and I'm wondering if there is a better approach.
I'm wondering if I should consider combining this into either a single set of 3 relational tables for the entire nation or even a single table to handle everything.
I've asked this on other forums and have been told to read various volumes of DB schema and structures etc. so if there is someone who can just suggest the direction to go in and the pro's and con's of what I have vs the alternative that would be great!
thanks!
Here's the way it is, X 50
alabama
ID
Name
State
Parent
Description
alabama_department
Department - ID's from "alabama"
Item - ID's from "alabama_item"
alabama_item
ID
Name
Description
Keywords
Doc_ID
Effective_date
...
...
The Queries: I step through the heirarchy of chapter/sub-chapter/end-record via links this works fine but I'm starting to focus more on search capability and also thinking what I have is overkill and it sounds like a couple of you think so (overkill)
If I am correct in thinking you have 150 tables (3 * 50 states) Then:
You should have a 'states' table which includes a stateID and stateName. Then use ONE table for chapter/subchapters, ONE for departments, and ONE for end records and use the stateID to relate different records to a state.
You should not have 3 tables for each state, you can use one of each and just relate to a state table. This brings you to four tables instead of 150.
This question already has answers here:
Is storing a delimited list in a database column really that bad?
(10 answers)
Closed 4 years ago.
I'm building a database for a real estate company.. Those real estate properties are managed by the sales people using a CodeIgniter website.
How should I proceed with the database design for this table. The fields like address/location, price etc. are pretty straightforward but there are some sections like Kitchen Appliences, Property Usage etc... where the sales people can check multiple values for that field.
Furthermore, a property can have multiple people attached to it (from the table people), this can be an intermediaire, owner, seller, property lawyer etc... Should I use one field for this or just I create an extra table and normalize the bindings?
Is the best way to proceed just using one field and using serialized data or is there a better way for this?
In a relational database you should never, EVER, put more than one value in a single field - this violates first normal form.
With the sections like Kitchen Appliances, Property Usage etc... where the sales people can check multiple values for that field, it depends on whether it will always be the same set of multiple options that can be specified:
If there are always the same set of multiple options, you could include a boolean column on the property table for each of the options.
If there are likely to be a variety of different options applicable to different properties, it makes more sense to set up a separate table to hold the available options, and a link table to hold which options apply to which properties.
The first approach is simpler and can be expected to perform faster, while the latter approach is more flexible.
A similar consideration applies to the people associated with each house; however, given that a single property can have (for example) more than one owner, the second approach seems like the only one viable. I therefore suggest separate tables to hold people details (name, phone number, etc) and people roles (such as owner, seller, etc.) and a link table to link roles to properties and people.
You should create extra table for it....
For example...
Consider the scenario that 1 item may have many categories and 1 category may have many items...
Then you can create table like this...
Create three tables for that....
(1) tblItem :
fields:
itemId (PK)
itemName
etc..
(2) tblCategory :
fields:
categoryId (PK)
categoryName
etc..
(3) tblItemCategory :
fields:
itemId (PK/FK)
categoryId (PK/FK)
So according to your data you can create an extra table for it....
I think it would be optimize way....
If you want your data to be in third normal form, then you should think in terms of adding extra tables, rather than encoding multiple values into various fields. However it depends how far your brief goes.