I have the following information that should be retrieved by using several dependent select fields on a web form:
Users will be able to add new categories.
Food
- Fruits
- Tropical
- Pineapples
- Pineapples - Brazil
- Pineapples - Hawaii
- Coconuts
- Continental
- Orange
- Fish
....
This data should come from a database.
I realize that creating a table for each category here presented is not a good schema perhaps, so I would to ask, if is there any standard way to deal with this?
I'm also aware of this schema example:
Managing Hierarchical Data in MySQL
Is there any other (perhaps more intuitive way) to store this type of information ?
The link you provided describes the two standard ways for storing this type of information:
Adjacency List
Nested Sets
One issue your question didn't raise is whether all fruits have the same attributes or not.
If all fruits have the same attributes, then the answer that tells you to look at the link you provided and read about adjacency lists and nested sets is correct.
If new fruits can have new attributes, then a user that can add a new fruit can also add a new attribute. This can turn into a mess, real easily. If two users invent the same attribute, but give it a different name, that might be a problem. If two users invent different attributes, but give them the same name, that's another problem.
You might just as well say that, conceptually, each user has their own database, and no meaningful queries can be made that combine data from different users. Problem is, the mission of the database almost always includes, sooner or later, bringing together all the data from the different users.
That's where you face a nearly impossible data management issue.
Kawu gave you the answer.... a recursive relation (the table will be be related to itself) aka Pig's Ear relation.
You example shows a parent with several children, but you didn't say if an item can belong to more that one parent. Can an orange be in 'Tropical' and in 'Citrus'?
Each row has an id and a parent_id with the parent_id pointing to the id of another row.
id=1 name='Fruits' parent_id=0
id=2 name='Citrus' parent_id=1
id=3 name='Bitter Lemon' parent_id=2
id=4 name='Pink Grapefruit' parent_id=2
Here are some examples of schemas using this type of relation to provide unlimited parent-child relations:
Data model for product categories
Data model for organizations and people
Related
I've a requirement to design a database for an ecommerce app that has vast scope of product categories ranging from pin to plane. All products have different kinds of features. For example, a mobile phone has specific features like memory, camera mega pixel, screen size etc whilst a house has land size, number of storeys and rooms, garage size etc. Such specific features go on and on as much as we've products. Whist all have some common features, there are mostly very different and specific features of all. So, it has gotten bit confusing while designing its database. I'm doing it for the first time.
My query is about database design. Here is what I'm planning to do:
Create a master table with all fields, that tells if a field is common or specific and map them with respective category of the product. All products will have "common" fields but "specific" will be shown only for one category.
table: ALL_COLUMNS
columns:
id,
name,
type(common or specific),
category(phone, car, laptop etc.)
Fetch respective fields from all_columns table while showing the fields on the front.
Store the user data in another table along with mapped fields
table: ALL_USER_DATA
columns:
id,
columnid,
value
I don't know what is the right way and how it is done with established apps and site. So, I'm looking forward if someone could tell if this is the right way of database architecture of an ecommerce app with highly comprehensive and sparse set of categories and features.
Thank you all.
There are many possible answers to this question - see the "related" questions alongside this one.
The design for your ALL_USER_DATA table is commonly known as "entity/attribute/value" (EAV). It's widely considered horrible (search SO for why) - it's theoretically flexible, but imagine finding "airplanes made by Boeing with a wingspan of at least 20 metres suitable for pilots with a new qualification" - your queries become almost unintelligible really fast.
The alternative is to create a schema that can store polymorphic data types - again, look on Stack Overflow for how that might work.
The simple answer is that the relational model is not a good fit for this - you don't want to make a schema change for each new product type your store uses, and you don't want to have hundreds of different tables/columns.
My recommendation is to store the core, common information, and all the relationships in SQL, and to store the extended information as XML or JSON. MySQL is pretty good at querying JSON, and it's a native data type.
Your data model would be something like:
Categories
---------
category_id
parent_category_id
name
Products
--------
product_id
price
valid_for_sale
added_date
extended_properties (JSON/XML)
Category_products
-----------------
category_id
product_id
I am searching for a guideline on how to set up my database for a auction side.
My problem is, that there is a lot of different product types - let's say paintings, clothes, computers etc. They have different specifications, and it should be possible to set just Product A in size L on auction - or the whole stock of Product B e.g.
How should I build my database for optimal performance - and coding - in this case?
I would suggest the following database/object structure:
[Auction] n..1 [Category] 1..n [Variation Attribute] 1..n [Attribute Value]
An auction then has a category and several attribute values referring the variation attribute as well:
[Auction] = [Category], [Name], [Description]
[Auction_AttrVal] = [AuctionID], [VarAttrID], [AttrValID]
First of all you can have some kind of category table, which holds items like "Paintings", "Clothes", "Computers". An auction / product is assigned to one category.
Each category then defines variation attributes for this specific category. An example would be "Size" for the category "Clothes" or "CPU" for the category "Computers". You can also add predefined values for the variation attributes to limit the number of variations and avoid differentiations like "3GhZ" vs "3 GhZ".
This mechanism also allows for easy filtering of search results. You select a category and simply load all variation attributes as filters (or add a flag to an attribute to declare it as such) and offer the values for filtering to the end-user.
Furthermore you can make variation attributes for a category mandatory to force users who create the auctions (I'm assuming it's Consumer-to-Consumer) to provide sufficient information for their auction.
The code will probably be quite generic and simple. The database structure is highly flexible and extensible. Performance is much better than having all in one table. You probably should create an index (for the field AuctionID) for the Auction_AttrVal table. Please let me know if the database structure is not explained properly.
I have a site written in cakephp with a mysql database.
Into my site I want to track the activities of every users, for example (like this site) if a user insert a product I want to put this activity into my database.
I have 2 ways:
1) One table called Activities with:
- id
- user_id
- title
- text
- type (the type of activity: comment, post edit)
2) more table differenced by activities
- table activities_comment
- table activities_post
- table activities_badges
The problem is when I go to the page activities of a user I can have different type of activities and I don't know which of this solution is better because a comment has a title and a comment, a post has only a text, a badge has an external id to its table (for example) ecc...
Help me please
I'm not familiar with CakePHP, but from purely database perspective your data model should probably look similar to this:
The symbol denotes category (aka. inheritance, subclass, subtype, generalization hierarchy etc.). Take a look at "Subtype Relationships" in ERwin Methods Guide for more info.
There are generally 3 strategies for implementing the category:
All types in single table. This requires a lot of NULLs and requires CHECKs to make sure separate subtypes are not inappropriately "intermingled".
All concrete types in separate tables (excluding the base, which is ACTIVITY in your case), which means common fields and relationships must be repeated in all child tables.
All types in separate tables (including the base). This implementation requires a little more JOINing, but is flexible and clean. It should be your default, unless there are strong reasons against it.
I am working on a reviews website. Basically you can choose a location and business type and optionally filter your search results by various business attribures. There are five tables at play here:
Businesses
ID
Name
LocationID
Locations
LocationID
LocationName
State
Attributes
AttributeID
AttributeName
AttributeValues
AttributeValueID
ParentAttributeID
AttributeValue
BusinessAttributes
ID
AttributeID
AttributeValueID
So what I need is to work out the query to use (joins?) to get a business in a particular location based on attribute values.
For example, I want to find a barber in Santa Monica with these attributes:
Price: Cheap
Open Weekends: Yes
Cuts Womens Hair: Yes
These attributes are stored in the Attributes and AttributeValues tables and are linked to the business in the BusinessAttributes table.
So let's say I have these details from the search form:
LocationID=5&Price=Cheap&Open_Weekends=Yes&Customs_Womens_Hair=Yes
I need to build the query to return the businesses that match this location and attributes.
Thank you in advance for your help and I think StackOverflow is awesome.
Thinking about your data needs, you may be a perfect candidate for a schema-free document oriented database. On a recent episode of .Net Rocks (link to show), Michael Dirolf talked about his project MongoDB.
From what I understand, you could take each Business entity and store it in the database with all its associated attributes (LocationID, Price, Open_Weekends, Customs_Womens_Hair, Etc.). Each entity stored in the store can have different combinations of attributes because there is no schema. This natively accomplishes what you are trying to do with an Attribute and Attribute_Value table.
To search the database, just ask it for all entities that have the particular set of keys and values you need. No complex joins and no loss of performance. What you are doing is exactly what schema-free, document based databases are designed for.
Michael Dirolf: Yes, I think that a lot of the people who are switching are people who have sort of got themselves into corners where they are using relational database the way that we use MongoDB.
Richard Campbell: Right.
Michael Dirolf: So having columns that, a column key and a separate column value and inserting stuff that way so that they get done in schema and all sorts of crazy stuff like that…
Richard Campbell: Yeah, now in reflection I suddenly realized I just describe your perfect customer, a guy who has taken, you know, abusing SQL Server as they say. We’re going down this funny path and you just shouldn’t be here in the first place.
If you keep going down the path of building a relational attribute/value store, your performance will suffer with the combonatoric explosion that results.
Im trying to use to define a one-to-many relationship in a single table. For example lets say I have a Groups table with these entries:
Group:
Group_1:
name: Atlantic Records
Group_2:
name: Capital Records
Group_3:
name: Gnarls Barkley
Group_4:
name: Death Cab For Cutie
Group_5:
name: Coldplay
Group_6:
name: Management Company
The group Coldplay could be a child of the group Capital Records and a child of the group Management Company and Gnarls Barkley could only be a child of Atlantic Records.
What is the best way to represent this relationship. I am using PHP and mySQL. Also I am using PHP-Doctrine as my ORM if that helps.
I was thinking that I would need to create a linking table called group_groups that would have 2 columns. owner_id and group_id. However i'm not sure if that is best way to do this.
Any insight would be appreciated. Let me know if I explained my problem good enough.
There are a number of possible issues with this approach, but with a minimal understanding of the requirements, here goes:
There appear to be really three 'entities' here: Artist/Band, Label/Recording Co. and Management Co.
Artists/Bands can have a Label/Recording CO
Artists/Bands can have a Management Co.
Label/Recording Co can have multiple Artists/Bands
Management Co can have multiple Artists/Bands
So there are one-to-many relationships between Recording Co and Artists and between Management Co and Artists.
Record each entity only once, in its own table, with a unique ID.
Put the key of the "one" in each instance of the "many" - in this case, Artist/Band would have both a Recording Co ID and a Management Co ID
Then your query will ultimately join Artist, Recording Co and Management Co.
With this structure, you don't need intersection tables, there is a clear separation of "entities" and the query is relatively simple.
A couple of options:
Easiest: If each group can only have one parent, then you just need a "ParentID" field in the main table.
If relationships can be more complex than that, then yes, you'd need some sort of linking table. Maybe even a "relationship type" column to define what kind of relationship between the two groups.
In this particular instance, you would be wise to follow Ken G's advice, since it does indeed appear that you are modeling three separate entities in one table.
In general, it is possible that this could come up -- If you had a "person" table and were modeling who everybody's friends were, for a contrived example.
In this case, you would indeed have a "linking" or associative or marriage table to manage those relationships.
I agree with Ken G and JohnMcG that you should separate Management and Labels. However they may be forgetting that a band can have multiple managers and/or multiple managers over a period of time. In that case you would need a many to many relationship.
management has many bands
band has many management
label has many bands
band has many labels
In that case your orginal idea of using a relationship table is correct. That is home many-to-many relationships are done. However, group_groups could be named better.
Ultimately it will depend on your requirements. For instance if you're storing CD titles then perhaps you would rather attach labels to a particular CD rather than a band.
This does appear to be a conflation of STI (single-table inheritance) and nested sets / tree structures. Nested set/trees are one parent to multiple children:
http://jgeewax.wordpress.com/2006/07/18/hierarchical-data-side-note/
http://www.dbmsmag.com/9603d06.html
http://www.sitepoint.com/article/hierarchical-data-database
I think best of all is to use NestedSet
http://www.doctrine-project.org/documentation/manual/1_0/en/hierarchical-data#nested-set
Just set actAs NestedSet
Yes, you would need a bridge that contained the fields you described. However, I would think your table should be split if it is following the same type of entities as you describe.
(I am assuming there is an id column which can be used for references).
You can add a column called parent_id (allow nulls) and store the id of the parent group in it. Then you can join using sql like: "Select a., b. from group parent join group child on parent.id = child.parent_id".
I do recommend using a separate table for this link because:
1. You cannot support multiple parents with a field. You have to use a separate table.
2. Import/Export/Delete is way more difficult with a field in the table because you may run into key conflicts. For example, if you try to import data, you need to make sure that you first import the parents and then children. With a separate table, you can import all groups and then all relationships without worrying about the actual order of the data.