I am developing a classifieds website similar to Quickr.com.
The main problem is that each category requires a different set of properties. For example, for a mobile phone the attributes might be Manufacturer, Operating System, Is Touch Screen, Is 3G enabled etc... Whereas for an apartment the attributes are Number of bedrooms, Is furnished, Which floor, total area etc. Since the attributes and the number of attributes varies for each category, I am keeping the attributes and their values in separate tables.
My current database structure is
Table classifieds_ads
This table stores all the ads. One record per ad.
ad_id
ad_title
ad_desc
ad_created_on
cat_id
Sample data
-----------------------------------------------------------------------------------------------
|ad_id | ad_title | ad_desc | ad_created_on | cat_id |
-----------------------------------------------------------------------------------------------
|1 | Nokia Phone | Nokia n97 phone for sale. Excellent condition | <timestamp> | 2 |
-----------------------------------------------------------------------------------------------
Table classifieds_cat
This table stores all the available category. cat_id in classifieds_ads table relates to cat_id in this table.
cat_id
category
parent_cid
Sample data
-------------------------------------------
|cat_id| category | parent_cid |
-------------------------------------------
|1 | Electronics | NULL |
|2 | Mobile Phone | 1 |
|3 | Apartments | NULL |
|4 | Apartments - Sale | 3 |
-------------------------------------------
Table classifieds_attribute
This table contains all the available attributes for a particular category. Relates to classifieds_cat table.
attr_id
cat_id
input_type
attr_label
attr_name
Sample data
-----------------------------------------------------------
|attr_id | cat_id | attr_label | attr_name |
-----------------------------------------------------------
|1 | 2 | Operating System | Operating_System |
|2 | 2 | Is Touch Screen | Touch_Screen |
|3 | 2 | Manufacturer | Manufacturer |
|4 | 3 | Bedrooms | Bedrooms |
|5 | 3 | Total Area | Area |
|6 | 3 | Posted By | Posted_By |
-----------------------------------------------------------
Table classifieds_attr_value
This table stores the attribute value for each ad in classifieds_ads table.
attr_val_id
attr_id
ad_id
attr_val
Sample data
---------------------------------------------
|attr_val_id | attr_id | ad_id | attr_val |
---------------------------------------------
|1 | 1 | 1 | Symbian OS |
|2 | 2 | 1 | 1 |
|3 | 3 | 1 | Nokia |
---------------------------------------------
========
Is this design okay?
Is it possible to index this data with solr?
How can I perform a faceted search on this data?
Does MySQL support field collapsing like solr?
My suggestion is to remove cat_id from the classifieds_attribute table, then create a new table.
The new table would look like:
cat_attr | id | cat_id | attr_id
This should help you decrease redundancy.
Your design is fine, although I question why you are using hierarchical categories. I understand that you want to organize categories from an end-user standpoint. The hierarchy helps them drill down to the category that they are looking for. However, your schema allows for attribute values at every level. I would suggest that you only need (or possibly want) attributes at the leaf level.
It is certainly possible that you could come up with attributes that would be applicable at higher levels, but this would drastically complicate your management of the data since you'd have to spend a lot of time thinking about exactly how high up the chain a certain attribute belongs and whether or not there is some reason why a lower level might be an exception to the parent rule and so forth.
It also certainly over complicates your retrieveal as well - which is part of the reason for your question, I think.
I would suggest creating an additional table that will be used to manage the hierarchy of categories above the leaf level. It would look exactly like your classifieds_cat table except the involuted relationship will obviously be to the new table. Then classifieds_cat.parent_cid becomes an FK to the new table rather than an involuted FK to classifieds_cat.
I think this schema change will reduce your application and data management complexity.
Related
Let's assume I have two types of users in my system.
Those who can program and those who cannot.
I need to save both types of users in the same table.
The users who can program have lots properties different to those who can't, defined in another table.
What's either advantages of the following solutions and are there any better solutions?
Solution 1
One table containing a column with the correspondig property.
Table `users`:
----------------------------
| id | name | can_program |
----------------------------
| 1 | Karl | 1 |
| 2 | Ally | 0 |
| 3 | Blake | 1 |
----------------------------
Solution 2
Two tables related to each other via primary key and foreign key.
One of the tables containing the users and the other table only containing the id of those who can program.
Table users:
--------------
| id | name |
--------------
| 1 | Karl |
| 2 | Ally |
| 3 | Blake |
--------------
Table can_program:
---------------------
| id | can_program |
---------------------
| 1 | 1 |
| 3 | 1 |
---------------------
You have a 1-1 relationship between a user and the property that allows him to program. I would recommend storing this information as an additional column in table users. Creating an additional table will basically results in an additional storage structure with a 1-1 relationship to the original table.
Why not just have some kind of programmer_profiles table that the users table has a one-to-many relationship with?
If there's an associated record in programmer_profiles then they can program, otherwise it's presumed they can't.
This is more flexible since you can add in other x_profiles tables that provide different properties even if some of these have the same names.
Please let me start by saying, I know this has been asked many times before and I've studied other questions (and answers) but after 2 days of reading questions and amending my database I can't get this to work as I want.
At the moment I have various tables, for example customer, supplier, product, banner, etc.
I have a table called custom_field which allows custom fields to be created and used against various other tables.
At the moment some of my tables look like this:
General Tables
==============
Customer
+-------------+---------------+
| customer_id | customer_name |
+-------------+---------------+
| 1 | Peter |
| 2 | Sally |
+-------------+---------------+
Banner
+-----------+-------------+--------------+
| banner_id | banner_name | banner_width |
+-----------+-------------+--------------+
| 1 | Easter | 100px |
| 2 | Xmas | 250px |
+-----------+-------------+--------------+
Tables for managing custom fields
=================================
Custom_Field
+----------+------------+----------------+-----------+
| field_id | field_name | field_label | item_type |
+----------+------------+----------------+-----------+
| 100 | fav_color | Favorite Color | customer |
| 101 | border | Border | banner |
+----------+------------+----------------+-----------+
Custom_Field_Value
+----------+----------+---------+-------------+
| value_id | field_id | item_id | field_value |
+----------+----------+---------+-------------+
| 1567 | 100 | 1 | Red |
| 1568 | 100 | 2 | Blue |
| 1569 | 101 | 1 | Solid |
| 1570 | 101 | 2 | Dotted |
+----------+----------+---------+-------------+
To clarify, item_id refers to a customer_id, a banner_id, or a supplier_id, etc. In the example above this means Peter has a "favorite color" custom field set to Red, and Sally has a "favorite color" custom field set to Blue.
The Easter Banner has a "border" custom field set to solid, and the Xmas Banner has a "border" custom field set to Dotted.
This all works fine, except there can be no foreign key or referential integrity set between Custom_field_value.item_id and Customer.customer_id (or Banner.banner_id) because item_id's context is described by the item_type field in the Custom_Field table.
I don't want to create multiple nullable foreign keys (not sure that would even work anyway) as it will become unmanageable.
I did try creating sub tables, for example customer_custom_field, and relate this between Customer and Custom_Field, but again it becomes unmanageable when you consider every table could potentially have custom fields.
A single field value would only ever apply to a single entity from another table.
As an aside I also want to create an Attachments table for managing uploaded attachments to a particular entity, and again that could apply to customers, suppliers, products and various other tables, so it's a similar issue.
Edit for future viewers: Aside from the accepted answer which helped me I found some really good info here .
I've got a database with a single table for displaying inventory on a website (RVs). It stores the typical info: year, make, model, etc. I originally made it with 6 extra columns for storing "special features", but I don't like having such a hard limit on what options can be listed. Since I've never messed with more than a single table my gut instinct was to just add 24 or so more columns to cover everything, but something in my head told me that there might be a better way. So when do I decide N columns is too many? The data in these columns will commonly not be unique.
(Sorry for crappy diagram)
Current table design:
-----------------------------------------------------------------------
| id | year | make | model | price | ft_1 | ft_2 | ft_3 | ft_4 | ft_5 |
-----------------------------------------------------------------------
| | | | | | | | | | |
-----------------------------------------------------------------------
Possible better design:
table #1
------------------------------------
| id | year | make | model | price |
------------------------------------
| | | | | |
------------------------------------
table #2
---------------------------------------------
| unique_id(?) | feature | unit_ref |
---------------------------------------------
| 0 | "Diesel Pusher" | 2,6,14 |
---------------------------------------------
I feel like a bonus of the second table might be that I could more easily propagate a dropdown containing all the previously entered features to speed up adding new units to inventory.
Is this the right way to go about it, or should I just add more columns and be content?
Thanks.
Believe it or not, your best option would likely be to add a third table.
Since each record in your rvs table can be linked to multiple rows in the features table, and each feature can correspond to multiple rvs, you have a many-to-many relationship which is inherently difficult to maintain in a relational dbms. By adding a third "intersection" table you convert it to a one-to-many-to-one relationship which can be enforced declaratively by the dbms.
Your table structure would then become something like
rvs
------------------------------------
| id | year | make | model | price |
------------------------------------
| | | | | |
------------------------------------
features
--------------------------
| id | feature |
--------------------------
| 1192 | "Diesel Pusher" |
--------------------------
rv_features
----------------------
| rv_id | feature_id |
----------------------
| | |
----------------------
How do you make use of this? Suppose you want to record the fact that the 2016 Travelmore CampMaster has a 25kW diesel generator. You would first add a record to rvs like
--------------------------------------------------
| id | year | make | model | price |
--------------------------------------------------
| 0231 | 2016 | Travelmore | CampMaster | 750000 |
| 2101 | 2016 | Travelmore | Domestant | 650000 |
--------------------------------------------------
(Note the value in the id column is entirely arbitrary; its sole purpose is to serve as the primary key which uniquely identifies the record. It can encode meaningful information, but it must be something that will not change throughout the life of the record it identifies.)
You then add (or already have) the generator in the features table:
--------------------------------
| id | feature |
--------------------------------
| 1192 | Diesel Pusher 450hp |
| 3209 | diesel generator 25kW |
--------------------------------
Finally, you associate the rv to the feature with a record in rv_features:
----------------------
| rv_id | feature_id |
----------------------
| 0231 | 3209 |
| 0231 | 1192 |
| 2101 | 3209 |
----------------------
(I've added a few other records to each table for context.)
Now, to retrieve the features of the 2016 CampMaster, you use the following SQL query:
SELECT r.year, r.make, r.model, f.feature
FROM rvs r, features f, rv_features rf
WHERE r.id = rf.rv_id
AND rv.feature_id = f.id
AND r.id = '2031';
to get
----------------------------------------------------------
| year | make | model | feature |
----------------------------------------------------------
| 2016 | Travelmore | CampMaster | diesel generator 25kW |
| 2016 | Travelmore | CampMaster | Diesel Pusher 450hp |
----------------------------------------------------------
To see the rvs with a 25kW generator, change the query to
SELECT r.year, r.make, r.model, f.feature
FROM rvs r, features f, rv_features rf
WHERE r.id = rf.rv_id
AND rv.feature_id = f.id
AND f.id = '3209';
Sherantha's link to A Quick-Start Tutorial on Relational Database Design actually looks like a good intro to table design and normalization; you might find it useful.
There is a thing calles "third normal form" it says that everything without the unique ids shuld be unique. This means you need to make a table for year, a table for make a table for models etc and a table where you can combine all these ids to one connected dataset.
But this is not always practical, io think the best way to take this is something in between, like tables for entrys that repeat very often, but there dont need to be an extra table for price with unique ids, that would be overkill i think.
Based upon your scenario, if you believe no. of features columns remain same then no need for second table. And in case if there any possibility that features can be increased at any time in future then you should break up your table into two. (RVS & Features). Then create a third table that identify RVS & features as it seems there is many-to-many relationship. So I suggest you to use three tables.
I think it is better for you to be more familiar with relational database design. This is a short but great article I have found earlier.
By referring table in the link, I have table category and another table name "package" to store category id.
http://ftp.nchu.edu.tw/MySQL/tech-resources/articles/hierarchical-data.html
Category
+-------------+----------------------+--------+
| category_id | name | parent |
+-------------+----------------------+--------+
| 1 | ELECTRONICS | NULL |
| 2 | TELEVISIONS | 1 |
| 3 | TUBE | 2 |
| 4 | LCD | 2 |
| 5 | PLASMA | 2 |
| 6 | PORTABLE ELECTRONICS | 1 |
| 7 | MP3 PLAYERS | 6 |
| 8 | FLASH | 7 |
| 9 | CD PLAYERS | 6 |
| 10 | 2 WAY RADIOS | 6 |
+-------------+----------------------+--------+
Is there anyway I can left join until there is no parent left without knowing how many times I have to join?
And second question, my table "package" is only storing the last/smallest category id, for example in the table is "7 - FLASH", is that a good practices to only store the last/smallest category id and refer it back by joining the table? Will this action making the database heavy by query it back every time?
Thanks in advance!
It is not possible to do such queries in MySQL.
If you need to keep this database structure, then the fastest approach is likely to select the relevant data from the table and then process the data client-side into the approach array/join.
The above may not work well if you cannot sufficiently narrow down the number of rows to SELECT out, in which case, recursively running multiple queries may be faster. On your second query, the best approach is to do something like WHERE id IN (list_of_parent_values) rather than 1 query per parent.
Lastly if you can change your data structure, there is a way of using special tree column values to efficiently select all of the nodes out with a single SQL query. Some more work is required to insert and re-organise the tree however.
There are a number of slightly differing implementations of this, see here for one such discussion:
http://web.archive.org/web/20110606032941/http://dev.mysql.com/tech-resources/articles/hierarchical-data.html
awesome_nested_set is also a ruby implementation of this pattern:
https://github.com/collectiveidea/awesome_nested_set
I creating a database in which I have an artefact that can be associated with either a project, production or performance. I will call the relationship 'comes_from'. This relationship can be a project or a more specific version of a project such as a production or performance.
I don't want to have separate foreign keys on my artefact for each possible value of the 'comes_from' relationship as it feels wrong to have multiple attributes for the same relationship. The only way I can think of doing this is having a separate table that stores the comes_from relationship containing the id of the referenced project or more specific version along with the table the item is located in.
artefact table
+-------------+------------+
| artefact_id | comes_from | -- Foreign key to comes_from
+-------------+------------+
| 1 | 7 |
| 2 | 8 |
+-------------+------------+
comes_from table
+---------------+-----------------+---------------------------------+
| comes_from_id | comes_from (FK) | comes_from_table (FK table) |
+---------------+-----------------+---------------------------------+
| 7 | 19 | project |
| 8 | 13 | performance |
| 9 | 21 | production |
+---------------+-----------------+---------------------------------+
project table
+-------------+
| project_id |
+-------------+
| 19 |
| 20 |
+-------------+
performance table
+-----------------+
| performance_id |
+-----------------+
| 13 |
| 14 |
+-----------------+
production table
+---------------+
| production_id |
+---------------+
| 21 |
| 22 |
+---------------+
Is there a better way to do this as I am not sure I can even resolve this relationship in a SQL query and it may cause issues when I use Doctrine as an ORM on top of this database.
Your solution is good, the "comes_from_table" column could be a simple VARCHAR or INT indexed field acting as a discriminator field. However, I would remove the "comes_from" column from the "artefact" table and the "comes_from_id" column and use directly the "artefact_id" column to reference artefacts in the relationship table.
Regarding Doctrine there shouldn't be any problem, I did something similar in the past using Symfony2 and Doctrine2 for an entity called Tags where a Tag could either belong to a contact or to a contact spouse. I also created a function in the repository file where I could pass the "tag_type" as a parameter so that I could get either the contact or the contact spouse tags.