I am restructuring a classifieds MySQL db where the different main sections are separated into separate tables. For example, sale items have their own table with unique ID's, jobs have their own table with unique ID's, personals have their own table as well.
These sections all share a few common characteristics:
-id
-title
-body
-listing status
-poster
-reply email
-posting date
But they each have some separate information required as well:
-each have different sets and trees of categories to choose from (which affect the structure needed to store them)
-jobs need to store things like salary, start date, etc.
-sale items need to store things like prices, obo, etc.
Therefore, is it a better practice to refactor the db while I can to a universal table to store ALL the general listing info regardless of section, and then task out customized data storage to small tables, or is it better to leave the current structure alone and leave the sections separated?
Sounds like they are all separate entities that have nothing to do with each other (ecxept for sharing some column-definitions), right?
Do you ever want to do a SELECT like
SELECT *
FROM main_entity
WHERE entity_type IN ('SALE_ITEM', 'JOB', 'PERSONAL')?
Otherwise I don't think I would merge them into one table.
Don't use a single table. Go relational.
What I would recommend setting up is a so-called polymorphic relationship between your "main" table (the one with the common characteristics), and three tables containing specific information. The structure would look something like this:
Main table
id
title
...
category_name (VARCHAR or CHAR)
category_id (INTEGER)
Category table
id
(specific columns)
The category_name field should contain the table name of the specific category table, eg. 'job_category', while the category_id should point to ID in the category table. An example would look like this:
# MAIN TABLE
id | title | ... | category_name | category_id
-------------------------------------------------------
123 | Some title | ... | job_category | 345
321 | Another title | ... | sale_category | 543
# SPECIFIC TABLE (job_category)
id | ...
---------
345 | ...
# SPECIFIC TABLE (sale_category)
id | ...
---------
543 | ...
Now, whenever you query the main table, you will immediately know which table to fetch the additional data from, and you will know the ID in that table. The only downside to this approach is that you have to perform two separate queries to fetch information for one single item. It would probably be possible to do this in a transaction, however.
For fetching data the other way around (eg. you search the jobs_category for something), on the other hand, you can fetch the associated data from the main table with a JOIN. Remember to not only join main.category_id = jobs_category.id, but also to use the category_name column as a join condition. Otherwise, you may fetch data that belongs to one of the other categories.
For optimal performance, you may want to index the category_name and category_id columns. This would mostly speed up any queries that join the two tables, as described in the previous paragraph.
Hope this helps!
Related
A table has column as category_NAME/ID where we can pass either id and create another table for list of that category or directly add the category name.
Which one is fater
CASE 1:
TABLE1
ID | CATEGORY_NAME
CASE 2
here to fetch list we have make one JOIN statement
TABLE1
ID | CATEGORY_ID
CATEGORY_TABLE
ID | CATEGORY NAME
What is the difference between
CASE 1:
TABLE1
ID | CATEGORY_NAME
CASE 2
here to fetch list we have make one JOIN statement
TABLE1
ID | CATEGORY_ID
CATEGORY_TABLE
ID | CATEGORY NAME
I suppose what you mean by first case is Item_id.
In that case it depends on code maintainability and storage. As if category name is changed, you have to update the record in your large table,that will be heavy query.
Also as you are storing category_id instead of its name,it will help in saving storage space.As you are not replicating category name for each record but just refering to category_id.
But clearly it depends on the size of category and size of items.
If TABLE1 size is rather small (let's say less than 10k rows at max), indexed VARCHAR CATEGORY_NAME should work OK, otherwise better to use a separate table for storing categories.
Personally, I would go with option 2. Performance improvement here will not be so dramatically, but storing dictionary field inside another table is a bad practice.
Let's say we have a table with these records of tags:
Category ID
apples 1
orange 2
And then we have another table with a row
Data catID
... 1
With this setup we can retrieve this row only in apples page, what is the proper way to assign both apples & orange to that row? Would I need to change catID field from integer to varchar and just add the second id so the value will be 1,2 and then edit the query to something like:
select * from table where catID LIKE '%1%'
select * from table where catID LIKE '%2%'
instead of
select * from table where catID='1'
select * from table where catID='2'
I'm not sure if this is the proper way? Could someone tell how you do it? Basically, I don't want to duplicate the whole row, just to add another id to it.
As others have already suggested, many-to-many relationship is represented in the physical model by a junction table. I'll do the leg work and illustrate that for you:
The CATEGORY_ITEM is the junction table. It has a composite PK consisting of FKs migrated from the other two tables. Example data...
CATEGORY:
CATEGORY_ID CATEGORY
----------- --------
1 Apple
2 Orange
ITEM:
ITEM_ID NAME
------- ----
1 Foo
2 Bar
CATEGORY_ITEM:
CATEGORY_ID ITEM_ID
----------- -------
1 1
2 1
1 2
The above means: "Foo is both Apple and Orange, Bar is only Apple".
The PK ensures any given combination of category and item cannot exist more than once. The category is either connected to the item of isn't - it cannot be connected multiple times.
Since you primarily want to search for items of given category, the order of fields in the PK is {CATEGORY_ID, ITEM_ID} so the underlying index can satisfy that query. The exact explanation why is beyond this scope - if you are interested I warmly recommend reading Use The Index, Luke!.
And since InnoDB uses clustering, this will also store items belonging to the same category physically close together, which may be rather beneficial for I/O of the query above.
(If you wanted to query for categories of the given item, you'd need to flip the order of fields in the index.)
Have you realized that two ids indexing one row is a typical application of bidirectional relationship management in a real project? We need a smarter solution in DB rather than the two rows/junction table solution. In MongoDB, you could make "low_id:hight_id" as field "_id" and field "uids_low_high", and indexing the "uids_low_high" for "$in:[$id]" search.
If I have a schema similar to this:
TABLE 1
id
column
other_column
etc
TABLE 2
id
table1_id
some_other_table_id
Is it a good idea to add a third table like this:
TABLE 3
id
table2_id
row_from_another_table_id
EDIT:
To make things clearer, consider a schema like this:
EVENTS
id
name
other_stuff
RANGES
id
time_from
time_to
max_people
etc
EVENTS_PLACES
id
event_id
place_id
What I want to do is to define a time range for an event. But a specific event in a specific place(EVENTS_PLACES) can 'overwrite' this ranges. Also an event can have multiple ranges.
I hope this makes the question a little bit more clear now.
Its always been my impression that a many to many relationship is a violation Boyce-Codd Normal Form and therefore a violation of a good relational database schema.
Therefore, relating data to a link table is, infact, necessary to achieve BCNF and therefore good. If avoiding data update anomolies is good.
On to the specific schema example you presented. I think you want these logical tables (or entities),
-----------------------
EventClass
-----------------------
Id
Name
... Other attributes common to every instance
-
-----------------------
TimeSlot
-----------------------
Id
Start
End
-
-----------------------
Place
-----------------------
Id
Name
Address
MaxAttendance
... etc
-
----------------------
EventInstance
-----------------------
Id
EventClassId
TimeSlotId
PlaceId
PresenterName
...Other attributes specific to the instance
EventInstance is a realtionship between EventClass, TimeSlot and Place, any attributes specific to the EventInstance should be stored on that entity. Any attributes common to a related group of events should be stored on the EventClass attribute.
Its all a question of Database Normalization, generally speaking, the more normalized the data the better. However, there is a case for compromise when performance is a concern, if the desired data is stored in the output format it does make a select query simpler and faster although, updates might be hell.
I would counteract the case for compromise by suggesting that, with the right Indecies, Materialized Views and, indecies on Materialized Views, you can get the best of both worlds. The maintainability of fully normalized data with the speed of performance. Although, it does require some skill and consideration to get the schema right.
So you have a relation between two tables with properties, and you have a subclass of that relation with some more properties. This is rare but possible.
Suppose in your polygamous hetero dating site one or more Woman entities has a relation with one or more Man. These two tables may be coupled with a junction table, Relationship. Now some of them are married, which you consider a special type of relationship. So Marriage is a subclass of Relationship, and the Marriage table has a reference to the id in the Relationship table.
Of course, it may be simpler to solve such situations in another way, for example to simply have two junction tables between Man and Woman. But there are certainly situations in which you would want to extend on the relationship in the junction table.
Another option would be to add a column to your TABLE2 that describes the nature of the connection between "things". For example, a PERSON table and a RELATIONSHIPS table, you model your "objects" in the first table, then the "links" in the seconds, e.g.
+---------+---------+-------+--------+
| link_id | from_id | to_id | type |
+---------+---------+-------+--------+
| 1 | 2 | 3 | Mother |
| 2 | 8 | 3 | Sister |
+---------+---------+-------+--------+
With appropriate indices, this means you can do things like find all relationships for a given person, or find everyone who has a sister, etc. This is a simple example, but it starts to get interesting when the from_id and to_id can be different types of object i.e. not just people.
I'd used this approach in the past when working with a very generic schema that was aggregating data from a variety of other sources and had to be flexible. Clearly there's a trade-off between flexibility and e.g. speed, query complexity. You have to decide whether it's useful in your case.
I'm trying to select some data from a MySQL database.
I have a table containing business details, and a seperate one containing a list of trades. As we have multiple trades
business_details
id | business_name | trade_id | package_id
1 | Happy News | 12 | 1
This is the main table, contains the business name, the trade ID and the package ID
shop_trades
id | trade
1 | newsagents
This contains the trade type of the business
configuration_packages
id | name_of_trade_table
1 | shop_trades
2 | leisure_trades
This contains the name of the trade table to look in
So, basically, if I want to find the trade type (e.g., newsagent, fast food, etc) I look in the XXXX_trades table. But I first need to look up the name of XXXX from the configuration_packages table.
What I would normally do is 2 SQL queries:
SELECT business_details.*, configuration_packages.name_of_trade_table
FROM business_details, configuration_packages
WHERE business_details.package_id = configuration_packages.id
AND business_details.id = '1'
That gives me the name of the database table to look in for the trade name, so I look up the name of the table
SELECT trade FROM XXXX WHERE id='YYYY'
Where XXXX is the name of the table returned as part of the first query and YYYY is the id of the package, again returned from the first query.
Is there a way to combine these two queries so that I only run one?
I've used subqueries before, but only on the SELECT side of the query - not the FROM side.
Typically, this is handled by a union in a single query.
Normalization gets you to a logical model. This helps better understand the data. It is common to denormalize when implementing the model. Subtypes as you have here are commonly implemented in two ways:
Seperate tables as you have, which makes retrieval difficult. This results in your question about how to retreive the data.
A common table for all subtypes with a subtype indicator. This may result in columns which are always null for certain subtypes. It simplifies data access, and may alter the way that the subtypes are handled in code.
If the extra columns for a subtype are relatively rarely accessed, then you may use a hybrid implementation where the common columns are in the type table, and some or all of the subtype columns are in a subtype table. This is more complex to code.
That's not possible, and it sounds like a problem with your model.
Why don't you put shop_trades and leisure_traces into the same table with one column to distinct between the two?
If this is possible, try this
SELECT trade
FROM (SELECT 'TABLE_NAME' FROM 'INFORMATION_SCHEMA'.'TABLES'
WHERE 'TABLE_SCHEMA'='*schema name*')
WHERE id='YYYY'
UPDATE:
I think the code I have above is not possible. :|
I'm planing to build some database project.
One of the tables have a lot of attributes.
My question is: What is better, to divide the the class into 2 separate tables or put all of them into one table. below is an example
create table User { id, name, surname,... show_name, show_photos, ...)
or
create table User { id, name, surname,... )
create table UserPrivacy {usr_id, show_name, show_photos, ...)
The performance i suppose is similar due to i can use index.
It's best to put all the attributes in the same table.
If you start storing attribute names in a table, you're storing meta data in your database, which breaks first normal form.
Besides, keeping them all in the same table simplifies your queries.
Would you rather have:
SELECT show_photos FROM User WHERE user_id = 1
Or
SELECT up.show_photos FROM User u
LEFT JOIN UserPrivacy up USING(user_id)
WHERE u.user_id = 1
Joins are okay, but keep them for associating separate entities and 1->N relationships.
There is a limit to the number of columns, and only if you think you might hit that limit would you do anything else.
There are legitimate reasons for storing name value pairs in a separate table, but fear of adding columns isn't one of them. For example, creating a name value table might, in some circumstances, make it easier for you to query a list of attributes. However, most database engines, including PDO in PHP include reflection methods whereby you can easily get a list of columns for a table (attributes for an entity).
Also, please note that your id field on User should be user_id, not just id, unless you're using Ruby, which forces just id. 'user_id' is preferred because with just id, your joins look like this:
ON u.id = up.user_id
Which seems odd, and the preferred way is this:
ON u.user_id = up.user_id
or more simply:
USING(user_id)
Don't be afraid to 'add yet another attribute'. It's normal, and it's okay.
I'd say the 2 separate tables especially if you are using ORM. In most cases its best to have each table correspond to a particular object and have its field or "attributes" be things that are required to describe that object.
You don't need 'show_photos' to describe a User but you do need it to describe UserPrivacy.
You should consider splitting the table if all of the privacy attributes are nullable and will most probably have values of NULL.
This will help you to keep the main table smaller.
If the privacy attributes will mostly be filled, there is no point in splitting the table, as it will require extra JOINs to fetch the data.
Since this appears to be a one to one relationship, I would normally keep it all in one table unless:
You would be near the limit of the number of bytes that can be stored in a row - then you should split it out.
Or if you will normally be querying the main table separately and won't need those fields much of the time.
If some columns is (not identifiable or dependent on the primary key) or (values from a definite/fixed set is being used repeatedly) of the Table make a Different Table for those columns and maintain a one to one relationship.
Why not have a User table and Features table, e.g.:
create table User ( id int primary key, name varchar(255) ... )
create table Features (
user_id int,
feature varchar(50),
enabled bit,
primary key (user_id, feature)
)
Then the data in your Features table would look like:
| user_id | feature | enabled
| -------------------------------
| 291 | show_photos | 1
| -------------------------------
| 291 | show_name | 1
| -------------------------------
| 292 | show_photos | 0
| -------------------------------
| 293 | show_name | 0
I would suggest something differnet. It seems likely that in the future you will be asked for 'yet another attribute' to manage. Rather than add a column, you could just add a row to an attributes table:
TABLE Attribute
(
ID
Name
)
TABLE User
(
ID
...
)
TABLE UserAttributes
(
UserID FK Users.ID
Attribute FK Attributes.ID
Value...
)
Good comments from everyone. I should have been clearer in my response.
We do this quite a bit to handle special-cases where customers ask us to tailor our site for them in some way. We never 'pivot' the NVP's into columns in a query - we're always querying "should I do this here?" by looking for a specific attribute listed for a customer. If it is there, that's a 'true'. So rather than having these be a ton of boolean-columns, most of which would be false or NULL for most customers, AND the tendency for these features to grow in number, this works well for us.