I had a question about whether or not my implementation idea is easy to work with/write queries for.
I currently have a database with multiple columns. Most of the columns are the same thing (items, but split into item 1, item 2, item 3 etc).
So I have currently in my database ID, Name, Item 1, Item 2 ..... Item 10.
I want to condense this into ID, Name, Item.
But what I want item to have is to store multiple values as different rows. I.e.
ID = One Name = Hello Item = This
That
There
Kind of like the format it looks like. Is this a good idea and how exactly would I go about doing this? I will be using no numbers in the database and all of the information will be static and will never change.
Can I do this using 1 database table (and would it be easy to match items of one ID to another ID), or would I need to create 2 tables and link them?
If so how exactly would I create 2 tables and make them relational?
Any ideas on how to implement this? Thanks!
This is a classical type of denormalized data base. Denormalization sometimes makes certain operations more efficient, but more often leads to inefficiencies. (For example, if one of your write queries was to change the name associated with an id, you would have to change many rows instead of a single one.) Denormalization should only be done for specific reasons after a fully normalized data base has been designed. In your example, a normalized data base design would be:
table_1: ID (key), Name
table_2: ID (foreign key mapped to table_1.ID), Item
You're talking about a denormalized table, which SQL databases have a difficult time dealing with. Your Item field is said to have a many-to-one relationship to the other fields. The correct things to do is to make two tables. The typical example is an album and songs. Songs have a many-to-one relationship to albums, so you could structure your ables like this:
Table Album
album_id [Primary Key]
Title
Artist
Table Song
song_id [Primary Key]
album_id [Foreign Key album.album_id]
Title
Often this example is given with a third table Artist, and you could substitute the Artist field for an artist_id field which is a Foreign Key to an Artist table's artist_id.
Of course, in reality songs, albums, and artists are more complex. One song can be on multiple albums, multiple artists can be on one album, there are multiple versions of the same song, and there are even some songs which have no album release at all.
Example:
Album
album_id Title Artist
1 White Beatles
2 Black Metallica
Song
song_id album_id Title
1 2 Enter Sandman
2 1 Back in the USSR
3 2 Sad but True
4 2 Nothing Else Matters
5 1 Helter Skelter
To query this you just do a JOIN:
SELECT * FROM Album INNER JOIN Song ON Album.album_id = Song.album_id
I don't think one table really makes sense in this case. Instead you can do:
Main Table:
ID
Name
Item Table:
ID
Item #
Item Value
Main_ID = Main Table.ID
Then when you do queries you can do a simple join
Related
I will create 3 tables in mysql:
Movies: id-name-country
Tv-Series: id-name-country
Artists: id-name-country
Instead of entering country information into these tables seperately, i am planning to create another table:
Countries: id-country
And i will make my first three tables take country data from Countries table. (So that, if the name of one country is misspelled, it will be easy just to correct in one place. Data in other tables will be updated automatically.
Can i do this with "foreign keys"?
Is this the correct approach?
Your approach so far is correct, ONLY IF by "country" in Tv-Series and Artist you mean country ID and NOT a value. And yes you can use foreign keys (country id in tv-series and artist is a foreign key linking to Countries);
Edit:
Side note: looking at your edit I feel obliged to point out that If you are planning to link Movie/TV-Show with artist you need a 4th table to maintain normalization you've got so far.
Edit2:
The usual way to decide whether you need tables is to check what kind of connection 2 tables or values have.
If it's 1 to many (like artist to country of origin), you are fine.
If you have Many to many, like Movie with Artist where 1 artist can be in multiple movies and 1 movie can have multiple artists you need a linking table.
If you have 1 to 1 relation (like customer_ID and passport details in a banking system, where they could be stored separately in customer and Passport tables, but joining them makes more sense because a banks only hold details of 1 valid passport for each customer and 1 passport can only be used by 1 person) you can merge the tables (at the risk of not meeting Normalization 3 criteria)
I am designing a database application for an award. It has a 75 year history and numerous categories that have changed over time. Right now, the design I am thinking of has two kinds of tables:
entities
people
publishers
categories
novel
movie
author
artist
and such like. Each category has data particular to that category, for example:
NOVEL
title varchar(1024)
author int #FK into people table ID
publisher int #FK into publisher table ID
year year(4)
winner bool
or
ARTIST
name int
year year(4)
winner bool
So far so good. However, there are 38 (!) of these categories that have existed over time (some do not exist anymore) and I really can't imagine doing a query for say, all of the winners from 1963 by doing:
SELECT * from table1,table2,...,table38 WHERE year=1963 and winner=TRUE;
These tables will never be that large (each category usually has at most five nominees, so even after a 100 years, there would be at most 500 rows per table and at a lot less for the early ones that aren't continued). So this isn't a performance question. It is just that that query feels very, very wrong to me, if only because every query will have to be changed every time a new category is created or an old one removed. That happens every few years or so.
The questions then are:
is this query evidence that I've designed this wrong?
if not, is there a better way to do that query?
I keep thinking there must be some way to create a lookup table which pulls from other tables, but I could be misremembering. Is there some way of doing such a thing?
Many thanks,
Glenn
You could do that with 3 tables.
First one is entities. It contains data about all publishers/artist/etc.
entities
name varchar(1024)
publisher bool
Second is data where all data from all categories is stored.
data
title varchar(1024)
author/name int #FK into people table ID
publisher int #FK into publisher table ID
year year(4)
winner bool
category int #FK into category table ID
Third is category in which you can find all categories names with their IDs.
category
ID int
name varchar(1024)
Now you have to join only three tables.
select * from entities e, data d, category c where d.name=e.name and d.category=c.id and winner=bool and year=1963
You would better to have a table for categories where you can save category key value, or just normal category table and you can save the row's id only in other table:
for example,
Table: Category
columns: id, name, slug, status, active_since, inactive_since etc...
In slug, you can keep slugified form of cat to make it easy for queries and url: for example, Industry Innovations category will be saved as industry-innovations.
In status, keep 0 or 1 to show if it is active now. You can also keep dates when it was active and when became inactive in active_since and inactive_since fields.
When you search, you can search those have status 1 for example etc. I dont think your problem is complex and it is very simple for mysql to search when you join tables.
There are projects where dozens of tables are joined and it is ok.
Let's say we have a table with these records of tags:
Category ID
apples 1
orange 2
And then we have another table with a row
Data catID
... 1
With this setup we can retrieve this row only in apples page, what is the proper way to assign both apples & orange to that row? Would I need to change catID field from integer to varchar and just add the second id so the value will be 1,2 and then edit the query to something like:
select * from table where catID LIKE '%1%'
select * from table where catID LIKE '%2%'
instead of
select * from table where catID='1'
select * from table where catID='2'
I'm not sure if this is the proper way? Could someone tell how you do it? Basically, I don't want to duplicate the whole row, just to add another id to it.
As others have already suggested, many-to-many relationship is represented in the physical model by a junction table. I'll do the leg work and illustrate that for you:
The CATEGORY_ITEM is the junction table. It has a composite PK consisting of FKs migrated from the other two tables. Example data...
CATEGORY:
CATEGORY_ID CATEGORY
----------- --------
1 Apple
2 Orange
ITEM:
ITEM_ID NAME
------- ----
1 Foo
2 Bar
CATEGORY_ITEM:
CATEGORY_ID ITEM_ID
----------- -------
1 1
2 1
1 2
The above means: "Foo is both Apple and Orange, Bar is only Apple".
The PK ensures any given combination of category and item cannot exist more than once. The category is either connected to the item of isn't - it cannot be connected multiple times.
Since you primarily want to search for items of given category, the order of fields in the PK is {CATEGORY_ID, ITEM_ID} so the underlying index can satisfy that query. The exact explanation why is beyond this scope - if you are interested I warmly recommend reading Use The Index, Luke!.
And since InnoDB uses clustering, this will also store items belonging to the same category physically close together, which may be rather beneficial for I/O of the query above.
(If you wanted to query for categories of the given item, you'd need to flip the order of fields in the index.)
Have you realized that two ids indexing one row is a typical application of bidirectional relationship management in a real project? We need a smarter solution in DB rather than the two rows/junction table solution. In MongoDB, you could make "low_id:hight_id" as field "_id" and field "uids_low_high", and indexing the "uids_low_high" for "$in:[$id]" search.
I have a categories table with id, parent and name fields. The parent field allows a category to be a subcategory of another category.
Example categories table where there are two main categories (WIDGETS and THINGAMABOBS), and WIDGETS have 3 subcategories:
id 1, parent null, name "WIDGETS"
id 2, parent 1, name "GADGETS"
id 3, parent 1, name "DOOHICKEYS"
id 4, parent 1, name "GIZMOS"
id 5, parent null, name "THINGAMABOBS
I have a products table with category field
Example products record where product is linked to the "GIZMOS" category:
id 1, category 4, name Contraption 5000
I want to be able to supply a category name in a SELECT statement and get back all products that are in that category. But not only do I want to find the above record on "GIZMOS", but I also want to be able to find it by the name "WIDGET", because MEDIUM WIDGET is a child of WIDGET. This should work across an unlimted number of levels (ie: sub-sub-sub categories)
To make this even more complicated, I want to be able to assign a product to more than one category. Perhaps they would be separated by commas? i.e.: If I wanted the Contraption 5000 to exist in the Doohickeys and Thingamabobs categories, I would put 3,5 in the category field.
Is what I'm asking possible with a single select statement?
I'm going to start at the end of your question:
To make this even more complicated, I want to be able to assign a product to more than one category. Perhaps they would be separated by commas? i.e.: If I wanted the Contraption 5000 to exist in the Doohickeys and Thingamabobs categories, I would put 3,5 in the category field.
By doing this you are creating a many-to-many relationship, in which case you'll need a third table called products_categories that holds two columns: product_id and category_id; you'd remove the category column from your products table.
If you wanted a product with id=1 to belong to categories 3 and 5, you'd create two rows in products_categories:
product_id | category_id
------------------------
1 | 3
1 | 5
Now to the first part of your question...
What you'd need to do is create a recursive query, which I know can be done in SQL Server but honestly I'm not sure can be done in MySQL. If it can be, I'm sure someone else will come up with an appropriate answer for you.
Do this in your application code! It will be much simpler and easier to maintain.
See also this similar post (actually there are many posts on this topic)
I have question about normalization.
Suppose I have an applications dealing with songs.
First I thought about doing like this:
Songs Table:
id | song_title | album_id | publisher_id | artist_id
Albums Table:
id | album_title | etc...
Publishers Table:
id | publisher_name | etc...
Artists Tale:
id | artist_name | etc...
Then as I think about normalization stuff. I thought I should get rid of "album_id, publisher_id, and artist_id in songs table and put them in intermediate tables like this.
Table song_album:
song_id, album_id
Table song_publisher
song_id, publisher_id
Table song_artist
song_id, artist_id
Now I can't decide which is the better way. I'm not an expert on database design so If someone would point out the right direction. It would awesome.
Are there any performance issues between two approaches?
Thanks
Forget about performance issues. The question is Does this model represent the data correctly?
The intermediate tables are called "junction tables" and they are useful when you can have a many-to-many relationship. For example, if you store the song "We Are the World" in your database, then you are going to have many artists for that song. Each of those artists are also responsible for creating many other songs. Therefor, to represent the data correctly, you will have to use junction tables, just as you did in the second version.
That depends. If you can guarantee that a particular song always belongs to one single album, go for your first approach. If not, you have a n-to-n relationship and need a join table: that is your second approach. Both are completely ok in terms of normalization.
It is important that you design your database in a way you can map your data to it.
Dont worry about performance here. Performance depends more on how you optimized your indexes and how your queries look like, than on having to do one more join operation or not (your second approach, the join table, would need one more join in every query).
The first structure is mixing up the semantics (e.g. writing the publisher name for each single song). The second structure will allow you to put invalid data in the database (e.g. one song can belong to two albums). Here is what I understood from the problem domain and my suggestions for the design:
One album is published by only one publisher, thus you don't need to specify the publisher in every single song, you just need to put the publisher_ID in the Albums table. Also if you keep the artist_ID in the Songs table, each one of your songs can have only one artist at a time; but by putting the song_ID and artist_ID in a linkage table you can have multiple artists for one song (like the time when 2 singers sing one song together). The publisher_id goes to albums table as each album is published by one publisher.
Also for table names it is always advised to use singular form.
Here is my suggested design:
Song Table:
id | song_title | album_id | ...
Album Table:
id | album_title | publisher_id | ...
Publisher Table:
id | publisher_name | ...
Artist Table:
id | artist_name | ...
Song_Artist Table:
song_id | artist_id | artist_role | ...
Songs can appear on multiple albums. Think a greatest hits release. Its important to zoom out of the technical muck and consider the real world use of an application (or database).
I'd stick with the first one, for two reasons:
A song is only associated with one album, one publisher and one artist, so you don't need to create separate tables for them (if, for example, a song can have more than one artist, then create the song_artist table).
It's more efficient. With the second approach you'll need to make some joins.