Should a boolean field or a separate table be used? - mysql

In order to learn MySQL I'm building a music CD database, which is pretty complex but so far I'm doing rather well. I have set up, among others, a table with albums, another with artists and an album_artists one which links album_id's with artist_id's. But in an album with various artists, usually one, or some, of them are the main artists, so when making a query I shouldn't order them by alphabetical or id order, but by primary or secondary. Question is:
Should I make a separate table of secondary_artists, identical to the original album_artists one, or make a boolean isPrimary field in the album_artists table? Are both ways acceptable?

Many bibliographic / discographic schemes for recording multiple creators assign an ordinal number to each contributor. So, instead of a flag indicating "primary", your album_artist table could contain
album_id artist_id artist_order
So if
"Daylight Again" had album_id 314,
David Crosby had artist_id 87,
Steven Stills had artist id 33,
Graham Nash had artist id 50,
your album_artist table would have these rows.
album_id artist_id artist_order
312 87 1
312 33 2
312 50 3
This would give you sufficient information to get the artists in the order mentioned in the work (which is the right order for most catalogs).
Don't put "secondary" artists in a different table.

Related

Mysql tables link with each other

I will create 3 tables in mysql:
Movies: id-name-country
Tv-Series: id-name-country
Artists: id-name-country
Instead of entering country information into these tables seperately, i am planning to create another table:
Countries: id-country
And i will make my first three tables take country data from Countries table. (So that, if the name of one country is misspelled, it will be easy just to correct in one place. Data in other tables will be updated automatically.
Can i do this with "foreign keys"?
Is this the correct approach?
Your approach so far is correct, ONLY IF by "country" in Tv-Series and Artist you mean country ID and NOT a value. And yes you can use foreign keys (country id in tv-series and artist is a foreign key linking to Countries);
Edit:
Side note: looking at your edit I feel obliged to point out that If you are planning to link Movie/TV-Show with artist you need a 4th table to maintain normalization you've got so far.
Edit2:
The usual way to decide whether you need tables is to check what kind of connection 2 tables or values have.
If it's 1 to many (like artist to country of origin), you are fine.
If you have Many to many, like Movie with Artist where 1 artist can be in multiple movies and 1 movie can have multiple artists you need a linking table.
If you have 1 to 1 relation (like customer_ID and passport details in a banking system, where they could be stored separately in customer and Passport tables, but joining them makes more sense because a banks only hold details of 1 valid passport for each customer and 1 passport can only be used by 1 person) you can merge the tables (at the risk of not meeting Normalization 3 criteria)

How to correctly structure schema

I'm enrolled in DBM/BI certificate program (crash course more like) and I decided to embark on an independent project to sort of implement everything i'm learning in real time. Long story short, Ill be analyzing data (boxofficemojo.com) about the top grossing 130 movies from the last 13 years ( using MySQL server/workbench. ). First i'd like to map out a schema and then do some data mining/visualization. Here's how i've split it up so far:
"Movies"
Movie_ID (Primary )
Dom_Revenue
Int_Revenue
OpWe_Revenue
Budget
"Rating"
Rating_ID (P)
Rating
"Release"
Release_ID (P)
Year
Month
Day
Movie_ID (F)
"Cast"
Director_Gender (P)
Lead_Gender (P)
Director_Name
Director_Name
Movie_ID (F)
"Studio"
Studio_ID (P)
Studio_Name
and these are my relationships so far:
rating to movies - one to many ( many movies can be rated R , a movie can only have 1 rating )
release to movies - one to many ( many movies can be released on the same weekend, a movie can only be released once)
cast to movies - one to many (directors/actors can make many movies, a movie can only have one cast)
studio to movies - many to many (movies can be attached to more than one studio, a studio can make more than one movie)
I know the schema is most likely not 100% correct so should i include the primary keys from all the other tables as foreign keys in the "movies" table? and how are my relationships?
thanks in advance
This is related to the first answer by Leo but I'll be more specific and I add more observations.
First, Release attributes are functionally dependent on Movie_ID (or Movies in general) so it should not be a separate Entity.
Second, and in relation to the first, you have Year, Month and Day in your Release Entity why not make it as Release_Date which has Year, Month and Day anyway?
Then you could make again your Release attributes as part of your Movie.
Third, and in relation to the first why not add a Movie_Title field?
So, in all-in-all then you could have the following schema:
"Movies"
Movie_ID (Primary )
Movie_Title
Dom_Revenue
Int_Revenue
OpWe_Revenue
Budget
Release_Date
You could easily query movies that are release in a certain Year like:
SELECT Movie_Title, Year(Release_Date) as Release_Year
FROM Movies
WHERE Year(Release_Date) = 2011
Or you could count it also by Year (or by Month)
SELECT Year(Release_Date) as Release_Year, COUNT(*) Number_of_Movies_in_a_Year
FROM Movies
GROUP BY Year(Release_Date)
ORDER BY Year(Release_Date)
Fourth, in your Cast entity you said "Directors/Actors can make many movies, a movie can only have one cast". But looking at your Cast you have a Movie attribute which is a FK (Foreign Key) from Movies and that means by the way that a Movie could have many Cast because the FK is always in the many side. And besides this entity is almost like a violation of the 4NF (Fourth Normal Form). So, the best way probably to do this is to make specialization in your Cast table and relate it to Movies table so that it would have One-to-Many relationship or a Cast or Director could have many movies. So, it would look like this:
"Cast"
Cast_ID (PK)
Cast_Name
Cast_Gender
Cast_Type (values here could either be Director or Lead or could be simply letters like D or L)
And your Movies table could now be changed to like this:
"Movies"
Movie_ID (Primary )
Movie_Title
Dom_Revenue
Int_Revenue
OpWe_Revenue
Budget
Release_Date
Lead_ID (FK)
Cast_ID (FK)
Lastly, you said "movies can be attached to more than one studio, a studio can make more than one movie". A Many-to-many relationship usually has a bridge table to create the many-to-many relationship between entities. So, let's say you have a Studio_Movie entity/table as your bridge table then you will have like this:
"Studio_Movie"
Studio_ID (PK, FK1)
Movie_ID (PK, FK2)
it looks ok for me.
I just think the "release" entity maybe a little bit overkill (what's the use to know what movies were released at the same time?) so I think it could just be a set of movie attributes.
And also your "cast" entity has two directors. Maybe you could normalize that and keep only 1 director (since movie 1<-->N director, it's just a matter of adding relationships)
About FKs, yes, you should add them. Your relationships look fine.
Good luck.

MySql: Store multiple choice data in database

I have a list of checkboxes in my form, user may chose any of them, or just all of them.
Think that user selects the type of sport he is interested.
I need the best database structure to store this user choise. So that, in future I can get all this data.
I think, I just can store each (userID, sport) choise as a new row in database table. But it is confusing me, because table will expand faster with just a few number of users.
Any ideas, brothers?
You can setup a many-to-many table such as:
FavoriteSports
------
id user_id sport_id
1 5 20
Where you have:
User
-------
id name
5 Mike
Sport
-----
id name
20 Football
This makes sense because a user has many sports, and a sport has many users.
Deciding how to do this is called normalizing.
There are multiple ways to do this depending on how normalized you want your data.
The simplest way is what you described.
userID userName sport
Or you can have 2 tables
users
userID userName sportID
sports
sportID sport
Or you can have 3 tables
users
userID sportName
sports
sportID sportName
user_sports
userID sportID
Where the user_sports table contains which user likes which sport.
Which method you chose depends on the relationships of your data and how much duplication you expect.
If you are only storing which sport a user has chosen, I would choose the second one. That prevents duplication of sport names but only allows one sport per user. If you want to allow users to choose multiple sports, use the third option.

MySQL Database column having multiple values

I had a question about whether or not my implementation idea is easy to work with/write queries for.
I currently have a database with multiple columns. Most of the columns are the same thing (items, but split into item 1, item 2, item 3 etc).
So I have currently in my database ID, Name, Item 1, Item 2 ..... Item 10.
I want to condense this into ID, Name, Item.
But what I want item to have is to store multiple values as different rows. I.e.
ID = One Name = Hello Item = This
That
There
Kind of like the format it looks like. Is this a good idea and how exactly would I go about doing this? I will be using no numbers in the database and all of the information will be static and will never change.
Can I do this using 1 database table (and would it be easy to match items of one ID to another ID), or would I need to create 2 tables and link them?
If so how exactly would I create 2 tables and make them relational?
Any ideas on how to implement this? Thanks!
This is a classical type of denormalized data base. Denormalization sometimes makes certain operations more efficient, but more often leads to inefficiencies. (For example, if one of your write queries was to change the name associated with an id, you would have to change many rows instead of a single one.) Denormalization should only be done for specific reasons after a fully normalized data base has been designed. In your example, a normalized data base design would be:
table_1: ID (key), Name
table_2: ID (foreign key mapped to table_1.ID), Item
You're talking about a denormalized table, which SQL databases have a difficult time dealing with. Your Item field is said to have a many-to-one relationship to the other fields. The correct things to do is to make two tables. The typical example is an album and songs. Songs have a many-to-one relationship to albums, so you could structure your ables like this:
Table Album
album_id [Primary Key]
Title
Artist
Table Song
song_id [Primary Key]
album_id [Foreign Key album.album_id]
Title
Often this example is given with a third table Artist, and you could substitute the Artist field for an artist_id field which is a Foreign Key to an Artist table's artist_id.
Of course, in reality songs, albums, and artists are more complex. One song can be on multiple albums, multiple artists can be on one album, there are multiple versions of the same song, and there are even some songs which have no album release at all.
Example:
Album
album_id Title Artist
1 White Beatles
2 Black Metallica
Song
song_id album_id Title
1 2 Enter Sandman
2 1 Back in the USSR
3 2 Sad but True
4 2 Nothing Else Matters
5 1 Helter Skelter
To query this you just do a JOIN:
SELECT * FROM Album INNER JOIN Song ON Album.album_id = Song.album_id
I don't think one table really makes sense in this case. Instead you can do:
Main Table:
ID
Name
Item Table:
ID
Item #
Item Value
Main_ID = Main Table.ID
Then when you do queries you can do a simple join

mySQL and general database normalization question

I have question about normalization.
Suppose I have an applications dealing with songs.
First I thought about doing like this:
Songs Table:
id | song_title | album_id | publisher_id | artist_id
Albums Table:
id | album_title | etc...
Publishers Table:
id | publisher_name | etc...
Artists Tale:
id | artist_name | etc...
Then as I think about normalization stuff. I thought I should get rid of "album_id, publisher_id, and artist_id in songs table and put them in intermediate tables like this.
Table song_album:
song_id, album_id
Table song_publisher
song_id, publisher_id
Table song_artist
song_id, artist_id
Now I can't decide which is the better way. I'm not an expert on database design so If someone would point out the right direction. It would awesome.
Are there any performance issues between two approaches?
Thanks
Forget about performance issues. The question is Does this model represent the data correctly?
The intermediate tables are called "junction tables" and they are useful when you can have a many-to-many relationship. For example, if you store the song "We Are the World" in your database, then you are going to have many artists for that song. Each of those artists are also responsible for creating many other songs. Therefor, to represent the data correctly, you will have to use junction tables, just as you did in the second version.
That depends. If you can guarantee that a particular song always belongs to one single album, go for your first approach. If not, you have a n-to-n relationship and need a join table: that is your second approach. Both are completely ok in terms of normalization.
It is important that you design your database in a way you can map your data to it.
Dont worry about performance here. Performance depends more on how you optimized your indexes and how your queries look like, than on having to do one more join operation or not (your second approach, the join table, would need one more join in every query).
The first structure is mixing up the semantics (e.g. writing the publisher name for each single song). The second structure will allow you to put invalid data in the database (e.g. one song can belong to two albums). Here is what I understood from the problem domain and my suggestions for the design:
One album is published by only one publisher, thus you don't need to specify the publisher in every single song, you just need to put the publisher_ID in the Albums table. Also if you keep the artist_ID in the Songs table, each one of your songs can have only one artist at a time; but by putting the song_ID and artist_ID in a linkage table you can have multiple artists for one song (like the time when 2 singers sing one song together). The publisher_id goes to albums table as each album is published by one publisher.
Also for table names it is always advised to use singular form.
Here is my suggested design:
Song Table:
id | song_title | album_id | ...
Album Table:
id | album_title | publisher_id | ...
Publisher Table:
id | publisher_name | ...
Artist Table:
id | artist_name | ...
Song_Artist Table:
song_id | artist_id | artist_role | ...
Songs can appear on multiple albums. Think a greatest hits release. Its important to zoom out of the technical muck and consider the real world use of an application (or database).
I'd stick with the first one, for two reasons:
A song is only associated with one album, one publisher and one artist, so you don't need to create separate tables for them (if, for example, a song can have more than one artist, then create the song_artist table).
It's more efficient. With the second approach you'll need to make some joins.