I'm enrolled in DBM/BI certificate program (crash course more like) and I decided to embark on an independent project to sort of implement everything i'm learning in real time. Long story short, Ill be analyzing data (boxofficemojo.com) about the top grossing 130 movies from the last 13 years ( using MySQL server/workbench. ). First i'd like to map out a schema and then do some data mining/visualization. Here's how i've split it up so far:
"Movies"
Movie_ID (Primary )
Dom_Revenue
Int_Revenue
OpWe_Revenue
Budget
"Rating"
Rating_ID (P)
Rating
"Release"
Release_ID (P)
Year
Month
Day
Movie_ID (F)
"Cast"
Director_Gender (P)
Lead_Gender (P)
Director_Name
Director_Name
Movie_ID (F)
"Studio"
Studio_ID (P)
Studio_Name
and these are my relationships so far:
rating to movies - one to many ( many movies can be rated R , a movie can only have 1 rating )
release to movies - one to many ( many movies can be released on the same weekend, a movie can only be released once)
cast to movies - one to many (directors/actors can make many movies, a movie can only have one cast)
studio to movies - many to many (movies can be attached to more than one studio, a studio can make more than one movie)
I know the schema is most likely not 100% correct so should i include the primary keys from all the other tables as foreign keys in the "movies" table? and how are my relationships?
thanks in advance
This is related to the first answer by Leo but I'll be more specific and I add more observations.
First, Release attributes are functionally dependent on Movie_ID (or Movies in general) so it should not be a separate Entity.
Second, and in relation to the first, you have Year, Month and Day in your Release Entity why not make it as Release_Date which has Year, Month and Day anyway?
Then you could make again your Release attributes as part of your Movie.
Third, and in relation to the first why not add a Movie_Title field?
So, in all-in-all then you could have the following schema:
"Movies"
Movie_ID (Primary )
Movie_Title
Dom_Revenue
Int_Revenue
OpWe_Revenue
Budget
Release_Date
You could easily query movies that are release in a certain Year like:
SELECT Movie_Title, Year(Release_Date) as Release_Year
FROM Movies
WHERE Year(Release_Date) = 2011
Or you could count it also by Year (or by Month)
SELECT Year(Release_Date) as Release_Year, COUNT(*) Number_of_Movies_in_a_Year
FROM Movies
GROUP BY Year(Release_Date)
ORDER BY Year(Release_Date)
Fourth, in your Cast entity you said "Directors/Actors can make many movies, a movie can only have one cast". But looking at your Cast you have a Movie attribute which is a FK (Foreign Key) from Movies and that means by the way that a Movie could have many Cast because the FK is always in the many side. And besides this entity is almost like a violation of the 4NF (Fourth Normal Form). So, the best way probably to do this is to make specialization in your Cast table and relate it to Movies table so that it would have One-to-Many relationship or a Cast or Director could have many movies. So, it would look like this:
"Cast"
Cast_ID (PK)
Cast_Name
Cast_Gender
Cast_Type (values here could either be Director or Lead or could be simply letters like D or L)
And your Movies table could now be changed to like this:
"Movies"
Movie_ID (Primary )
Movie_Title
Dom_Revenue
Int_Revenue
OpWe_Revenue
Budget
Release_Date
Lead_ID (FK)
Cast_ID (FK)
Lastly, you said "movies can be attached to more than one studio, a studio can make more than one movie". A Many-to-many relationship usually has a bridge table to create the many-to-many relationship between entities. So, let's say you have a Studio_Movie entity/table as your bridge table then you will have like this:
"Studio_Movie"
Studio_ID (PK, FK1)
Movie_ID (PK, FK2)
it looks ok for me.
I just think the "release" entity maybe a little bit overkill (what's the use to know what movies were released at the same time?) so I think it could just be a set of movie attributes.
And also your "cast" entity has two directors. Maybe you could normalize that and keep only 1 director (since movie 1<-->N director, it's just a matter of adding relationships)
About FKs, yes, you should add them. Your relationships look fine.
Good luck.
Related
I have the two following schemes:
Movies[title, year, director, country, rating, genre, gross, producer]
and
Actors[title, year, characterName, actor]
Now I have the following exercise
Find character names that appeared in two movies produced in different countries.
My idea was the following which doesn't really work:
SELECT characterName
FROM Actors a
JOIN Movies m
ON a.title=m.title
AND a.year=m.year
WHERE COUNT(m.title)=2
AND COUNT(DISTINCT(m.country)=2
GROUP BY m.title;
My idea was to obviously select the characterName and join both tables on title and year because they are unique values in combination. Then my plan was to get the movies that are unique (by grouping them) and find the ones with a count of 2 since we want two movies. I hope that I am right till now.
Now I have my problems, because I don't really know how to evaluate if the movies played in two different locations.
I want to somehow make sure that they play in different countries.
You are on the right track. Here is a fixed version of your original query, that should get you the results that you expect:
select a.characterName
from actors a
inner join movies m
on m.title = a.title
and m.year = a.year
group by a.characterName
having count(distinct m.coutry) >= 2
Notes on your design:
it seems like you are using (title, year) as the primary key for the movies table. This does not look like a good design (what if two movies with the same title are produced the same year?). You would be better off with an identity column (in MySQL, an autoincremented primary key), that you would refer as a foreign key in the actors table
better yet, you would probably need to create a separate table to store the masterdata of the actors, and set up a junction table, say appearances, that represents which actors interpreted which character in which movie
I'm very new to SQL, so please bear with me.
I've built a movie database and I'm trying to query it so that all my tables display properly.
I have a movies table with the columns movieID, title, releaseYear, directorID, genreID, and actorID.
Inside the table director, I have directorID and Director.
Using the query SELECT * FROM movies INNER JOIN director ON director.directorID = movies.directorID;, I'm able to get everything in tables movies and director to display (which isn't exactly what I want, but it's in the right track).
My remaining tables are actor, (with actorID and actor's names) starring (with starringID, movieID, and actorID), genre (with genreID and 22 different genres), and moviegenres (with moviegenresID, moviesID, and genreID).
I'm a bit lost and I apologize if this is confusing and messy, but I'm thinking I need to query the database so that all the tables show the data and are associated with the correct column. For example, most movies have multiple genres and actors, which is why I separated them into tables of their own.
I can't figure out how to query everything to display properly in the result grid.
Thanks in advance
Morning Guys,
Im struggling to see how these following tables can be broken up into 3NF, I Know the rules based on normalizing but cannot seen any data that needs to be moved here is how the tables look:
PlaylistID, PlaylistName, TrackID, Trackname, AlbumID, AlbumTitle, GenreID, GenreName, TrackSeconds, TrackBytes
The question is not very clear, but here goes
You should create a separate table for each ID column... and then move into that table all the related columns:
Playlist(ID, Name)
Track(ID, Name, Seconds, Bytes, AlbumID, GenreID)
Album(ID, Title)
Genre(ID, Name)
Since you will probably want to have the same track in more than one playlist, you need a many-to-many relation, which you should handle with a relation table:
PlaylistTrack(PlaylistID, TrackID)
This satisfies the 3NF, as the playlist name, the album title and the genre name are not repeated on multiple rows.
I am designing a database application for an award. It has a 75 year history and numerous categories that have changed over time. Right now, the design I am thinking of has two kinds of tables:
entities
people
publishers
categories
novel
movie
author
artist
and such like. Each category has data particular to that category, for example:
NOVEL
title varchar(1024)
author int #FK into people table ID
publisher int #FK into publisher table ID
year year(4)
winner bool
or
ARTIST
name int
year year(4)
winner bool
So far so good. However, there are 38 (!) of these categories that have existed over time (some do not exist anymore) and I really can't imagine doing a query for say, all of the winners from 1963 by doing:
SELECT * from table1,table2,...,table38 WHERE year=1963 and winner=TRUE;
These tables will never be that large (each category usually has at most five nominees, so even after a 100 years, there would be at most 500 rows per table and at a lot less for the early ones that aren't continued). So this isn't a performance question. It is just that that query feels very, very wrong to me, if only because every query will have to be changed every time a new category is created or an old one removed. That happens every few years or so.
The questions then are:
is this query evidence that I've designed this wrong?
if not, is there a better way to do that query?
I keep thinking there must be some way to create a lookup table which pulls from other tables, but I could be misremembering. Is there some way of doing such a thing?
Many thanks,
Glenn
You could do that with 3 tables.
First one is entities. It contains data about all publishers/artist/etc.
entities
name varchar(1024)
publisher bool
Second is data where all data from all categories is stored.
data
title varchar(1024)
author/name int #FK into people table ID
publisher int #FK into publisher table ID
year year(4)
winner bool
category int #FK into category table ID
Third is category in which you can find all categories names with their IDs.
category
ID int
name varchar(1024)
Now you have to join only three tables.
select * from entities e, data d, category c where d.name=e.name and d.category=c.id and winner=bool and year=1963
You would better to have a table for categories where you can save category key value, or just normal category table and you can save the row's id only in other table:
for example,
Table: Category
columns: id, name, slug, status, active_since, inactive_since etc...
In slug, you can keep slugified form of cat to make it easy for queries and url: for example, Industry Innovations category will be saved as industry-innovations.
In status, keep 0 or 1 to show if it is active now. You can also keep dates when it was active and when became inactive in active_since and inactive_since fields.
When you search, you can search those have status 1 for example etc. I dont think your problem is complex and it is very simple for mysql to search when you join tables.
There are projects where dozens of tables are joined and it is ok.
I am new to database structure and design. Currently, I am in the process of creating a course catalog that will match course description according to course name and date. I have sketched one table describing the courses, which include course_code, name and every other relevant information. Then I sketched another table linking those courses to when they will be taught.
I am missing classes that are classified as all_year. Also I am missing a way how to label the courses under a major. Since hypothetically a course can belong to several majors, putting the data from one into the other would force you to duplicate data. Any ideas how I would implement this two things to my tables design? Or suggestion in how to restructure my design. If possible please show me a query to execute in my phpmyadmin DB.
Example of table courses
id serial
course_code text
description text
Example of table course_dates
id serial
course_id serial
year date
semester
Example of table majors
major_id int
course_id int
So a populated database could contain the following:
Table courses
id course_code description
1 INF1000 "Basic programming"
2 INF1001 "More basic programming"
Table course_dates (0 for spring 1 for fall)
id course_id year semester
1 1 2012 0
2 1 2013 1
3 2 2013 1
To link courses to majors - this is a one to many relationship (one course to many majors) - you want to use a linking table that has this type of structure:
table courses_majors
major_id int
course_id int
Remember to index this table as well - its very important. Then you can populate it and have one course even go to many majors and many course to one major (many to many relationship).
Then you can run a join on the tables across this table:
select * from courses left join courses_majors on courses.id = courses_majors.course_id left join majors on courses_majors.majors_id = majors.id
Of course you can add a where clause, etc.
The other way is to create a table of majors:
majors
id int
name varchar
Then add a major_id to your courses table - this will just give you a one to one relationship from courses to majors, but many courses can join a major.
As for Yearly, I would just add a field in the database to account for this, probably a tiny int and just make it 0 or 1.