Best approach to normalizing an existing multi-column, multi-string table? - mysql

I am new to mysql, so help would be much appreciated :-)
Let's take the movie db example:
movie_td (mov_id auto_increment pk, title, year, duration)
actor_td (act_id auto_increment pk, name)
director_td (dir_id auto_increment pk, name)
movie_actor_td (movie_id fk, actor_id fk)
movie_director_td (movie_id fk, director_id fk)
I understand how to insert a .csv type of a file into a single td where all the names are stored in one column, but it's a little bit confusing to do this in a normalized format. If I already have all the data stored in one table, does it make sense to create a static mov_id first so that I can reference the rest of columns to it? Or is there a better way of doing this?
Thanks!

If you will store all the data in one table, you will face issue if any of your movie has multiple actors or has more than one directors.
This normalized database approach is better to avoid insert, update and delete anomalies of redundant data in database tables.
Also, you will have to write same name(for actor/director) for each row of the movie if same actor is concerned to many movies. Thus, updating actor/director name in a particular row and not in other rows will create inconsistency in the names of actor/director in the table.

If you go by definition, a relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain. (Source: wikipedia.org).
Hence, when you insert multiple values separated by comma in a row, you are violating the first NF itself! This is because there is a many-to-many relationship among data and you are not mapping it correctly.
Moreover, you ask a very basic question- If I already have all the data stored in one table, does it make sense to create a static mov_id first so that I can reference the rest of columns to it? - well, if you just want to have all the data stored in one table, why not go for XML? You will have one single file storing all the relevant data. But the fact is, you can not run a complete application using XML. XML has different purpose, database tables have different purpose. You do need a data structure that can be queried however you want and not worry about how the storage is happening. I would suggest you read Korth's book on database design.
Coming over to designing databases and table structures, it doesn't matter whether you know how to store a .csv file into a column or not. What matters is how long it is going to take to develop the complicated code to fetch values from the CSV column. It is always better to write a few simple queries than complicated search loops to fetch values.
Let's look a the example you have posted. I'd take only three tables from it.
Consider the table movie_td (I don't understand the reason behind the _td part but I'll stick to it because you posted it.) This table stores information about a movie. Now, in the real world, a movie may have multiple attributes (columns) like title, release date (now, that too depends on the region where it is released, it may have multiple release dates as per region, it's a different story altogether), running time, name of the director(I've only watched movies by single director or director duo so far. I'm yet to see a multi-director movie ;), etc.
We must consider two facts here:
A movie has multiple actors portraying multiple characters.
An actor may have acted in multiple movies.
This gives us with a many-to-many relationship between actors and movies and this is where the table movie_actor_td comes into picture. This table stores which movie has which actor cast in it, with movie_id and actor_id each being a foreign key. A movie may have multiple entries in this table against those many actors. An actor may also have multiple entries in this table against those many movies, so a mutual many-to-many relationship is maintained among these.
A major reason to have this sort of structure is querying the tables. If you store the names of the actors comma separated in the movies table, you have no means to drill down data for the actors using actor_id- you cannot get the actor's other details like their date of birth and other biodata.
What if someone asks you how many movies has the actor foo done? Would you go looking for the actor's name in the CSV column in every row? How fast would it be?
But now that you have the given table structure, you can find that out by a simple query like this:
SELECT count(*)
FROM movie_actor_td
WHERE actor_id = (SELECT actor_id
FROM actor_td
WHERE name = 'foo');
Let's consider an even more complex example. For this, I'd take the freedom to add a column character_name to the table movie_actor_td, as an actor usually plays a single character in a movie. So your movie_actor_td table would look like:
movie_actor_td (movie_id, actor_id, character_name)
So now, there is an actor who played James Bond in movie Goldeneye that was released in 1996. I don't know his name. I want to know how many movies has he done in year 2002. I'd simply put a query like:
SELECT COUNT(*)
FROM movie_actor_td
WHERE actor_id = (SELECT actor_id
FROM movie_actor_td
WHERE movie_id = (SELECT movie_id
FROM movie_td
WHERE name = 'Goldeneye'
AND release_year = 1996)
AND character_name = 'James Bond');
Can you fetch that so easily if you have all the data stored in a single CSV column? I doubt that. I'd suggest you continue with the current schema in hand.
EDIT
You ask about creating a static mov_id first and the reference all the other columns to it. I think you need to read further about primary keys, foreign keys and database constraints first. Then read about auto-increnemted column values in MySQL.

Related

Database design - Many tables with unique tags or one table with all of them?

I'm working on the database (MySQL) - car dealership. Since the product (car) has a lot of features and unique values (gearbox, model, manufacturer...), I wonder, how to create a well designed database for it.
Should I use:
Table cars
columns -> id, name, manufacturer, model, gearbox...
Or:
Table cars
columns -> id, name, manufacturer_id, gearbox_id...
Table manufacturers
columns -> id, name
Table gearbox
columns -> id, name
There are a lot of unique values as I mentioned and I think it's not good to store them again and again, but if I create a lot of tables + link them with link table to product table (car), there will be a lot of joins when I make a query to get all of the values.
And these are only few of them, there are much more values I need to store for every product in the database.
You have 3 options here:
You could store each car as a separate table and then have a row corresponding to the gearbox, etc. This is awful, no one does it, don't do it.
You could serialize all the gearbox, etc. data as json strings and put them in your car cells. This is also awful, some people have stupidly done this, but not that often. Don't do it.
You could do things the normal, good way and implement separate tables for every class of object with foreign keys linking them. This is the way to go.

Tracking with a Database

I'm not looking for the answer, I am just looking for some guidance or a little clarity here. I need to design a database as if I worked for redbox and I'm trying to track movies actors and directors. So I am assuming I need three different tables but I just don't understand how to "track" it. Would I create a custom ID for each movie and something that tracks where the kiosks are? Like I said, I think I can do this but I just fully understand it.
Any help is appreciated
In broad strokes here is what you need:
(Basic relational rules and strategy apply, so every table needs to have a Primary Key, and the keys will be used to relate the tables together).
movie:
One row per movie, with title, rating, year, etc.
person:
Add to that a related person table with one row for any person who might be a cast or crew member in any film.
credit:, credit_type
Now relate Movie <-> Person
Since this is a many to many relationship you need a table between the two. Typically this would be called "credit" and you need a credit_type table that will describe the credit (actor, director, writer, producer, etc).
Of course that has nothing to do with your "tracking" question. For that you would need a slew of tables:
inventory:
Here is where you have one row for every copy of a movie that exists. It should be obvious that there will be a foreign key for a movie in this table. In the real world there would be an assigned id that would then be printed out as a barcode and attached to the disk + sleeve of the physical material.
kiosk:
For every Kiosk there is a row, along with location information, which could be an address perhaps along with a note, in case there are multiple kiosks at the same location.
kiosk_bin:
For every Kiosk, you will have a 1-M bins, each with a number identifying it.
I wouldn't do it this way, but you could for simplicity add a column in kiosk_bin that would be a foreign key to the inventory table. In this way you are able to indicate that an inventory (a single copy of one particular movie) is sitting in a kiosk_bin.
member:
These are the people subscribed to the service.
member_checkout:
When a member gets a movie from a kiosk/kiosk_bin, a row gets created here, with the inventory_id, and the date, and the system would update the kiosk_bin row to remove the inventory_id and show that the bin is now empty and could accept another inventory copy.
As you can see, this is non-trivial. Database design of any relatively complicated business process is going to be more than 3 tables, I'm sorry to say.
Here's an ERD that illustrates some of the basic movie to credit relations I did for another similar question. The tables were named a bit differently but you should be able to match them up.

Designing a schema for players and tournaments

I am beginner in SQL, I have a simple MySQL database which contains two tables:
Players (id, name)
Tournaments (id, name, participants)
I want to save information about participants of every tournament. The first idea that I have is that participants should contain a large number of id fields from the players table but that doesn't seem good.
How should I design this in the correct way?
Make another table called Participants with two fields. Player_ID and Tournament_ID. This table can hold as many lines as it needs to to correlate who played when, and you can cross reference it as needed.
Remove "participants" from the second table and add a third table: TournamentPlayers (playerid, tournamentid).
You should avoid storing multiple values in one field. That would break the first normal form of database design (1NF), which states that only atomic values can be stored in one field. Relational database systems are not well-suited to cope with non-normalized data, and you will have a hard time writing queries for non-1NF tables.

What's the best approach to designing a database that keeps track of orders and wish lists?

The best way to describe this scenario is to use an example. Consider Netflix: do they
store their orders (DVD's they mail out) in a separate table from their member lists (NOT members table, but a joiner table of members and movies--a list of movies each member has created), or are orders distinguished by using additional information in the same row of the same table?
For those not familiar with Netflix, imagine a service that lets you create a wish list of movies. This wish list is subsequently sent to you incrementally, say two movies at a time.
I would like to implement a similar idea using a MySQL database, but I am unsure whether to create two tables (one for orders and one for lists) and dynamically move items from the lists table to the orders table (this process should be semi-automatic based on the member returning an item, where before a new one is sent out, a table with some controls will be checked to see if the user is still eligible/has not gone over his monthly limit)...
Thoughts and pros and cons would be fantastic!
EDIT: my current architecture is: member, items, members_items, what I am asking is if to store orders in the same table as members_items or create a separate table.
Moving things from one database table to another to change its status is simply bad practice. In a RDBMS, you relate rows from one table to other rows in other tables using primary and foreign key constraints.
As for your example, I see about four tables just to get started. Comparing this to Netflix, the grand-daddy of movie renting, is a far-cry from reality. Just keep that in mind.
A User table to house your members.
A Movie table that knows about all of the available movies.
A Wishlist or Queue table that has a one-to-many relationship between a User and Movies.
An Order or Rental table that maps users to the movies that are currently at home.
Statuses of the movies in the Movie table could be in yet another table where you relate a User to a Movie to a MovieStatus or something, which brings your table count to 6. To really lay this out and design it properly you may end up with even more, but hopefully this sort of gives you an idea of where to begin.
EDIT: Saw your update on exactly what you're looking for. I thought you were designing from scratch. The simple answer to your question is: have two tables. Wishlists (or member_items as you have them) and Orders (member_orders?) are fundamentally different so keeping them separated is my suggestion.
A problem with storing orders in the members table is that there's a variable number (0, 1, or several) of orders per member. The way to do this using a relational database is to have two separate tables.
I feel like they would store their movies as follows (simplified of course):
tables:
Titles
Members
Order
Order_Has_Titles
This way an order which has a foreign key to the Members would then have a pivot table as many orders could have many titles apart of them.
When you have a many to many realtionship in the database you then need to create a pivot table:
Order_Has_Titles:
ID (auto-inc)
Order_FkId (int 11)
Title_FkId (int 11)
This way you're able to put multiple movies apart of each order.
Of course this is simplified, and you would have many other components which would be apart of it, however at a basic level, you can see it here.

Database Formatting for Album Tracks

I would like to store album's
track names in a single field in a
database.
The number of tracks are arbitrary
for each album.
Each album is one record in the table.
Each track must be linked to a specific URL which also should be stored in the database somewhere.
Is it possible to do this by storing them in a single field, or is a relational table for the track names/urls the only way to go?
Table: Album
ID/PK (your choice of primary key philosophy)
AlbumName
Table: Track
ID/PK (optional, could make AlbumFK, TrackNumber the primary key)
AlbumFK REFERENCES (Album.PK)
TrackNumber
TrackName
TrackURL
It's entirely possible, you could store the field as comma-separated or XML data for example.
Whether it's sensible is another question - if you ever want to query how many albums have more than 10 tracks for example you aren't going to be able to write an SQL query for that and you'll have to resort to pulling the data back into your application and dissecting it there which is not ideal.
Another option is to store the data in a separate "tracks" table (i.e. normalised), but also provide a view on those tables that gives the data as a single field in a denormalised manner. Then you get the benefit of properly structured data and the ability to query the data as a single field from the view.
Conventional approach would be to have one table with a row for each track (with any meta data). Have another table for each Album, and a third table that records the association for which tracks are on which album(s) and in which order.
Use two tables, one for albums, and one for tracks.
Album
-----
Id
Name
Artist
etc...
Track
-----
Id
AlbumId(Foreign Key to Album Table)
Name
URL
You could also augment this with a third table that joined the trackId and AlbumId fields (so don't have the AlbumId in the Track table). The advantage of this second approach would be that it would allow you to reuse a recording when it appeared on many albums (such as compilations).
The Wikipedia article on Database Normalization makes a reasonable effort to explain the purpose of normalization ... and the sorts of anomalies that the normalization rules are intended to prevent.