I am setting up a database where I have movies, directors, camera operator, composer of score etc.
Now for each movie there is a row with title, description etc. Also, there should be a column for the director, composer and so forth.
I do not want to repeat the name of a director for every row in the film table, so I put the directors in a different table, the composers in a different table and so on.
In the film table, I then reference with a foreign key to to data of the director, composer etc.
To get ALL data of a movie, I would create a view where I join all the tables together so that a single MySQL query would give me the human readable information of a movie.
Is this a good way to do this? Or are several left joins not a good idea?
The view would look like this (kindof)
SELECT `movies`.`title` as `title`, `movies`.`description` as [...],
`directors`.`name` as `name` [...], `composers`.`name` as `cname` [...]
from
`movies`
left join
`directors` on `movies`.`directors_id`=`directors`.`id`
left join
`movies`.`musicians_id` = `musicians`.`id` [...]
Would that be an efficient way to do it?
As commented by Gordon Linoff, this is called normalization and in general is a good way of storing the data.
You certainly need LEFT joins in your way of defining the tables. If a LEFT join is replaced with INNER JOIN, then there will be no row at all in the result for a movie that has no musician or no director.
However, your proposed normalization will end up with many tables for the different creative roles in the film industry. Even to cover the major creative roles for which there are Oscars and other awards, you will need ten or more tables.
A common way to avoid this is to have a single table for movies and a single table for creatives. A third table is used to connect the two in different roles. For example:
TABLE CREATIVE_ROLES_LINK ( LIKN_ID INT, MOVIE_ID INT, ROLE_NAME VARCHAR(30), PERSON_ID INT);
The above shows the basic idea. In practice, even more normalization can be done with ROLE_NAME stored in another table.
Related
I'm very new to SQL, so please bear with me.
I've built a movie database and I'm trying to query it so that all my tables display properly.
I have a movies table with the columns movieID, title, releaseYear, directorID, genreID, and actorID.
Inside the table director, I have directorID and Director.
Using the query SELECT * FROM movies INNER JOIN director ON director.directorID = movies.directorID;, I'm able to get everything in tables movies and director to display (which isn't exactly what I want, but it's in the right track).
My remaining tables are actor, (with actorID and actor's names) starring (with starringID, movieID, and actorID), genre (with genreID and 22 different genres), and moviegenres (with moviegenresID, moviesID, and genreID).
I'm a bit lost and I apologize if this is confusing and messy, but I'm thinking I need to query the database so that all the tables show the data and are associated with the correct column. For example, most movies have multiple genres and actors, which is why I separated them into tables of their own.
I can't figure out how to query everything to display properly in the result grid.
Thanks in advance
i am trying to implement a database which has multi valued attributes and create a filter based search. For example i want my people_table to contain id, name, address, hobbies, interests (hobbies and interests are multi-valued). The user will be able to check many attributes and sql will return only those who have all of them.
I made my study and i found some ways to implement this but i can't decide which one is the best.
The first one is to have one table with the basic info of people (id, name, address), two more for the multi-valued attributes and one more which contains only the keys of the other tables (i understand how to create this tables, i don't know yet how to implement the search).
The second one is to have one table with the basic info and then one for each attribute. So i will have 20 or more tables (football, paint, golf, music, hiking etc.) which they only contain the ids of the people. Then when the user checks the hobbies and the activities i am going to get the desired results with the use of the JOIN feature (i am not sure about the complexity, so i don't know how fast is going to be if the user do many checks).
The last one is an implementation that i didn't find on internet (and i know there is a reason :) ) but in my mind is the easiest to implement and the fastest in terms of complexity. Use only one table which will have the basic infos as normal and also all the attributes as boolean variables. So if i have 1000 people in my table there are going to be only 1000 loops and which i imagine with the use of AND condition are going to be fast enough.
So my question is: can i use the the third implementation or there is a big disadvantage that i don't get? And also which one of the first two ways do you suggest me to use?
That is a typical n to m relation. It works like this
persons table
------------
id
name
address
interests table
---------------
id
name
person_interests table
----------------------
person_id
interest_id
person_interests contains a record for each interest of a person. To get the interests of a person do:
select i.name
from interests i
join person_interests pi on pi.interest_id = i.id
join persons p on pi.person_id = p.id
where p.name = 'peter'
You could create also tables for hobbies. To get the hobbies do the same in a separate query. To get both in one query you can do something like this
select p.id, p.name,
i.name as interest,
h.name as hobby
from persons p
left join person_interests pi on pi.person_id = p.id
left join interests i on pi.interest_id = i.id
left join person_hobbies ph on ph.person_id = p.id
left join hobbies h on ph.hobby_id = h.id
where p.name = 'peter'
The basic way to deal with this is with a many-to-many join table. Each user can have many hobbies. Each hobby can have many users. That's basic stuff you can find information about anywhere, and #juergend already covered that.
The harder part is tracking different information about various hobbies and interests. Like if their hobby is "baseball" you might want to track what position they play, but if their hobby is "travel" you might want to track their favorite countries. Doing this with typical SQL relationships will lead to a rapid proliferation of tables and columns.
A hybrid approach is to use the new JSON data type to store some unstructured data. To expand on #juergend's example, you might add a field to Person_Interests which can store some of those details about that person's interest.
create table Person_Interests (
InterestID integer references Interests(ID),
PersonID integer references Persons(ID),
Details JSON
);
And now you could add that Person 45 has Interest 12 (travel), their favorite country is Djibouti, and they've been to 45 countries.
insert into person_interests
(InterestID, PersonID, Details)
(12, 45, '{"favorite_country": "Djibouti", "countries_visited": 45}');
And you can use JSON search functions to find, for example, everyone whose favorite country is Djibouti.
select p.id, p.name
from person_interests pi
join persons p on p.id = pi.personid
where pi.details->"$.favorite_country" = "Djibouti"
The advantage here is flexibility: interests and their attributes aren't limited by your database schema.
The disadvantages is performance. The JSON data type isn't the most efficient, and indexing a JSON column in MySQL is complicated. Good indexing is critical to good SQL performance. So as you figure out common patterns you might want to turn commonly used attributes into real columns in real tables.
The other option would be to use table inheritance. This is a feature of Postgres, not MySQL, and I'd recommend considering switching. Postgres also has better and more mature JSON support and JSON columns are easier to index.
With table inheritance, rather than having to write a completely new table for every different interest, you can make specific tables which inherit from a more generic one.
create table person_interests_travel (
FavoriteCountry text,
CountriesVisited text[]
) inherits(person_interests);
This still has InterestID, PersonID, and Details, but it's added some specific columns for tracking their favorite country and countries they've visited.
Note that text[]. Postgresql also supports arrays so you can store real lists without having to create another join table. You can also do this in MySQL with a JSON field, but arrays offer type constraints that JSON does not.
I've got 2 tables - dishes and ingredients:
in Dishes, I've got a list of pizza dishes, ordered as such:
In Ingredients, I've got a list of all the different ingredients for all the dishes, ordered as such:
I want to be able to list all the names of all the ingredients of each dish alongside each dish's name.
I've written this query that does not replace the ingredient ids with names as it should, instead opting to return an empty set - please explain what it that I'm doing wrong:
SELECT dishes.name, ingredients.name, ingredients.id
FROM dishes
INNER JOIN ingredients
ON dishes.ingredient_1=ingredients.id,dishes.ingredient_2=ingredients.id,dishes.ingredient_3=ingredients.id,dishes.ingredient_4=ingredients.id,dishes.ingredient_5=ingredients.id,dishes.ingredient_6=ingredients.id, dishes.ingredient_7=ingredients.id,dishes.ingredient_8=ingredients.id;
It would be great if you could refer to:
The logic of the DB structuring - am I doing it correctly?
The logic behind the SQL query - if the DB is built in the right fashion, then why upon executing the query I get the empty set?
If you've encountered such a problem before - one that requires a single-to-many relationship - how did you solved it in a way different than this, using PHP & MySQL?
Disregard The Text In Hebrew - Treat It As Your Own Language.
It seems to me that a better Database Structure would have a Dishes_Ingredients_Rel table, rather than having a bunch of columns for Ingredients.
DISHES_INGREDIENTS_REL
DishesID
IngredientID
Then, you could just do a much simpler JOIN.
SELECT Ingredients.Name
FROM Dishes_Ingredients_Rel
INNER JOIN Ingredients
ON Dishes_Ingredients.IngredientID = Ingredients.IngredientID
WHERE Dishes_Ingredients_Rel.DishesID = #DishesID
1. The logic of the DB structuring - am I doing it correctly?
This is denormalized data. To normalize it, you would restructure your database into three tables:
Pizza
PizzaIngredients
Ingredients
Pizza would have ID, name, and type where ID is the primary key.
PizzaIngredients would have PizzaId and IngredientId (this is a many-many table where the primary key is a composite key of PizzaId and IngredientID)
Ingredients has ID and name where ID is the primary key.
2. List all the names of all the ingredients of each dish alongside each dish's name. Something like this in MySQL (untested):
SELECT p.ID, p.name, GROUP_CONCAT(i.name) AS ingredients
FROM pizza p
INNER JOIN pizzaingredients pi ON p.ID = pi.PizzaID
INNER JOIN ingredients i ON pi.IngredientID = i.ID
GROUP BY p.id
3. If you've encountered such a problem before - one that requires a single-to-many relationship - how did you solved it in a way different than this, using PHP & MySQL?
Using a many-many relationship, since that what your example truly is. You have many pizzas which can have many ingredients. And many ingredients belong to many different pizzas.
The reason you are getting an empty result is because you are setting a join condition that never gets satisfied. During the INNER join execution the database engine compares each record of the first table with each record of the second one trying to find a match where the id of the ingredient table record being evaluated is equal to ingredient1 AND ingredient2 AND so on. It would return some result if you create a record in the first table with the same ingredient in all 8 columns (testing purposes only).
Regarding the database structure, you choose a denormalized one creating 8 columns for each ingredient. There are a lot of considerations possible on this data structure (performance, maintainability, or just think if you are asked to insert a dish with 9 ingredients for example) and I would personally go for a normalized data structure instead.
But if you want to keep this, you should write something like:
SELECT dishes.name, ingredients1.name, ingredients1.id, ingredients2.name, ingredients2.id, ...
FROM dishes
LEFT JOIN ingredients AS ingredients1 ON dishes.ingredient_1=ingredients1.id
LEFT JOIN ingredients AS ingredients2 ON dishes.ingredient_2=ingredients2.id
LEFT JOIN ingredients AS ingredients3 ON dishes.ingredient_3=ingredients3.id
...
The LEFT join is required to get a result for unmatched ingredients (0 value when no ingredient is set reading your example)
To make you understand my question I'll give you an example:
I have a chat web app with many rooms, let's say 5 rooms.
People can choose to stay only in one room and they choose it at login.
When they choose the room I have to retrieve the people already in the room, so I can structure my db in two ways:
each room one table with the people being records;
all the rooms in one table, people are the records and a column indicating the room they are in;
In the first case the query would be:
SELECT * FROM 'room_2' WHERE 1
In the second case the query would be:
SELECT * FROM 'rooms' WHERE room = 'room_2'
Which is the best?
I think the only parameter to consider is performance, right?
In this example, no, because people are all 'like' objects and should therefore be in the same table.
All people and rooms in one table with a primary key on people, in this simple example.
Table Rooms(pk_person, personName, table_id)
But I want to talk about a structure that you will want to consider as your website grows. You’ll want three tables, one for each object (chat rooms, people) and one for the relationships.
Chat_Rooms(pk_ChatId, ChatName, MaxOccupants, other unique attributes of a chat room)
People(pk_PersonID, FirstName, LastName, other unique attributes of a person)
Room_People_Join(pk_JoinId, fk_ChatId, fk_PersonID, EnterDateTime, ExitDateTime)
This is a “highly normalized” structure. Each table is a collection of like objects, the join allows for many to many relationships, and object rows are not duplicated. So, a Person with all their attributes (name, gender, age) is never duplicated in the person table. Also, the person table never defines which chat rooms a person is in, because a person could be in one, many, none, or may have entered and exit multiple times. The same concept applies to a chat room. A chat rooms features, such as background color, max occupants, etc. have nothing to do with people.
The Room_People_Join is the important one. This has a unique primary key for which chat rooms a person is in and when they were there. This table grows indefinitely, but it tracks usage. Including the relationship table is what logically normalizes your database.
So how do you know which users are currently in chat room 1? You join your people and rooms to the join table with their respective Primary and Foreign keys in your FROM clause, ask for the columns you want in your SELECT clause, and filter for chat room 1 and people who haven’t yet left.
SELECT p.FirstName, p.LastName, r.ChatName
FROM Room_People_Join j
JOIN People p ON j.fk_PersonID = p.pk_PersonID
JOIN Chat_Rooms r ON j.fk_ChatId = r.pk_ChatId
WHERE r.ExitDateTime IS NOT NULL
AND pk_ChatId = 1
Sorry that’s long winded, but I extrapolated your question for database growth.
The answer is very simple and strongly recommended - one database table for all rooms for sure! What if you will later like to create rooms dynamically!? For sure you would not create new tables dynamically.
I'm wondering if this its even posible.
I want to join 2 tables based on the data of table 1.
Example table 1 has column food with its data beeing "hotdog".
And I have a table called hotdog.
IS it possible to do a JOIN like.
SELECT * FROM table1 t join t.food on id = foodid
I know it doesnt work but, its even posible, is there a work arround?.
Thanks in advance.
No, you can't join to a different table per row in table1, not even with dynamic SQL as #Cade Roux suggests.
You could join to the hotdog table for rows where food is 'hotdog' and join to other tables for other specific values of food.
SELECT * FROM table1 JOIN hotdog ON id = foodid WHERE food = 'hotdog'
UNION
SELECT * FROM table1 JOIN apples ON id = foodid WHERE food = 'apples'
UNION
SELECT * FROM table1 JOIN soups ON id = foodid WHERE food = 'soup'
UNION
...
This requires that you know all the distinct values of food, and that all the respective food tables have compatible columns so you can UNION them together.
What you're doing is called polymorphic associations. That is, the foreign key in table1 references rows in multiple "parent" tables, depending on the value in another column of table1. This is a common design mistake of relational database programmers.
For alternative solutions, see my answers to:
Possible to do a MySQL foreign key to one of two possible tables?
Why can you not have a foreign key in a polymorphic association?
I also cover solutions for polymorphic associations in my presentation Practical Object Oriented Models In SQL, and in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Only with dynamic SQL. It is also possible to left join many different tables and use CASE based on type, but the tables would be all have to be known in advance.
It would be easier to recommend an appropriate design if we knew more about what you are trying to achieve, what your design currently looks like and why you've chosen that particular table design in the first place.
-- Say you have a table of foods:
id INT
foodtype VARCHAR(50) (right now it just contains 'hotdog' or 'hamburger')
name VARCHAR(50)
-- Then hotdogs:
id INT
length INT
width INT
-- Then hamburgers:
id INT
radius INT
thickness INT
Normally I would recommend some system for constraining only one auxiliary table to exist, but for simplicity, I'm leaving that out.
SELECT f.*, hd.length, hd.width, hb.radius, hb.thickness
FROM foods f
LEFT JOIN hotdogs hd
ON hd.id = f.id
AND f.foodtype = 'hotdog'
LEFT JOIN hamburgers hb
ON hb.id = f.id
AND f.foodtype = 'hamburger'
Now you will see that such a thing can be code generated (or even for a very slow prototype dynamic SQL on the fly) from SELECT DISTINCT foodtype FROM foods given certain assumptions about table names and access to the table metadata.
The problem is that ultimately whoever consumes the result of this query will have to be aware of new columns showing up whenever a new table is added.
So the question moves back to your client/consumer of the data - how is it going to handle the different types? And what does it mean for different types to be in the same set? And if it needs to be aware of the different types, what's the drawback of just writing different queries for each type or changing a manual query when new types are added given the relative impact of such a change anyway?