Mysql finding duplicates

Mysql finding duplicates - mysql

have been given an assignment for school, things have been mostly going well but one query i must do has me stumped. Here is a description of the two tables:
Movie: MovieId,[pk] Title, Year, DirectorCode[fk]
Director: DirectorCode,[pk] Name
What i have to do is find any directors that have remade their own movie, and display the Name of the movie, director's name and year of the first and second release??
even if you dont want to give me the answer I would be very greatful for some hints
Thanks

Assume that the remake has a different movieId, but it will have the same title. Therefor you can find movies that has the same title and the same directorCode.
using GROUP BY and find all the movies that would have COUNT(title) > 1 would give you the directorCode and titles to search for, and then in a second query take out full info from both movies (first and second remake) using that, because the info would be lost in the GROUP BY. Another option would be to just select MAX(year), MIN(year) to find out the first and second year.
If you are allowed using "HAVING" keyword that will be useful in order to filter on a group by aggregate, however I don't remember if that is mysql proprietary or part of ANSI SQL.

You don't need to use having or a subquery, or even GROUP. You can do this in one query as long as it is assumed that the movie and remake titles are identical. Since they are remakes, I assume the titles will be identical with different years (otherwise, how can you identify a remake? You would need another field).
SELECT
name
, m1.title
, m1.year
, m2.year as remake
FROM
Movie m1
JOIN Director d USING (directorcode)
JOIN Movie m2 ON (
d.directorcode = m2.directorcode
AND m1.title = m2.title
AND m1.year < m2.year
)
The inner joins from Movie to Director and Director to Movie again ensure that you will only get results if the same director is on two movies. Then, the titles are compared (this could also be done in the WHERE clause). For organizational purposes, m1 is chosen to be less than m2 (also possible in the WHERE clause). Otherwise, 'remake' could be the earlier one.
One thing to note is that if a director remakes a movie twice, you will get three rows. E.g. if they remake a 2009 movie in 2010 and 2011, you will get a row where year = 2009, remake = 2010, year = 2009, remake = 2011, and year = 2010, remake = 2011. From the context of the question, it seems like a director will only remake a movie once, though.
I tested this out and it will not show results for movies that have been remade by a different director or not at all. If two directors remake the same movie twice (that's three remakes, two from a different director) you will get both of those directors. I think this is desirable.

Related

SQL giving me lines that doesn't exist?

While using this:
SELECT borrowbook.studentusername, borrowbook.schoolbookid,borrowbook.date,borrowbook.deadline, book.title, student.email, student.fname, student.lname
FROM borrowbook, book, student
I get many lines, but in my database I just have four lines in the borrowbook table, and while using this, I get some "lines" that doesn't exist. (Note: this works through php on a website, I cannot seem to make this work in mysql so I think I have done something)
Like that a person that had borrowed one book (line 1 in my list of borrowed books) suddenly has borrowed ten different books that I have not registered anyone to borrow. With date as to when it was loaned, and deadline just taken from one of the four lines I have registered.
Even the same person that is registered to borrow one book, suddenly shows up as if they borrowed it four times with different dates. Dates and deadline are taken from "borrowbook" while different names of students are taken from another table, since they have never been used in the "borrowbook" line.
I have tried this now in different ways and with different content and different tables, but still get many "made up" lines of loans that is not registered.
I know very little, but I am grateful for all help I can get. Articles help as well.

Without joins, you duplicate records. For a better practice, you should use explicit joins instead of implicit ones. If you have student.username and book.id fields, you can do something like this:
SELECT borrowbook.studentusername, borrowbook.schoolbookid,borrowbook.date,borrowbook.deadline,
book.title,
student.email, student.fname, student.lname
FROM borrowbook
INNER JOIN student ON borrowbook.studentusername=student.username
INNER JOIN schoolbook ON borrowbook.schoolbookid=schoolbook.id
INNER JOIN book ON schoolbook.isbn=book.isbn
;

You haven't specified any JOIN conditions in your query, and because of that tables will be CROSS JOIN-ed, i.e., every record from the borrowbook table is paired with every record from the book table which is then paired with every record from the student table. So if you have X, Y and Z number of records in each table respectively, you will get X * Y * Z records as a result.
You probably want to add join conditions such as (I'm just guessing column names):
SELECT borrowbook.studentusername, borrowbook.schoolbookid,borrowbook.date,borrowbook.deadline, book.title, student.email, student.fname, student.lname
FROM borrowbook, book, student
WHERE borrowbook.book_id = book.id and borrowbook.student_id = student.id

SQL query with distinct values

I have the two following schemes:
Movies[title, year, director, country, rating, genre, gross, producer]
and
Actors[title, year, characterName, actor]
Now I have the following exercise
Find character names that appeared in two movies produced in different countries.
My idea was the following which doesn't really work:
SELECT characterName
FROM Actors a
JOIN Movies m
ON a.title=m.title
AND a.year=m.year
WHERE COUNT(m.title)=2
AND COUNT(DISTINCT(m.country)=2
GROUP BY m.title;
My idea was to obviously select the characterName and join both tables on title and year because they are unique values in combination. Then my plan was to get the movies that are unique (by grouping them) and find the ones with a count of 2 since we want two movies. I hope that I am right till now.
Now I have my problems, because I don't really know how to evaluate if the movies played in two different locations.
I want to somehow make sure that they play in different countries.

You are on the right track. Here is a fixed version of your original query, that should get you the results that you expect:
select a.characterName
from actors a
inner join movies m
on m.title = a.title
and m.year = a.year
group by a.characterName
having count(distinct m.coutry) >= 2
Notes on your design:
it seems like you are using (title, year) as the primary key for the movies table. This does not look like a good design (what if two movies with the same title are produced the same year?). You would be better off with an identity column (in MySQL, an autoincremented primary key), that you would refer as a foreign key in the actors table
better yet, you would probably need to create a separate table to store the masterdata of the actors, and set up a junction table, say appearances, that represents which actors interpreted which character in which movie

MySQL Querying a movie database

I'm very new to SQL, so please bear with me.
I've built a movie database and I'm trying to query it so that all my tables display properly.
I have a movies table with the columns movieID, title, releaseYear, directorID, genreID, and actorID.
Inside the table director, I have directorID and Director.
Using the query SELECT * FROM movies INNER JOIN director ON director.directorID = movies.directorID;, I'm able to get everything in tables movies and director to display (which isn't exactly what I want, but it's in the right track).
My remaining tables are actor, (with actorID and actor's names) starring (with starringID, movieID, and actorID), genre (with genreID and 22 different genres), and moviegenres (with moviegenresID, moviesID, and genreID).
I'm a bit lost and I apologize if this is confusing and messy, but I'm thinking I need to query the database so that all the tables show the data and are associated with the correct column. For example, most movies have multiple genres and actors, which is why I separated them into tables of their own.
I can't figure out how to query everything to display properly in the result grid.
Thanks in advance

SQL Outer Join - improper execution

I am working with learning SQL, I have taken the basics course on pluralsight, and now I am using MySQL through Treehouse, with dummy databases they've set up, through the MySQL server. Once my training is complete I will be using SQLServer daily at work.
I ran into a two-part challenge yesterday that I had some trouble with.
The first question in the challenge was:
"We have a 'movies' table with a 'title' and 'genre_id' column and a
'genres' table which has an 'id' and 'name' column. Use an INNER JOIN
to join the 'movies' and 'genres' tables together only selecting the
movie 'title' first and the genre 'name' second."
Understanding how to properly set up JOINS has been a little confusing for me, because the concepts seem simple but like in cooking, execution is everything ---and I'm doing it wrong. I was able to figure this one out after some trial and error, work, and rewatching the Treehouse explanation a few times; here is how I solved the first question, with a Treehouse-accepted answer:
SELECT movies.title, genres.name FROM movies INNER JOIN genres ON movies.genre_id = genres.id;
--BUT--
The next question of the challenge I have not been so successful with, and I'm not sure where I'm going wrong. I would really like to get better with JOINS, and picking the brains of all you smartypantses is the best way I can think of to get an explanation for this specific (and I'm sure, pitifully simple for you guys) problem. Thanks for your help, here's where I'm stumped:
"Like before, bring back the movie 'title' and genre 'name' but use
the correct OUTER JOIN to bring back all movies, regardless of whether
the 'genre_id' is set or not."
This is the closest (?) solution that I've come up with, but I'm clearly doing something (maybe a lot) wrong here:
SELECT movies.title, genres.name FROM movies LEFT OUTER JOIN genres ON genres.id;
I had initially tried this (below) but when it didn't work, I decided to cut out the last portion of the statement, since it's mentioned in the requirement criteria that I need a dataset that doesn't care if genre_id is set in the movies table or not:
SELECT movies.title, genres.name FROM movies LEFT OUTER JOIN genres ON movies.genre_id = genres.id;
I know this is total noob stuff, but like I said, I'm learning, and the questions I researched on Stack and on the Internet at large were not necessarily geared for the same problem. I am very grateful to have your expertise and help to draw on. Thank you for taking the time to read this and help out if you choose to do so!

Your solution is correct:
SELECT movies.title, genres.name
FROM movies
LEFT OUTER JOIN genres ON movies.genre_id = genres.id
This is my interpretation:
When you tell "Left join" or "left outer join", in fact,
it's not that "You don't care if genre_id is set in the movies table or not",
but "You want all genres of each movie to be shown, however, you don't care if genre_id is not set in the movies table for some records; just show the movie in these cases [and show 'genre = NULL' for those records]"
generally, in "left join", you want:
all the records of the left table, with their corresponding records in the other table, if any. Otherwise with NULL.
In your example, these two sets of records will be shown:
1- All the movies which have been set to a genre
(give movie.title, Genres.name)
2- All other movies [which do not have a genre, i.e., genre_id = NULL]
(give movie.title, NULL)
Example (with left join):
Title, Genre
--------------
Movie1, Comedy
Movie1, Dramma
Movie1, Family
Movie2, NULL
Movie3, Comedy
Movie3, Dramma
Movie4, Comedy
Movie5, NULL
Example (with inner join):
Title, Genre
--------------
Movie1, Comedy
Movie1, Dramma
Movie1, Family
Movie3, Comedy
Movie3, Dramma
Movie4, Comedy

Your'e specific question was already answered, though:
I'd like to add another perspective about JOIN, that i think will help you understand how to use it in the future (after that, I also recommend you follow this link: SQL JOINS ).
This perspective is from the DB eyes, which is "dumb" and can't guess what you really want it to do for you.
I help it helps and won't confuse you too match:
Lets first understand what a join does (without using any SQL script), and than we'll understand better how to use it.
Say this is a movie list:
Armageddon
Batman
Cinderella
and a list of genres:
Action
Fantasy
Western
When you join both tables, the DB creates a new tables, that for each row in movies table, you'll get all possible rows in genres table, like this:
Armageddon <-> Action
Armageddon <-> Fantasy
Armageddon <-> Western
Batman <-> Action
Batman <-> Fantasy
Batman <-> Western
Cinderella <-> Action
Cinderella <-> Fantasy
Cinderella <-> Western
You can also see that the NEW table row number is 3*3 ([table 1 row number] multiply [table 2 row number]). Can you explain yourself why? If so, lets continue to our second step...
In your DB, you keep track of which movie is which genre (identifying genre by it's id), so lets talk about NEW tables, that look like this and have info about movies genre:
1 - Armageddon - 1
2 - Armageddon - 2
4 - Batman - 1
5 - Batman - 2
6 - Batman - 3
7 - Cinderella - 2
And the genre:
1 - Action
2 - Fantasy
3 - Western
As we've just explained, joining both tables will get you... 18 rows (6*3=18. why? because for each row in movies table, you'll get all possible rows from genres table). I won't write those 18 rows, I hope you get the point...
Each time you call a join (doesn't matter which kind of join: LEFT/RIGHT/OUTER/INNER), the DB creates a new table with all passible options([table 1 row number] multiply [table 2 row number]). Now, you're probably thinking: How does the DB erase the rows I don't want?
First, you define an ON condition. You tell your DB: "please mark for me all rows that meet my condition: movies.genre_id = genres.id (But don't drop any unmarked rows yet!!!)".
Second, you tell your DB which kind of rows you want to drop (or edit!!!): now comes the JOIN kind, which is a bit tricky.
INNER JOIN is easy to understand- just tell the DB: "drop all rows that don't meet my condition: movies.genre_id = genres.id" (and of course show me the updated table, after you've dropped these rows I don't need).
LEFT/RIGHT JOINs are more complicated. Lets start for example with LEFT JOIN. You're telling your DB: "well, in case a row doesn't match my condition: movies.genre_id = genres.id, mark the RIGHT part of my row (meaning, the columns that represent my 2nd table) as null, AND LEAVE THE ROW.
That way, I know you this row in table1, doesn't have a matching row in table2.
In RIGHT JOIN, it's the opposite: you tell the DB, that if your condition isn't met, mark the LEFT side with null.
FULL JOIN tells your DB: "well, from a row that doesn't meet my condition, make 2 rows: 1 that has it's RIGHT part marked as null, and a second that has it's LEFT part marked as null" (this is a bit complicated for understanding for why the hack you'll need that, and you'll hardly need to use FULL JOIN in your first steps, so drop it for now).
In conclusion, my advice for you when you design your JOIN query
first, understand what YOU want to get, see illustration in answer: SQL JOINS.
Then, comes the part when you need to explain to you DB what it should do:
first, tell it which rows it should mark,
than, tell it which rows it should drop/edit.

Need support to properly query my DB when a lot different data are coming into play

Suppose you have this schema:
Of course it's over-simplified in this example, just pretend you have a collection of users, that are described with a lot of different tables like the ones drawn here.
You can assume that:
Any user have a name,
Any user can speak one or more languages
Any user can own one or more titles or certifications
Any user can own one or more experiences
Suppose that you have to show this large amount of data to a third user, that needs to nimbly access, search and administrate this (great) amount of data with ease.
What is done:
My first approach was to show just the most relevant infos of each users, to provide a first clean interface where the admin could start from, and allow him to filter the records shown, using all the data available on the DB.
To cut a long story short, the admin can (or should be able to) "display all male users that speak english and worked for IBM" on screen, while seeing just a clean and simplified list of records, that he'll be able to examine further in a different way if he needs to.
How my query look like:
SELECT
users.id as id,
name,
surname,
etc,
certificazioni.title as certifications,
lingue.language as language,
esperienze.company,
FROM users
LEFT JOIN lingue ON users.id = lingue.iduser
LEFT JOIN certificazioni ON users.id = certificazioni.iduser
LEFT JOIN esperienze ON users.id = esperienze.iduser
GROUP BY users.id
ORDER BY users.id
I Built an interface that given some user input, is able to append conditions to this query like this:
WHERE language = 'English' AND Sex = 'm'
Now the problem:
With this query i'm able to find out if is there a certain user that speak English, is male and so on, but it fails to find out if is there a users that speaks both English and Dutch to say one.
Why?
(From my point of view) It's because (i'm failing to find a good approach, AND because) of the relations between users and other tables, that are one to many in most cases and causes the output of this query to be something like that:
without GROUP BY
ID NAME SEX LANGUAGE COMPANY
-----------------------------
12 Alamo M English IBM
12 Alamo M Italian NBA
12 Alamo M Dutch NULL
12 Alamo M French NULL
(Combination of every different value of each language, experience and so on)
with GROUP BY
ID NAME SEX LANGUAGE COMPANY
-----------------------------
12 Alamo M Italian NBA
(That are of course flatted with evident loose of information with the group by function)
Now the requirement:
My need is to find a different approach to this problem that consider the limits i'm imposed to have, and still allow me to efficiently query my DB in most cases.
I'm also uploading a screen of the platform, to better describe what kind of user input i expect:

One approach would be to use a subquery for each condition:
SELECT *
FROM users
WHERE
users.id IN (SELECT users_id FROM lingue WHERE language = 'English')
AND
users.id IN (SELECT users_id FROM lingue WHERE language = 'Dutch')
AND
...

This is not trivial. For each language/skill/certification condition you add to a query you need to (inner) join another table. Note that when the same table appears more than once, each instance must have its own alias, and you can use group_concat to put multiple input row values on the same output row e.g.
Select u,name, group_concat(c.title)
From users u
Inner join lingue l1
On u.id=l1.user_Id
And l1.language='Italian'
Inner join lingue l2
On l1,user_Id=l2.user_id
And l2.language='French'
Inner join certificazioni c
On u.id=c.user_id
Group by u.name
Lists names and qualifications of people whom speak both French and Italian, and have at least one qualification. But you may find it simpler to denormalise the database into one or two tables and use full text search.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008