Selecting rows from one table using values gotten from another table MYSQL - mysql

I have currently have 2 mysql tables in my db
Film and Film_Ratings_Report
The primary key for Film is filmid which is used to identify the film ratings in the Film_Ratings_Report table.
I would like to know if its possible using a MYSQL query only to search the ratings table and collect all film ids which fit a certain criteria then use the selected IDs to get the film titles from the Film table. Below is the MYSQL query Im using which isnt working:
SELECT *
FROM film
UNION SELECT filmid
FROM film_rating_report
WHERE rating = 'GE'
LIMIT 0,0
I am relatively green to MYSQL and would appreciate any help on this.
Thanks in Advance

SELECT * FROM film WHERE id IN
(SELECT filmid FROM film_rating_report WHERE rating = 'GE');
should work

It seems you want a semi-join, e.g. a join where only data from one of the 2 joined tables are needed. In this case, all rows from film for which there is a matching row in film_rating_report that has the wanted condition (rating = 'GE').
This is not exactly equivalent to a usual join because even if there are 2 (or more) row in the second table (2 ratings of a film, both with 'GE'), we still want the film to be shown once, not twice (or more times) as it would be shown with a usual join.
There are various ways to write a semi-join and most popular are:
using an EXISTS correlated subquery (#Justin's answer):
SELECT t1.*
FROM film t1
WHERE EXISTS (SELECT filmid
FROM film_rating_report t2
WHERE t2.rating = 'GE'
AND t2.filmid = t1.id);
using an IN (uncorrelated) subquery (#SG 86's answer):
(this should be used with extreme care as it may return unexpected results - or none at all - if the joining columns (the two filmid in this case) are Nullable)
SELECT *
FROM film
WHERE id IN
( SELECT filmid
FROM film_rating_report
WHERE rating = 'GE'
);
using a usual JOIN with a GROUP BY to avoid the duplicate rows in the results (#Tomas' answer):
(and note that this specific use of GROUP BY works in MySQL only and in recent versions of Postgres, if you ever want to write a similar query in other DBMS, you'll have to include all columns: GROUP BY f.filmid, f.title, f.director, ...)
SELECT f.*
FROM film AS f
JOIN film_rating_report AS frr
ON f.filmid = frr.filmid
WHERE frr.rating = 'GE'
GROUP BY f.filmid ;
A variation on #Tomas'es answer, where the GROUP BY is done on a derived table and then the JOIN:
SELECT f.*
FROM film AS f
JOIN
( SELECT filmid
FROM film_rating_report
WHERE rating = 'GE'
GROUP BY filmid
) AS frr
ON f.filmid = frr.filmid ;
Which one to use, depends on the RDBMS and the specific version you are using (for example, IN subqueries should be avoided in most versions of MySQL as they may produce inefficient execution plans), your specific table sizes, distribution, indexes, etc.
I usually prefer the EXISTS solution but it never hurts to first test the various queries with the table sizes you have or expect to have in the future and try to find the best query-indexes combination for your case.
Addition: if there is a unique constraint on the film_rating_report (filmid, rating) combination, which means that no film will ever get two same ratings, or if there is an even stricter (but more plausible) unique constraint on film_rating_report (filmid) that means that every film has at most one rating, you can simplify the JOIN solutions to (and get rid of all the other queries):
SELECT f.*
FROM film AS f
JOIN film_rating_report AS frr
ON f.filmid = frr.filmid
WHERE frr.rating = 'GE' ;

Preferred solution for this is to use join, and don't forget group by so that you don't have duplicate lines:
select film.*
from film
join film_rating_report on film.filmid = film_rating_report.filmid
and rating = 'GE'
group by film.filmid
EDIT: as correctly noted by #ypercube, I was wrong claiming that the performance of join & group by is better than using subqueries with exists or in - quite the opposite.

Query:
SELECT t1.*
FROM film t1
WHERE EXISTS (SELECT filmid
FROM film_rating_report t2
WHERE t2.rating = 'GE'
AND t2.filmid = t1.id);

I believe this will work, thought without knowing your DB structure (consider giving SHOW CREATE TABLE on your tables), I have no way to know for sure:
SELECT film.*
FROM (film)
LEFT JOIN film_rating_report ON film.filmid = film_rating_report.filmid AND film_rating_report.rating = 'GE'
WHERE film_rating_report.filmid IS NOT NULL
GROUP BY film.filmid
(The WHERE film_rating_report.filmid IS NOT NULL prevents lines that don't have the rating you are seeking from sneaking in, I added GROUP BY at the end because film_rating_report might match more than once - not sure as I have visibility to the data stored in it)

Related

MySQL: Optimizing Sub-queries

I have this query I need to optimize further since it requires too much cpu time and I can't seem to find any other way to write it more efficiently. Is there another way to write this without altering the tables?
SELECT category, b.fruit_name, u.name
, r.count_vote, r.text_c
FROM Fruits b, Customers u
, Categories c
, (SELECT * FROM
(SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r
WHERE b.fruit_id = r.fruit_id
AND u.customer_id = r.customer_id
AND category = "Fruits";
This is your query re-written with explicit joins:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN
(
SELECT * FROM
(
SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r on r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
CROSS JOIN Categories c
WHERE c.category = 'Fruits';
(I am guessing here that the category column belongs to the categories table.)
There are some parts that look suspicious:
Why do you cross join the Categories table, when you don't even display a column of the table?
What is ORDER BY fruit_id, count_vote DESC, r_id supposed to do? Sub query results are considered unordered sets, so an ORDER BY is superfluous and can be ignored by the DBMS. What do you want to achieve here?
SELECT * FROM [ revues ] GROUP BY fruit_id is invalid. If you group by fruit_id, what count_vote and what r.text_c do you expect to get for the ID? You don't tell the DBMS (which would be something like MAX(count_vote) and MIN(r.text_c)for instance. MySQL should through an error, but silently replacescount_vote, r.text_cbyANY_VALUE(count_vote), ANY_VALUE(r.text_c)` instead. This means you get arbitrarily picked values for a fruit.
The answer hence to your question is: Don't try to speed it up, but fix it instead. (Maybe you want to place a new request showing the query and explaining what it is supposed to do, so people can help you with that.)
Your Categories table seems not joined/related to the others this produce a catesia product between all the rows
If you want distinct resut don't use group by but distint so you can avoid an unnecessary subquery
and you dont' need an order by on a subquery
SELECT category
, b.fruit_name
, u.name
, r.count_vote
, r.text_c
FROM Fruits b
INNER JOIN Customers u ON u.customer_id = r.customer_id
INNER JOIN Categories c ON ?????? /Your Categories table seems not joined/related to the others /
INNER JOIN (
SELECT distinct fruit_id, count_vote, text_c, customer_id
FROM Reviews
) r ON b.fruit_id = r.fruit_id
WHERE category = "Fruits";
for better reading you should use explicit join syntax and avoid old join syntax based on comma separated tables name and where condition
The next time you want help optimizing a query, please include the table/index structure, an indication of the cardinality of the indexes and the EXPLAIN plan for the query.
There appears to be absolutely no reason for a single sub-query here, let alone 2. Using sub-queries mostly prevents the DBMS optimizer from doing its job. So your biggest win will come from eliminating these sub-queries.
The CROSS JOIN creates a deliberate cartesian join - its also unclear if any attributes from this table are actually required for the result, if it is there to produce multiples of the same row in the output, or just an error.
The attribute category in the last line of your query is not attributed to any of the tables (but I suspect it comes from the categories table).
Further, your code uses a GROUP BY clause with no aggregation function. This will produce non-deterministic results and is a bug. Assuming that you are not exploiting a side-effect of that, the query can be re-written as:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN Reviews r
ON r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
ORDER BY r.fruit_id, count_vote DESC, r_id;
Since there are no predicates other than joins in your query, there is no scope for further optimization beyond ensuring there are indexes on the join predicates.
As all too frequently, the biggest benefit may come from simply asking the question of why you need to retrieve every single row in the tables in a single query.

Better way for SQL LEFT JOIN OR condition

I've done some searching without success and I want to know if there is some better way to rewrite sql query because this OR condition in the LEFT JOIN kills the performance:(
For e.g.:
SELECT DISTINCT * FROM computers
LEFT JOIN monitors ON computers.brand = monitors.brand
LEFT JOIN keyboards ON computers.type = keyboards.type
LEFT JOIN accessories ON accessories.id = keyboards.id OR accessories.id = monitors.id
GROUP BY computers.id
ORDER BY computers.id DESC
Sorry for dumb question, but is it possible to rewrite OR statement to improve performance?
I doubt it will make any difference, but you could try this:
SELECT DISTINCT *
FROM computers
LEFT JOIN monitors ON computers.brand = monitors.brand
LEFT JOIN keyboards ON computers.type = keyboards.type
LEFT JOIN accessories ON a1.id IN (keyboards.id, monitors.id)
GROUP BY computers.id
ORDER BY computers.id DESC
You could also join to the same table twice, if you are comfortable having two sets of accessories columns (perhaps using coalesce() a bunch in the SELECT list):
SELECT DISTINCT * FROM computers
LEFT JOIN monitors ON computers.brand = monitors.brand
LEFT JOIN keyboards ON computers.type = keyboards.type
LEFT JOIN accessories a1 ON a1.id = keyboards.id
LEFT JOIN accessories a2 ON a2.id = monitors.id
GROUP BY computers.id
ORDER BY computers.id DESC
And, fwiw, this query would not be legal in most modern database engines. If you want to GROUP BY a field, the ANSI SQL standard says you can't also just put * (even with DISTINCT) in the SELECT list, because you haven't specified which values to keep and which to discard as the database rolls up the group... the results are undefined, and that's a bad thing.
You are doing SELECT DISTINCT *, so its checking that your entire record is unique across all rows it gets, which is 3 tables worth. Its probably going to be already unique, if your primary keys and unique indexes are set up correctly its definitely unique, so just take it out.
If your primary keys and indexes arent setup, do that first. Primary key on fields named id.
That and SELECT * incurs a big overhead since it has to figure out what the rest of your columns are.
Guessing without knowing what the table structure actually is: Since you are grouping by GROUP BY computers.id, put that in your SELECT instead and take it out of your GROUP BY.
SELECT DISTINCT computers.id

MySQL - Double primary key in EER table

Context
I have been fiddling with a small fooseball hobby database to keep track of matches, players and goals. And came across a problem i don't quite know how to fix.
The match table has two foreign keys both pointing to tID in the team table.
The thought was that i later would be able to do a SELECT to see what teams (by name) played against eachother in a given match.
select * from `Fooseball`.`match`
INNER JOIN team T1
ON Fooseball.`match`.mHome_Team = T1.tID
INNER JOIN team T2
ON Fooseball.`match`.mAway_Team = T2.tID
WHERE mID=1
Question
1 Is their a better way to archive this, than creating two primary keys. Like, an intermediate table?
2 How can i construct my select statement so i can name the tName columns as "home" and "away" or something else? When i try and say
INNER JOIN team AS T1
Nothing changes.
Unstated additional requirements notwithstanding, this is pretty much how I would do it.
To rename columns in the result, you would do something like
SELECT m.mDate AS match_date, T1.tName AS home_team, T2.tName AS away_team
FROM Fooseball.`match` m
INNER JOIN team T1
ON m.mHome_Team = T1.tID
INNER JOIN team T2
ON m.mAway_Team = T2.tID
WHERE mID=1
For reporting, you can alias your columns with mixed case and spaces (eg. "Home Team") by enclosing the alias in double quotes.

Best way to structure SQL queries with many inner joins?

I have an SQL query that needs to perform multiple inner joins, as follows:
SELECT DISTINCT adv.Email, adv.Credit, c.credit_id AS creditId, c.creditName AS creditName, a.Ad_id AS adId, a.adName
FROM placementlist pl
INNER JOIN
(SELECT Ad_id, List_id FROM placements) AS p
ON pl.List_id = p.List_id
INNER JOIN
(SELECT Ad_id, Name AS adName, credit_id FROM ad) AS a
ON ...
(few more inner joins)
My question is the following: How can I optimize this query? I was under the impression that, even though the way I currently query the database creates small temporary tables (inner SELECT statements), it would still be advantageous to performing an inner join on the unaltered tables as they could have about 10,000 - 100,000 entries (not millions). However, I was told that this is not the best way to go about it but did not have the opportunity to ask what the recommended approach would be.
What would be the best approach here?
To use derived tables such as
INNER JOIN (SELECT Ad_id, List_id FROM placements) AS p
is not recommendable. Let the dbms find out by itself what values it needs from
INNER JOIN placements AS p
instead of telling it (again) by kinda forcing it to create a view on the table with the two values only. (And using FROM tablename is even much more readable.)
With SQL you mainly say what you want to see, not how this is going to be achieved. (Well, of course this is just a rule of thumb.) So if no other columns except Ad_id and List_id are used from table placements, the dbms will find its best way to handle this. Don't try to make it use your way.
The same is true of the IN clause, by the way, where you often see WHERE col IN (SELECT DISTINCT colx FROM ...) instead of simply WHERE col IN (SELECT colx FROM ...). This does exactly the same, but with DISTINCT you tell the dbms "make your subquery's rows distinct before looking for col". But why would you want to force it to do so? Why not have it use just the method the dbms finds most appropriate?
Back to derived tables: Use them when they really do something, especially aggregations, or when they make your query more readable.
Moreover,
SELECT DISTINCT adv.Email, adv.Credit, ...
doesn't look to good either. Yes, sometimes you need SELECT DISTINCT, but usually you wouldn't. Most often it is just a sign that you haven't thought your query through.
An example: you want to select clients that bought product X. In SQL you would say: where a purchase of X EXISTS for the client. Or: where the client is IN the set of the X purchasers.
select * from clients c where exists
(select * from purchases p where p.clientid = c.clientid and product = 'X');
Or
select * from clients where clientid in
(select clientid from purchases where product = 'X');
You don't say: Give me all combinations of clients and X purchases and then boil that down so I just get each client once.
select distinct c.*
from clients c
join purchases p on p.clientid = c.clientid and product = 'X';
Yes, it is very easy to just join all tables needed and then just list the columns to select and then just put DISTINCT in front. But it makes the query kind of blurry, because you don't write the query as you would word the task. And it can make things difficult when it comes to aggregations. The following query is wrong, because you multiply money earned with the number of money-spent records and vice versa.
select
sum(money_spent.value),
sum(money_earned.value)
from user
join money_spent on money_spent.userid = user.userid
join money_earned on money_earned.userid = user.userid;
And the following may look correct, but is still incorrect (it only works when the values happen to be unique):
select
sum(distinct money_spent.value),
sum(distinct money_earned.value)
from user
join money_spent on money_spent.userid = user.userid
join money_earned on money_earned.userid = user.userid;
Again: You would not say: "I want to combine each purchase with each earning and then ...". You would say: "I want the sum of money spent and the sum of money earned per user". So you are not dealing with single purchases or earnings, but with their sums. As in
select
sum(select value from money_spent where money_spent.userid = user.userid),
sum(select value from money_earned where money_earned.userid = user.userid)
from user;
Or:
select
spent.total,
earned.total
from user
join (select userid, sum(value) as total from money_spent group by userid) spent
on spent.userid = user.userid
join (select userid, sum(value) as total from money_earned group by userid) earned
on earned.userid = user.userid;
So you see, this is where derived tables come into play.

MySQL - 3 tables, is this complex join even possible?

I have three tables: users, groups and relation.
Table users with fields: usrID, usrName, usrPass, usrPts
Table groups with fields: grpID, grpName, grpMinPts
Table relation with fields: uID, gID
User can be placed in group in two ways:
if collect group minimal number of points (users.usrPts > group.grpMinPts ORDER BY group.grpMinPts DSC LIMIT 1)
if his relation to the group is manually added in relation tables (user ID provided as uID, as well as group ID provided as gID in table named relation)
Can I create one single query, to determine for every user (or one specific), which group he belongs, but, manual relation (using relation table) should have higher priority than usrPts compared to grpMinPts? Also, I do not want to have one user shown twice (to show his real group by points, but related group also)...
Thanks in advance! :) I tried:
SELECT * FROM users LEFT JOIN (relation LEFT JOIN groups ON (relation.gID = groups.grpID) ON users.usrID = relation.uID
Using this I managed to extract specified relations (from relation table), but, I have no idea how to include user points, respecting above mentioned priority (specified first). I know how to do this in a few separated queries in php, that is simple, but I am curious, can it be done using one single query?
EDIT TO ADD:
Thanks to really educational technique using coalesce #GordonLinoff provided, I managed to make this query to work as I expected. So, here it goes:
SELECT o.usrID, o.usrName, o.usrPass, o.usrPts, t.grpID, t.grpName
FROM (
SELECT u.*, COALESCE(relationgroupid,groupid) AS thegroupid
FROM (
SELECT u.*, (
SELECT grpID
FROM groups g
WHERE u.usrPts > g.grpMinPts
ORDER BY g.grpMinPts DESC
LIMIT 1
) AS groupid, (
SELECT grpUID
FROM relation r
WHERE r.userUID = u.usrID
) AS relationgroupid
FROM users u
)u
)o
JOIN groups t ON t.grpID = o.thegroupid
Also, if you are wondering, like I did, is this approach faster or slower than doing three queries and processing in php, the answer is that this is slightly faster way. Average time of this query execution and showing results on a webpage is 14 ms. Three simple queries, processing in php and showing results on a webpage took 21 ms. Average is based on 10 cases, average execution time was, really, a constant time.
Here is an approach that uses correlated subqueries to get each of the values. It then chooses the appropriate one using the precedence rule that if the relations exist use that one, otherwise use the one from the groups table:
select u.*,
coalesce(relationgroupid, groupid) as thegroupid
from (select u.*,
(select grpid from groups g where u.usrPts > g.grpMinPts order by g.grpMinPts desc limit 1
) as groupid,
(select gid from relations r where r.userId = u.userId
) as relationgroupid
from users u
) u
Try something like this
select user.name, group.name
from group
join relation on relation.gid = group.gid
join user on user.uid = relation.uid
union
select user.name, g1.name
from group g1
join group g2 on g2.minpts > g1.minpts
join user on user.pts between g1.minpts and g2.minpts