How this query be optimised to prevent timeout - mysql

From the database below with schema
movieActor (actorID, movieID)
rental (rentalID, inventoryID, customerID)
inventory (inventoryID, movieID)
I am trying to list pair of customers who rented movies from same actor. The resulting set should be composed of three columns
customerID1,customerID2,nOfCommonActors
for example
23 44 5
11 44 3
where the first row means customers with ids of 23 and 44 each rented various movies but 5 of those actors played in both set of movies customer 23 and 44 rented
I came up with this query however it takes so much time to run and times out without returning any result. Was wondering how I can make it more efficient( I am using MYSQL):
SELECT r1.customerID AS customerID1,
r2.customerID AS customerID2,
COUNT(DISTINCT fa.actorID) as nOfCommonActors
FROM movieActor AS fa
JOIN (SELECT r.customerID, i.movieID, fa.actorID
FROM rental AS r
JOIN inventory i
ON i.inventoryID=r.inventoryID
JOIN movieActor AS fa
ON fa.actorID=i.movieID
) AS r1
JOIN (SELECT r.customerID, i.movieID, fa.actorID
FROM rental AS r
JOIN inventory i
ON i.inventoryID=r.inventoryID
JOIN movieActor AS fa
ON fa.actorID=i.movieID
) AS r2
ON r2.actorID=r1.actorID
AND r1.customerID < r2.customerID
GROUP BY r1.customerID, r2.customerID
ORDER BY nOfCommonActors DESC;

The one thing I can think of is select distinct in the subqueries:
SELECT ca.customerID AS customerID1,
ca2.customerID AS customerID2,
COUNT(*) as nOfCommonActors
FROM (SELECT DISTINCT r.customerID, fa.actorID
FROM rental r JOIN
inventory i
ON i.inventoryID = r.inventoryID JOIN
movieActor fa
ON fa.actorID = i.movieID
) ca JOIN
(SELECT DISTINCT r.customerID, fa.actorID
FROM rental r JOIN
inventory i
ON i.inventoryID = r.inventoryID JOIN
movieActor fa
ON fa.actorID = i.movieID
) ca2
ON ca.actorID = ca2.actorID AND
ca.customerID < ca2.customerID
GROUP BY ca.customerID, ca2.customerID
ORDER BY nOfCommonActors DESC;
Your version is multiplying out the number of rows in the subqueries considerably. That makes the JOIN more expensive -- and all that extra work is for nought because you want COUNT(DISTINCT) anyway.

Splitting the query into, allows allows statistic to plot best path
SELECT DISTINCT r.customerID, fa.actorID
into #t1
FROM rental r JOIN
inventory i
ON i.inventoryID = r.inventoryID JOIN
movieActor fa
ON fa.actorID = i.movieID
SELECT DISTINCT r.customerID, fa.actorID
into #t1
FROM rental r JOIN
inventory i
ON i.inventoryID = r.inventoryID JOIN
movieActor fa
ON fa.actorID = i.movieID
select #t1.customerID AS customerID1,
#t2.customerID AS customerID2,
COUNT(*) as nOfCommonActors
from
(
select #t1.customerID, #t2.customerID
from #t1
join #t2 ON #t1.actorID = #t2.actorID AND #t1.customerID < #t2.customerID )
GROUP BY #t1.customerID, #t2.customerID
ORDER BY nOfCommonActors DESC;

Related

Minimizing redundancy of MySQL query

I'm having a bit of trouble trying to reduce the redundancy of a query in MySQL. I currently have it working, but it feels like I have too much overhead because it uses a redundant subquery. What I am trying to do is use a dvd rental database to find which store location has rented out more dvd's for each month in 2005.
Here is the working query
SELECT b.month, c.store_id, b.maxRentals
FROM
(SELECT a.month, MAX(a.rentalCount) as maxRentals
FROM
(SELECT MONTH(rental.rental_date) as month, inventory.store_id, count(1) as rentalCount
FROM rental
INNER JOIN inventory
ON rental.inventory_id = inventory.inventory_id
WHERE YEAR(rental.rental_date) = 2005
GROUP BY MONTH(rental.rental_date), inventory.store_id
) a
GROUP BY a.month
) b
INNER JOIN
(SELECT MONTH(rental.rental_date) as month, inventory.store_id, count(1) as rentalCount
FROM rental
INNER JOIN inventory
ON rental.inventory_id = inventory.inventory_id
WHERE YEAR(rental.rental_date) = 2005
GROUP BY MONTH(rental.rental_date), inventory.store_id
) c
ON b.maxRentals = c.rentalCount
GROUP BY b.month;
Notice how the subquery with the alias of "c" is the exact same subquery of alias "a". I'm not sure if there's a way to get rid of this, as I can't inner join on an alias. Am I just stuck with a giant query, or is there something else I can do?
I am 90% certain this query will achieve your intentions:
SELECT MONTH(r.rental_date), i.store_id, COUNT(*)
FROM rental r
LEFT JOIN inventory i ON r.inventory_id = i.inventory_id
WHERE YEAR(r.rental_date) = 2005
GROUP BY MONTH(r.rental_date), i.store_id
Let me know how it goes!
Edit: to answer the question which store location has rented out more dvd's for each month in 2005:
SELECT x.rental_month, x.store_id, MAX(x.rental_count) FROM (
SELECT MONTH(r.rental_date) AS rental_month, i.store_id AS store_id, COUNT(*) AS rental_count
FROM rental r LEFT JOIN inventory i ON r.inventory_id = i.inventory_id
WHERE YEAR(r.rental_date) = 2005
GROUP BY MONTH(r.rental_date), i.store_id) x
GROUP BY x.rental_month, x.store_id
I was explicit by using aliases everywhere, you could probably omit some. Hopefully this helps...
Edit: Dirty hack:
SELECT x.rental_month, x.store_id, MAX(x.rental_count) FROM (
SELECT MONTH(r.rental_date) AS rental_month, i.store_id AS store_id, COUNT(*) AS rental_count
FROM rental r LEFT JOIN inventory i ON r.inventory_id = i.inventory_id
WHERE YEAR(r.rental_date) = 2005
GROUP BY MONTH(r.rental_date), i.store_id
ORDER BY MONTH(r.rental_date) ASC, COUNT(*) DESC) x
GROUP BY x.rental_month
Ref:
http://kristiannielsen.livejournal.com/6745.html
But then does this satisfy you, seeing as you do already have a working query...

Merge 2 SQL Queries/Tables

I spent so much time googling today but i don't even know which keywords to use. So …
The project is an evaluation of a betting game (Football). I have 2 SQL Queries:
SELECT players.username, players.userid, matchdays.userid, matchdays.points, SUM(points) AS gesamt
FROM players INNER JOIN matchdays ON players.userid = matchdays.userid AND matchdays.season_id=5
GROUP BY players.username
ORDER BY gesamt DESC
And my second query:
SELECT max(matchday) as lastmd, points, players.username from players INNER JOIN matchdays ON players.userid = matchdays.userid WHERE matchdays.season_id=5 AND matchday=
(select max(matchday) from matchdays)group by players.username ORDER BY points DESC
The first one adds up the points of every matchday and shows the sum.
The second shows the points of the last gameday.
My Goal is to merge those 2 queries/tables so that the output is a table like
Rank | Username | Points last gameday | Overall points |
I don't even know where to start or what to look for. Any help would be appreciated ;)
use both query with join....use inner join if each userid have value in 2nd query also.also add userid in 2nd query also for join
SET #rank = 0;
SELECT #rank := rank + 1,
t1.username,
t2.points,
t1.gesamt
FROM (
SELECT players.username, players.userid puserid, matchdays.userid muserid, matchdays.points, SUM(points) AS gesamt
FROM players INNER JOIN matchdays ON players.userid = matchdays.userid AND matchdays.season_id=5
GROUP BY players.username
)t1
INNER JOIN
(
SELECT players.userid, max(matchday) as lastmd, points, players.username
from players INNER JOIN matchdays ON players.userid = matchdays.userid
WHERE matchdays.season_id=5 AND matchday=
(select max(matchday) from matchdays)group by players.username
)t2
ON t1.puserid = t2.userid
ORDER BY t1.gesamt
You can use conditional aggregation, i.e. sum the points only when the day is the last day:
SELECT
p.username,
SUM(case when m.matchday = (select max(matchday) from matchdays) then m.points end)
AS last_day_points,
SUM(m.points) AS total_points
FROM players p
INNER JOIN matchdays m ON p.userid = m.userid AND m.season_id = 5
GROUP BY p.userid
ORDER BY total_points DESC;
Or with a join instead of a non-correlated subquery (MySQL should come to the same execution plan):
SELECT
p.username,
SUM(case when m.matchday = last_day.matchday then m.points end) AS last_day_points,
SUM(m.points) AS total_points
FROM players p
INNER JOIN matchdays m ON p.userid = m.userid AND m.season_id = 5
CROSS JOIN
(
select max(matchday) as matchday
from matchdays
) last_day
GROUP BY p.userid
ORDER BY total_points DESC;

Intersection in mysql for multiple values

The intersect keyword is not available in mysql. I want to know how to implement the following in mysql db. My tables are:
customer(cid,city,name,state)
orders(cid,oid,date)
product(pid,price,productname)
lineitem(lid,pid,oid,totalquantity,totalprice)
I want the products bought by all the customers of a particular city 'X'. i.e. every customer in city 'x' should have bought the product. I managed to select the oid's and the pid's of customers living in that particular city. Now I should select the pid's which is present in all the oid's.
Example.
Oid Pid
2400 1
2400 2
2401 3
2401 1
2402 1
2403 1
2403 3
The answer from the above input should be 1 because it is present in all oid's. The query which I used to get the oid's and pid's:
select t.oid,l.pid
from lineitem l
join (select o.oid,c1.cid
from orders o
join (select c.cid
from customer c
where c.city='X') c1
where o.cid=c1.cid) t on l.oid=t.oid
Now I need to intersect all the oid's and get the result.The query should not be dependent on data.
Try:
select pid, count(*)
from (select t.oid, l.pid
from lineitem l
join (select o.oid, c1.cid
from orders o
join (select c.cid from customer c where c.city = 'X') c1
where o.cid = c1.cid) t
on l.oid = t.oid) x
group by pid
having count(*) = (select count(*)
from (select distinct oid
from lineitem l
join (select o.oid, c1.cid
from orders o
join (select c.cid
from customer c
where c.city = 'X') c1
where o.cid = c1.cid) t
on l.oid = t.oid) y) z
I think you can achieve what you want by using IN

mySQL gives syntax error on subquery with valid syntax

I'm trying to find the film that has been rented the most without using limit. I'm trying to use the following query:
SELECT f.title, f.film_id
FROM film f
JOIN inventory i ON f.film_id = i.film_id
JOIN rental r ON r.inventory_id = i.inventory_id
GROUP BY f.film_id
HAVING COUNT(r.rental_id) = MAX(
SELECT COUNT(r2.rental_id)
FROM rental r2, inventory i2
WHERE i2.inventory_id = r2.inventory_id
GROUP BY i2.film_id);
but mySQL tells me that I have a syntax error somewhere in here SELECT COUNT(r2.rental_id)
FROM rental r2, inventory however, when I run the subquery independently it returns the expected table. Am I doing something massively wrong?
relevant database schema:
film(film id, title, description, release year, language id, original language id, rental duration, rental rate, length, replacement cost, rating, special features, last update)
inventory(inventory id, film id, store id, last update)
rental(rental id, rental date, inventory id, customer id, return date, staff id, last update)
You can't use MAX() over a result set, but you can use
someValue >= ALL (subquery)
to achieve what you're attempting, because ALL requires that the preceding operator be true for all values in the set.
Try this:
SELECT f.title, f.film_id
FROM film f
JOIN inventory i ON f.film_id = i.film_id
JOIN rental r ON r.inventory_id = i.inventory_id
GROUP BY f.film_id
HAVING COUNT(r.rental_id) >= ALL (
SELECT COUNT(r2.rental_id)
FROM rental r2, inventory i2
WHERE i2.inventory_id = r2.inventory_id
GROUP BY i2.film_id);
I don't have a database to test in, but this should work:
Edited to LIMIT 1 instead of SELECT TOP 1 for MySQL)
SELECT f.title, f.film_id
FROM film f
JOIN inventory i ON f.film_id = i.film_id
JOIN rental r ON r.inventory_id = i.inventory_id
GROUP BY f.film_id
HAVING COUNT(r.rental_id) = (SELECT COUNT(r2.rental_id)
FROM rental r2, inventory i2
WHERE i2.inventory_id = r2.inventory_id
GROUP BY i2.film_id
ORDER BY COUNT(r2.rental_id) desc
LIMIT 1) s

MySQL - Looking for Top 10 records by Month

I will preface with Table Structures:
revshare r : contains info for a purchase including orderNo, sales, commission, itemid, EventDate
Products p: contains information around a product including a PID (Product ID) and is used to join to the Merchants table to get Merchant information.
Merchants m: contains information about the merchant the product was purchased from, including MerchantName
Question
I am trying to create a MySQL query to pull top 10 itemid's ordered by sum of commission for a given month. The entire data set I would like to get is from 2011-2013 so each year would populate 120 records (10 per month).
I created a query to pull 1 months worth of data and planned on using a UNION ALL to just create a records list with 10 records from each query (each individual query representing a months top 10 itemid's).
Query1
This query accurately returns me the top 10 itemid's based on total commission of those items in the given month period.
SELECT
m.MerchantName,
Count(r.OrderNo),
sum(r.commission)
FROM revshare r
LEFT JOIN Products p ON r.itemid = p.PID
LEFT JOIN Merchants m ON p.MID = m.MID
WHERE r.EventDate between '2011-01-01' and '2011-01-31'
GROUP by r.itemid
ORDER by 3 DESC LIMIT 10
When I try to UNION this query with another so that I can get records for the next month between '2011-02-01' and '2011-02-31' I get and error "ERROR: Incorrect usage of UNION and ORDER BY" I know this is because apparently you cannot use ORDER BY on any set of UNION'd queries but the last. I could pull the entire data set and then use Excel or Pentaho BI to show only the top 10 but that is not efficient based on the huge data sets in the revshare table.
Below is the query with the UNION ALL that doesn't work. Does anyone have any better method of pulling this data?
Any help is greatly appreciated.
Regards,
-Chris
Query 2 (doesn't work because of the ORDER BY statement)
SELECT
m.MerchantName,
Count(r.OrderNo),
sum(r.commission)
FROM revshare r
LEFT JOIN Products p ON r.itemid = p.PID
LEFT JOIN Merchants m ON p.MID = m.MID
WHERE r.EventDate between '2011-01-01' and '2011-01-31'
GROUP by r.itemid
ORDER by 3 DESC LIMIT 10
UNION ALL
SELECT
m.MerchantName,
Count(r.OrderNo),
sum(r.commission)
FROM revshare r
LEFT JOIN Products p ON r.itemid = p.PID
LEFT JOIN Merchants m ON p.MID = m.MID
WHERE r.EventDate between '2011-02-01' and '2011-02-31'
GROUP by r.itemid
ORDER by 3 DESC LIMIT 10
Ok, try this....
SELECT * FROM (
SELECT
m.MerchantName,
Count(r.OrderNo),
sum(r.commission)
FROM
revshare r
LEFT JOIN Products p ON r.itemid = p.PID
LEFT JOIN Merchants m ON p.MID = m.MID
WHERE
r.EventDate between '2011-01-01' and '2011-01-31'
GROUP by
r.itemid
ORDER by
3 DESC LIMIT 10
) AS RESULT1
UNION ALL
SELECT * FROM (
SELECT
m.MerchantName,
Count(r.OrderNo),
sum(r.commission)
FROM
revshare r
LEFT JOIN Products p ON r.itemid = p.PID
LEFT JOIN Merchants m ON p.MID = m.MID
WHERE
r.EventDate between '2011-02-01' and '2011-02-31'
GROUP by
r.itemid
ORDER by
3 DESC LIMIT 10
) AS RESULT2
Since you already started down the path of unioning queries together, here is the right approach:
select t.*
from ((SELECT '2011-01' as yyyymm, m.MerchantName, Count(r.OrderNo) as cnt, sum(r.commission) as comm
FROM revshare r LEFT JOIN
Products p
ON r.itemid = p.PID LEFT JOIN
Merchants m
ON p.MID = m.MID
WHERE r.EventDate between '2011-01-01' and '2011-01-31'
GROUP by r.itemid
ORDER by comm DESC
LIMIT 10
) union all
(SELECT '2011-02' as yyyymm, m.MerchantName, Count(r.OrderNo) as cnt, sum(r.commission) as comm
FROM revshare r LEFT JOIN
Products p
ON r.itemid = p.PID LEFT JOIN
Merchants m
ON p.MID = m.MID
WHERE r.EventDate between '2011-02-01' and '2011-02-28'
GROUP by r.itemid
ORDER by comm DESC LIMIT 10
) union all
. . .
) t
order by 1, comm desc
In other words, you need to use subqueries for the union all. Note that I also added in yyyymm to identify the month.