I have this statement:
SELECT board.*, numlikes
FROM board
LEFT JOIN (SELECT
pins.board_id, COUNT(source_user_id) AS numlikes
FROM likes
INNER JOIN pins ON pins.id = likes.pin_id
GROUP BY pins.board_id) likes ON board.id = likes.board_id
WHERE who_can_tag = ''
ORDER BY numlikes DESC LIMIT 10
But I need to also join these other two statements to it:
SELECT COUNT(owner_user_id)
FROM repin
INNER JOIN pins ON pins.id = repin.from_pin_id
WHERE pins.board_id = '$id'
and
SELECT COUNT(is_following_board_id)
FROM follow
WHERE is_following_board_id = '$id'
I managed to get the first one joined but I'm having trouble with the others - thinking it might get too long.
Is there a quicker way to execute?
Ideally, start with the smallest result set, and then start joining to the next smallest table.
You don't want the database to do full table joins on a bunch of big tables, and then at the end have a where clause that removes 99% of the rows the database just created.
In Oracle, I do a:
SELECT *
FROM big_table bt
JOIN DUAL ON bt.best_filter_column='the_value'
--now there are only a few rows
JOIN other_table_1 ...
LEFT JOIN outer_join_tables ...
Include all OUTER JOINS last, since they don't drop any rows, so hopefully you've already filtered out a lot of rows.
Related
The query below is grabbing some information about a category of toys and showing the most recent sale price for three levels of condition (e.g., Brand New, Used, Refurbished). The price for each sale is almost always different. One other thing - the sales table row id's are not necessarily in chronological order, e.g., a toy with a sale id of 5 could have happened later than a toy with a sale id of 10).
This query works but is not performant. It runs in a manageable amount of time, usually about 1s. However, I need to add yet another left join to include some more data, which causes the query time to balloon up to about 9s, no bueno.
Here is the working but nonperformant query:
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN (
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
) AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
But like I said it's slow. The sales table has about 200k rows.
What I tried to do was create the subquery as a view, e.g.,
CREATE VIEW sales_view AS
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
Then replace the subquery with the view, like
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN sales_view AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
Unfortunately, this change causes the query to no longer grab the most recent sale, and the sales price it returns is no longer the most recent.
Why is it that the table view doesn't return the same result as the same select as a subquery?
After reading just about every top-n-per-group stackoverflow question and blog article I could find, getting a query that actually worked was fantastic. But now that I need to extend the query one more step I'm running into performance issues. If anybody wants to sidestep the above question and offer some ways to optimize the original query, I'm all ears!
Thanks for any and all help.
The solution to the subquery performance issue was to use the answer provided here: Groupwise maximum
I thought that this approach could only be used when querying a single table, but indeed it works even when you've joined many other tables. You just have to left join the same table twice using the s.date_sold < s2.date_sold join condition and make sure the where clause looks for the null value in the second table's id column.
I have a query that fetches data from Six tables but it takes too much time to fetch data.The browser loads and shows sometimes nothing as a result.When I run this query in the MySQL database, it takes a long time to execute.
SELECT SQL_CALC_FOUND_ROWS movies.*,
curriculums.name AS curriculum,
teachers.name AS teacher,
movie_sub_categories.name AS sub_cat_name,
movie_categories.name AS cat_name
FROM movies
LEFT JOIN curriculums on movies.curriculum_id = curriculums.id
LEFT JOIN teachers on movies.teacher_id = teachers.id
LEFT JOIN movies_movie_sub_categories on movies.id = movies_movie_sub_categories.movie_id
LEFT JOIN movie_sub_categories on movies_movie_sub_categories.movie_sub_category_id = movie_sub_categories.id
LEFT JOIN movie_categories on movie_sub_categories.movie_category_id = movie_categories.id
ORDER BY id LIMIT 0, 50
Here all of my table structure
That's not a very exciting query -- it simply delivers the first 50 rows of whichever table id belongs to. When JOINing, please qualify columns so we know what is going on.
Do you really need LEFT?
Assuming you need LEFT and id belongs to movies, then this should run a lot faster:
Meanwhile, find how many rows there are in movies only once, so you don't have to compute it every time.
SELECT movies.*, curriculums.name AS curriculum,
teachers.name AS teacher, movie_sub_categories.name AS sub_cat_name,
movie_categories.name AS cat_name
FROM ( SELECT id FROM movies ORDER BY id LIMIT 0, 50 ) AS m
JOIN movies USING(id)
LEFT JOIN curriculums AS c ON movies.curriculum_id = c.id
LEFT JOIN teachers AS t ON movies.teacher_id = t.id
LEFT JOIN movies_movie_sub_categories AS mmsc ON movies.id = mmsc.movie_id
LEFT JOIN movie_sub_categories AS msc ON mmsc.movie_sub_category_id = msc.id
LEFT JOIN movie_categories AS mc ON msc.movie_category_id = mc.id
ORDER BY m.id
Please use SHOW CREATE TABLE; we need to see if you have sufficient indexes, such as
mmsc: INDEX(movie_id)
the table movies_movie_sub_categories needs to have an index on movie_id and a separate index on movie_sub_category_id. Without those two indexes the query builder will be forced to scan every record twice (since the query has two separate join clauses that reference that table)
I've the following code, which pulls out the most recent row from a table called wwlassessments.
It works, but what I'm trying to do is show all the rows matching the WHERE criteria in table, regardless of whether there's an entry in the wwlassessments table.
I've tried changing the 2nd JOIN to a LEFT JOIN, but this just provides thousands of inaccurate results.
I'm sure it's very simple, but I can't for the life of me work out what I need to change! Thanks in advance.
SELECT s.*,
a.*
FROM wwlstatements s
LEFT JOIN wwlassessments a ON a.id = s.id
JOIN (SELECT n.id,n.pupilID,
MAX(n.dateAchieved) AS max_achieved_date
FROM wwlassessments n
where n.pupilID='114631705547'
GROUP BY n.id) y ON y.id = a.id
AND y.max_achieved_date = a.dateAchieved
WHERE s.`category`='Reading'
ORDER BY s.`statementID` ASC
I have a query that looks like this:
select `adverts`.*
from `adverts`
inner join `advert_category` on `advert_category`.`advert_id` = `adverts`.`id`
inner join `advert_location` on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
and `advert_category`.`category_id` = ?
order by `updated_at` desc
The problem here is I have a huge database and this response is absolutely ravaging my database.
What I really need is to do the first join, and then do there where clause. This will whittle down my response from like 100k queries to less than 10k, then I want to do the other join, in order to whittle down the responses again so I can get the advert_location on the category items.
Doing it as is just isn't viable.
So, how do I go about using a join and a where condition, and then after getting that response doing a further join with a where condition?
Thanks
This is your query, written a bit simpler so I can read it:
select a.*
from adverts a inner join
advert_category ac
on ac.advert_id = a.id inner join
advert_location al
on al.advert_id = a.id
where al.location_id = ? and
ac.category_id = ?
order by a.updated_at desc;
I am speculating that advert_category and advert_locations have multiple rows per advert. In that case, you are getting a Cartesian product for each advert.
A better way to write the query uses exists:
select a.*
from adverts a
where exists (select 1
from advert_location al
where al.advert_id = a.id and al.location_id = ?
) and
exists (select 1
from advert_category ac
where ac.advert_id = a.id and ac.category_id = ?
)
order by a.updated_at desc;
For this version, you want indexes on advert_location(advert_id, location_id), advert_category(advert_id, category_id), and probably advert(updated_at, id).
You can write the 1st join in a Derived Table including a WHERE-condition and then do the 2nd join (but a decent optimizer might resolve the Derived Table again and do what he thinks is best based on statistics):
select adverts.*
from
(
select `adverts`.*
from `adverts`
inner join `advert_category`
on `advert_category`.`advert_id` =`adverts`.`id`
where `advert_category`.`category_id` = ?
) as adverts
inner join `advert_location`
on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
order by `updated_at` desc
MySQL will reorder inner joins for you during optimization, regardless of how you wrote them in your query. Inner join is the same in either direction (in algebra this is called commutative), so this is safe to do.
You can see the result of join reordering if you use EXPLAIN on your query.
If you don't like the order MySQL chose for your joins, you can override it with this kind of syntax:
from `adverts`
straight_join `advert_category` ...
https://dev.mysql.com/doc/refman/5.7/en/join.html says:
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer processes the tables in a suboptimal order.
Once the optimizer has decided on the join order, it always does one join at a time, in that order. This is called the nested join method.
There isn't really any way to "do the join then do the where clause". Conditions are combined together when looking up rows for joined tables. But this is a good thing, because you can then create a compound index that helps match rows based on both join conditions and where conditions.
PS: When asking query optimization question, you should include the EXPLAIN output, and also run SHOW CREATE TABLE <tablename> for each table, and include the result. Then we don't have to guess at the columns and indexes in your table.
I have movie database that has these tables: new_movies, ratings, critic_ratings, colors
I'm trying to execute this SELECT statement which will combine these 4 tables on the same movie using 'mid' (movie id):
SELECT DISTINCT
new_movies.*,
movies_db.*,
ratings.rating,
ratings.count,color,
critic_ratings.rating AS critic_ratings
FROM
new_movies
INNER JOIN
movies_db
ON
new_movies.mid = movies_db.mid
LEFT JOIN
ratings
ON
new_movies.mid = ratings.mid
LEFT JOIN
colors
ON
new_movies.mid = colors.mid
LEFT JOIN
critic_ratings
ON
new_movies.mid = critic_ratings.mid
ORDER BY
title ASC
But I get this error:
The SELECT would examine more than
MAX_JOIN_SIZE rows; check your WHERE
and use SET SQL_BIG_SELECTS=1 or SET
SQL_MAX_JOIN_SIZE=# if the SELECT is
okay
How do I properly do this query?
If you don't want to enable big selects, you could reform this using correlated sub-queries. (I don't know if you'll still hit the limit or not though.)
SELECT DISTINCT
new_movies.*,
movies_db.*,
(SELECT rating FROM ratings WHERE new_movies.mid = ratings.mid) AS rating,
(SELECT count FROM ratings WHERE new_movies.mid = ratings.mid) AS rating_count,
(SELECT color FROM colors WHERE new_movies.mid = colors.mid) AS colour,
(SELECT rating FROM critic_ratings WHERE new_movies.mid = critic_ratings.mid) AS critic_ratings
FROM
new_movies
INNER JOIN
movies_db
ON new_movies.mid = movies_db.mid
ORDER BY
title ASC
Also, worth a test to see if the LEFT JOINs are actually the cause, can you execute the following?
SELECT DISTINCT
new_movies.*,
movies_db.*
FROM
new_movies
INNER JOIN
movies_db
ON new_movies.mid = movies_db.mid
ORDER BY
title ASC
why do you have a movies and a new_movies table? surely a release date field would be sufficient for that - would cut out a join too...
to that end I would create a view of that data and query that instead.
But back to your query:
SELECT DISTINCT
new_movies.*,
movies_db.*,
ratings.rating,
ratings.count,
color,
critic_ratings.rating AS critic_ratings
FROM
new_movies
INNER JOIN
movies_db
ON
new_movies.mid = movies_db.mid
LEFT JOIN
ratings
ON
new_movies.mid = ratings.mid
LEFT JOIN
colors
ON
new_movies.mid = colors.mid
LEFT JOIN
critic_ratings
ON
new_movies.mid = critic_ratings.mid
ORDER BY
title ASC
I can't see anything obvious... perhaps you can post the results of an explain query?
There is no problem with your query per se. It's just that you're selecting all movies (no WHERE, no LIMIT) and since you're joining ratings for e.g., it will join all ratings to each movie. You are just reaching the max amount of data allowed for joins.
I'm not sure why you'd need to select all movies. Perhaps you can use a limit. Otherwise you can just try the solutions in the error message.