I have table cities with colums id, name, region_id and table orders with colums id,city_id.
I need select all rows from orders where is needed region_id.
With EXISTS:
SELECT `o`.`id`
FROM `orders` as `o`
WHERE EXISTS (SELECT `id`
FROM `cities`
WHERE `id` = `o`.`city_id` AND `region_id` = ".$region_id.")
With JOIN:
SELECT `o`.`id`
FROM `orders` as `o`
LEFT JOIN `cities` as `c` ON `o`.`city_id` = `c`.`id`
WHERE `c`.`region_id` = ".$region_id."
What is better in this case?
I would say JOIN is the better one due to readability. Fair enough it is easy to understand the Exit one, but with a quick look it is less repetition and easier to read.
If you come from a performance perspective, I would advice you to benchmark it on your own system.
I would also recommend you to take advantage of PDO if you are dealing with user inputs.
You should learn to use parameters, so you don't munge query strings. Stuffing parameter values into a query string can lead to unexpected syntax errors and SQL injection vulnerabilities.
The LEFT JOIN is superfluous. Your WHERE clause turns the outer join into an inner join:
SELECT o.id
FROM orders o JOIN
cities c
ON o.city_id = c.id
WHERE c.region_id = ".$region_id."
(I removed the backticks. They are unnecessary so they only serve to clutter the query.)
As to which is better, the two can do different things. The JOIN version can return multiple rows for a given order, if cities has multiple rows for the same id. Admittedly, this is unlikely with a column called id, but the semantics are slightly in favor of EXISTS. Note: This is only because you are not selecting any other columns from c. If so, the JOIN is a no-brainer.
For the most part, the choice is a matter of taste. You might want to run both versions, check the execution plans, and see which is faster on your data.
Related
I am studying select queries for MySQL join functions.
I am slightly confused on the below query. I understand the below statement to join attributes from multiple tables with the ON clause, and then filter the results set with the WHERE clause.
Is this correct? What other functionality does this provide? Are there better alternatives?
The tables, attributes, and schema are not relevant to this question, specifically just the ON and WHERE interaction. Thanks in advance for any insight you can provide, appreciated.
SELECT DISTINCT Movies.title
FROM Rentals
INNER JOIN Customers
INNER JOIN Copies
INNER JOIN Movies ON Rentals.customerNum=Customers.customerNum
AND Rentals.barcode=Copies.barcode
AND Copies.movieNum=Movies.movieNum
WHERE Customers.givenName='Chad'
AND Customers.familyName='Black';
INNER JOIN (and the outer joins) are binary operators that should be followed by an ON clause. Your particular syntax works in MySQL, but will not work in any other database (because it is missing two ON clauses).
I would recommend writing the query as:
SELECT DISTINCT m.title
FROM Movies m JOIN
Copies co
ON co.movieNum = m.movieNum JOIN
Rentals r
ON r.barcode = co.barcode JOIN
Customers c
ON c.customerNum = r.customerNum
WHERE c.givenName = 'Chad' AND
c.familyName = 'Black';
You should always put the JOIN conditions in the ON clause, with one ON per JOIN. This also introduces table aliases, which make the query easier to write and to read.
The WHERE clause has additional filtering conditions. These could also be in ON clauses, but I think the query reads better with them in the WHERE clause. You can glance at the query and see: "We are getting something from a bunch of tables for Chad Black".
Ordinary inner JOIN operations only generate result rows for table rows matching their ON condition. They suppress any rows that don't match. That means you can move the contents of ON clauses to your WHERE clause and get the same result set. Still, don't do that; JOINs are easier to understand when they have ON clauses.
If you use LEFT JOIN, a kind of outer join, you get rows from the first table you mention that don't match any rows in the second table according to the ON clause.
SELECT a.name, b.name
FROM a
LEFT JOIN b ON a.a_id = b.a_id
gives you are result set containing all rows of a with NULL values in b.name indicating that the ON condition did not match.
I'm creating a product filter for e-commerce store. I have a product table, characteristics table and a table in which I store product_id, characteristic_id and a single filter value.
shop_products - id, name
shop_characteristics - id, values (json)
shop_values - product_id, characteristic_id, value
I can build a query to get all the products by a single value like this:
SELECT `p`.* FROM `shop_products` `p`
LEFT JOIN `shop_values` `fv` ON `p`.`id` = `fv`.`product_id`
WHERE ((`fv`.`characteristic_id`=3) AND (`fv`.`value`='outdoor'))
It works fine. Also, I can modify this query and get all the products by multiple values that belong to the very same characteristics group (have identical characteristics_id) like this:
SELECT `p`.* FROM `shop_products` `p`
LEFT JOIN `shop_values` `fv` ON `p`.`id` = `fv`.`product_id`
WHERE ((`fv`.`characteristic_id`=3) AND (`fv`.`value`='outdoor'))
OR ((`fv`.`characteristic_id`=3) AND (`fv`.`value`='indoor'))
but when I try to create a query for multiple conditions with different characteristic_id I get nothing
SELECT `p`.* FROM `shop_products` `p`
LEFT JOIN `shop_values` `fv` ON `p`.`id` = `fv`.`product_id`
WHERE ((`fv`.`characteristic_id`=3) AND (`fv`.`value`='outdoor'))
AND ((`fv`.`characteristic_id`=5) AND (`fv`.`value`='white'))
My guess it does not work because of AND operator that I am using wrong in this case due to there are no records in shop_values table that have both characteristic_id 3 and 5.
So my question is how to combine or modify my query to get all related products or maybe it is a flaw to store data like this and I need to create a different kind of shop_values table?
Use aggregation. You can also use tuples with the in clause. So:
SELECT p.*
FROM shop_products p JOIN
shop_values v
ON p.id = v.product_id
WHERE (v.characteristic_id, v.value) IN ( (3, 'outdoor'), (5, 'white'))
GROUP BY p.id
HAVING COUNT(DISTINCT v.characteristic_id) = 2;
Notes:
Unnecessarily escaping column and table aliases (with backticks) just makes the query harder to write and to read.
In general, using SELECT p.* and GROUP BY p.id is really, really bad form. The one exception is when you are grouping by a unique or primary key. This latter form is actually supported in the ANSI standard.
A LEFT JOIN is not needed. You need to find matches between the tables for the logic to work.
The use of AND and OR is fine for the WHERE clause. MySQL happens to support tuples with IN, which somewhat simplifies the logic.
I have a query that looks like this:
select `adverts`.*
from `adverts`
inner join `advert_category` on `advert_category`.`advert_id` = `adverts`.`id`
inner join `advert_location` on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
and `advert_category`.`category_id` = ?
order by `updated_at` desc
The problem here is I have a huge database and this response is absolutely ravaging my database.
What I really need is to do the first join, and then do there where clause. This will whittle down my response from like 100k queries to less than 10k, then I want to do the other join, in order to whittle down the responses again so I can get the advert_location on the category items.
Doing it as is just isn't viable.
So, how do I go about using a join and a where condition, and then after getting that response doing a further join with a where condition?
Thanks
This is your query, written a bit simpler so I can read it:
select a.*
from adverts a inner join
advert_category ac
on ac.advert_id = a.id inner join
advert_location al
on al.advert_id = a.id
where al.location_id = ? and
ac.category_id = ?
order by a.updated_at desc;
I am speculating that advert_category and advert_locations have multiple rows per advert. In that case, you are getting a Cartesian product for each advert.
A better way to write the query uses exists:
select a.*
from adverts a
where exists (select 1
from advert_location al
where al.advert_id = a.id and al.location_id = ?
) and
exists (select 1
from advert_category ac
where ac.advert_id = a.id and ac.category_id = ?
)
order by a.updated_at desc;
For this version, you want indexes on advert_location(advert_id, location_id), advert_category(advert_id, category_id), and probably advert(updated_at, id).
You can write the 1st join in a Derived Table including a WHERE-condition and then do the 2nd join (but a decent optimizer might resolve the Derived Table again and do what he thinks is best based on statistics):
select adverts.*
from
(
select `adverts`.*
from `adverts`
inner join `advert_category`
on `advert_category`.`advert_id` =`adverts`.`id`
where `advert_category`.`category_id` = ?
) as adverts
inner join `advert_location`
on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
order by `updated_at` desc
MySQL will reorder inner joins for you during optimization, regardless of how you wrote them in your query. Inner join is the same in either direction (in algebra this is called commutative), so this is safe to do.
You can see the result of join reordering if you use EXPLAIN on your query.
If you don't like the order MySQL chose for your joins, you can override it with this kind of syntax:
from `adverts`
straight_join `advert_category` ...
https://dev.mysql.com/doc/refman/5.7/en/join.html says:
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer processes the tables in a suboptimal order.
Once the optimizer has decided on the join order, it always does one join at a time, in that order. This is called the nested join method.
There isn't really any way to "do the join then do the where clause". Conditions are combined together when looking up rows for joined tables. But this is a good thing, because you can then create a compound index that helps match rows based on both join conditions and where conditions.
PS: When asking query optimization question, you should include the EXPLAIN output, and also run SHOW CREATE TABLE <tablename> for each table, and include the result. Then we don't have to guess at the columns and indexes in your table.
I have two tables called profiles and details. The details table has index on the column city. Here is my query:
select *
from profiles p
left outer join details d
use index(details_city)
on (p.id = d.pid)
where (d.city = ‘york’ or p.city = 'york')
order by p.id
When I do the explain on it, I can see that index from details table on city column is not even being used.
Is there any restriction in MySQL that it does not use indexes in such cases.
The OR wrecks all attempts at optimizing. This should work much faster, especially if the tables are large:
SELECT p.*, d.*
FROM (
-- Get what you can from `profiles`:
( SELECT id
FROM profiles
WHERE city = 'york'
)
UNION DISTINCT
-- Get what you can by starting from `details`:
( SELECT p.id
FROM profiles p
JOIN details d ON (p.id = d.pid)
WHERE d.city = 'york' )
) AS u
JOIN profiles p ON p.id = u.id
LEFT JOIN details d ON d.pid = p.id
ORDER BY p.id
Each inner SELECT will use a different index, hence can be optimized. You will need these indexes:
d: INDEX(city, pid), INDEX(pid)
p: PRIMARY KEY(id), INDEX(city, id)
And you should not need any form of USE INDEX.
(And don't use the funny apostrophes: ‘york’.)
(OUTER is optional and has no impact.)
(If you need city LIKE '%york%', considerFULLTEXT` instead.)
Why, pray tell, do you have city in both tables?!? Fixing that may lead to the real solution.
I suspect you would see the same behavior with an inner join. In this statement, the predicate in the WHERE clause negates the "outerness" of the LEFT JOIN. I don't think it has anything to do with the LEFT JOIN.
With the predicate in the WHERE clause... city LIKE '%...', MySQL can't use an index range scan operation. It has to evaluate the value of city for every row in the table (or every row that isn't otherwise filtered out.)
Plus, you're returning every column from the details table, and MySQL can't satisfy that using just an index, it's going to have to visit pages in the underlying table to get the values of those columns.
MySQL is judging a different access plan to have a lower cost than using the index with a leading column of city. There's an equality comparison in the join predicate = d.pid. MySQL can use an index with a leading column to satisfy that.
The index most likely to be beneficial to this query is a composite index:
... ON details (pid, city)
I have five tables in my database. Members, items, comments, votes and countries. I want to get 10 items. I want to get the count of comments and votes for each item. I also want the member that submitted each item, and the country they are from.
After posting here and elsewhere, I started using subselects to get the counts, but this query is taking 10 seconds or more!
SELECT `items_2`.*,
(SELECT COUNT(*)
FROM `comments`
WHERE (comments.Script = items_2.Id)
AND (comments.Active = 1))
AS `Comments`,
(SELECT COUNT(votes.Member)
FROM `votes`
WHERE (votes.Script = items_2.Id)
AND (votes.Active = 1))
AS `votes`,
`countrys`.`Name` AS `Country`
FROM `items` AS `items_2`
INNER JOIN `members` ON items_2.Member=members.Id AND members.Active = 1
INNER JOIN `members` AS `members_2` ON items_2.Member=members.Id
LEFT JOIN `countrys` ON countrys.Id = members.Country
GROUP BY `items_2`.`Id`
ORDER BY `Created` DESC
LIMIT 10
My question is whether this is the right way to do this, if there's better way to write this statement OR if there's a whole different approach that will be better. Should I run the subselects separately and aggregate the information?
Yes, you can rewrite the subqueries as aggregate joins (see below), but I am almost certain that the slowness is due to missing indices rather than to the query itself. Use EXPLAIN to see what indices you can add to make your query run in a fraction of a second.
For the record, here is the aggregate join equivalent.
SELECT `items_2`.*,
c.cnt AS `Comments`,
v.cnt AS `votes`,
`countrys`.`Name` AS `Country`
FROM `items` AS `items_2`
INNER JOIN `members` ON items_2.Member=members.Id AND members.Active = 1
INNER JOIN `members` AS `members_2` ON items_2.Member=members.Id
LEFT JOIN (
SELECT Script, COUNT(*) AS cnt
FROM `comments`
WHERE Active = 1
GROUP BY Script
) AS c
ON c.Script = items_2.Id
LEFT JOIN (
SELECT votes.Script, COUNT(*) AS cnt
FROM `votes`
WHERE Active = 1
GROUP BY Script
) AS v
ON v.Script = items_2.Id
LEFT JOIN `countrys` ON countrys.Id = members.Country
GROUP BY `items_2`.`Id`
ORDER BY `Created` DESC
LIMIT 10
However, because you are using LIMIT 10, you are almost certainly as well off (or better off) with the subqueries that you currently have than with the aggregate join equivalent I provided above for reference.
This is because a bad optimizer (and MySQL's is far from stellar) could, in the case of the aggregate join query, end up performing the COUNT(*) aggregation work for the full contents of the Comments and Votes table before wastefully throwing everything but 10 values (your LIMIT) away, whereas in the case of your original query it will, from the start, only look at the strict minimum as far as the Comments and Votes tables are concerned.
More precisely, using subqueries in the way that your original query does typically results in what is called nested loops with index lookups. Using aggregate joins typically results in merge or hash joins with index scans or table scans. The former (nested loops) are more efficient than the latter (merge and hash joins) when the number of loops is small (10 in your case.) The latter, however, get more efficient when the former would result in too many loops (tens/hundreds of thousands or more), especially on systems with slow disks but lots of memory.