SQL query with JOIN - mysql

I'm creating a product filter for e-commerce store. I have a product table, characteristics table and a table in which I store product_id, characteristic_id and a single filter value.
shop_products - id, name
shop_characteristics - id, values (json)
shop_values - product_id, characteristic_id, value
I can build a query to get all the products by a single value like this:
SELECT `p`.* FROM `shop_products` `p`
LEFT JOIN `shop_values` `fv` ON `p`.`id` = `fv`.`product_id`
WHERE ((`fv`.`characteristic_id`=3) AND (`fv`.`value`='outdoor'))
It works fine. Also, I can modify this query and get all the products by multiple values that belong to the very same characteristics group (have identical characteristics_id) like this:
SELECT `p`.* FROM `shop_products` `p`
LEFT JOIN `shop_values` `fv` ON `p`.`id` = `fv`.`product_id`
WHERE ((`fv`.`characteristic_id`=3) AND (`fv`.`value`='outdoor'))
OR ((`fv`.`characteristic_id`=3) AND (`fv`.`value`='indoor'))
but when I try to create a query for multiple conditions with different characteristic_id I get nothing
SELECT `p`.* FROM `shop_products` `p`
LEFT JOIN `shop_values` `fv` ON `p`.`id` = `fv`.`product_id`
WHERE ((`fv`.`characteristic_id`=3) AND (`fv`.`value`='outdoor'))
AND ((`fv`.`characteristic_id`=5) AND (`fv`.`value`='white'))
My guess it does not work because of AND operator that I am using wrong in this case due to there are no records in shop_values table that have both characteristic_id 3 and 5.
So my question is how to combine or modify my query to get all related products or maybe it is a flaw to store data like this and I need to create a different kind of shop_values table?

Use aggregation. You can also use tuples with the in clause. So:
SELECT p.*
FROM shop_products p JOIN
shop_values v
ON p.id = v.product_id
WHERE (v.characteristic_id, v.value) IN ( (3, 'outdoor'), (5, 'white'))
GROUP BY p.id
HAVING COUNT(DISTINCT v.characteristic_id) = 2;
Notes:
Unnecessarily escaping column and table aliases (with backticks) just makes the query harder to write and to read.
In general, using SELECT p.* and GROUP BY p.id is really, really bad form. The one exception is when you are grouping by a unique or primary key. This latter form is actually supported in the ANSI standard.
A LEFT JOIN is not needed. You need to find matches between the tables for the logic to work.
The use of AND and OR is fine for the WHERE clause. MySQL happens to support tuples with IN, which somewhat simplifies the logic.

Related

MySQL: Optimizing Sub-queries

I have this query I need to optimize further since it requires too much cpu time and I can't seem to find any other way to write it more efficiently. Is there another way to write this without altering the tables?
SELECT category, b.fruit_name, u.name
, r.count_vote, r.text_c
FROM Fruits b, Customers u
, Categories c
, (SELECT * FROM
(SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r
WHERE b.fruit_id = r.fruit_id
AND u.customer_id = r.customer_id
AND category = "Fruits";
This is your query re-written with explicit joins:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN
(
SELECT * FROM
(
SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r on r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
CROSS JOIN Categories c
WHERE c.category = 'Fruits';
(I am guessing here that the category column belongs to the categories table.)
There are some parts that look suspicious:
Why do you cross join the Categories table, when you don't even display a column of the table?
What is ORDER BY fruit_id, count_vote DESC, r_id supposed to do? Sub query results are considered unordered sets, so an ORDER BY is superfluous and can be ignored by the DBMS. What do you want to achieve here?
SELECT * FROM [ revues ] GROUP BY fruit_id is invalid. If you group by fruit_id, what count_vote and what r.text_c do you expect to get for the ID? You don't tell the DBMS (which would be something like MAX(count_vote) and MIN(r.text_c)for instance. MySQL should through an error, but silently replacescount_vote, r.text_cbyANY_VALUE(count_vote), ANY_VALUE(r.text_c)` instead. This means you get arbitrarily picked values for a fruit.
The answer hence to your question is: Don't try to speed it up, but fix it instead. (Maybe you want to place a new request showing the query and explaining what it is supposed to do, so people can help you with that.)
Your Categories table seems not joined/related to the others this produce a catesia product between all the rows
If you want distinct resut don't use group by but distint so you can avoid an unnecessary subquery
and you dont' need an order by on a subquery
SELECT category
, b.fruit_name
, u.name
, r.count_vote
, r.text_c
FROM Fruits b
INNER JOIN Customers u ON u.customer_id = r.customer_id
INNER JOIN Categories c ON ?????? /Your Categories table seems not joined/related to the others /
INNER JOIN (
SELECT distinct fruit_id, count_vote, text_c, customer_id
FROM Reviews
) r ON b.fruit_id = r.fruit_id
WHERE category = "Fruits";
for better reading you should use explicit join syntax and avoid old join syntax based on comma separated tables name and where condition
The next time you want help optimizing a query, please include the table/index structure, an indication of the cardinality of the indexes and the EXPLAIN plan for the query.
There appears to be absolutely no reason for a single sub-query here, let alone 2. Using sub-queries mostly prevents the DBMS optimizer from doing its job. So your biggest win will come from eliminating these sub-queries.
The CROSS JOIN creates a deliberate cartesian join - its also unclear if any attributes from this table are actually required for the result, if it is there to produce multiples of the same row in the output, or just an error.
The attribute category in the last line of your query is not attributed to any of the tables (but I suspect it comes from the categories table).
Further, your code uses a GROUP BY clause with no aggregation function. This will produce non-deterministic results and is a bug. Assuming that you are not exploiting a side-effect of that, the query can be re-written as:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN Reviews r
ON r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
ORDER BY r.fruit_id, count_vote DESC, r_id;
Since there are no predicates other than joins in your query, there is no scope for further optimization beyond ensuring there are indexes on the join predicates.
As all too frequently, the biggest benefit may come from simply asking the question of why you need to retrieve every single row in the tables in a single query.

MYSQL server: Write a query to display cars which was not taken on rent

Use Cars and Rentals tables to retrieve records.
CARS(car_id, car_name, car_type)
RENTALS(rental_id, cust_id, car_id, pickup_date, km, fare)
SELECT c.car_id, c.car_name, c.car_type
FROM cars as c, rentals as r
WHERE c.car_id=r.car_id and r.pickup_date=null
ORDER BY c.car_id;
I've tried this but output is NO ROWS SELECTED
I would recommend NOT EXISTS for this query:
select c.*
from cars c
where not exists (select 1 from rentals r where r.car_id = c.car_id);
That said, your query has multiple errors:
The comma in the FROM clause is very last-century. Use JOIN.
= NULL is always going to filter out all rows. Almost all comparisons to NULL return NULL which is treated as "false". The correct comparison is IS NULL, but I don't think that is needed.
You can specify equivalent logic using LEFT JOIN, but I think NOT EXISTS is closer to the statement of the question.
Your original intent was to join the two tables and filters on rows that did not match on the right side of the join. This technique is sometimes called an anti left join.
Your attempt failed because you need a left join rather than an (implicit) inner join, and because you did not properly check for nullity (this requires operator is null).
The left join solution phrases as:
select c.*
from cars c
left join rentals r on r.car_id = c.car_id
where r.car_id is null
order by c.car_id
Note that I check for nullity on column car_id rather than pickup_date - any column that is not nullable can do, however I find the intent clearer when using the joining column.
A simple way to do this is with NOT IN:
SELECT car_id, car_name, car_type
FROM cars
WHERE car_id NOT IN (SELECT car_id FROM rentals)
ORDER BY car_id;

MySQL EXISTS or JOIN

I have table cities with colums id, name, region_id and table orders with colums id,city_id.
I need select all rows from orders where is needed region_id.
With EXISTS:
SELECT `o`.`id`
FROM `orders` as `o`
WHERE EXISTS (SELECT `id`
FROM `cities`
WHERE `id` = `o`.`city_id` AND `region_id` = ".$region_id.")
With JOIN:
SELECT `o`.`id`
FROM `orders` as `o`
LEFT JOIN `cities` as `c` ON `o`.`city_id` = `c`.`id`
WHERE `c`.`region_id` = ".$region_id."
What is better in this case?
I would say JOIN is the better one due to readability. Fair enough it is easy to understand the Exit one, but with a quick look it is less repetition and easier to read.
If you come from a performance perspective, I would advice you to benchmark it on your own system.
I would also recommend you to take advantage of PDO if you are dealing with user inputs.
You should learn to use parameters, so you don't munge query strings. Stuffing parameter values into a query string can lead to unexpected syntax errors and SQL injection vulnerabilities.
The LEFT JOIN is superfluous. Your WHERE clause turns the outer join into an inner join:
SELECT o.id
FROM orders o JOIN
cities c
ON o.city_id = c.id
WHERE c.region_id = ".$region_id."
(I removed the backticks. They are unnecessary so they only serve to clutter the query.)
As to which is better, the two can do different things. The JOIN version can return multiple rows for a given order, if cities has multiple rows for the same id. Admittedly, this is unlikely with a column called id, but the semantics are slightly in favor of EXISTS. Note: This is only because you are not selecting any other columns from c. If so, the JOIN is a no-brainer.
For the most part, the choice is a matter of taste. You might want to run both versions, check the execution plans, and see which is faster on your data.

Should I use GROUP BY when I use IN clause?

I'm working on someone else's project. There is a query like this:
SELECT posts.id, posts.title, posts.body, posts.keywords
FROM posts
INNER JOIN pivot ON pivot.post_id = posts.id
INNER JOIN tags ON tags.id = pivot.tag_id
WHERE tags.name IN ( :keywords )
GROUP BY posts.id
The new policy is to replace IN with =. So the query I've written looks like this:
SELECT posts.id, posts.title, posts.body, posts.keywords
FROM posts
INNER JOIN pivot ON pivot.post_id = posts.id
INNER JOIN tags ON tags.id = pivot.tag_id
WHERE tags.name = :keyword
GROUP BY posts.id
Now I want to know, is GROUP BY redundant in this case? I say so because I think the reason of GROUP BY is omitting duplicate posts which are matched by each keyword.
First things first, when using GROUP BY within a SELECT statement every column that is not included within the grouping clause should be wrapped up with an aggregate function.
Just because MySQL allows this kind of odd behaviour doesn't make it best practices. Other DBMS for example PostgreSQL wouldn't allow this query to execute at all.
Saying that, how it works internally within MySQL is just that you get a unique record for each posts.id, but random values from potentially different rows for all non-aggregated and non-grouped column.
You should be using DISTINCT from what I can see.
Answer to your question
Replacing IN with = doesn't affect grouping at all, so you are free to go with it especially if you are not passing list but a single value to that query, but GROUP BY is not redundant in any case (or should be completely removed in both). It would change the output you receive.
If you, for instance, grouped by a unique column within a table and join that to a table with 1:1 relationship GROUP BY would be redundant. As a second example constructing proper WHERE clause with conditions might make it redundant as well.

Joining Results From Another Table

I'm dealing with a large query that maps data from one table into a CSV file, so it essentially looks like a basic select query--
SELECT * FROM item_table
--except that * is actually a hundred lines of CASE, IF, IFNULL, and other logic.
I've been told to add a "similar items" line to the select statement, which should be a string of comma-separated item numbers. The similar items are found in a category_table, which can join to item_table on two data points, column_a and column_b, with category_table.category_id having the data that identifies the similar items.
Additionally, I've been told NOT to use a subquery.
So I need to join category_table and group_concat item numbers from that table having the same category_id value (but not having the item number of whatever the current record would be).
If I can only do it with a subquery regardless of the instructions, I will accept that, but I want to do it with a join and group_concat as instructed if possible--I just can't figure it out. How can I do this?
You can make use of a mySQL "feature" called hidden columns.
I am going to assume you have an item id in the item table that uniquely identifies each row. And, if I have your logic correct, the following query does what you want:
select i.*, group_concat(c.category_id)
from item_table i left outer join
category_table c
on i.column_a = c.column_a and
i.column_b = c.column_b and
i.item_id <> c.category_id
group by i.item_id
I think this is what you're looking for, although I wasn't sure what uniquely identified your item_table so I used column_a and column_b (those may be incorrect):
SELECT
...,
GROUP_CONCAT(c.category_id separator ',') CategoryIDs
FROM item_table i
JOIN category_table ct ON i.column_a = ct.column_a AND
i.column_b = ct.column_b
GROUP BY i.column_a, i.column_b
I've used a regular INNER JOIN, but if the category_table might not have any related records, you may need to use a LEFT JOIN instead to get your desired results.
Maybe something like this?
SELECT i.*, GROUP_CONCAT(c.category_id) AS similar_items
FROM item_table i
INNER JOIN category_table c ON (i.column_a = c.column_a AND
i.column_b = c.column_b)
GROUP BY i.column_a, i.column_b