I have three tables, with structures like this:
Trips: id
Users: id
users_has_trips: user_id, trips_id
My current query:
SELECT trips.id
FROM trips
LEFT JOIN users_has_trips ON users_has_trips.trips_id = trips.id
WHERE users_has_trips.users_id != '1'
I would like not to select a trip.id if the value in users_has_trips.users_id is set to a certain value (such as 1). users_has_trips has multiple rows with the same values for users_has_trips.trips_id so when I eliminate the row that has the undesired users_id, I will still have rows showing up.
For example, there are trip.id values of 1 and 2. users_has_trips columns (trips_id, users_id) have values of (1, 1), (1, 2), and (1, 3), respectively.
When I run the query above, WHERE will eliminate the the row for (1, 1), but will still grab the trips.id of 1 in rows (1, 2), (1, 3), and also the trips.id of 2.
The desired outcome is to not select trips.id value of 1 (because my users_id was associated with it) and only grab trips.id value of 2.
SELECT *
FROM trips AS t
WHERE t.id NOT IN
(
SELECT trip_id
FROM users_has_trips
WHERE user_id = '1'
)
Or rephrasing your question:
Only show those trips where no row exists for a given user:
SELECT *
FROM trips AS t
WHERE NOT EXISTS
(
SELECT *
FROM users_has_trips AS ut
WHERE ut.user_id = '1'
AND ut.trip_id = t.id
)
Related
I have wondered about general performance of a query if specific subselect (subquery) is located in WHERE or FROM clause. I didn't find sufficient explanation which way is better. Are there some rules how we should apply subselect in this kind of queries?
I prepared following example
Query FROM
SELECT name
FROM users a
JOIN (SELECT user_id, AVG(score) as score
FROM scores GROUP BY user_id
) b ON a.id=b.user_id
WHERE b.score > 15;
Query WHERE
SELECT name
FROM users
WHERE
(SELECT AVG(score) as score
FROM scores WHERE scores.user_id=users.id GROUP BY user_id
) > 15;
Tables:
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(30));
CREATE TABLE scores (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT,
score INT);
INSERT INTO users(name)
VALUES ('John'), ('Max'), ('Dan'), ('Alex');
INSERT INTO scores(user_id, score)
VALUES
(1, 20),
(1, 19),
(2, 15),
(2, 10),
(3, 20),
(3, 18),
(4, 13),
(4, 16),
(4, 15);
In both cases, scores needs INDEX(user_id, score) for performance.
It is hard to predict which will run faster.
There are times when a query similar to the first formulation is excellent. This is because it goes off an focuses on b and efficiently calculates all the AVGs all at once. Then it reaches over to the other table for the final info.
Let's tweak the second version slightly by adding some other test in the WHERE clause. Now the second one might be faster.
This may be even better:
SELECT name
FROM ( SELECT user_id -- Don't fetch AVG if not needed
FROM scores GROUP BY user_id
HAVING AVG(score) > 15; -- Note
) b
JOIN users a ON a.id = b.user_id
(The swapping of FROM and JOIN is not an optimization; it is just to show what order the Optimizer will perform the steps.)
In some other situations, EXISTS( SELECT ... ) is beneficial. (But I don't see such in your case.
Your question was about general optimization. I'm trying to emphasize that there is no general answer.
I think this request is faster than what you give above, as it has no subqueries.
SELECT u.name
FROM users u
JOIN scores s
ON (s.user_id = u.id)
GROUP BY s.user_id
HAVING AVG(s.score) > 15
You can see it on this link: http://sqlfiddle.com/#!9/b050f9/16
it shows the execution time of the next 3 Select queries:
SELECT name
FROM users a
JOIN (SELECT user_id, AVG(score) as score
FROM scores GROUP BY user_id
) b ON a.id=b.user_id
WHERE b.score > 15;
SELECT name
FROM users
WHERE
(SELECT AVG(score) as score
FROM scores WHERE scores.user_id=users.id GROUP BY user_id
) > 15;
SELECT u.name
FROM users u
JOIN scores s
ON (s.user_id = u.id)
GROUP BY s.user_id
HAVING AVG(s.score) > 15
I have a query that joins several tables (3 or 4) and gets me results as expected.
SELECT DISTINCT test_title, stt_id FROM student_tests
LEFT JOIN student_test_answers ON sta_stt_num = stt_id
JOIN tests ON stt_test_id = test_id
WHERE student_test_answer_id IS NULL
I have another query that shows another set of data, it basically is this:
SELECT test_id, COUNT(*) AS theCount FROM tests
JOIN test_questions ON test_id= tq_test_id
WHERE type= 'THE_TYPE'
GROUP BY test_id
HAVING theCount = 1
So basically I want to NOT include the results of this second query in the first one. the test_id would be the joining fields.
I have tried a WHERE NOT EXISTS ( -the above query -) but that returns no results which is not correct. I also tried 'NOT IN ( )'
Is there a better way of doing this?
Try something like this:
(SELECT test_id, COUNT(*) AS theCount FROM tests
JOIN test_questions ON test_id= tq_test_id
WHERE type= 'THE_TYPE'
GROUP BY test_id
HAVING theCount = 1) outer
LEFT JOIN (
[OtherQuery]
) a ON outer.test_id = a.test_id
WHERE a.test_id IS NULL
Here is my answer. Left outer Join gives you the participant (test). If there are no matches in test_questions then it'll still return the test rows, but null for test_questions. So if you then look for any test_question.test_id that is null, you will get what you are looking for.
I would also be specific in your count clause and not do a count(*) just to ensure that mysql knows what you truly want to count.
create database test;
use test;
create table test
(
test_id int,
`the_type` varchar(20)
);
create table test_questions
(
test_question_id int,
test_id int,
`the_type` varchar(20)
);
insert into test values (1, 'history');
insert into test values (2, 'chemistry');
insert into test values (3, 'reading');
insert into test_questions values (1, 1, 'hard question');
insert into test_questions values (2, 1, 'medium question');
insert into test_questions values (3, 2, 'hard question');
insert into test_questions values (4, 2, 'easy question');
select * from test;
select * from test_questions;
select t.test_id, count(distinct t.test_id)
from test t
left outer join test_questions tq on tq.test_id = t.test_id
where
tq.test_id is null
group by
t.test_id
As written in the comment you should be able to do it like this:
SELECT
DISTINCT test_title,
olc_stt_i_num
FROM
student_tests
LEFT JOIN
student_test_answers
ON olc_sta_i_stt_num = olc_stt_i_num
INNER JOIN
ol_class_tests
ON stt_test_num = test_num
WHERE
student_test_answer_id IS NULL
-- added this: replace test_id with real column
AND ***test_id*** NOT IN (
SELECT
test_id
FROM
tests
JOIN
test_questions
ON test_id= tq_test_id
WHERE
type= 'THE_TYPE'
GROUP BY
test_id
HAVING
COUNT(*) = 1
)
What exactly is the second query? I only see one query here, and no sub-query. Also, a sqlfiddle of your precise schema would be helpful.
Anyhow, I think you want some sort of left excluding join. It looks something like this:
select test.test_id, count(*) as theCount
from tests test
join test_questions tq
on tq.test_id = test.test_id
left join tests excluded_test
on excluded_test.test_id = test.test_id
where tq.type = 'THE_TYPE'
and << Some condition that explains what excluded_test is >>
and excluded_test.test_id is null
EDIT: Yeah, there's definitely a lot of details missing from the original question (which had in some ways been mended), and which are still missing. Knowing the full table-structure of your example is important here, so it is difficult to provide a good concrete answer.
I use this query to insert multiple rows into my table user.
insert into user
(select 'bbb', coalesce(max(subid),0)+1 from user where name = 'bbb')
union
(select 'ccc', coalesce(max(subid),0)+1 from user where name = 'ccc');
How can I achieve the same result in a single select query?
Not quite. The problem is what happens when the names are not in the table. You can do this:
insert into user(name, subid)
select n.name, coalesce(max(u.subid), 1)
from (select 'bbb' as name union all select 'ccc') n left join
user u
on u.name = n.name
group by u.name;
It still has a union (all) for constructing the names, but the calculation of subid is only expressed once.
When using insert it is a good idea to list the columns explicitly.
I have an item-to-item similarity matrix set up with these tables:
items (id, ...) (Primary key `id`)
similarities (item1_id, item2_id, similarity) (Index on `item1_id` and `item2_id`)
The similarities tables contains pairs of ids with a similarity index, i.e:
item1_id item2_id similarity
1 2 0.3143
2 3 0.734
For efficient storage "reverse pairs" are omitted, i.e. there's only one pair (1,2), there's no redundant pair (2,1). That means the foreign key for an item may be either item1_id or item2_id.
Now I want to find items that are similar to a bunch of other items, sorted by descending similarity. I'm using this query:
SELECT `Item`.*
FROM `items` AS `Item`
LEFT JOIN `similarities` AS `Similarity`
ON (`Item`.`id` = `Similarity`.`item1_id`
AND `Similarity`.`item2_id` IN (1, 2, 3, ...))
OR (`Item`.`id` = `Similarity`.`item2_id`
AND `Similarity`.`item1_id` IN (1, 2, ,3, ...))
WHERE `Similarity`.`item1_id` IN (1, 2, 3, ...)
OR `Similarity`.`item2_id` IN (1, 2, 3, ...)
GROUP BY `Item`.`id`
ORDER BY `Similarity`.`similarity` desc
It's extremely slow though, it takes 4-5 seconds for ~100,000 items and ~30,000 similarity pairs. It seems the JOIN is extremely costly. Here's the query EXPLAINed:
select_type table type possible_keys key key_len ref rows Extra
SIMPLE Similarity index_merge item1_id,item2_id item1_id,item2_id 110,110 NULL 31 Using sort_union(item1_id,...
SIMPLE Item ALL PRIMARY NULL NULL NULL 136600 Using where; Using join buffer
What can I do to speed this up? Worst case I would do it in two separate queries, but I'd prefer one JOIN query if possible.
I didn't actually try this but maybe it points you in the right direction. The idea is to make a temp result of the UNION of (unique) id, similarity pairs from similarities, then join items with that.
SELECT Item.*, s.other_item_id, s.similarity
FROM items AS Item
JOIN
(
SELECT item1_id AS id, item2_id AS other_item_id, similarity FROM similarities
UNION
SELECT item2_id AS id, item1_id AS other_item_id, similarity FROM similarities
) AS s ON s.id = items.id
WHERE items.id IN (1, 2, 3, ...)
ORDER BY s.similarity DESC;
In your original query you don't need to restrict the ids from similarities in both the JOIN condition and the WHERE clause.
I am wondering whether joining to the items table twice will perform better than the two queries.
Pardon the psuedo-code-ish SELECT portion of this statement - I think you'll actually need a CASE for every field value...
SELECT
CASE WHEN `Item2`.`id` IS NULL THEN
`Item1`.`id`
ELSE `Item2`.`id`
END,
SELECT
CASE WHEN `Item2`.`id` IS NULL THEN
`Item1`.`name`
ELSE `Item2`.`name`
END,
SELECT
CASE WHEN `Item2`.`id` IS NULL THEN
`Item1`.`description`
ELSE `Item2`.`description`
END,
[and so on]
FROM `items` AS `Item1`
LEFT OUTER JOIN `similarities` AS `Similarity`
ON (`Item1`.`id` = `Similarity`.`item1_id`
RIGHT OUTER JOIN `items` AS `Item2`
ON (`Item2`.`id` = `Similarity`.`item2_id`
WHERE `Similarity`.`item1_id` IN (1, 2, 3, ...)
OR `Similarity`.`item2_id` IN (1, 2, 3, ...)
ORDER BY `Similarity`.`similarity` desc
Thanks to the inspirations, I ended up with this query:
SELECT `Item`.*
FROM `items` AS `Item`
JOIN (
SELECT `item1_id` AS `id`, `similarity`
FROM `similarities`
WHERE `similarities`.`item2_id` IN (1, 2, 3, ...)
UNION
SELECT `item2_id` AS `id`, `similarity`
FROM `similarities`
WHERE `similarities`.`item1_id` IN (1, 2, 3, ...)
) AS `SimilarityUnion` ON `SimilarityUnion`.`id` = `Item`.`id`
GROUP BY `SimilarityUnion`.`id`
ORDER BY `SimilarityUnion`.`similarity` DESC
I've got two tables:
User (id, name, etc)
UserRight (user_id, right_id)
I want to find the users who have rights 1, 2 and 3, but no users who only have one or two of these. Also, the number of rights will vary, so searches for (1,2,3) and (1,2,3,4,5,6,7) should work with much the same query.
Essentially:
SELECT *
FROM User
WHERE (
SELECT right_id
FROM tblUserRight
WHERE user_id = id
ORDER BY user_id ASC
) = (1,2,3)
Is this possible in MySQL?
SELECT u.id, u.name ...
FROM User u
JOIN UserRight r on u.id = r.user_id
WHERE right_id IN (1,2,3)
GROUP BY u.id, u.name ...
HAVING COUNT DISTINCT(right_id) = 3
You can also do this using PIVOT, especially if you want a visual representation. I did this on SQL Server - you may be able to translate it.
Declare #User Table (id Int, name Varchar (10))
Declare #UserRight Table (user_id Int, right_id Int)
Insert Into #User Values (1, 'Adam')
Insert Into #User Values (2, 'Bono')
Insert Into #User Values (3, 'Cher')
Insert Into #UserRight Values (1, 1)
Insert Into #UserRight Values (1, 2)
Insert Into #UserRight Values (1, 3)
--Insert Into #UserRight Values (2, 1)
Insert Into #UserRight Values (2, 2)
Insert Into #UserRight Values (2, 3)
Insert Into #UserRight Values (3, 1)
Insert Into #UserRight Values (3, 2)
--Insert Into #UserRight Values (3, 3)
SELECT *
FROM #User U
INNER JOIN #UserRight UR
ON U.id = UR.User_Id
PIVOT
(
SUM (User_Id)
FOR Right_Id IN ([1], [2], [3])
) as xx
WHERE 1=1
SELECT *
FROM #User U
INNER JOIN #UserRight UR
ON U.id = UR.User_Id
PIVOT
(
SUM (User_Id)
FOR Right_Id IN ([1], [2], [3])
) as xx
WHERE 1=1
AND [1] IS NOT NULL
AND [2] IS NOT NULL
AND [3] IS NOT NULL
In correspondance with the errors in my answer pointed out, here a solution with count and a subquery:
SELECT *
FROM User
WHERE 3 = (
SELECT Count(user_id)
FROM tblUserRight
WHERE right_id IN (1,2,3)
AND user_id = User.id
)
An optimizer may of course change this to Martin Smith's solution (i.e. by using a group by).