Subselect in WHERE or FROM clause? - mysql

I have wondered about general performance of a query if specific subselect (subquery) is located in WHERE or FROM clause. I didn't find sufficient explanation which way is better. Are there some rules how we should apply subselect in this kind of queries?
I prepared following example
Query FROM
SELECT name
FROM users a
JOIN (SELECT user_id, AVG(score) as score
FROM scores GROUP BY user_id
) b ON a.id=b.user_id
WHERE b.score > 15;
Query WHERE
SELECT name
FROM users
WHERE
(SELECT AVG(score) as score
FROM scores WHERE scores.user_id=users.id GROUP BY user_id
) > 15;
Tables:
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(30));
CREATE TABLE scores (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT,
score INT);
INSERT INTO users(name)
VALUES ('John'), ('Max'), ('Dan'), ('Alex');
INSERT INTO scores(user_id, score)
VALUES
(1, 20),
(1, 19),
(2, 15),
(2, 10),
(3, 20),
(3, 18),
(4, 13),
(4, 16),
(4, 15);

In both cases, scores needs INDEX(user_id, score) for performance.
It is hard to predict which will run faster.
There are times when a query similar to the first formulation is excellent. This is because it goes off an focuses on b and efficiently calculates all the AVGs all at once. Then it reaches over to the other table for the final info.
Let's tweak the second version slightly by adding some other test in the WHERE clause. Now the second one might be faster.
This may be even better:
SELECT name
FROM ( SELECT user_id -- Don't fetch AVG if not needed
FROM scores GROUP BY user_id
HAVING AVG(score) > 15; -- Note
) b
JOIN users a ON a.id = b.user_id
(The swapping of FROM and JOIN is not an optimization; it is just to show what order the Optimizer will perform the steps.)
In some other situations, EXISTS( SELECT ... ) is beneficial. (But I don't see such in your case.
Your question was about general optimization. I'm trying to emphasize that there is no general answer.

I think this request is faster than what you give above, as it has no subqueries.
SELECT u.name
FROM users u
JOIN scores s
ON (s.user_id = u.id)
GROUP BY s.user_id
HAVING AVG(s.score) > 15
You can see it on this link: http://sqlfiddle.com/#!9/b050f9/16
it shows the execution time of the next 3 Select queries:
SELECT name
FROM users a
JOIN (SELECT user_id, AVG(score) as score
FROM scores GROUP BY user_id
) b ON a.id=b.user_id
WHERE b.score > 15;
SELECT name
FROM users
WHERE
(SELECT AVG(score) as score
FROM scores WHERE scores.user_id=users.id GROUP BY user_id
) > 15;
SELECT u.name
FROM users u
JOIN scores s
ON (s.user_id = u.id)
GROUP BY s.user_id
HAVING AVG(s.score) > 15

Related

how to optimize query searching for similar ids referenced in different tables

in my query here https://www.db-fiddle.com/f/nfJzZoYC5gEXLu8hrw4JT2/1
SELECT id, COUNT(DISTINCT tc1.co_id), COUNT(DISTINCT tc2.co_id)
FROM test t
INNER JOIN test_corelation_1 tc1 ON tc1.test_id = t.id AND tc1.co_id IN (
SELECT co_id
FROM test_corelation_1
WHERE test_id IN (1, 2, 5)
GROUP BY co_id
)
INNER JOIN test_corelation_2 tc2 ON tc2.test_id = t.id AND tc2.co_id IN (
SELECT co_id
FROM test_corelation_2
WHERE test_id IN (1, 2, 5)
GROUP BY co_id
)
GROUP BY t.id
ORDER BY (COUNT(DISTINCT tc1.co_id) + COUNT(DISTINCT tc2.co_id)) ASC;
i am trying getting all the ids from table test that shares similar ids corelated to the ids 1, 2, 3 then sorting it by the least similar by counting it which results in this
id
COUNT(DISTINCT tc1.co_id)
COUNT(DISTINCT tc2.co_id)
3
1
3
2
3
7
1
5
6
but it gets very very slow the more ids i am checking for its similarities and i do not know how to optimize it further from this and i thought of using CTE but it had same results in the optimizer explain
The main means we have to speed up queries are indexes. But of course we should ensure that our queries are as straight-forward as possible. In your case you don't need the test table at all. Just get your counts from the two other tables and join them. As you are only interested in test IDs that exist in both tables, this is a mere inner join.
select test_id, c1.count_1, c2.count_2
from
(
select test_id, count(*) as count_1
from test_corelation_1
where co_id in (select co_id from test_corelation_1 where test_id in (1, 2, 5))
group by test_id
) c1
join
(
select test_id, count(*) as count_2
from test_corelation_2
where co_id in (select co_id from test_corelation_2 where test_id in (1, 2, 5))
group by test_id
) c2 using (test_id)
order by c1.count_1 + c2.count_2, test_id;
I recommend these indexes for the query:
create index idx1 on test_corelation_1 (co_id, test_id);
create index idx2 on test_corelation_2 (co_id, test_id);
(In case the DBMS wants to work with indexes on (co_id, test_id), too, it already has these indexes, as these are the tables' primary keys.)
You seem to have written the query in a very awkward way. I would write it this way, using direct joins to subqueries which find the counts for the two test correlation tables:
SELECT t.id, COALESCE(tc1.cnt1, 0) AS cnt1, COALESCE(tc2.cnt2, 0) AS cnt2
FROM test t
LEFT JOIN
(
SELECT co_id, COUNT(*) AS cnt1
FROM test_corelation_1
WHERE test_id IN (1, 2, 5)
GROUP BY co_id
) tc1
ON tc1.test_id = t.id
LEFT JOIN
(
SELECT co_id, COUNT(*) AS cnt2
FROM test_corelation_2
WHERE test_id IN (1, 2, 5)
GROUP BY co_id
) tc2
ON tc2.test_id = t.id
ORDER BY cnt1 + cnt2;
Count queries are notoriously difficult to optimize, but the subqueries above have a WHERE clause, so the following indices might help:
CREATE INDEX tc_idx_1 ON test_corelation_1 (test_id, co_id);
CREATE INDEX tc_idx_1 ON test_corelation_2 (test_id, co_id);

Getting sum from a left table of leftjoined table

Below are the tables and the SQL query. I am doing a left join and trying to get SUM of a column that's in the left table and count from the right table.
Is it possible to get both in 1 query?
https://www.db-fiddle.com/f/3QuxG1DLgWJ8aGXNbnnwU1/1
select
s.test,
count(distinct s.name),
sum(s.score) score, -- need accurate score
count(a.id) attempts -- need accurate attempt count
from question s
left join attempts a on s.id = a.score_id
group by s.test
create table question (
id int auto_increment primary key,
test varchar(25),
name varchar(25),
score int
);
create table attempts (
id int auto_increment primary key,
score_id int,
attempt_no int
);
insert into question (test, name, score) values
('test1','name1', 10),
('test1','name2', 15),
('test1','name3', 20),
('test1','name4', 25),
('test2','name1', 15),
('test2','name2', 25),
('test2','name3', 30),
('test2','name4', 20);
insert into attempts (score_id, attempt_no) values
(1, 1),
(1, 2),
(1, 3),
(1, 4),
(2, 1),
(2, 2),
(2, 3),
(2, 4);
You need to pre-aggregate before the join:
select q.test, count(distinct q.name),
sum(q.score) score, -- need accurate score
sum(a.num_attempts) attempts -- need accurate attempt count
from question q left join
(select a.score_id, count(*) as num_attempts
from attempts a
group by a.score_id
) a
on q.id = a.score_id
group by q.test;
Here is a db-fiddle.
As Gordon said above, you can pre-aggregate, but his answer will get you the incorrect number of attempts, unfortunately. This is due to an issue with how you're structuring your DB schema. It looks like your question table really records scores of attempts at questions, and your attempts table is unnecessary. You should really have a question table that simply contains an ID and a name for the question, and a attempts table that contains an attempt ID, question ID, name, and score.
create table question (
id int auto_increment primary key,
test varchar(25)
);
create table attempts (
id int auto_increment primary key,
question_id int,
name varchar(25),
score int
);
Then your query becomes as simple as:
select
q.id as question_id,
count(distinct a.name) as attempters,
sum(a.score) as total_score,
count(a.id) as total_attempts
from question q join attempts a on q.id = a.question_id
group by q.id

Delete all duplicate rows in mysql

i have MySQL data which is imported from csv file and have multiple duplicate files on it,
I picked all non duplicates using Distinct feature.
Now i need to delete all duplicates using SQL command.
Note i don't need any duplicates i just need to fetch only noon duplicates
thanks.
for example if number 0123332546666 is repeated 11 time i want to delete 12 of them.
Mysql table format
ID, PhoneNumber
Just COUNT the number of duplicates (with GROUP BY) and filter by HAVING. Then supply the query result to DELETE statement:
DELETE FROM Table1 WHERE PhoneNumber IN (SELECT a.PhoneNumber FROM (
SELECT COUNT(*) AS cnt, PhoneNumber FROM Table1 GROUP BY PhoneNumber HAVING cnt>1
) AS a);
http://sqlfiddle.com/#!9/a012d21/1
complete fiddle:
schema:
CREATE TABLE Table1
(`ID` int, `PhoneNumber` int)
;
INSERT INTO Table1
(`ID`, `PhoneNumber`)
VALUES
(1, 888),
(2, 888),
(3, 888),
(4, 889),
(5, 889),
(6, 111),
(7, 222),
(8, 333),
(9, 444)
;
delete query:
DELETE FROM Table1 WHERE PhoneNumber IN (SELECT a.PhoneNumber FROM (
SELECT COUNT(*) AS cnt, PhoneNumber FROM Table1 GROUP BY PhoneNumber HAVING cnt>1
) AS a);
you could try using a left join with the subquery for min id related to each phonenumber ad delete where not match
delete m
from m_table m
left join (
select min(id), PhoneNumber
from m_table
group by PhoneNumber
) t on t.id = m.id
where t.PhoneNumber is null
otherwise if you want delete all the duplicates without mantain at least a single row you could use
delete m
from m_table m
INNER join (
select PhoneNumber
from m_table
group by PhoneNumber
having count(*) > 1
) t on t.PhoneNumber= m.PhoneNumber
Instead of deleting from the table, I would suggest creating a new one:
create table table2 as
select min(id) as id, phonenumber
from table1
group by phonenumber
having count(*) = 1;
Why? Deleting rows has a lot of overhead. If you are bringing the data in from an external source, then treat the first landing table as a staging table and the second as the final table.

Select rows with max value (value is generated by join)

I have a database that can be summarized like this:
teacher (tid, f_name, l_name);
subject (sid, title);
teacher_subject (tid, sid);
What I want is to get the teachers that teach the most subjects, I have seen some similar but not duplicate questions here and couldn't patch up the solutions to get to what I want, this is in short what I've written:
select max(num_subs) from
(select t.f_name, t.l_name, count(t.tid) num_subs
from teacher t
join teacher_subject ts
on t.tid = ts.tid
group by t.tid)
max_subs;
But couldn't go any further. I'm sure there's a way to it as I was sometimes getting too close to it but never reached.
This is a little awkward in MySQL for the lack of window functions or a limit clause that allows for ties, but here you go:
select *
from teacher
where tid in
(
select tid
from teacher_subject
group by tid
having count(*) =
(
select count(*)
from teacher_subject
group by tid
order by count(*) desc
limit 1
)
);
Just for the record, in standard SQL this is merely:
select *
from teacher t
order by (select count(*) from teacher_subject ts where ts.tid = t.tid) desc
fetch first 1 row with ties;

SELECT all EXCEPT results in a subquery

I have a query that joins several tables (3 or 4) and gets me results as expected.
SELECT DISTINCT test_title, stt_id FROM student_tests
LEFT JOIN student_test_answers ON sta_stt_num = stt_id
JOIN tests ON stt_test_id = test_id
WHERE student_test_answer_id IS NULL
I have another query that shows another set of data, it basically is this:
SELECT test_id, COUNT(*) AS theCount FROM tests
JOIN test_questions ON test_id= tq_test_id
WHERE type= 'THE_TYPE'
GROUP BY test_id
HAVING theCount = 1
So basically I want to NOT include the results of this second query in the first one. the test_id would be the joining fields.
I have tried a WHERE NOT EXISTS ( -the above query -) but that returns no results which is not correct. I also tried 'NOT IN ( )'
Is there a better way of doing this?
Try something like this:
(SELECT test_id, COUNT(*) AS theCount FROM tests
JOIN test_questions ON test_id= tq_test_id
WHERE type= 'THE_TYPE'
GROUP BY test_id
HAVING theCount = 1) outer
LEFT JOIN (
[OtherQuery]
) a ON outer.test_id = a.test_id
WHERE a.test_id IS NULL
Here is my answer. Left outer Join gives you the participant (test). If there are no matches in test_questions then it'll still return the test rows, but null for test_questions. So if you then look for any test_question.test_id that is null, you will get what you are looking for.
I would also be specific in your count clause and not do a count(*) just to ensure that mysql knows what you truly want to count.
create database test;
use test;
create table test
(
test_id int,
`the_type` varchar(20)
);
create table test_questions
(
test_question_id int,
test_id int,
`the_type` varchar(20)
);
insert into test values (1, 'history');
insert into test values (2, 'chemistry');
insert into test values (3, 'reading');
insert into test_questions values (1, 1, 'hard question');
insert into test_questions values (2, 1, 'medium question');
insert into test_questions values (3, 2, 'hard question');
insert into test_questions values (4, 2, 'easy question');
select * from test;
select * from test_questions;
select t.test_id, count(distinct t.test_id)
from test t
left outer join test_questions tq on tq.test_id = t.test_id
where
tq.test_id is null
group by
t.test_id
As written in the comment you should be able to do it like this:
SELECT
DISTINCT test_title,
olc_stt_i_num
FROM
student_tests
LEFT JOIN
student_test_answers
ON olc_sta_i_stt_num = olc_stt_i_num
INNER JOIN
ol_class_tests
ON stt_test_num = test_num
WHERE
student_test_answer_id IS NULL
-- added this: replace test_id with real column
AND ***test_id*** NOT IN (
SELECT
test_id
FROM
tests
JOIN
test_questions
ON test_id= tq_test_id
WHERE
type= 'THE_TYPE'
GROUP BY
test_id
HAVING
COUNT(*) = 1
)
What exactly is the second query? I only see one query here, and no sub-query. Also, a sqlfiddle of your precise schema would be helpful.
Anyhow, I think you want some sort of left excluding join. It looks something like this:
select test.test_id, count(*) as theCount
from tests test
join test_questions tq
on tq.test_id = test.test_id
left join tests excluded_test
on excluded_test.test_id = test.test_id
where tq.type = 'THE_TYPE'
and << Some condition that explains what excluded_test is >>
and excluded_test.test_id is null
EDIT: Yeah, there's definitely a lot of details missing from the original question (which had in some ways been mended), and which are still missing. Knowing the full table-structure of your example is important here, so it is difficult to provide a good concrete answer.