I have a query that joins several tables (3 or 4) and gets me results as expected.
SELECT DISTINCT test_title, stt_id FROM student_tests
LEFT JOIN student_test_answers ON sta_stt_num = stt_id
JOIN tests ON stt_test_id = test_id
WHERE student_test_answer_id IS NULL
I have another query that shows another set of data, it basically is this:
SELECT test_id, COUNT(*) AS theCount FROM tests
JOIN test_questions ON test_id= tq_test_id
WHERE type= 'THE_TYPE'
GROUP BY test_id
HAVING theCount = 1
So basically I want to NOT include the results of this second query in the first one. the test_id would be the joining fields.
I have tried a WHERE NOT EXISTS ( -the above query -) but that returns no results which is not correct. I also tried 'NOT IN ( )'
Is there a better way of doing this?
Try something like this:
(SELECT test_id, COUNT(*) AS theCount FROM tests
JOIN test_questions ON test_id= tq_test_id
WHERE type= 'THE_TYPE'
GROUP BY test_id
HAVING theCount = 1) outer
LEFT JOIN (
[OtherQuery]
) a ON outer.test_id = a.test_id
WHERE a.test_id IS NULL
Here is my answer. Left outer Join gives you the participant (test). If there are no matches in test_questions then it'll still return the test rows, but null for test_questions. So if you then look for any test_question.test_id that is null, you will get what you are looking for.
I would also be specific in your count clause and not do a count(*) just to ensure that mysql knows what you truly want to count.
create database test;
use test;
create table test
(
test_id int,
`the_type` varchar(20)
);
create table test_questions
(
test_question_id int,
test_id int,
`the_type` varchar(20)
);
insert into test values (1, 'history');
insert into test values (2, 'chemistry');
insert into test values (3, 'reading');
insert into test_questions values (1, 1, 'hard question');
insert into test_questions values (2, 1, 'medium question');
insert into test_questions values (3, 2, 'hard question');
insert into test_questions values (4, 2, 'easy question');
select * from test;
select * from test_questions;
select t.test_id, count(distinct t.test_id)
from test t
left outer join test_questions tq on tq.test_id = t.test_id
where
tq.test_id is null
group by
t.test_id
As written in the comment you should be able to do it like this:
SELECT
DISTINCT test_title,
olc_stt_i_num
FROM
student_tests
LEFT JOIN
student_test_answers
ON olc_sta_i_stt_num = olc_stt_i_num
INNER JOIN
ol_class_tests
ON stt_test_num = test_num
WHERE
student_test_answer_id IS NULL
-- added this: replace test_id with real column
AND ***test_id*** NOT IN (
SELECT
test_id
FROM
tests
JOIN
test_questions
ON test_id= tq_test_id
WHERE
type= 'THE_TYPE'
GROUP BY
test_id
HAVING
COUNT(*) = 1
)
What exactly is the second query? I only see one query here, and no sub-query. Also, a sqlfiddle of your precise schema would be helpful.
Anyhow, I think you want some sort of left excluding join. It looks something like this:
select test.test_id, count(*) as theCount
from tests test
join test_questions tq
on tq.test_id = test.test_id
left join tests excluded_test
on excluded_test.test_id = test.test_id
where tq.type = 'THE_TYPE'
and << Some condition that explains what excluded_test is >>
and excluded_test.test_id is null
EDIT: Yeah, there's definitely a lot of details missing from the original question (which had in some ways been mended), and which are still missing. Knowing the full table-structure of your example is important here, so it is difficult to provide a good concrete answer.
Related
I'm trying to write a simple SQL query to show all possible combinations of data in a single table. Here's the table:
id
fruit
1
apple
2
orange
3
pear
4
plum
I've only got as fair as pairing all the data using CROSS JOIN: "apple,orange", "apple,pear" etc.
SELECT t1.fruit, t2.fruit
FROM fruits t1
CROSS JOIN fruits t2
WHERE t1.fruit < t2.fruit
Instead I'm looking for all unique combinations in alphabetical order, e.g.
apple
apple,orange
apple,orange,pear
apple,orange,pear,plum
apple,pear
apple,plum
apple,orange,plum
apple,pear,plum
orange
orange,pear
orange,pear,plum
orange,plum
pear
pear,plum
plum
i.e. as long as a combination exists once, it doesn't need to appear again in a different order, e.g. with apple,orange, there is no need for orange,apple
This should work for any table size.
Result here
Note: this requires MySQL 8+.
-- TABLE
CREATE TABLE IF NOT EXISTS `fruits`
(
`id` int(6) NOT NULL,
`fruit` char(20)
);
INSERT INTO `fruits` VALUES (1, 'apple');
INSERT INTO `fruits` VALUES (2, 'orange');
INSERT INTO `fruits` VALUES (3, 'pear');
INSERT INTO `fruits` VALUES (4 ,'plum');
-- QUERY
WITH RECURSIVE cte ( combination, curr ) AS (
SELECT
CAST(t.fruit AS CHAR(80)),
t.id
FROM
fruits t
UNION ALL
SELECT
CONCAT(c.combination, ', ', CAST( t.fruit AS CHAR(100))),
t.id
FROM
fruits t
INNER JOIN
cte c
ON (c.curr < t.id)
)
SELECT combination FROM cte;
Credit:
Code adapted from this answer
EDIT: This query doesn't give all the possible combinations.
Below query should work:
WITH RECURSIVE cte AS (
SELECT A.id,
CONCAT(A.fruit,',',GROUP_CONCAT(B.fruit ORDER BY B.id)) AS combinations,
COUNT(*) AS count_of_delims
FROM fruits A
INNER JOIN fruits B
ON A.id<B.id
GROUP BY A.id,A.fruit
UNION ALL
SELECT id,
SUBSTRING_INDEX(combinations,',',count_of_delims),
count_of_delims-1
FROM cte
WHERE count_of_delims>0
)
SELECT combinations FROM cte ORDER BY id;
Here is a working example in DB Fiddle.
i have MySQL data which is imported from csv file and have multiple duplicate files on it,
I picked all non duplicates using Distinct feature.
Now i need to delete all duplicates using SQL command.
Note i don't need any duplicates i just need to fetch only noon duplicates
thanks.
for example if number 0123332546666 is repeated 11 time i want to delete 12 of them.
Mysql table format
ID, PhoneNumber
Just COUNT the number of duplicates (with GROUP BY) and filter by HAVING. Then supply the query result to DELETE statement:
DELETE FROM Table1 WHERE PhoneNumber IN (SELECT a.PhoneNumber FROM (
SELECT COUNT(*) AS cnt, PhoneNumber FROM Table1 GROUP BY PhoneNumber HAVING cnt>1
) AS a);
http://sqlfiddle.com/#!9/a012d21/1
complete fiddle:
schema:
CREATE TABLE Table1
(`ID` int, `PhoneNumber` int)
;
INSERT INTO Table1
(`ID`, `PhoneNumber`)
VALUES
(1, 888),
(2, 888),
(3, 888),
(4, 889),
(5, 889),
(6, 111),
(7, 222),
(8, 333),
(9, 444)
;
delete query:
DELETE FROM Table1 WHERE PhoneNumber IN (SELECT a.PhoneNumber FROM (
SELECT COUNT(*) AS cnt, PhoneNumber FROM Table1 GROUP BY PhoneNumber HAVING cnt>1
) AS a);
you could try using a left join with the subquery for min id related to each phonenumber ad delete where not match
delete m
from m_table m
left join (
select min(id), PhoneNumber
from m_table
group by PhoneNumber
) t on t.id = m.id
where t.PhoneNumber is null
otherwise if you want delete all the duplicates without mantain at least a single row you could use
delete m
from m_table m
INNER join (
select PhoneNumber
from m_table
group by PhoneNumber
having count(*) > 1
) t on t.PhoneNumber= m.PhoneNumber
Instead of deleting from the table, I would suggest creating a new one:
create table table2 as
select min(id) as id, phonenumber
from table1
group by phonenumber
having count(*) = 1;
Why? Deleting rows has a lot of overhead. If you are bringing the data in from an external source, then treat the first landing table as a staging table and the second as the final table.
I have wondered about general performance of a query if specific subselect (subquery) is located in WHERE or FROM clause. I didn't find sufficient explanation which way is better. Are there some rules how we should apply subselect in this kind of queries?
I prepared following example
Query FROM
SELECT name
FROM users a
JOIN (SELECT user_id, AVG(score) as score
FROM scores GROUP BY user_id
) b ON a.id=b.user_id
WHERE b.score > 15;
Query WHERE
SELECT name
FROM users
WHERE
(SELECT AVG(score) as score
FROM scores WHERE scores.user_id=users.id GROUP BY user_id
) > 15;
Tables:
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(30));
CREATE TABLE scores (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT,
score INT);
INSERT INTO users(name)
VALUES ('John'), ('Max'), ('Dan'), ('Alex');
INSERT INTO scores(user_id, score)
VALUES
(1, 20),
(1, 19),
(2, 15),
(2, 10),
(3, 20),
(3, 18),
(4, 13),
(4, 16),
(4, 15);
In both cases, scores needs INDEX(user_id, score) for performance.
It is hard to predict which will run faster.
There are times when a query similar to the first formulation is excellent. This is because it goes off an focuses on b and efficiently calculates all the AVGs all at once. Then it reaches over to the other table for the final info.
Let's tweak the second version slightly by adding some other test in the WHERE clause. Now the second one might be faster.
This may be even better:
SELECT name
FROM ( SELECT user_id -- Don't fetch AVG if not needed
FROM scores GROUP BY user_id
HAVING AVG(score) > 15; -- Note
) b
JOIN users a ON a.id = b.user_id
(The swapping of FROM and JOIN is not an optimization; it is just to show what order the Optimizer will perform the steps.)
In some other situations, EXISTS( SELECT ... ) is beneficial. (But I don't see such in your case.
Your question was about general optimization. I'm trying to emphasize that there is no general answer.
I think this request is faster than what you give above, as it has no subqueries.
SELECT u.name
FROM users u
JOIN scores s
ON (s.user_id = u.id)
GROUP BY s.user_id
HAVING AVG(s.score) > 15
You can see it on this link: http://sqlfiddle.com/#!9/b050f9/16
it shows the execution time of the next 3 Select queries:
SELECT name
FROM users a
JOIN (SELECT user_id, AVG(score) as score
FROM scores GROUP BY user_id
) b ON a.id=b.user_id
WHERE b.score > 15;
SELECT name
FROM users
WHERE
(SELECT AVG(score) as score
FROM scores WHERE scores.user_id=users.id GROUP BY user_id
) > 15;
SELECT u.name
FROM users u
JOIN scores s
ON (s.user_id = u.id)
GROUP BY s.user_id
HAVING AVG(s.score) > 15
I have two related tables as follows :
USERS
user_id <\PK>
USERACTIONS
user_action_id <\PK>
user_id <\FK>
user_action <\int>
Whenever user performs an action, there is a new insertion in "useractions" table. I need a query to fetch those USERACTION rows where user performed only particular set of actions say (1,2) but not (3,4).
So I have a query like -
select * from USERACTIONS where (1,2) in(select user_action from USERACTIONS where user_id=100) and user_id=100;
Problem is the above query doesn't work as supplying (1,2) expects subquery also to return two columns which is understandable. This is the error I get -
ERROR: subquery has too few columns
Giving a single value say (1) or (2) works perfectly. I want to know if there is any way I can use the same query and compare the subquery's result with multiple values? I prefer the same query because the case demonstrated here is just a part of a large query.
Please note the query should not list users who performed (1,2,3,4) those who performed only (1,2) should be listed and also user_action values can be any random integer.
Any alternate queries are welcome but would prefer changes in the same query. Thanks in advance.
try this:
SELECT USERS.user_id, USERACTIONS.user_action
FROM USERACTIONS
LEFT JOIN USERS ON USERS.user_id = USERACTIONS.user_id where USERACTIONS.user_action in (1,2);
This Works for your query.
You add the numbers to the in Clause
SELECT a.user_id
FROM
(SELECT DISTINCT user_id
from
USERACTIONS
WHERE user_action
IN (1,2)) a
INNER JOIN
(SELECT DISTINCT user_id
from
USERACTIONS
WHERE user_action
NOT IN (1,2)) b
ON a.user_id <> b.user_id
;
CREATE TABLE USERACTIONS (id INT NOT NULL AUTO_INCREMENT
, PRIMARY KEY(id)
, user_action INT
, user_id INT
);
INSERT USERACTIONS VALUES (NULL,1,100),(NULL,2,100),(NULL,3,100), (NULL,1,101),(NULL,2,101);
✓
✓
SELECT a.user_id
FROM
(SELECT DISTINCT user_id
from
USERACTIONS
WHERE user_action
IN (1,2)) a
INNER JOIN
(SELECT DISTINCT user_id
from
USERACTIONS
WHERE user_action
NOT IN (1,2)) b
ON a.user_id <> b.user_id
;
| user_id |
| ------: |
| 101 |
db<>fiddle here
I see typical SO answers that aren't answering OP's question, but rather trying to steer them in a different direction. I know this is old, but if anyone stumbles upon this, I believe this will be more helpful.
I too have a large, enterprise solution where the WHERE check is MUCH more performant in a subquery than using a JOIN.
You can set a variable in your WHERE clause and use it afterwards. I am currently trying to find a better way to do this without setting a variable, but something like this works:
SELECT * FROM USERACTIONS
WHERE
( #useraction =
(select user_action from USERACTIONS where user_id=100 LIMIT 1)
= 1
OR #useraction = 2)
AND user_id=100;
What you are doing is creating a variable in your WHERE clause, setting that variable, then using it later. This is encapsulated, so it can match either one of the conditions.
There is a problem when I change SELECT to DELETE:
DELETE
FROM mitarbeiter
WHERE mitarbeiter.pers_nr=
(SELECT mitarbeiter.pers_nr
FROM mitarbeiter
LEFT JOIN kunde ON kunde.betreuer=mitarbeiter.pers_nr
group by mitarbeiter.pers_nr
order by count(*)
limit 1);
The error says...
Table 'mitarbeiter' is specified twice, both as a target for 'DELETE' and as a separate source for data
How can I change that?
You can replace the logic with a JOIN:
DELETE m
FROM mitarbeiter m JOIN
(SELECT m2.pers_nr
FROM mitarbeiter m2 LEFT JOIN
kunde k
ON k.betreuer = m2.pers_nr
GROUP BY m2.pers_nr
ORDER BY COUNT(*)
LIMIT 1
) m2
ON m.pers_nr = m2.pers_nr;
That said, the logic can probably be simplified, but it is strange. You are using COUNT(*) with a LEFT JOIN, so even non-matches in the second table get a count of 1. Knowing your intentions -- with sample data and desired results -- would help others figure out if another approach would work better.
Instead of where and subquery you could try usin a join based and a table as alias for the same subquery
DELETE m.*
FROM mitarbeiter m
INNER JOIN (
SELECT mitarbeiter.pers_nr
FROM mitarbeiter
LEFT JOIN kunde ON kunde.betreuer=mitarbeiter.pers_nr
group by mitarbeiter.pers_nr
order by count(*)
limit 1
) t ON t.pers_nr = m.pers_nr
Another workaround you could try is saving your subquery in a temporary table, then calling on rows from that table.
DROP TABLE IF EXISTS tempTable;
CREATE TEMPORARY TABLE tempTable
SELECT mitarbeiter.pers_nr
FROM mitarbeiter
LEFT JOIN kunde
ON kunde.betreuer=mitarbeiter.pers_nr
GROUP BY mitarbeiter.pers_nr
ORDER BY COUNT(*)
LIMIT 1;
DELETE FROM mitarbeiter
WHERE mitarbeiter.pers_nr
IN (
SELECT * FROM tempTable
);
DROP TABLE tempTable; -- cleanup
Warning: this may take a much longer time and/or larger space than using JOIN as in the other solutions.
Welcome to Stack Overflow! Does this help point you in the right direction?
I often find that result visualization is easier for my feeble brain when I use Common Table Expressions
Please note that MySQL prior to version 8.0 doesn't support the WITH clause
CREATE TABLE IF NOT EXISTS mitarbeiter (
pers_nr varchar(10),
PRIMARY KEY (pers_nr)
) DEFAULT CHARSET=utf8;
INSERT INTO mitarbeiter (pers_nr) VALUES
('Scum'),
('Worker'),
('Manager'),
('President');
CREATE TABLE IF NOT EXISTS kunde (
kunde_id int(3) NOT NULL,
betreuer varchar(10) NOT NULL,
PRIMARY KEY (kunde_id)
) DEFAULT CHARSET=utf8;
INSERT INTO kunde (kunde_id, betreuer) VALUES
(1, 'Scum'),
(2, 'Worker'),
(3, 'Worker'),
(4, 'Manager'),
(5, 'Manager'),
(6, 'Manager'),
(7, 'President'),
(8, 'President'),
(9, 'President'),
(10, 'President');
WITH s1
AS
(SELECT betreuer
, count(1) AS kunde_count_by_pers_nr -- JJAUSSI: find the kunde count by pers_nr
FROM kunde
GROUP BY betreuer
),
s2
AS
(SELECT MIN(kunde_count_by_pers_nr) AS kunde_count_by_pers_nr_min -- JJAUSSI: find the lowest kunde_count
FROM s1),
s3
AS
(SELECT s1.betreuer
FROM s1 INNER JOIN s2
ON s1.kunde_count_by_pers_nr = s2.kunde_count_by_pers_nr_min -- JJAUSSI: Retrieve all the betreuer values with the lowest kunde_count
)
SELECT * -- JJAUSSI: Test this result and see if it contains the records you expect to delete
FROM s3;
--DELETE -- JJAUSSI: Once you are confident in the results from s3, this DELETE can work
-- FROM mitarbeiter
-- WHERE pers_nr IN (SELECT betreuer
-- FROM s3);
SELECT * -- JJAUSSI: Check for the desired results (we successfully got rid of 'Scum'!)
FROM mitarbeiter;