Trying to understand the difference between IN and EXISTS - mysql

I'm currently taking a course and during one of the tests, I came across this question.
The math_students and english_students tables have the following columns:
student_id, grade, first_name, last_name
Using a subquery, find out what grade levels are represented in both the math and english classes.
The query I used was this.
select distinct grade
from math_students
where grade in (
select grade
from english_students
);
However, it was graded as incorrect and the correct answer was given as
SELECT grade
FROM math_students
WHERE EXISTS (
SELECT grade
FROM english_students
);
I would really appreciate it if someone could help me understand the difference in the two queries because the output was the same in both cases. Also, why doesn't the query contain distinct?

The version with EXISTS is incorrect. Period. It is answering the "question":
Return all grades for math students if there is at least one English student.
Not very useful. The correct EXISTS would be:
SELECT DISTINCT grade
FROM math_students ms
WHERE EXISTS (
SELECT 1
FROM english_students es
WHERE ms.grade = es.grade
);
If your database supports it, I would expect you to also be learning:
select grade
from math_students
intersect
select grade
from english_students;

The "In" clause checks if the value is a member of a provided list, this can either be a hard coded list or a query result.
In your example if the table english_students was defined as
id name grade
-- ---- -----
1 Joe 6
2 Bill 8
3 Sue 7
The query:
select distinct grade
from math_students
where grade in (
select grade
from english_students
);
Will evaluate to:
select distinct grade
from math_students
where grade in (
"6","8","7"
);
Order of operations executes the query in parens first, then executes the outer query
so since you are saying
select grade
from english_students
It would return "6","8","7" then execute the outer query.
Your second query:
SELECT grade
FROM math_students
WHERE EXISTS (
SELECT grade
FROM english_students
);
"Exists" checks for just that Existence of data (or objects), so you are saying select all data in math_students if data exists in english_students as Gordon pointed out.
To answer the question in the title. The big distinction between "In" and "Exists" is: "In" is evaluates to "Give me anything that matches one of these values" where "Exists" evaluates to "Give me anything that exists"
The reason the second does not require a distinct is probably data related.
As Gordon pointed out "intersect" will append multiple results to the same result set and return the distinct records.

Related

Count only 1 if there's a row data exist?

I have this capstone project that need to finish tomorrow. They suggest that I should count the number of student violators by gender/course/violation.
the only problem is, is there a query that if a 2 or more violation offense has been committed by the student, the result will only show that counts as 1 student violator.
This is the image sample of my student violation table.
Sorry for my bad grammar.
I would use SELECT DISTINCT for such cases.
For example:
SELECT DISTINCT StudentNumber FROM my_table WHERE TypeOfViolation="SomeType";
You can refere here for more details:
https://www.w3schools.com/sql/sql_distinct.asp

sql table design to fetch records with multiple inclusion and exclusion conditions

We want to select customers based on following parameters i.e. customer should be in:
specific city i.e. cityId=1,2,3...
specific customerId should be excluded i.e. customerId=33,2323,34534...
specific age i.e. 5 years, 7 years, 72 years...
This inclusion & exclusion list can be any long.
How should we design database for this:
Create separate table 'customerInclusionCities' for these inclusion cities and do like:
select * from customers where cityId in (select cityId from customerInclusionCities)
Some we do for age, create table 'customerEligibleAge' with all entries of eligible age entries:
i.e. select * from customers where age in (select age from customerEligibleAge)
and Create separate table 'customerIdToBeExcluded' for excluding customers:
i.e. select * from customers where customerId not in (select customerId from customerIdToBeExcluded)
OR
Create One table with Category and Ids.
i.e. Category1 for cities, Category2 for CustomerIds to be excluded.
Which approach is better, creating one table for these parameters OR creating separate tables for each list i.e. age, customerId, city?
IN ( SELECT ... ) can be very slow. Do your query as a single SELECT without subqueries. I assume all 3 columns are in the same table? (If not, that adds complexity.) The WHERE clause will probably have 3 IN ( constants ) clauses:
SELECT ...
FROM tbl
WHERE cityId IN (1,2,3...)
AND customerId NOT IN (33,2323,34534...)
AND age IN (5, 7, 72)
Have (at least):
INDEX(cityId),
INDEX(age)
(Negated things are unlikely to be able to use an index.)
The query will use one of the indexes; having both will give the Optimizer a choice of which it thinks is better.
Or...
SELECT c.*
FROM customers AS c
JOIN cityEligible AS b ON b.city = c.city
JOIN customerEligibleAge AS ce ON c.age = ce.age
LEFT JOIN customerIdToBeExcluded AS ex ON c.customerId = ex.customerId
WHERE ex.customerId IS NULL
Suggested indexes (probably as PRIMARY KEY):
customers: (city)
customerEligibleAge: (age)
customerIdToBeExcluded: (customerId)
In order to discuss further, please provide SHOW CREATE TABLE for each table and EXPLAIN SELECT ... for any of the queries actually work.
If you use the database only that operation, I recommend to use the first solution. Also the first solution is very simple to deploy.
The second solution fills up with junk the DB.

SQL - IIF clause outside of select statement?

I'm working in MS Access 2010 doing a series of UNIONs inside of a SELECT statement. After all of these UNIONs are complete, I need to add a couple of new columns using IIF statements (i.e., based on the value of a ch column). However, I can't seem to find out how or where to properly squeeze in this syntax so that the IIFs check against an entire column in the final 'unioned' table. Simply put, after the below syntax (which is trimmed and simplified for this question) runs, I want to then append a new variable column (e.g., 'asterisk') in this final table based on whether an existing column cell (e.g., 'grades') contains an asterisk. Help and patience is greatly appreciated -- I am just becoming familiar with the syntax principles of SQL.
SELECT * FROM(
SELECT class, teacher, student
grades2010 AS grades
WHERE NOT(grades2010 IS NULL OR grades2010="")
UNION
SELECT class, teacher, student
grades2011 AS grades
WHERE NOT(grades2011 IS NULL OR grades2011="")
UNION
SELECT class, teacher, student
grades2012 AS grades
WHERE NOT(grades2012 IS NULL OR grades2010="")
)
Here's an example of the type of IIF I'd want to run -- see if 'grades' contains an asterisk and create an indicator:
IIF(InStr([grades],"*")>0,"YES","NO") AS asterisk
So... You are on the right track. It would look something like:
SELECT
IIF(InStr([grades],"*")>0,"YES","NO") AS asterisk,
class,
teacher,
student
FROM(
SELECT class, teacher, student
grades2010 AS grades
WHERE NOT(grades2010 IS NULL OR grades2010="")
UNION
SELECT class, teacher, student
grades2011 AS grades
WHERE NOT(grades2011 IS NULL OR grades2011="")
UNION
SELECT class, teacher, student
grades2012 AS grades
WHERE NOT(grades2012 IS NULL OR grades2010="")
) as mysubquery
But... grades is not a column in your UNION subquery, so this won't work. You'll need to make sure that grade column is added to each SELECT statement in your subquery to do anything with it in your main SELECT.
Also worth mentioning is that your schema isn't the best. Instead of having a table for each year, you should really just have a single table with year as a column. This will get you out of having to do these nasty, and often times slow Union queries. You could fix that up pretty quick by making a table and using that UNION query there to do an INSERT from each of your year based tables.

Selecting specific records to run query on

I am trying to select a small number of records in a somewhat large database and run some queries on them.
I am incredibly new to programming so I am pretty well lost.
What I need to do is select all records where the Registraton# column equals a certain number, and then run the query on just those results.
I can put up what the db looks like and a more detailed explanation if needed, although I think it may be something simple that I am just missing.
Filtering records in a database is done with the WHERE clause.
Example, if you wanted to get all records from a Persons table, where the FirstName = 'David"
SELECT
FirstName,
LastName,
MiddleInitial,
BirthDate,
NumberOfChildren
FROM
Persons
WHERE
FirstName = 'David'
Your question indicates you've figured this much out, but are just missinbg the next piece.
If you need to query within the results of the above result set to only include people with more than two children, you'd just add to your WHERE clause using the AND keyword.
SELECT
FirstName,
LastName,
MiddleInitial,
BirthDate,
NumberOfChildren
FROM
Persons
WHERE
FirstName = 'David'
AND
NumberOfChildren > 3
Now, there ARE some situations where you really need to use a subquery. For example:
Assuming that each person has a PersonId and each person has a FatherId that corresponds to another person's PersonId...
PersonId FirstName LastName FatherId...
1 David Stratton 0
2 Matthew Stratton 1
Select FirstName,
LastName
FROM
Person
WHERE
FatherId IN (Select PersonId
From Person
WHERE FirstName = 'David')
Would return all of the children with a Father named David. (Using the sample data, Matthew would be returned.)
http://www.w3schools.com/sql/sql_where.asp
Would this be any use to you?
SELECT * from table_name WHERE Regestration# = number
I do not know what you have done up to now, but I imagine that you have a SQL query somewhere like
SELECT col1, col2, col3
FROM table
Append a where clause
SELECT col1, col2, col3
FROM table
WHERE "Registraton#" = number
See SO question SQL standard to escape column names?.
Try this:
SELECT *
FROM tableName
WHERE RegistrationNo = 'valueHere'
I am not certain about my solution. I would propose You to use view. You create view based on needed records. Then make needed queries and then you can delete the view.
View description: A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database.
Example:
CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition
For more information: http://www.w3schools.com/sql/sql_view.asp

Exercise Help - Select the register number of the students with the average highest score

I am studying SQL at my university, and I was practicing this exercise and got stuck! I have a database that stores all the exams passed by a student for a specific teach.
These are the tables in the database:
Student(**number**, name, birthday)
Exam(**student_number**, **teaching_code**, vote) *(store the exams passed by the students with vote)*
Teaching(**code**, name)
Where number is the primary key for the Student table, and code is for Teaching, "student_number" references "number" in Student, and "teaching_code" references "code" in Teaching.
The exercise asks to select the students’ numbers with the average highest score.
I know how to write a query which gives me a table containing the average for each students but I don't know how to select the highest from it or how to show the corresponding student number!
The solution with the limit command doesn't work if exists some students have the same highest average...
The query to show the average score per student is:
select avg(e.vote) as Average from STUDENT s, EXAM e
where s.number = e.student_number
group by s.number
EDIT:
I tried the MAX function in SQL, I have tried this:
select MAX( avg(e.vote) ) as Average from STUDENT s, EXAM e
where s.number = e.student_number
group by s.number
but it say "Error Code: 1111. Invalid use of group function"
Probably the solution is with a nested query but I can't realize it.
SELECT MAX(Expression)
FROM tables
WHERE Condition
You can check the documentation on how to use MAX and find highest value.
If you want to select the TOP 5 or 10 or 20... highest, use the TOP clause.
select student_code, MAX(avg_vote)
FROM (
SELECT student_code, AVG( vote) as avg_vote
FROM Exam
GROUP BY student_Code ) t
GROUP BY student_code
I didn't checked that query.
May query to select that max average is be like that
What a pity MySQL doesn't seem to support CTE's, then this would have been so simple.
select t.student_code, t.avg_vote
FROM (SELECT student_code, AVG(vote) as avg_vote
FROM Exam
GROUP BY student_Code) t
WHERE t.avg_vote = (select max(avg_vote)
FROM (SELECT student_code, AVG(vote) as avg_vote
FROM Exam
GROUP BY student_Code))
Continue studying hard, being presented with a likely answer is far less useful than actually reaching the conclusion yourself. Actually, if you're lucky, my proposal will not work (I haven't tested) so that you will have to come up with the right modification yourself!