Join Predicates And Comparison Operators - mysql

In an attempt to better understand how SQL joins work, I've run into difficulty understanding the result set returned from a query -- The tables I'm using are:
Employees
Id Name Salary Gender City
1 Sam 2500 Male London
5 Todd 3100 Male Toronto
3 John 4500 Male New York
6 Jack 7000 Male Shangri La
4 Sara 5500 Female Tokyo
2 Pam 6500 Female Sydney
and Gender:
ID Gender
1 Male
2 Female
3 Unknown
The first query returns all the columns from an inner join on the Gender column -
SELECT *
FROM Employees
INNER JOIN Gender
ON Employees.Gender = Gender.Gender
The returned result -
Id Name Salary Gender City ID Gender
1 Sam 2500 Male London 1 Male
5 Todd 3100 Male Toronto 1 Male
3 John 4500 Male New York 1 Male
6 Jack 7000 Male Shangri La 1 Male
4 Sara 5500 Female Tokyo 2 Female
2 Pam 6500 Female Sydney 2 Female
Which is pretty much what I expected. However when I changed the comparison operator -
SELECT *
FROM Employees
INNER JOIN Gender
ON Employees.Gender != Gender.Gender
What I originally thought would return an empty set, returned this -
Id Name Salary Gender City ID Gender
4 Sara 5500 Female Tokyo 1 Male
2 Pam 6500 Female Sydney 1 Male
1 Sam 2500 Male London 2 Female
5 Todd 3100 Male Toronto 2 Female
3 John 4500 Male New York 2 Female
6 Jack 7000 Male Shangri La 2 Female
1 Sam 2500 Male London 3 Unknown
5 Todd 3100 Male Toronto 3 Unknown
3 John 4500 Male New York 3 Unknown
6 Jack 7000 Male Shangri La 3 Unknown
4 Sara 5500 Female Tokyo 3 Unknown
2 Pam 6500 Female Sydney 3 Unknown
While I can kinda see how the not-equals(!=) operator would return this result it begs the question of what type of comparisons are useful in join predicates and which aren't - does the type of join [inner, right, left...] impact the returned result adversely or can the join type and comparison be resolved to actionable behavior(in other words does it always have to be ==)? Also if there are any sources out there that could help me, that would be great. Thanks.

An inner join is a subset of the Cartesian product of both tables. That is, every row in one table is paired with every row of the other table.
So, if one table has n rows and the other m rows, then the Cartesian product has n * m rows.
The on clause filters these rows. Equality is the most common filter and such joins are often called equi-joins. They are offer the most optimization opportunities and are typically the most efficient.
(Outer joins are similar but have a mechanism to include unmatched rows.)
Normally, join predicates contain at least one equality comparison (although this is not necessary). Other comparisons -- including subqueries using exists/in -- are allowed and often useful.
Any decent documentation or tutorial or book should be able to explain this.
In your case, often not exists is the intention. To find employees whose gender is not in the reference table:
SELECT e.*
FROM Employees e
WHERE NOT EXISTS (SELECT 1 FROM Gender g WHERE e.Gender = g.Gender)
Of course, such a query would be unnecessary if you used the primary key to reference the table and included proper foreign key declarations.

Related

I am trying to return combined rows in the same table based on a key in a second table

I probably haven't explained this very well in the title but I have two tables. Here is a simple version.
channel_data
entry_id channel_id first_name last_name model other_fields
1 4 John Smith
2 4 Jane Doe
3 4 Bill Evans
4 15 235
5 15 765
6 15 543
7 15 723
8 15 354
9 15 976
10 1 xxx
11 2 yyy
12 3 123
channel_titles
entry_id author_id channel_id
1 101 4
2 102 4
3 103 4
4 101 15
5 101 15
6 101 15
7 102 15
8 102 15
9 103 15
10 101 1
11 102 2
12 103 3
I am not able to re-model the data unfortunately.
I need to list all the rows with a channel_id 15 from channel_data and beside them the first_name and last_name which has the same author_id from channel_titles.
What I want to return is this:
Model First Name Last Name
---------------------------------
235 John Smith
765 John Smith
543 John Smith
723 Jane Doe
354 Jane Doe
976 Bill Evans
If Model was in one table and Names were in another this would be much simpler but I'm not sure how to go about this when they are in the same table.
========================================
Edited to clarify.
I need to get each model with a channel_id 15 from channel_data
For each model I need to look up the entry_id in channel_titles to find the author_id
I need to find the row with that author_id AND channel_id 4 in channel titles (each row with channel_id 4 has a unique author_id).
I need to take the entry_id of this row back to channel_data and get the first_name and last_name to go with the model.
I am well aware that the data is not structured well but that is what I have to work with. I am trying to accomplish a very small task in a much larger system, remodelling the data is not an option at this point.
I think sub-queries might be what I am looking for but this is not my area at all usually.
Ok, that is convoluted. However, based on your description, this query should give you the results you want. The WHERE and JOIN descriptions follow the logic you have described in your question.
SELECT cd1.model, cd2.first_name, cd2.last_name
FROM channel_data cd1
JOIN channel_titles ct1 ON ct1.entry_id = cd1.entry_id
JOIN channel_titles ct2 ON ct2.channel_id = 4 AND ct2.author_id = ct1.author_id
JOIN channel_data cd2 ON cd2.entry_id = ct2.entry_id
WHERE cd1.channel_id = 15
ORDER BY cd1.entry_id
Output:
model first_name last_name
235 John Smith
765 John Smith
543 John Smith
723 Jane Doe
354 Jane Doe
976 Bill Evans
Demo on SQLFiddle

prediction on rowwise data or progressive data

I am working on employee attrition analysis with a table having rowwise data for a (employee like Id, name, Date_Join Date_Relieving Dept Role etc)
eID eName Joining Releiving Dept Married Experience
123 John Doe 10Oct15 12Oct16 HR No 12
234 Jen Doee 01jan16 -NA- HR No 11 (ie she is available)
I can run regression on this data to find the beta coefficient
eID eName Joining Releiving Dept Married Experience
123 John Doe 10Oct15 12Oct16 HR No 12
234 Jen Doee 01jan16 -NA- HR No 11
But I've seen other approach too.. where employee have multiple entries depending on their difference between joining date and current month or relieving month(say Employee A joined in Jan and Left in Dec so he'll have 12 entries updating corresponding columns like experience and marriage etc)
eID eName Dept Married Experience
123 John Doe HR No 0
123 John Doe HR No 1
123 John Doe HR Yes 2
123 John Doe HR Yes 3
can someone tell what differentiate two approaches.. and what is the outcome of this second approach.

Relational Algebra: Natural join with NULL value

Table 1
Customer id city
John 1 LA
Nancy 2 NULL
Table 2
Customer $ in the pocket
John 20
Nancy 30
I am wondering what happen if Table 1 natural join with Table 2? My guess is that the result would be 4 attributes and both John and Nancy will appear.
But my friend told me that only John will appear, Nancy won't because there is a null value.
In the case above, your friend is wrong, you are right!
Let's see a case where it would be otherwise:
Table 'Customer'
Id Name AccNo
1 John 44
2 Nancy NULL
Table 'Account'
AccNo $_in_Pocket
44 20
45 30
Here, with a natural join, we would get all attributes for John but Nancy would be missing from the results.

How does the NOT IN logical operator work to generate this output? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have the following tables and data: Screenshot of the data here
course:
course# ctitle units
--------- ------------------- -----
ACCT 201 Financial Account 3
CHEM 356 Organic Chemistry 4
HIST 101 US History 5
MINS 235 Database Design 4
MINS 301 Intro to Business IS 3
MINS 350 Systems Analysis 4
PHED 434 Advanced Gym 2
class:
class# course# sec# semyr
------ ---------- ----- -----
203 ACCT 201 03 F11
204 ACCT 201 04 F11
307 MINS 301 07 F11
418 MINS 235 04 F11
438 MINS 350 01 F11
624 PHED 434 02 F11
student:
sid sname major
--- ---------- ----------
1 Bob MINS
2 Mary POMG
3 Joe MGMT
4 Sue MKTG
5 Jim ACCT
class_student:
class sid grade
----- ---- -----
203 2 B
203 5 D
204 1 C
204 4 C
307 1 B
307 2 B
307 4 A
418 1 A
418 2 B
418 5 C
438 1 B
438 4 C
634 5 F
grade:
grade grade_pts
----- ---------
A 4
B 3
C 2
D 1
F 0
When I run the following query:
SELECT *
FROM STUDENT
WHERE SID NOT IN
( SELECT SID
FROM CLASS_STUDENT
WHERE GRADE IN ('A' , 'B')
)
ORDER BY SID;
I think Oracle would generate this output.
sid sname major
--- ----- -----
1 bob mins
4 sue mktg
5 jim acct
5 jim acct
5 jim acct
I would like to understand how NOT IN logical operator works. How does the NOT IN operator work in the above query to generate the output?
It returns students with grades worse than B.
NOT IN filters out what you define after that.
In words:
select all student but not the ones that have grades A or B
SELECT SID FROM CLASS_STUDENT WHERE GRADE IN ( 'A' , 'B' )
will select the sid of the students having A,B grades.
will result in a list of sid
SID
2
1
2
4
1
2
1
then
SELECT * FROM STUDENT
WHERE SID NOT IN ...
will result in :
SID sname Major
3 joe mgmt
5 jim acct
The subquery (SELECT SID FROM CLASS_STUDENT WHERE GRADE IN ( 'A' , 'B' )) selects the SIDs of all students that have an A or B grade in at least one class. When used with an IN operator, the list is implicitly deduplicated. It appears in your data that students 1, 2, and 4 all have an A or B in at least one class, so would be included in the result set of this subquery.
Then, the full query is simply selecting all rows from STUDENT that are not included in the list returned by the subquery. So I think you are getting two rows, with SIDs 3 and 5.
Your "expected result" makes little sense. There is no reason to expect multiple rows for the same student when your query is selecting all students then filtering some out.
What your query is doing is showing students that don't have an A or B grade in at least one class. I suspect that what you want is to show each student-class combination for which the grade is worse than a B (that seems consistent with your expected result). To do this, I would suggest driving the query off of the CLASS_STUDENT table, and joining to STUDENT to get the student information (and perhaps to CLASS and COURSE to get the name of the course, if desired).

How do I count rows in MySQL?

I know just about the basic usage of COUNT(*) and I wonder if I can use it or some other function to get the following result.
I have a table with people and the products they have purchased (product_id_). I have second table which maps each product_code to a single product_category.
Using a simple SELECT I can combine both tables to get:
first last product_code product_category
John BGood 100 Food
John BGood 29 Beverage
John BGood 30 Beverage
Rita Black 25 Fashion
Betty Rock 36 Electronics
Betty Rock 72 Food
Betty Rock 100 Food
Betty Rock 36 Electronics
But what I would like is to count for each person the number of products it purchased from each category. product_category is an enum with 5 possible values (the four above and Other). I would like to get a table like:
first last product_category count
John BGood Food 1
John BGood Beverage 2
John BGood Fashion 0
John BGood Electronics 0
John BGood Other 0
Betty ...
SELECT first, last, product_category, COUNT(product_code)
FROM <table>
ORDER BY last, first
GROUP BY first, last, product_category
Try this query
SELECT first, last, product_category, count(product_category)
FROM <table_name>
GROUP BY product_category
Append GROUP BY person_id, product_category to your SELECT.