Converting a Join Query into a Sub Query - mysql

I'm attempting to write a sub query that wold accomplish the same results as the join query shown below.
SELECT Department_to_major.DNAME
FROM Department_to_major
INNER JOIN Course
ON Department_to_major.Dcode = Course.OFFERING_DEPT
WHERE Course.COURSE_NAME LIKE '%INTRO%'
GROUP BY Department_to_major.DNAME
However each attempt has produced errors.
Is there a way to write this as a sub query?

Hi, You can use below query,
SELECT DNAME FROM Department_to_major WHERE
Dcode IN (SELECT OFFERING_DEPT FROM Course
WHERE COURSE_NAME LIKE '%INTRO%')
You have used GROUP BY clause, but there is no any aggregate function in the query. Is your query works fine?

Here is a way to use a subquery:
SELECT DISTINCT dm.DNAME
FROM Department_to_major dm
WHERE EXISTS (SELECT 1
FROM Course c
WHERE dm.Dcode = c.OFFERING_DEPT AND
c.COURSE_NAME LIKE '%INTRO%'
);
I assume the GROUP BY is to prevent duplicates in the output; SELECT DISTINCT does the same thing.
That said, storing the department code and name in Department_to_major is not a good data structure, because the department name is (presumably) repeated multiple times. I would expect you to have just a Departments table, with one row per department.
Then the query would look like:
SELECT d.DNAME
FROM Departments d
WHERE EXISTS (SELECT 1
FROM Course c
WHERE d.Dcode = c.OFFERING_DEPT AND
c.COURSE_NAME LIKE '%INTRO%'
);
And the SELECT DISTINCT/GROUP BY is unnecessary.

Try the below query. I am assuming that you have used "GROUP BY" clause to make DNAME field unique.
SELECT DISTINCT(DNAME)
FROM Department_to_major
WHERE Dcode IN (SELECT OFFERING_DEPT
FROM Course
WHERE COURSE_NAME LIKE '%INTRO%');

Related

Can and should you loop over query returns from sub-query?

I have a query that returns the employee ID of employees that have a specific skill at a company.
SELECT competences.employee_id
FROM competences
WHERE service_id = 2)
Returns
1
2
4
7
I now want to use this query return to find the names of these employees
from the employees table. I've tried this, which of course didn't work, but can figure out how to do it.
SELECT employee.first_name, employee.last_name
FROM employee
WHERE employee.employee_id =
(SELECT competences.employee_id
FROM competences
WHERE service_id = 2)
How do I use the sub-query to get the employees?
This is a job for JOIN. You should read about JOIN. Many tutorials are available. Here is one.
Try something like this:
SELECT DISTINCT employee.first_name, employee.last_name
FROM employee
JOIN competences ON employee.employee_id = competences.employee_id
WHERE competences.srvice_id = 2
Using IN yields the same results as JOIN. But it uses a so-called dependent subquery, which can be very bad for performance when your tables get large. DISTINCT removes any possible duplicate employee names. If we knew the names and meanings of the columns we could offer more specific advice for generating the most useful query.
You can simply use the IN instead of "="
SELECT first_name, last_name
FROM employee
WHERE employee_id IN (SELECT employee_id
FROM competences
WHERE service_id = 2)
use 'in' instead of equal symbol
SELECT employee.first_name, employee.last_name
FROM employee
WHERE employee.employee_id in
(SELECT competences.employee_id
FROM competences
WHERE service_id = 2)

Which one of it is a better query and why?

I have to get the names of the Departments and the number of Employees in it. Test is my schema.
So I come up with two queries that give me the same result -
First
SELECT Department.Departmentname,
(
SELECT COUNT(*)
FROM test.Employee
WHERE Employee.Departmentid = Department.idDepartment
) AS NumberOfEmployees
FROM test.Department;
Second
SELECT Department.Departmentname AS NAme,COUNT(Employee.idEmployee) AS Employee_COUNT
FROM test.Department
LEFT JOIN test.Employee
ON Employee.Departmentid = Department.idDepartment
GROUP BY Employee.Departmentid ;
Which of the two is the best and efficient way to get the required result? Any other solution is welcome.
Please explain why a particular solution is better
My preference for expressing the logic is the second query, which I would write as:
SELECT d.Departmentname AS Name, COUNT(e.idEmployee) AS Employee_COUNT
FROM test.Department d LEFT JOIN
test.Employee e
ON e.Departmentid = d.idDepartment
GROUP BY d.Departmentname;
Note the use of table aliases and the fact that the GROUP BY uses the same columns as the SELECT. However, in MySQL, this query will not use an index on DepartmentName for the group by. That means that the GROUP BY is doing a file sort, a relatively expensive operation.
When you write the query like this:
SELECT d.Departmentname,
(SELECT COUNT(*)
FROM test.Employee e
WHERE e.Departmentid = d.idDepartment
) AS NumberOfEmployees
FROM test.Department d;
No explicit group by is needed. With an index on Employee(DepartmentId) this will use the index for the count(*), so this version would normally perform better in MySQL.
The difference in performance is probably negligible until you start having thousands or ten of thousands of rows.

Nested SQL Query, what is actually occurring at each nesting?

I was wondering how this query works:
SELECT empname FROM Employee WHERE not exists (
SELECT projid FROM Project WHERE not exists (
SELECT empid, projid FROM Assigned WHERE empid = Employee.empid and projid = Project.projid
)
)
It is supposed to return names of all employees who are assigned to every project and it does work however I am getting confused as to how/why it works correctly.
Schema is:
Employee(empID INT,empName VARCHAR(100),job VARCHAR(100),deptID INT,salary INT);
Assigned(empID INT,projID INT,role VARCHAR(100));
Project(projID INT,title VARCHAR(100),budget INT,funds INT);
I am new to SQL so a detailed/simple explanation would be appreciated.
When I need to try to understand what's going on, I look for the inner-most query and work my way outwards. In your case, let's start with:
SELECT empid, projid
FROM Assigned
WHERE empid = Employee.empid and projid = Project.projid
This is matching all records in the Assigned table where the empid and projid are in the previous tables (hence the Employee.empid and Project.projid).
Assume there are 5 projects in the Projects table and Employee1 is assigned to each. That would return 5 records. Also assume Employee2 is assigned to 1 of those projects, thus returning 1 record.
Next look at:
SELECT projid FROM Project WHERE not exists (
...
)
Now this says for those found records in the previous query (Employee1 with 5 projects and Employee2 with 1 project), select any projid from the Project table where there aren't any matches (not exists) from the previous query. In other words, Employee1 would return no projects from this query but Employee2 would return 4 projects.
Finally, look at
SELECT empname FROM Employee WHERE not exists (
...
)
Just as with the 2nd query, for any records found in the previous query (no records to match those employees with all projects such as Employee1 and some records if the employee isn't assigned to every project such as Employee2), select any employee from the Employee table where there aren't any matches (again, not exists). In other words, Employee1 would return since no projects were returned from the previous query, and Employee2 would not return, since 1 or more projects were returned from the previous query.
Hope this helps. Here's some additional information about EXISTS:
http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html
And from that article:
What kind of store is present in all cities?
SELECT DISTINCT store_type FROM stores s1 WHERE NOT EXISTS (
SELECT * FROM cities WHERE NOT EXISTS (
SELECT * FROM cities_stores
WHERE cities_stores.city = cities.city AND cities_stores.store_type = stores.store_type));
The last example is a double-nested NOT EXISTS query. That is, it has
a NOT EXISTS clause within a NOT EXISTS clause. Formally, it answers
the question “does a city exist with a store that is not in Stores”?
But it is easier to say that a nested NOT EXISTS answers the question
“is x TRUE for all y?”
Good luck.
A NOT EXISTS (subquery) predicate will return TRUE when the resultset from the subquery has no rows. It will return FALSE when a matching row is found.
Essentially, the query is asking
for each row in Employee... check each row from the Project table, to see if there is a row in the Assigned table for a row that has an empid that matches the empid on the Employee row and a projid that matches a row in the Project table.
The row from Employee will be returned only if no matching row is found.
Note that the expressions in the SELECT list of the subquery are not important; all that is being checked is whether that subquery returns one (or more) rows or not. Normally, we use a literal 1 in the SELECT list; that remind us that what we are checking is whether a row is found or not.)
I would typically write that query in a style that looks like this:
SELECT e.empname
FROM Employee e
WHERE NOT EXISTS
( SELECT 1
FROM Project p
WHERE NOT EXISTS
( SELECT 1
FROM Assigned a
WHERE a.empid = e.empid
AND a.projid = p.projid
)
)
And I read the "SELECT 1" as "select one row")
The resultset from that query is essentially equivalent to the resultset from this (usually much less efficient) query:
SELECT e.empname
FROM Employee e
WHERE e.empid NOT IN
( SELECT a.empid
FROM Assigned a
JOIN Project p
ON a.projid = p.projid
WHERE a.empid IS NOT NULL
GROUP
BY a.empid
)
The NOT IN query can be a little easier to understand, because you can run that subquery and see that it returns something. (What can be kind of confusing about the NOT EXISTS subquery is that it doesn't matter what expressions are returned in the SELECT list; what matters is whether a row is returned or not.) There are some "gotchas" with the NOT IN subquery besides really bad performance; you need to be careful to ensure that the subquery does not return a NULL value, because then the NOT IN (NULL,...) will never return true.
An equivalent resultset can be returned using an anti-join pattern as well:
SELECT e.empname
FROM Employee e
LEFT
JOIN ( SELECT a.empid
FROM Assigned a
JOIN Project p
ON a.projid = p.projid
WHERE a.empid IS NOT NULL
GROUP
BY a.empid
) o
ON o.empid = e.empid
WHERE o.empid IS NULL
In that query, we are looking for "matches" on empid. The LEFT keyword tells MySQL to also return any rows from Employee (the table one the left side of the JOIN) which do not have a match. For those rows, a NULL value is returned in place of the values of the columns that would have been returned if there had been a matching row. The "trick" is then to throw out all the rows that matched. We do that by checking for a NULL in a column that would not be NULL if there had been a match.
If I were going to write this query using a NOT EXISTS predicate, I would probably actually favor writing it like this:
SELECT e.empname
FROM Employee e
WHERE NOT EXISTS
( SELECT 1
FROM Assigned a
JOIN Project p
ON a.projid = p.projid
WHERE a.empid = e.empid
)

Using aggregate functions in SQL query

I have my Table structure like this ::
ATT_Table : Fields - Act_ID, Assigned_To_ID, Percent_Complete(Integer value)
Act_ID is primary key, Assigned_To_ID is referenced to Emp_ID in Employee_Table.
Employee_Table : Fields - Emp_ID, F_Name.
Emp_ID is primary key.
Now at a particular point in time, 1 or more activities can be assigned to same person. My goal is write a query to calculate a person's load. I want to count the number of activities assigned to a particular person (can be more than 1) then take the average of their percent_Complete.
For example if person A is assigned A1, A2, A3(Act_ID). Then corresponding (Percent_Complete values addition)/3. Basically an average. In my final query result I want:
Name, Number of activities assigned(Count), load value(Avg).
How do I this? Do I have to use a nested WHERE IN clause ? Thanks.
SELECT MIN(F_Name) Employee_Name ,
COUNT(1) Activities_Assigned ,
AVG(Percent_Complete) Load_Value
FROM ATT_Table a
INNER JOIN Employee_Table e ON a.Assigned_To_ID = e.Emp_ID
GROUP BY e.Emp_ID
I may be missing some nuance, but it sounds like you can just: join the tables, group by employee, COUNT and AVG for the load.
try the following:
select Emp_ID, F_Name, count(Act_ID), avg(Percent_Complete)
from ATT_Table, Employee_Table where ATT_Table.Assigned_To_ID = Employee_Table.Emp_ID
group by Emp_ID, F_Name
As Dmitri said, something like
SELECT Employee_Table.F_Name, COUNT(*) AS activities, AVG(Percent_Complete) AS load
FROM ATT_Table JOIN Employee_Table ON ATT_Table.Assigned_to_ID = Employee_Table.Emp_ID
WHERE Employee_Table.Emp_ID = 42
GROUP BY Employee_Table.Emp_ID

SELECT * WHERE NOT EXISTS

I think I'm going down the right path with this one... Please bear with me as my SQL isn't the greatest
I'm trying to query a database to select everything from one table where certain cells don't exist in another. That much doesn't make a lot of sense but I'm hoping this piece of code will
SELECT * from employees WHERE NOT EXISTS (SELECT name FROM eotm_dyn)
So basically I have one table with a list of employees and their details. Then another table with some other details, including their name. Where there name is not in the eotm_dyn table, meaning there is no entry for them, I would like to see exactly who they are, or in other words, see what exactly is missing.
The above query returns nothing, but I know there are 20ish names missing so I've obviously not gotten it right.
Can anyone help?
You didn't join the table in your query.
Your original query will always return nothing unless there are no records at all in eotm_dyn, in which case it will return everything.
Assuming these tables should be joined on employeeID, use the following:
SELECT *
FROM employees e
WHERE NOT EXISTS
(
SELECT null
FROM eotm_dyn d
WHERE d.employeeID = e.id
)
You can join these tables with a LEFT JOIN keyword and filter out the NULL's, but this will likely be less efficient than using NOT EXISTS.
SELECT * FROM employees WHERE name NOT IN (SELECT name FROM eotm_dyn)
OR
SELECT * FROM employees WHERE NOT EXISTS (SELECT * FROM eotm_dyn WHERE eotm_dyn.name = employees.name)
OR
SELECT * FROM employees LEFT OUTER JOIN eotm_dyn ON eotm_dyn.name = employees.name WHERE eotm_dyn IS NULL
You can do a LEFT JOIN and assert the joined column is NULL.
Example:
SELECT * FROM employees a LEFT JOIN eotm_dyn b on (a.joinfield=b.joinfield) WHERE b.name IS NULL
SELECT * from employees
WHERE NOT EXISTS (SELECT name FROM eotm_dyn)
Never returns any records unless eotm_dyn is empty. You need to some kind of criteria on SELECT name FROM eotm_dyn like
SELECT * from employees
WHERE NOT EXISTS (
SELECT name FROM eotm_dyn WHERE eotm_dyn.employeeid = employees.employeeid
)
assuming that the two tables are linked by a foreign key relationship. At this point you could use a variety of other options including a LEFT JOIN. The optimizer will typically handle them the same in most cases, however.
You can also have a look at this related question. That user reported that using a join provided better performance than using a sub query.