I'm trying to query all the data from the MySQL Employees sample database. (structure and row counts)
I'm trying to do this in a way that preserves the relationships between all the tables, but I've never had to query a database this last, and honestly, I'm getting lost in Joins and such. I would really appreciate any help.
Thanks!
Edit: SQL:
SELECT *
FROM
`employees`
LEFT JOIN `salaries`
ON `employees`.`emp_no` = `salaries`.`emp_no`
LEFT JOIN `dept_manager`
ON `employees`.`emp_no` = `dept_manager`.`emp_no`
LEFT JOIN `titles`
ON `employees`.`emp_no` = `titles`.`emp_no`,
JOIN
`departments` on `departments`
ON `departments`.`dept_no` = `dept_emp`.`dept_no`
LIMIT 50;
I try to explain joins simple as possible.
Lets say, we have to store some relational information: person (Name, Surname) and few E-mails assigned to person.
First - we save person information in table persons. We need unique row identifier personId, that will allow us to point exact person in persons database.
table persons
personId, name, surname
1, John, Foo
2, Mike, Bar
When we have table like this, we can store persons emails in other table. In this case, creating unique row identifier is not necessary, but its a good practise to store these row identifiers in all tables. PersonId field is telling us what person owns that e-mail.
table emails
emailId, personId, address
1, 1, john.foo#company.com
2, 1, john.foo#goomail.com
3, 2, mike.bar#company.com
4, 2, mike.bar#yaaho.com
Now we can select data with join.
SELECT persons.Name, persons.Surname, emails.address
FROM
persons
join
emails
ON
persons.personId = emails.personId
That query will return data (persons with addresses assigned to them).
John, Foo, john.foo#company.com
John, Foo, john.foo#goomail.com
Mike, Bar, mike.bar#company.com
Mike, Bar, mike.bar#yaaho.com
Is that clear?
If joins are not clear for you - maybe visit phpcademy.org or thenewboston.org - they have very good tutorials about many languages and techologies, including SQL.
Added later:
You can also join multiple tables, like this:
SELECT primarykey, key1,key2, other, fields, detail_t_1.something
FROM
master_t
JOIN
detail_t_1 on detail_t_1
ON master_t.key1 = detail_t_1.key1,
detail_t_2 on detail_t_2
ON master_t.key2 = detail_t_2.key2
Added later:
If there are multiple "detail" rows (like 2 emails in my example) - returned data will contain one person multiple times, with each email address.
In SQL returned data is always in form of table, not tree.
Try this:
SELECT * from -- you can specify fields instead of *
employees e
join salaries s on e.emp_no = s.emp_no,
join titles t on e.emp_no = t.emp_no,
join dept_emp de on e.emp_no = de.emp_no,depts,
join departments d on d.emp_no = t.emp_no,
join dept_manager dm on d.dept_no = dm.dept_no,
join employees edm on dm.emp_no = edm.emp_no -- second instance of employees to join employee who is a dept manager
Related
I am using the Chinook database for a project and I have two difficult queries to execute, but both provide errors.
I am looking for all the orders (invoice) that were sent to 'New York' and contain tracks that belong to more than one genre. [InvoiceId, amount of products, total1, total2]. Total1 should be unitprice*quantity and total2 is total. It should show only 2 rows.
So far I have come up with this. I have also tried switching up with left join, full outer join, etc
CREATE TEMPORARY TABLE temp AS
SELECT *
FROM track join invoiceline USING (TrackId)
WHERE (select * from track t1 where EXISTS (select * from track t2 where t1.GenreId <> t2.GenreId));
SELECT invoice.InvoiceId, invoiceline.Quantity, invoiceline.UnitPrice*invoiceline.Quantity, invoice.Total
FROM (SELECT * FROM invoice JOIN invoiceline
WHERE invoice.BillingCity LIKE '%New York%') JOIN temp cc ON invoiceline.TrackId
GROUP BY invoiceline.InvoiceId;
DROP TABLE temp;
It provides the error:
Operand should contain 1 column(s)
I am looking for clients (in couples) that have bought more than two of the same tracks. It should provide 14 rows.
Until now I have come up with this.
SELECT CONCAT(FIRSTNAME,',', LASTNAME) AS name1 FROM customer
JOIN invoice ON customer.CustomerId = invoice.CustomerId
JOIN invoiceline ON invoice.InvoiceId = invoiceline.InvoiceId
JOIN track ON invoiceline.TrackId = track.TrackId
UNION
(
SELECT CONCAT(FIRSTNAME,',', LASTNAME) AS name2 FROM customer
JOIN invoice ON customer.CustomerId = invoice.CustomerId
JOIN invoiceline ON invoice.InvoiceId = invoiceline.InvoiceId
JOIN track ON invoiceline.TrackId = track.TrackId
);
So A) Does anybody know why it provides that error?
B) Could anyone give any tips or suggest a better way to write these queries?
Here are two helpful schemas:ER diagram
relational diagram
Answer to you first question:
The error comes up because many rows would have a single genre id. This method is also very redundant.
You should use count of genre Ids and take track Ids with count more than 1 as shown below:
CREATE TEMPORARY TABLE temp AS
SELECT *
FROM track join invoiceline USING (TrackId)
WHERE TrackId in
(select TrackId from (select TrackId, count(distinct GenreId) as genres from track group by 1 having genres>1));
SELECT invoice.InvoiceId, invoiceline.Quantity, invoiceline.UnitPrice*invoiceline.Quantity, invoice.Total
FROM (SELECT * FROM invoice JOIN invoiceline
WHERE invoice.BillingCity LIKE '%New York%') JOIN temp cc ON invoiceline.TrackId
GROUP BY invoiceline.InvoiceId;
DROP TABLE temp;
I have assumed that track id is the primary key here.
For the second question, I assume that you want to find customers buying the same records. You can use a query like the one below:
SELECT invoiceline.TrackId, group_concat(customer.CustomerId) as customers FROM customer
JOIN invoice ON customer.CustomerId = invoice.CustomerId
JOIN invoiceline ON invoice.InvoiceId = invoiceline.InvoiceId
JOIN track ON invoiceline.TrackId = track.TrackId
group by 1
This will give you comma separated customer ids who have bought the same track. Also, use customer id instead of first name and last name since some customers can have the same name. Using primary key is best.
Since you mentioned, you want customers buying the same records in couples, I would suggest reading up on market basket analysis or association analysis using apriori algorithm. You can import your dataset into R or Python whichever you are comfortable with and build a visualization. Python is faster and can handle more data but its visualizations are bad. R is a bit slow at handling large amounts of data but has good visualizations for apriori algorithm
I'm trying to learn SQL and have this question: print the name of all employees together with the name of their supervisor.
Here is my Employees table:
Table Employee
How do I use a SELECT command in new query to get the result out? I tried with a self join.
Ty
Max (SQL NOOB)
Try the below
SELECT t1.EmployeeName AS Manager, t2.EmployeeID as EmployeeName
FROM Employee t1
LEFT JOIN Employee t2 ON t1.ManagerID=t2.EmployeeID
An inner join should do it. Here's the syntax, but you'll have to apply it to your database:
SELECT c.name, o.name
FROM cats c
INNER JOIN owners o
ON c.owner_id = o.id
"cats c" basically means "Cats table, which I'm nicknaming c." "owners o" basically means "Owners table, which I'm nicknaming o." In your select, you tell it which fields you want by their name, and which table their in: [table nickname].[fieldname]. The ON is where you specify a common field from each table, which tells it how to join the two tables together. In the case of this example, we're saying that we want the owner who's id is equal to the owner id field of the cats table. The owner_id field is a foreign key of the owners table.
I know answers like this are frustrating when you're learning, but the truth is you'll learn better if you have to figure a bit out for yourself, and I can't do your homework for you (I mean that nicely..)
I have this very basic problem and just can't figure out how to do it. I have two tables.
The first one, users, contains two columns: id which is just number representing a person and sex. Second column doesn't matter now.
The other table, orders has columns: id_user, time, state. The id_user refers to id in the first table. state has three different values (finished, canceled, new). I need to make a table that would show count of finished state (how many finished states one person has) next to the id of that person. I can run this thing:
select * , count(state) as CountOfFinished from orders
where state = 'finished'
group by id_user;
to show me that information, but i need to have a table that would show id,sex, CountOfFinished.
I copied the first table to a new one, but don't know how to add the CountOfFinished column next to these two. I don't even know how to make a column out of it so I can join it or something.
Any idea what should I do?
You don't need a new table. What you want is a JOIN, or in this case, a LEFT JOIN:
SELECT
u.id,
u.sex,
ISNULL(COUNT(o.state), 0) as CountOfFinished
FROM users u
LEFT JOIN orders o
ON o.id_user = u.id
AND o.state = 'finished'
GROUP BY
u.id, u.sex
The above query will list all users and the number of finished orders. If you want to list only users with at least one finished order, use INNER JOIN.
To insert into a new table, create the table first and use INSERT INTO:
CREATE TABLE FinishedOrderCountByUser(
UserId INT,
Sex CHAR(1),
CountOfFinished INT
)
INSERT INTO FinishedOrderCountByUser(UserId, Sex, CountOfFinished)
SELECT
u.id,
u.sex,
ISNULL(COUNT(o.state), 0) as CountOfFinished
FROM users u
LEFT JOIN orders o
ON o.id_user = u.id
AND o.state = 'finished'
GROUP BY
u.id, u.sex
Here's one example, I have a Car, User, Member, and Dealer tables. At the moment I'm trying to get the name of a dealer who owns a car by matching their userids up
Select userid, name FROM `User` WHERE userid IN
(SELECT userid FROM 'Car' WHERE userid in
(SELECT userid FROM `Member` WHERE userid IN
(SELECT userid FROM `Dealer`)))
This does what I want but I can't help feel there's a better way of doing it? Currently the userid in Car is a FK of the userid in Dealer which is a FK of the userid in Member which is a FK of the userid in User which stores the name.
Can I go straight to getting all the userid's and names of dealers who's id is in the Car table, whilst making sure they're actually a Dealer?
Basically your schema is a downstream schema
Users -> Members -> Dealer -> Car
Good thing is you made all the possible keys that you need here.
So to selct anything in any table just go down stream from Users for example for the data you want
Select * from `USER` u
where
dealer.user_id = car.user_id and
member.user_id = dealer.user_id and
u.user_id = member.user_id
The reason i went upstream in matching records is because we want to make as few matching operations as possible. As you can see user table is supposed to contain the maximum records. So i match with car table and get all the user_id where there is a match in dealer. similarly i go from dealer to member and then to user. this means all the records of users will be matched with a lot fewer records that they would have been if i went from users to members to dealer to car.
But this is not fool proof solution. it will depend on your data. because it may be a case where one user may have multiple cars, then it would be better to go downstream.
Use JOIN instead of subqueries to fetch the data.
Try this:
SELECT U.userid, U.NAME
FROM `User` U
INNER JOIN Car C ON U.userid = C.userid
INNER JOIN Member M ON C.userid = M.userid
INNER JOIN Dealer D ON M.userid = D.userid;
I don't understand the concept of a left outer join, a right outer join, or indeed why we need to use a join at all! The question I am struggling with and the table I am working from is here: Link
Question 3(b)
Construct a command in SQL to solve the following query, explaining why it had to employ the
(outer) join method. [5 Marks]
“Find the name of each staff member and his/her dependent spouse, if any”
Question 3(c) -
Construct a command in SQL to solve the following query, using (i) the join method, and (ii) the
subquery method. [10 Marks]
“Find the identity name of each staff member who has worked more than 20 hours on the
Computerization Project”
Can anyone please explain this to me simply?
Joins are used to combine two related tables together.
In your example, you can combine the Employee table and the Department table, like so:
SELECT FNAME, LNAME, DNAME
FROM
EMPLOYEE INNER JOIN DEPARTMENT ON EMPLOYEE.DNO=DEPARTMENT.DNUMBER
This would result in a recordset like:
FNAME LNAME DNAME
----- ----- -----
John Smith Research
John Doe Administration
I used an INNER JOIN above. INNER JOINs combine two tables so that only records with matches in both tables are displayed, and they are joined in this case, on the department number (field DNO in Employee, DNUMBER in Department table).
LEFT JOINs allow you to combine two tables when you have records in the first table but might not have records in the second table. For example, let's say you want a list of all the employees, plus any dependents:
SELECT EMPLOYEE.FNAME as employee_first, EMPLOYEE.LNAME as employee_last, DEPENDENT.FNAME as dependent_last, DEPENDENT.LNAME as dependent_last
FROM
EMPLOYEE INNER JOIN DEPENDENT ON EMPLOYEE.SSN=DEPENDENT.ESSN
The problem here is that if an employee doesn't have a dependent, then their record won't show up at all -- because there's no matching record in the DEPENDENT table.
So, you use a left join which keeps all the data on the "left" (i.e. the first table) and pulls in any matching data on the "right" (the second table):
SELECT EMPLOYEE.FNAME as employee_first, EMPLOYEE.LNAME as employee_last, DEPENDENT.FNAME as dependent_first, DEPENDENT.LNAME as dependent_last
FROM
EMPLOYEE LEFT JOIN DEPENDENT ON EMPLOYEE.SSN=DEPENDENT.ESSN
Now we get all of the employee records. If there is no matching dependent(s) for a given employee, the dependent_first and dependent_last fields will be null.
example (not using your example tables :-)
I have a car rental company.
Table car
id: integer primary key autoincrement
licence_plate: varchar
purchase_date: date
Table customer
id: integer primary key autoincrement
name: varchar
Table rental
id: integer primary key autoincrement
car_id: integer
bike_id: integer
customer_id: integer
rental_date: date
Simple right? I have 10 records for cars because I have 10 cars.
I've been running this business for 10 years, so I've got 1000 customers.
And I rent the cars about 20x per year per cars = 10 years x 10 cars x 20 = 2000 rentals.
If I store everything in one big table I've got 10x1000x2000 = 20 million records.
If I store it in 3 tables I've got 10+1000+2000 = 3010 records.
That's 3 orders of magnitude, so that's why I use 3 tables.
But because I use 3 tables (to save space and time) I have to use joins in order to get the data out again
(at least if I want names and licence plates instead of numbers).
Using inner joins
All rentals for customer 345?
SELECT * FROM customer
INNER JOIN rental on (rental.customer_id = customer.id)
INNER JOIN car on (car.id = rental.car_id)
WHERE customer.id = 345.
That's an INNER JOIN, because we only want to know about cars linked to rentals linked to customers that actually happened.
Notice that we also have a bike_id, linking to the bike table, which is pretty similar to the car table but different.
How would we get all bike + car rentals for customer 345.
We can try and do this
SELECT * FROM customer
INNER JOIN rental on (rental.customer_id = customer.id)
INNER JOIN car on (car.id = rental.car_id)
INNER JOIN bike on (bike.id = rental.bike_id)
WHERE customer.id = 345.
But that will give an empty set!!
This is because a rental can either be a bike_rental OR a car_rental, but not both at the same time.
And the non-working inner join query will only give results for all rentals where we rent out both a bike and a car in the same transaction.
We are trying to get and boolean OR relationship using a boolean AND join.
Using outer joins
In order to solve this we need an outer join.
Let's solve it with left join
SELECT * FROM customer
INNER JOIN rental on (rental.customer_id = customer.id) <<-- link always
LEFT JOIN car on (car.id = rental.car_id) <<-- link half of the time
LEFT JOIN bike on (bike.id = rental.bike_id) <<-- link (other) 0.5 of the time.
WHERE customer.id = 345.
Look at it this way. An inner join is an AND and a left join is a OR as in the following pseudocode:
if a=1 AND a=2 then {this is always false, no result}
if a=1 OR a=2 then {this might be true or not}
If you create the tables and run the query you can see the result.
on terminology
A left join is the same as a left outer join.
A join with no extra prefixes is an inner join
There's also a full outer join. In 25 years of programming I've never used that.
Why Left join
Well there's two tables involved. In the example we linked
customer to rental with an inner join, in an inner join both tables must link so there is no difference between the left:customer table and the right:rental table.
The next link was a left join between left:rental and right:car. On the left side all rows must link and the right side they don't have to. This is why it's a left join
You use outer joins when you need all of the results from one of the join tables, whether there is a matching row in the other table or not.
I think Question 3(b) is confusing because its entire premise wrong: you don't have to use an outer join to "solve the query" e.g. consider this (following the style of syntax in the exam paper is probably wise):
SELECT FNAME, LNAME, DEPENDENT_NAME
FROM EMPLOYEE, DEPENDENT
WHERE SSN = ESSN
AND RELATIONSHIP = 'SPOUSE'
UNION
SELECT FNAME, LNAME, NULL
FROM EMPLOYEE
EXCEPT
SELECT FNAME, LNAME, DEPENDENT_NAME
FROM EMPLOYEE, DEPENDENT
WHERE SSN = ESSN
AND RELATIONSHIP = 'SPOUSE'
In general:
JOIN joints two tables together.
Use INNER JOIN when you wanna "look up", like look up detailed information of any specific column.
Use OUTER JOIN when you wanna "demonstrate", like list all the info of the 2 tables.