I have 2 tables and I need to write a query in relational algebra that will select all names of teams, who are not working with any client.
I have these relations
team( id, name )
client( id, name, teamId )
teamId ⊆ team.id
Tables looks like this
Could you please help me what would be the query in relational algebra? I was thinking about joining these 2 tables and selecting rows there team has Client.teamId as NULL, but I don't know how to formally write it.
Here are the steps that must be done:
Join Team and Client on Team.id = Client.TeamId and project this relation on Team.id and Team.name. You obtain the id and name of all the teams that work for some client.
Subtract from the relation Team the relation obtained in the previous step: in this way you get all the teams that do not work for some client.
Project the relation obtained at the previous step on Team.name. In this way you obtain the name of the teams that do not work for some client.
The notations for relational algebra are different, here an expression with a typical notation:
πname (team - πid,name(team ⨝id=teamId client))
Related
I'm trying to find out which schools had students that did not complete their exams in 2018. So I've got 3 tables set up being: ExamInfo, ExamEntry and Students. I'm going to try use the ExamInfo table to get information from the Students table though, I obviously only want the student information that did not complete their exam in 2018. Note: I'm looking for students that attended, though did not complete the exam, with this particular exam you can look at completed exam as passed exam.
Within ExamInfo I have the columns:
ExamInfo_Date --when exam took place, using to get year() condition
ExamInfo_ExamNo --unique student exam ID used to connect with other tables
ExamInfo_Completed --1 if completed, 0 if not.
...
Within ExamEntry I have the related columns:
ExamEntry_ExamNo --connected to ExamInfo table
ExamEntry_StudentId --unique studentId used to connect to Students table
ExamEntry_Date -- this is same as ExamInfo_Date if any relevance.
...
Within Students I have following columns:
Students_Id --this is related to ExamEntry_StudentId, PRIMARY KEY
Students_School --this is the school of which I wish to be my output.
...
I want my output to simply be a list of all schools that had students that did not complete their exams in 2018. Though my issue is with getting from the ExamInfo table, to finding the schools of which students did not complete their exam.
So far I've:
SELECT a.Students_School, YEAR(l.ExamInfo_Date), l.ExamInfo_Completed
FROM ExamInfo l ??JOIN?? Students a
WHERE YEAR(l.ExamInfo_Date) = 2018
AND l.ExamInfo_Completed = 0
;
I'm not even sure if going through the ExamEntry table is necessary. I'm sure I'm meant to use a join, though unsure of how to appropriately use it. Also, with my 3 different SELECT columns, I only wish for Students_School column to be output:
Students_School
---------------
Applederry
Barnet Boys
...
Clearly, you need a JOIN -- two in fact. Your table has exams, students, and a junction/association table that represents the many-to-many relationship between these entities.
So, I would expect the FROM clause to look like:
FROM ExamInfo e JOIN
ExamEntry ee
ON ee.ExamEntry_ExamNo = e.ExamNo JOIN
Students s
ON ee.ExamEntry_StudentId = s.Students_Id
I was given the following question:
Write a SQL statement to make a join on the tables salesman, customer and orders in such a form that the same column of each table will appear once and only the relational rows will come.
I executed the following query:
SELECT * FROM orders NATURAL JOIN customer NATURAL JOIN salesman;
However, I was not expecting the following result:
My doubt lies in step 2.
Why didn't I get the rows with salesman_id 5002, 5003 & 5007?
I know that natural join uses the common columns to finalize the rows.
Here all the Salesman_ids are present in the result from step 1.
Why isn't the final result equal to the table resulting from step 1 with non duplicate columns from salesman added to it?
... the same column of each table will appear once
Yes Natural Join does that.
... and only the relational rows will come.
I don't know what that means.
I disagree with those who are saying: do not use Natural Join. But it is certainly true that if you plan to use Natural Join for your queries, you must design the schema so that (loosely speaking) 'same column name means same thing'.
Then this exercise is teaching you the dangers of having same-named columns that do not mean the same thing. The danger is sometimes called the 'connection trap' or 'join trap'. (Not really a trap: you just need to learn ways to write queries over poorly-designed schemas.)
A more precise way to put that: if you have columns named the same in two different tables, the column must be a key of at least one of them. So:
city is not a key in any of those tables,
so should not get 'captured' in a Natural Join.
salesman_id is not a key in table customer,
so should not get 'captured' in the join from table orders.
The main way to fix up this query is by renaming some columns to avoid 'capture' (see below). It's also worth mentioning that some dialects of SQL allow:
SELECT *
FROM orders
NATURAL JOIN customer ON customer_id
...
The ON column(s) phrase means: validate that the only columns in common between the two tables are those named. Otherwise reject the query. So your query would be rejected.
Renaming means that you shouldn't use SELECT *. (Anyway, that's dangerous for 'production code' because your query might produce different columns each time there's a schema change.) The easiest way to tackle this might be to create three Views for your three base tables, with the 'accidental' same-named columns given some other name. For this one query:
SELECT ord_no, purch_amt, ord_date, customer_id,
salesman_id AS order_salesman_id
FROM orders
NATURAL JOIN (SELECT customer_id, cust_name,
city AS cust_city, grade,
salesman_id AS cust_salesman_id
FROM customer) AS customer_grr
NATURAL JOIN (SELECT salesman_id, name,
city AS salesman_city,
commission
FROM salesman) AS salesman_grr
I'm using explicit AS to show renaming. Most dialects of SQL allow you to omit that keyword; just put city cust_city, ....
Why isn't the final result equal to the table resulting from step 1 with [...]?
Because natural join doesn't work how you expect--whatever that is, since you don't say.
In terms of relational algebra: Natural join returns the rows
• whose column set is the union of the input column sets and
• that have a subrow in both inputs.
In business terms: Every table & query result holds the rows that make some statement template--its (characteristic) predicate--its "meaning"--into a true statement. The designer gives the predicates for the base tables. Here, something like:
Orders = rows where
order [ord_no] ... and was sold by salesman [salesman_id] to customer [customer_id]
Customer = rows where
customer [customer_id] has name [cust_name] and lives in city [city]
and ... and is served by salesman [salesman_id]
Salesman = rows where
salesman [salesman_id] has name [name] and works in city [city] ...
Natural join is defined so that if each input holds the rows that make its predicate into a true statement then their natural join holds the rows that make the AND/conjunction of those predicates into a true statement. So (your query):
Orders natural join Customer natural join Salesman = rows where
order [ord_no] ... and was sold by salesman [salesman_id] to customer [customer_id]
and customer [customer_id] has name [cust_name] and lives in city [city]
and ... and is served by salesman [salesman_id]
and salesman [salesman_id] has name [name] and works in city [city] ...
So that natural join is asking for rows where, among other things, the customer lives in the city that the salesman works in. If that's not what you want, then you shouldn't use that expression.
Note how the meaning of a natural join of tables is a (simple) function of the meanings of its input tables. That's so for all the relational operators. So every query expression has a meaning, built from its base table meanings & relational operators.
Is there any rule of thumb to construct SQL query from a human-readable description?
Why didn't I get the rows with salesman_id 5002, 5003 & 5007?
Because those salesmen don't work a city in which one of their customers lives.
i am trying to implement a database which has multi valued attributes and create a filter based search. For example i want my people_table to contain id, name, address, hobbies, interests (hobbies and interests are multi-valued). The user will be able to check many attributes and sql will return only those who have all of them.
I made my study and i found some ways to implement this but i can't decide which one is the best.
The first one is to have one table with the basic info of people (id, name, address), two more for the multi-valued attributes and one more which contains only the keys of the other tables (i understand how to create this tables, i don't know yet how to implement the search).
The second one is to have one table with the basic info and then one for each attribute. So i will have 20 or more tables (football, paint, golf, music, hiking etc.) which they only contain the ids of the people. Then when the user checks the hobbies and the activities i am going to get the desired results with the use of the JOIN feature (i am not sure about the complexity, so i don't know how fast is going to be if the user do many checks).
The last one is an implementation that i didn't find on internet (and i know there is a reason :) ) but in my mind is the easiest to implement and the fastest in terms of complexity. Use only one table which will have the basic infos as normal and also all the attributes as boolean variables. So if i have 1000 people in my table there are going to be only 1000 loops and which i imagine with the use of AND condition are going to be fast enough.
So my question is: can i use the the third implementation or there is a big disadvantage that i don't get? And also which one of the first two ways do you suggest me to use?
That is a typical n to m relation. It works like this
persons table
------------
id
name
address
interests table
---------------
id
name
person_interests table
----------------------
person_id
interest_id
person_interests contains a record for each interest of a person. To get the interests of a person do:
select i.name
from interests i
join person_interests pi on pi.interest_id = i.id
join persons p on pi.person_id = p.id
where p.name = 'peter'
You could create also tables for hobbies. To get the hobbies do the same in a separate query. To get both in one query you can do something like this
select p.id, p.name,
i.name as interest,
h.name as hobby
from persons p
left join person_interests pi on pi.person_id = p.id
left join interests i on pi.interest_id = i.id
left join person_hobbies ph on ph.person_id = p.id
left join hobbies h on ph.hobby_id = h.id
where p.name = 'peter'
The basic way to deal with this is with a many-to-many join table. Each user can have many hobbies. Each hobby can have many users. That's basic stuff you can find information about anywhere, and #juergend already covered that.
The harder part is tracking different information about various hobbies and interests. Like if their hobby is "baseball" you might want to track what position they play, but if their hobby is "travel" you might want to track their favorite countries. Doing this with typical SQL relationships will lead to a rapid proliferation of tables and columns.
A hybrid approach is to use the new JSON data type to store some unstructured data. To expand on #juergend's example, you might add a field to Person_Interests which can store some of those details about that person's interest.
create table Person_Interests (
InterestID integer references Interests(ID),
PersonID integer references Persons(ID),
Details JSON
);
And now you could add that Person 45 has Interest 12 (travel), their favorite country is Djibouti, and they've been to 45 countries.
insert into person_interests
(InterestID, PersonID, Details)
(12, 45, '{"favorite_country": "Djibouti", "countries_visited": 45}');
And you can use JSON search functions to find, for example, everyone whose favorite country is Djibouti.
select p.id, p.name
from person_interests pi
join persons p on p.id = pi.personid
where pi.details->"$.favorite_country" = "Djibouti"
The advantage here is flexibility: interests and their attributes aren't limited by your database schema.
The disadvantages is performance. The JSON data type isn't the most efficient, and indexing a JSON column in MySQL is complicated. Good indexing is critical to good SQL performance. So as you figure out common patterns you might want to turn commonly used attributes into real columns in real tables.
The other option would be to use table inheritance. This is a feature of Postgres, not MySQL, and I'd recommend considering switching. Postgres also has better and more mature JSON support and JSON columns are easier to index.
With table inheritance, rather than having to write a completely new table for every different interest, you can make specific tables which inherit from a more generic one.
create table person_interests_travel (
FavoriteCountry text,
CountriesVisited text[]
) inherits(person_interests);
This still has InterestID, PersonID, and Details, but it's added some specific columns for tracking their favorite country and countries they've visited.
Note that text[]. Postgresql also supports arrays so you can store real lists without having to create another join table. You can also do this in MySQL with a JSON field, but arrays offer type constraints that JSON does not.
I am studying for SQL exam, and I came across this fact, regarding subqueries:
2. Main query and subquery can get data from different tables
When is a case when this feature would be useful? I find it difficult to imagine such a case.
Millions of situations call for finding information in different tables, it's the basis of relational data. Here's an example:
Find the emergency contact information for all students who are in a chemistry class:
SELECT Emergency_Name, Emergency_Phone
FROM tbl_StudentInfo
WHERE StudentID IN (SELECT b.StudentID
FROM tbl_ClassEnroll b
WHERE Subject = 'Chemistry')
SELECT * FROM tableA
WHERE id IN (SELECT id FROM tableB)
There is plenty of reasons why you have to get data from different tables, such as select sth from main query, which is based on subquery/subqueries from another tables. The usage is really huge.
choose customers from main query which is based on regions and their values
SELECT * FROM customers
WHERE country IN(SELECT name FROM country WHERE name LIKE '%land%')
choose products from main query which is greater or lower than average incoming salary of customers and so on...
You could do something like,
SELECT SUM(trans) as 'Transactions', branch.city as 'city'
FROM account
INNER JOIN branch
ON branch.bID = account.bID
GROUP BY branch.city
HAVING SUM(account.trans) < 0;
This would for a company to identify which branch makes the most profit and which branch is making a loss, it would help identify if the company had to make changes to their marketing approach in certain regions, in theory allowing for the company to become more dynamic and reactive to changes in the economy at any give time.
I have a schema design question for my application, hope I can get advices from teachers. This is very alike of Role Based Access Controll, but a bit different in detail.
Spec:
For one company, there are 4 roles: Company (Boss) / Department (Manager) / Team (Leader) / Member (Sales), and there are about 1 million Customers records. Each customer record can be owned by someone, and he could be Boss or Manager or Leader or Sales. If the record's owner is some Sales, then his upper grade (say: his leader / manager / boss) can see this record as well (but others: say the same level of his workmates, cannot see, unless his upper grade manager share the customer to his workmates), but if the record's owner is boss, none except the boss himself can see it.
My Design is like this (I want to improve it to make it more simple and clear):
Table:
departments:
id (P.K. deparment id)
d_name (department name)
p_id (parent department id)
employees
id (P.K. employee id)
e_name (employee name)
employee_roles
id (P.K.)
e_id (employee id)
d_id (department id)
customers
id (P.K. customer id)
c_name (customer name)
c_phone (customer phone)
permissions
id (P.K.)
c_id (customer id)
e_id (owner employee id)
d_id (this customer belongs to which deparment)
share_to (this customer share to other's id)
P.S.: each employee can have multi roles, for example, employee A can be the manager of department_I and meanwhile he can also be one sales of deparment_II >> Team_X.
So, when an employee login to application, by querying from employee_roles table, we can get all of the department ids and sub department ids, and save them into an array.
Then I can use this array to query from permissions table and join it with customers table to get all the customers this employee should see. The SQL might look like this:
SELECT * FROM customers AS a INNER JOIN permissions AS b ON a.id =
b.c_id AND (b.d_id IN ${DEP_ARRAY} OR e_id = ${LOGIN_EMPLOYEE_ID} OR
share_to = ${LOGIN_EMPLOYEE_ID})
I don't really like the above SQL, especially the "IN" clause, since I am afraid it will slow down the query, since there are about 1 million records or even more in the customer table; and, there will be as many records as the customers table in the permissions table, the INNER JOIN might be very slow too. (So what I care about is the performance like everyone :))
To my best knowledge, this is the best design I can work out, could you teachers please help to give me some advice on this question? If you need anything more info, please let me know.
Any advice would be appreciated!
Thanks a million in advance!!
Do not use an array, use a table, ie the value of a select statement. And stop worrying about performance until you know more basics about thinking in terms of tables and queries.
The point of the relational model is that if you structure your data as tables then you can looplessly describe the output table and the DBMS figures out how to calculate it. See this. Do not think about "joining"; think about describing the result. Whatever the DBMS ends up doing is its business not yours. Only after you become knowledgeable about variations in descriptions and options for descriptions will you have basic knowledge to learn about performance.