I am reading a book on SQL and I am stuck on an example which is related to a database schema as shown below in the image.
The example below solves the query as stated in the book :
Suppose we wish to answer the query “List the names of instructors
along with the titles of courses that they teach.” The query can be written in
SQL as follows:
select name , title
from instructor natural join teaches , course
where teaches.course id = course.course id;
Now the book states that
" Note that teaches.course id
in the where clause refers to the course id field of the natural join result, since this field in turn came from the teaches relation. "
Again the book states in BOLD that :
"It is not possible to use attribute names containing the original relation names, for instance instructor.name or teaches.course id, to refer to attributes in the natural join result; we can, however, use attribute names such as
name and course id, without the relation names."
(Refering to the query above)If it is not possible then how come the author was able to write the query as
teaches.course id = course.course id
How can teaches.course refer to the natural join attribute "Course" , the author has so ambiguously put forth his arguments.Please explain me the author's point of view.
Ignore what the book has to say about NATURAL JOIN. Just avoid it. NATURAL JOIN is a bug waiting to happen. Why? The join keys are defined simply by naming conventions on columns in the tables -- any columns that happen to have the same names are used. In fact, NATURAL JOIN ignores properly defined FOREIGN KEY relationships; and they hide the keys actually used for matching.
So, be explicit and use the ON or USING clauses instead. These are explicit about the keys being used and the code is more understandable and maintainable.
Then, follow a simple rule: Never use commas in the FROM clause; always use explicit JOIN syntax.
So, a good way to write your query would be something like this:
select i.name, c.title
from instructor i inner join
teaches t
on t.instructor_id = i.instructor_id inner join
course c
on t.course_id = i.course_id;
Note that there is no where clause and all the columns are qualified, meaning that they specify the table they are coming from.
Also, I don't see an instructor_id column in the teaches table, so this is just an example of what reasonable code would look like.
Related
I have the databaase in icon below. I want
to count all students from Subject with name Psychology and class with name Class5.
the percentage of students with status "Something" from subject with name Psychology and class with name Class5.
All students and the class name from Class "Class6" that are male.
I've tried for example
(in english:)
SELECT COUNT(student_name) AS NumberOfStudents FROM student_srms JOIN class_srms JOIN subject_srms WHERE class_srms.class_name='Class5' AND subject_srms.subject_name='Psychology'
But returns NumberOfStudents = 20, but 20 are all student entries.
The issue likely stems from your FROM clause. It's not enough to just say JOIN. You need to specify the relationship of the columns between the two tables being joined with an ON clause:
FROM student_srms
JOIN class_srms
ON student_srms.student_id = class_srms.student_id
JOIN subject_srms
ON class_srms.subject_id = subject_srms.subject_id
I believe in MySQL there is a NATURAL JOIN which will tell mysql without an ON clause to just join on column names that are similar between the two tables, but that feels dirty to me and could cause failures later on in an applications lifecycle if new columns are introduced to tables that share names, but not relationships, so I would just steer clear of that.
I have a suspicion that your diagram showing tables/columns is incorrect based on the error you are reporting in the comments. Instead, try (and I'm totally guessing blind here at this point):
FROM student_srms
JOIN student_class
On student_srms.student_id = student_class.class_id
JOIN class_srms
ON student_class.class_id = class_srms.student_id
JOIN subject_srms
ON class_srms.subject_id = subject_srms.subject_id
That adds in that student_class relationship table so you can make the jump from student to class tables. Fingers crossed.
For example, a bookpub database contains the following tables (pseudocode):
book (key: isbn)
bookauthor (key:author_id, isbn)
author (key: author_id)
The following query returns the number of books by each author:
select lastname, firstname, count(isbn)
from author
join bookauthor using (author_id)
group by lastname, firstname;
However, the following query also produces identical results in MySQL without complaint:
select lastname, firstname, count(isbn)
from author
join bookauthor using (author_id)
group by author_id;
So why shouldn't author_id be used instead of lastname, firstname?
I might add that the formal SQL spec contains the following:
All non-aggregate groups in a SELECT expression list or HAVING expression list must be included in the GROUP BY clause.
Can somebody please interpret this? What is a "non-aggregate group"? Why not just say "columns"? Furthermore, what is an "expression list"? Does an expression in this case always evaluate to a column?
No SQL Implementation is 100% true to the ANSI definition. Some things are missing, some things are added, something are just different.
In MySQL's case, it was chosen to not enforce the restriction you mention:
All non-aggregate groups in a SELECT expression list or HAVING expression list must be included in the GROUP BY clause.
This allows the GROUP BY primary_key syntax that you have noticed, instead of the clunky (and actually slightly more costly) GROUP BY property1, property2, property3, etc. It's clean and elegant.
There are downsides, however; misuse and misunderstanding are rife in web developers because of MySQL, and the flexibility allows bugs to slip though undetected. I recommend avoiding it it in most cases as the performance gains are minimal and the potential for bugs can be huge.
An example of an bug that slips through could be:
SELECT
person.name,
address.city
FROM
person
INNER JOIN
address
ON address.person_id = person.id
GROUP BY
person.id
MySQL will pretty much always allow that code to execute. Even if the address table can have multiple entries per person (I've lived at more than one address).
The code could possibly need to be as follows, but MySQL will never enforce this:
SELECT
person.name,
address.move_in_date,
address.city
FROM
person
INNER JOIN
address
ON address.person_id = person.id
GROUP BY
person.id,
address.id
The more joins involved, the more chance the GROUP BY needs to include multiple primary keys, or other fields.
The behavior you get is that MySQL arbitrarily chooses what values to return when the code is ambiguous. It is explicitly non-deterministic. The following code could give the city from one address and the city's population from another address :-/
SELECT
person.name,
address.move_in_date,
address.city,
city.population
FROM
person
INNER JOIN
address
ON address.person_id = person.id
INNER JOIN
city
ON address.city_id = city.id
GROUP BY
person.id
People then try to abuse this with "tricks" like the following...
SELECT
person.name,
address.move_in_date,
address.city,
city.population
FROM
person
INNER JOIN
address
ON address.person_id = person.id
INNER JOIN
city
ON address.city_id = city.id
GROUP BY
person.id
ORDER BY
person.id,
city.population DESC
This happens to cause the MySQL engine to choose the city with the highest population. Useful for finding the most populous city each person has lived in? Well, it's not actually guaranteed to work. It's still arbitrary; if the tables are being written to, or the database is in a distributed environment, or the MySQL code changes, etc, the behavior could change.
But people do it anyway. Because "well, it's always worked for me so far!"...
In the group by clause you list fields and expressions whose values will partition your result set. For those groups you can calculate aggregate functions like count sum etc.
MySQL lets you select non aggregate expressions or fields, not present in the group by clause, but it's non standard SQL. The result will be non deterministic if those fields have more than one value for a group.
If you group by the primary key the result will be deterministic because there's only one row for each key.
I am trying to solve this "issue", however still without success. What I'd like to achieve is, create a query that will select all friends of specific actor. Let's say I want to get list of First name, Last name and age of Jason Statham's friends.
Below is an image of tables.
PS: Are those tables correctly organized ? (especially those foreign keys)
Thanks in advance
Does this do what you're looking for?
SELECT actors.first_name,
actors.last_name
FROM actors
WHERE actors.login IN
(
SELECT friendslist.loginf
FROM friendslist
WHERE friendslist.logina = 'xstad'
)
You'll need to include the Actors table twice - once for the focus person (Jason Statham) and once for his friends.
SELECT CONCAT(A.first_name," ",A.last_name) AS Actor, CONCAT(B.first_name," ",B.last_name) AS Friend
FROM Actors AS A
JOIN [Friends List] AS F on A.login=F.loginA
JOIN Actors AS B on F.loginB=B.login
ORDER BY A.last_name, B.last_name
If Friends_List is rows where "[loginA] and [loginF] are friends" then rows should appear in pairs, < a1,a2> and < a2,a1>. I'll assume it means "[loginA] considers [loginF] a friend" and that in your query "all friends of an actor" means "all actors that an actor considers a friend".
You can do this with natural join. It joins on common columns and returns only one column with that name. If you want two differently named columns joined to each other then you have to rename one. Unfortunately in SQL you can't do that by just mentioning the one column, you must mention them all.
// first_name, last_name of rows satisfying [login] has name [first_name] [last_name] AND 'xstad' considers [login] a friend
SELECT first_name,last_name
FROM Actors
NATURAL JOIN
(SELECT loginA, loginF AS login
FROM
(SELECT * FROM Friends_List WHERE loginA='xstad')
)
(You could collapse the nested selects but I am illustrating the structure.)
You could also use NATURAL JOIN without renaming but explicity equating columns:
// <firstname,last_name> of rows satisfying [login] has name [first_name] [last_name] AND [login] considers [login] a friend AND loginA='xstad' AND login=loginF
SELECT first_name,last_name
FROM Actors
NATURAL JOIN Friends_List
WHERE loginA='xstad' AND login=loginF
EDIT
A certain design & programming style not familiar to or understood by most SQL programmers is supported by natural join. This approach lets relational algebra operators parallel logic operators: result rows satisfy the statement that is the AND of the statements that argument rows satisfy; UNION the OR; EXCEPT the AND NOT; PROJECT on all but some columns {C,...} the EXISTS C,..; etc.) This is simpler than having to deal with the dotted duplicate columns from SQL INNER JOIN. Unfortunately SQL does not give all the relevant support (eg rename columns, project out columns, no SQL pseudo-3VL, optimizations).
EDIT corrected 1st query
I think the design is straightforward so no explanation is required.
Question: Is there a way to inditcate the language of the name column in courses table? Maybe to link it with the languages table?
Edit: Or maybe separate the name-language pare in another table with id and reference it in courses table?
Edit2: Course language and Name langauge may be different
Question: Is there a way to inditcate the language of the name column in courses table? Maybe to link it with the languages table?
There's no need. The following query will give you what you want:
SELECT c.name, COALESCE(l.name,'default') as language
FROM courses c
LEFT JOIN courses_has_languages cl ON (cl.courses_course_id = c.course_id)
LEFT JOIN languages l ON (l.language_id = cl.languages_language_id)
Of source it would be even better if you just rename your column names so the query can be rewritten to:
SELECT c.name, COALESCE(l.name,'default') as language
FROM courses c
LEFT JOIN courses_has_languages cl ON (cl.course_id = c.id)
LEFT JOIN languages l ON (l.id = cl.language_id)
But that's just my preference.
If I understand you it sounds like you're on the right track.
Make language_id a foreign key in the courses table that points back to languages.
Assuming that the diagram correctly models the data then no, you do not need any additional relationships.
Instead, you will retrieve the languages for a course by JOINing the tables in a SELECT statement. If this is an operation you will perform frequently, you can encapsulate that SELECT statement by CREATEing a VIEW.
I have a table called faq. This table consists from fields faq_id,faq_subject.
I have another table called article which consists of article_id,ticket_id,a_body and which stores articles in a specific ticket. Naturally there is also a table "ticket" with fields ticket_id,ticket_number.
I want to retrieve a result table in format:
ticket_number,faq_id,faq_subject.
In order to do this I need to search for faq_id in the article.a_body field using %LIKE% statement.
My question is, how can I do this dynamically such that I return with SQL one result table, which is in format ticket_number,faq_id,faq_subject.
I tried multiple configurations of UNION ALL, LEFT JOIN, LEFT OUTER JOIN statements, but they all return either too many rows, or have different problems.
Is this even possible with MySQL, and is it possible to write an SQL statement which includes #variables and can take care of this?
First off, that kind of a design is problematic. You have certain data embedded within another column, which is going to cause logic as well as performance problems (since you can't index the a_body in such a way that it will help the JOIN). If this is a one-time thing then that's one issue, but otherwise you're going to have problems with this design.
Second, consider this example: You're searching for faq_id #123. You have an article that includes faq_id 4123. You're going to end up with a false match there. You can embed the faq_id values in the text with some sort of mark-up (for example, [faq_id:123]), but at that point you might as well be saving them off in another table as well.
The following query should work (I think that MySQL supports CAST, if not then you might need to adjust that).
SELECT
T.ticket_number,
F.faq_id,
F.faq_subject
FROM
Articles A
INNER JOIN FAQs F ON
A.a_body LIKE CONCAT('%', F.faq_id, '%')
INNER JOIN Tickets T ON
T.ticket_id = A.ticket_id
EDIT: Corrected to use CONCAT
SELECT DISTINCT t.ticket_number, f.faq_id, f.faq_subject
FROM faq.f
INNER JOIN article a ON (a.a_body RLIKE CONCAT('faq_id: ',faq_id))
INNER JOIN ticket t ON (t.ticket_id = a.ticket_id)
WHERE somecriteria