Combining LIKE and EXISTS? - mysql

Here is the database I'm using: https://drive.google.com/file/d/1ArJekOQpal0JFIr1h3NXYcFVngnCNUxg/view?usp=sharing
Find the papers whose title contain the string 'data' and where at least one author is
from the department with deptnum 100. List the panum and title of these papers. You
must use the EXISTS operator. Ensure your query is case-insensitive.
I'm unsure how to output the total number of papers for each academic.
My attempt at this question:
SELECT panum, title
FROM department NATURAL JOIN paper
WHERE UPPER(title) LIKE ('%data%') AND EXISTS (SELECT deptnum FROM
department WHERE deptnum = 100);
This seems to come up empty. I'm not sure what I'm doing wrong, can LIKE and EXISTS be combined?
Thank you.

Don't use natural join! It is an abomination because it does not make use of explicitly declared foreign key relationships. Explicitly list your join keys, so the queries are more understandable and more maintainable.
That said, your subquery is the problem. I would expect a query more like this:
SELECT p.panum, p.title
FROM paper p
WHERE lower(p.title) LIKE '%data%' AND
EXISTS (SELECT 1
FROM authors
WHERE a.author = p.author AND -- or whatever the column should be
a.deptnum = 100
);

Since they are requiring EXISTS, the operator needs to be applied to author, not department table. The query inside EXISTS needs to be correlated with the query on papers, so there should be no JOIN on the top level:
SELECT p.PANUM, p.TITLE
FROM paper p
WHERE p.Title LIKE ('%data%') AND EXISTS (
SELECT *
FROM author a
JOIN academic ac ON ac.ACNUM=a.ACNUM
WHERE a.PANUM=p.PANUM AND ac.DEPTNUM=100
)
Note that since author table lacks DEPTNUM, you do need a join inside the EXISTS query to bring in a row of academic for its DEPTNUM column.

The phrase UPPER(title) LIKE ('%data%') is never going to find any rows, since an uppercase version of whatever is in title will never contain the lowercase letters data.

select p.TITLE,p.PANUM from PAPER p where TITLE like '%data%'
AND EXISTS(
SELECT * FROM AUTHOR a join ACADEMIC d
on d.ACNUM=a.ACNUM where d.DEPTNUM=100 AND a.PANUM=p.PANUM)

Related

Using the WHERE clause in conjunction with NATURAL JOIN SQL?

I am reading a book on SQL and I am stuck on an example which is related to a database schema as shown below in the image.
The example below solves the query as stated in the book :
Suppose we wish to answer the query “List the names of instructors
along with the titles of courses that they teach.” The query can be written in
SQL as follows:
select name , title
from instructor natural join teaches , course
where teaches.course id = course.course id;
Now the book states that
" Note that teaches.course id
in the where clause refers to the course id field of the natural join result, since this field in turn came from the teaches relation. "
Again the book states in BOLD that :
"It is not possible to use attribute names containing the original relation names, for instance instructor.name or teaches.course id, to refer to attributes in the natural join result; we can, however, use attribute names such as
name and course id, without the relation names."
(Refering to the query above)If it is not possible then how come the author was able to write the query as
teaches.course id = course.course id
How can teaches.course refer to the natural join attribute "Course" , the author has so ambiguously put forth his arguments.Please explain me the author's point of view.
Ignore what the book has to say about NATURAL JOIN. Just avoid it. NATURAL JOIN is a bug waiting to happen. Why? The join keys are defined simply by naming conventions on columns in the tables -- any columns that happen to have the same names are used. In fact, NATURAL JOIN ignores properly defined FOREIGN KEY relationships; and they hide the keys actually used for matching.
So, be explicit and use the ON or USING clauses instead. These are explicit about the keys being used and the code is more understandable and maintainable.
Then, follow a simple rule: Never use commas in the FROM clause; always use explicit JOIN syntax.
So, a good way to write your query would be something like this:
select i.name, c.title
from instructor i inner join
teaches t
on t.instructor_id = i.instructor_id inner join
course c
on t.course_id = i.course_id;
Note that there is no where clause and all the columns are qualified, meaning that they specify the table they are coming from.
Also, I don't see an instructor_id column in the teaches table, so this is just an example of what reasonable code would look like.

How to create this very complex MySQL query

I have a serious troubles with creating the proper query for the following:
I have 3 tables.
authors ( author_id(int), author_name(varchar) ),
books ( book_id(int), book_title(varchar) )
contribution ( book_id(int), author_id(int), prec(double)) -- for storing the precentage how deeply an author is involved in creation of a specific book. (so one book may have more then one author)
And the legendary difficulty (for me right now) query has to ask database for book_id, book_title and in a third column all authors of the specified book concatenated with comma and ordered by perecentage of participation. So I have as many rows in the result of the query as many books I have in the books table, and to every book I have to get the title, and authors in a third column. But how can be such mysql query forged?
Little vague on the 3rd table (name) and the join criteria but this should be close...
The key here is the function group_Concat() which allows you to combine multiple rows into one based on the group by values. Additionally the group_Concat function allows you to define a deliminator as well as an order by.
SELECT B.Book_ID
, B.Book_Title
, group_concat(A.Author_name order by ABP.Prec Desc separator ', ') as Authors
FROM Author_Book_Percent ABP
INNER JOIN AUTHORS A
on ABP.Author_ID = A.Author_ID
INNER JOIN BOOKS B
on ABP.Book_ID = B.Book_ID
GROUP BY B.Book_ID, B.Book_Title

Is INTERSECT preferred over subquery?

I am working on question
Find all students who do not appear in the Likes table (as a student who likes or is liked) and return their names and grades. Sort by grade, then by name within each grade.
I proposed doing the following, getting all people who don't have Likes and intersecting those with the people who don't like anyone:
SELECT name, grade
FROM Highschooler h1
LEFT JOIN Likes l1
ON (l1.ID1 = h1.ID)
WHERE l1.ID1 IS NULL
INTERSECT
SELECT name, grade
FROM Highschooler h1
LEFT JOIN Likes l1
ON (l1.ID2 = h1.ID)
WHERE l1.ID2 IS NULL
ORDER BY grade, name
An alternative way to do this is with a subquery (as I found online)
select name, grade from Highschooler H1
where H1.ID not in (select ID1 from Likes union select ID2 from Likes)
order by grade, name;
Which way is preferred? I think my method is more readable.
Like Adrien said, both are acceptable ways of doing it,
I think your way is more readable because of the capitalized functions and better indentation and not because of the query itself.
This is how i would do it for example:
SELECT name, grade
FROM Highschooler H1
LEFT JOIN Likes AS L1 ON L1.ID1 = H1.ID OR L1.ID2 = H1.ID
WHERE L1.ID1 IS NULL
ORDER BY grade, name
This case would be called an Anti-Join. Read more about those here
https://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/
Here is another nice blog post about the different kind of joins with some diagrams to accompany them:
http://blog.jooq.org/2015/10/06/you-probably-dont-use-sql-intersect-or-except-often-enough/
our case would be
No, your query is not very readable. Read it in plain English: 1. Make a large list of all pupils and their likes. 2. If no likes record is found for a pupil, keep the pupil record nontheless. 3. Then remove all records that do contain likes (thus only keeping those pupils who don't have likes). 4.-6. Do the same with the other person in the likes table. 7. Intersect the two result sets.
The original task was only: Find pupils that don't exist in the likes table.
select name, grade
from highschooler
where not exists
(
select *
from likes
where likes.id1 = highschooler.id
or likes.id2 = highschooler.id
)
order by grade, name;
Keep your queries as simple as possible. SQL is made to read more or less like you would word the task in English. It's not always possible to formulate a task in such simple form, but you can always try :-)
The anti-join pattern you are using in your intersect query is a trick to overcome weaknesses in young DBMS that don't deal well with IN and EXISTS yet. You should use tricks only when necessary, not when dealing with such a simple task as the one given.

Efficiently selecting from many-to-many relation in H2

I'm using H2, and I have a database of books (table Entries) and authors (table Persons), connected through a many-to-many relationship, itself stored in a table Authorship.
The database is fairly large (900'000+ persons and 2.5M+ books).
I'm trying to efficiently select the list of all books authored by at least one author whose name matches a pattern (LIKE '%pattern%'). The trick here is that the pattern should severly restrict the number of matching authors, and each author has a reasonably small number of associated books.
I tried two queries:
SELECT p.*, e.title FROM (SELECT * FROM Persons WHERE name LIKE '%pattern%') AS p
INNER JOIN Authorship AS au ON au.authorId = p.id
INNER JOIN Entries AS e ON e.id = au.entryId;
and:
SELECT p.*, e.title FROM Persons AS p
INNER JOIN Authorship AS au ON au.authorId = p.id
INNER JOIN Entries AS e ON e.id = au.entryId
WHERE p.name like '%pattern%';
I expected the first one to be much faster, as I'm joining a much smaller (sub)table of authors, however they both take as long. So long in fact that I can manually decompose the query into three selects and find the result I want faster.
When I try to EXPLAIN the queries, I observe that indeed they are very similar (a full join on the tables and only then a WHERE clause), so my question is: how can I achieve a fast select, that relies on the fact that the filter on authors should result in a much smaller join with the other two tables?
Note that I tried the same queries with MySQL and got results in line with what I expected (selecting first is much faster).
Thank you.
OK, here is something that finally worked for me.
Instead of running the query:
SELECT p.*, e.title FROM (SELECT * FROM Persons WHERE name LIKE '%pattern%') AS p
INNER JOIN Authorship AS au ON au.authorId = p.id
INNER JOIN Entries AS e ON e.id = au.entryId;
...I ran:
SELECT title FROM Entries e WHERE id IN (
SELECT entryId FROM Authorship WHERE authorId IN (
SELECT id FROM Persons WHERE name LIKE '%pattern%'
)
)
It's not exactly the same query, because now I don't get the author id as a column in the result, but that does what I wanted: take advantage of the fact that the pattern restricts the number of authors to a very small value to search only through a small number of entries.
What is interesting is that this worked great with H2 (much, much faster than the join), but with MySQL it is terribly slow. (This has nothing to do with the LIKE '%pattern%' part, see comments in other answers.) I suppose queries are optimized differently.
SELECT * FROM Persons WHERE name LIKE '%pattern%' will always take LONG on a 900,000+ row table no matter what you do because when your pattern '%pattern%' starts with a % MySql can't use any indexes and should do a full table scan. You should look into full-text indexes and function.
Well, since the like condition starts with a wildcard it will result in a full table scan which is always slow, no internal caching can take place.
If you want to do full text searches, mysql is not the best bet you have. Look into other software (solr for instance) to solve this kind of problems.

Help me figure out a MySQL query

These are tables I have:
Class
- id
- name
Order
- id
- name
- class_id (FK)
Family
- id
- order_id (FK)
- name
Genus
- id
- family_id (FK)
- name
Species
- id
- genus_id (FK)
- name
I'm trying to make a query to get a list of Class, Order, and Family names that does not have any Species under them. You can see that the table has some form of hierarchy from Order all the way down to Species. Each table has Foreign Key (FK) that relates to the immediate table above itself on the hierarchy.
Trying to get this at work, but I am not doing so well.
Any help would be appreciated!
Meta-answer (comment on the two previous answers):
Using IN tends to degrade to something very like an OR (a disjunction) of all terms in the IN. Bad performance.
Doing a left join and looking for null is an improvement, but it's obscurantist. If we can say what we mean, let's say it in a wau that's clossest to how we'd say it in natural language:
select f.name
from family f left join genus g on f.id = g.family_id
WHERE NOT EXISTS (select * from species c where c.id = g.id);
We want where something doesn't exist, so if we can say "where not exists" all the better. And, the select * in the subquery doesn't mean it's really bringing back a whole row, so it's not an "optimization" to replace select * with select 1, at least not on any modern RDBMS.
Further, where a family has many genera (and in biology, most families do), we're going to get one row per (family, genus) when all we care about is the family. So let's get one row per family:
select DISTINCT f.name
from family f left join genus g on f.id = g.family_id
WHERE NOT EXISTS (select * from species c where c.id = g.id);
This is still not optimal. Why? Well it fulfills the OP's requirement, in that it finds "empty" genera, but it fails to find families that have no genera, "empty" families. Can we make it do that too?
select f.name
from family f
WHERE NOT EXISTS (
select * from genus g
join species c on c.id = g.id
where g.id = f.id);
We can even get rid of the distinct, because we're not joining family to anything. And that is an optimization.
Comment from OP:
That was a very lucid explanation. However, I'm curious as to why using IN or disjunctions is bad for performance. Can you elaborate on that or point me to a resource where I can learn more about the relative performance cost of different DB operations?
Think of it this way. Say that there was not IN operator in SQL. How would you fake an IN?
By a series of ORs:
where foo in (1, 2, 3)
is equivalent to
where ( foo = 1 ) or ( foo = 2 ) or (foo = 3 )
Ok, you say, but that still doesn't tell me why it's bad. It's bad because there's often no decent way to use a key or index to look this up. So what you get is either a) a table scan, where for each disjunction (or'd predicate or element of an IN list), the row gets tested, until a test is true or the list is exhausted. Or b) you get a table scan for each of these disjunctions. The second case (b) may actually be better, which is why you sometimes see a select with an OR turned into one select for each leg of the OR union'd together:
select * from table where x = 1 or x = 3 ;
select * from table where x = 1
union select * from table where x = 3 ;
Now this is not to say you can never use an OR or an IN list. And in some cases, the query optimizer is smart enough to turn an IN list into a join -- and the other answers you were given are precisely the cases where that's most likely.
But if we can explicitly turn our query into a join, well, we don't have to wonder if the query optimizer is smart. And in general, joins are what the databse is best at doing.
Well, just giving this a quick and dirty shot, I'd write something like this. I spend most of my time using Firebird so the MySQL syntax may be a little different, but the idea should be clear
select f.name
from family f left join genus g on f.id = g.family_id
left join species s on g.id = species.genus_id
where ( s.id is null )
if you want to enforce there being a genus then you just remove the "left" portion of the join from family to genus.
I hope I'm not misunderstanding the question and thus, leading you down the wrong path. Good luck!
edit: Actually, re-reading this I think this will just catch families where there's no species within a genus. You could add a " and ( g.id is null )" too, I think.
Sub-select to the rescue...
select f.name from family as f, genus as g
where
f.id == g.family_id and
g.id not in (select genus_id from species);
SELECT f.name
FROM family f
WHERE NOT EXISTS (
SELECT 1
FROM genus g
JOIN species s
ON g.id = s.genus_id
WHERE g.family_id = f.id
)
Note, than unlike pure LEFT JOIN solutions, this is more efficient.
It does not select ALL rows filtering out those with NOT NULL values, but instead selects at most one row from genus and species.