Can not determine what the WHERE clause should be - mysql

I'm stuck with creating a MySQL query. Below is my database structure.
authors (author_id and author_name)
books (book_id and book_title)
books_authors is the link table (book_id and author_id)
Result of all books and authors:
I need to get all the books for certain author, but if a book has 2 authors the second one must be displayed also. For example the book "Good Omens" with book_id=2 has two authors. When I run the query I get the books for the author_id=1 but I can not include the second author - "Neil Gaiman" in the result. The query is:
SELECT * FROM books
LEFT JOIN books_authors
ON books.book_id=books_authors.book_id
LEFT JOIN authors
ON books_authors.author_id=authors.author_id
WHERE books_authors.author_id=1
And below is the result:

You need to change the WHERE clause to execute a subselect like this:
SELECT b.*, a.*
FROM books b
LEFT JOIN books_authors ba ON ba.book_id = b.book_id
LEFT JOIN authors a ON a.author_id = ba.author_id
WHERE b.book_id IN (
SELECT book_id
FROM books_authors
WHERE author_id=1)
The problem with your query is that the WHERE clause is not only filtering the books you are getting in the result set, but also the book-author associations.
With this subquery you first use the author id to filter books, and then you use those book ids to fetch all the associated authors.
As an aside, I do think that the suggestion to substitute the OUTER JOINs with INNER JOINs in this specific case should apply. The first LEFT OUTER JOIN on books_authors is certainly useless because the WHERE clause guarantees that at least one row exists in that table for each selected book_id. The second LEFT OUTER JOIN is probably useless as I expect the author_id to be primary key of the authors table, and I expect the books_authors table to have a foreign key and a NOT NULL constraint on author_id... which all means you should not have a books_authors row that does not reference a specific authors row.
If this is true and confirmed, then the query should be:
SELECT b.*, a.*
FROM books b
JOIN books_authors ba ON ba.book_id = b.book_id
JOIN authors a ON a.author_id = ba.author_id
WHERE b.book_id IN (
SELECT book_id
FROM books_authors
WHERE author_id=1)
Notice that INNER JOINs may very well be more efficient than OUTER JOINs in most cases (they give the engine more choice on how to execute the stament and fetch the result). So you should avoid OUTER JOINs if not strictly necessary.
I added aliases and removed the redundant columns from the result set.

You don't need a subquery for this:
SELECT *
FROM book_authors ba
JOIN books b
ON b.book_id = ba.book_id
JOIN book_authors ba2
ON ba2.book_id = b.book_id
JOIN authors a
ON a.author_id = ba2.author_id
WHERE ba.author_id = 1

You're pretty close... basically you need to identify all unique book ids for which author_id = ?. Then join that with the book_author table again to get all of the authors associate with those book ids. Then join to books and authors to get your book and author names.
Hopefully the following is very clear in this regard, but if it's not just let me know and I'll help explain it in more detail
SELECT a.*, d.* FROM books as a
INNER JOIN (SELECT book_id FROM books_authors WHERE author_id=?) as b
ON a.book_id=b.book_id
INNER JOIN books_authors as c
ON b.book_id=c.book_id
INNER JOIN authors AS d
ON d.author_id = c.author_id
Btw you could also structure this with a WHERE EXISTS clause. I don't think you'll see much of a performance difference either way, but just FYI you can try that if need be. Use EXPLAIN to view the execution plan for the query. If it's problematic, there are other ways to skin this cat.
Also, make sure you pay attention to indices. Whether you use the method here, or the method described by Frazz, a compound/mutli-column/complex index may make a big difference for you. That is, consider indexing books_authors by both book_id and by (author_id, book_id). Whether you should use an additional join or an IN or an EXISTS subquery... lots of ways to skin the cat. No matter what, though, having a multicolumn index on books_authors is likely to help you out, especially if this table is large

Related

How to select all the authors from database with the number of books assigned to them?

I've the following DB structure:
Authors(id,name);
Books(id,title,authorId);
I want to select all fields from authors and the number of books they are assigned to. I've managed to get the result, but only for the authors that are assigned to at least one book, which is not what I want. I tried with the following query:
SELECT books.*,authors.*
FROM authors
FULL OUTER JOIN books
ON authors.id = books.authorId;
but it doesn't work.
I guess that you want a left join and aggregation:
select a.id, a.name, count(*)
from authors a
left join books b on b.authorId = a.id
group by a.id, a.name
outer join will bring back authors without books. Instead use inner join and your results will only bring back authors with at least 1 book.
I would recommend a correlated subquery:
SELECT a.*,
(SELECT COUNT(*)
FROM books b
WHERE a.id = b.authorId
) as num_books
FROM authors;
This allows you to use SELECT a.* from authors. If you put a GROUP BY in the outer query, you either need to list all the columns separately or be using a database that allows you to aggregate by a primary key, while selecting other columns (this is standard functionality but most databases do not support it).
Definitely you need LEFT JOIN and GROUP BY, but details is not clear enough from the task description. Let's try a kind of
SELECT b.*, ab.count
FROM authors AS a
LEFT JOIN (
SELECT authorId, COUNT(*) AS count
FROM books
GROUP BY authorId
) AS ab ON a.id = ab.authorId;
also, if you don't want to get NULL for some authors, you can apply such expression:
IFNULL(ab.count, 0) AS count

MYSQL: Select Query with multiple values from one column

i am currently working with a MYSQL-Database whichhas three tables:
Books, Keywords and KeywordAssignment.
The tables Books and Keywords are in a many to many relationship therefore the table KeywordAssignment.
Now to my question: I want to search for a book with multiple (max: up to ten) keywords.
I've already tried a self join:
SELECT BookID
FROM Keywords K1 INNER JOIN
Keywords K2
ON K1.KeywordAssignmentID=K2.KeywordAssignmentID INNER JOIN
KeywordAssignment
ON KeywordAssignment.KeywordAssignmentID=K1.KeywordAssignmentID INNER JOIN
Books
ON KeywordAssignment.BookID=Books.BookID
WHERE K1.Keyword='Magic' AND K2.Keyword='Fantasy'
The problem is it only works if the given Keyword are in the right order. If they aren't there are more than one.
I appreciate your help thank you very much!
You need to GROUP BY BookID and a HAVING clause with the condition that both keywords are linked to that BookID:
SELECT b.BookID, b.Title
FROM Books b
INNER JOIN KeywordAssignment ka ON ka.BookID = b.BookID
INNER JOIN Keyword k ON k.KeywordID = ka.KeywordID
WHERE k.Keyword IN ('Magic', 'Fantasy')
GROUP BY b.BookID, b.Title
HAVING COUNT(DISTINCT k.Keyword) = 2
This code will return books that are linked to both 'Magic' and 'Fantasy'.
If you want either of the 2 keywords then remove the HAVING clause.
If I understand your question correctly, you want to query for books that have multiple key words. The key word there is have. I don't have MYSQL but the query should look something like this:
SELECT B.BookID, COUNT(*) as NumberOfKeywords FROM Books B
INNER JOIN KeywordAssignment KA
ON B.BookID = KA.BookID
INNER JOIN Keywords K
ON KA.KeywordID = K.KeywordID
GROUP BY B.BookID
HAVING NumberOfKeywords > 0 AND NumberOfKeywords <= 10
What we are doing is grouping by each book and then selecting the ones that have more than 0 keywords and less than 10.

MySQL Join with looped reference query

I have a slightly complex table structure that I'm trying to query for a search function, but my queries keep timing out. Basically, it's a book search, and I'm focusing on the subject portion of that search.
The subjects table is simple (id and title), but there's a link table that refers it back to itself called subjects_subjects, which complicates things.
**subjects_subjects**
id (key)
subject_id (reference to subjects table)
see_subject_id (another reference to subjects table)
The reason for the looping reference is to catch subjects that don't contain any books, but point to subjects that do. For example, there's no books under the 'Travel' subject, so that subject has a link to 'Explorers' and 'Voyages' that do contain books. The point is to make searching easier.
So what I'm trying to do is allow the user to search for 'Travel', but return results from 'Explorers' and 'Voyages'. Here's my query that times out:
SELECT
BK.id,
BK.title
FROM
books BK
LEFT OUTER JOIN
books_subjects BS
ON BS.book_id = BK.id
WHERE
BS.subject_id IN (1639,3173)
OR BS.subject_id IN
(
SELECT
SS.see_subject_id
FROM
subjects_subjects SS
WHERE
SS.subject_id IN (1639,3173)
)
GROUP BY
BK.books_id
Extra info: There are 17000 books and over 3000 subjects in the database, with roughly 84000 book/subject references.
Can anyone help me figure out where am I going wrong here?
You're doing two things that MySQL optimizes poorly:
OR in the WHERE clause.
IN (SELECT ...)
Instead of OR, use two queries that you combine with OR. And instead of IN (SELECT ...) use a JOIN.
Also, you shouldn't use LEFT JOIN if you don't need to return rows from the first table with no matches in the second table, use INNER JOIN.
SELECT b.id, b.title
FROM books AS b
JOIN books_subjects AS bs ON bs.book_id = b.id
WHERE bs.subject_id IN (1639, 3173)
UNION
SELECT books AS b
JOIN books_subjects AS bs ON bs.book_id = b.id
JOIN subjects_subjects AS ss ON bs.subject_id = ss.see_subject_id
WHERE ss.subject_id IN (1639, 3173)

Subquery returns more than one row error result from mysql

I have a question, I am trying to go through a data base and display all the books by an author based on a search where I use an author name to get isbn and then find all the details of that isbn...well if the author wrote one book it is displaying one row but when the author wrote more than one book it is giving me an error...what am I doing wrong can you please help...Here is my code.
SELECT* FROM books WHERE isbn=(SELECT isbn FROM books_authors
WHERE author_id IN
(SELECT author_id FROM authors WHERE first_name ="J.K."))
Change to:
SELECT* FROM books WHERE isbn in (SELECT isbn FROM books_authors
WHERE author_id IN
(SELECT author_id FROM authors WHERE first_name ="J.K."))
You can't have isbn=(subquery) if the subquery returns multiple results.
Try:
SELECT * FROM books
WHERE isbn IN (SELECT isbn FROM books_authors
WHERE author_id IN (SELECT author_id
FROM authors
WHERE first_name ="J.K."
)
) ;
Note that the outmost select should also have WHERE isbn IN....
NEVER use nested sub-queries when you can get the desired result using a join, particularly on MySQL:
SELECT b.*
FROM books b
INNER JOIN books_authors ba
ON b.isbn=ba.isbn
INNER JOIN authors a
ON ba.author_id=a.author_id
WHERE a.first_name ="J.K.";
While you can use 'IN' inplace of '=' in your original query, the optimizer won't be able to do much with the query, it is inflexible and difficult to maintain.
You can do this query avoinding subselect and in clause (using join)
SELECT * FROM books
INNER JOIN books_authors on books_authors.isbn = books.isbn
INNER JOIN authors on authors.author_id = books_authors.author_id
WHERE authors.first_name = J.K.";
Anyway your error happen because you are using = instead of in for the first part of your quert SELECT* FROM books WHERE isbn=(SELECT isbn FROM ... i fyou want use your query you should use SELECT* FROM books WHERE isbn in (SELECT isbn FROM .....

MySQL: how to get result from 2 tables without repeating results?

I've got 3 tables: book, publisher, book_category
For a particular book category (fantasy) I have to display list of publisher names supplying that genre.
publisher_name and category_name are linked through book table, so my query is:
SELECT publisher.publisher_name
FROM publisher, book, book_category
WHERE publisher.publisher_id = book.publisher_id
AND book.category_id = book_category.category_id
AND category_name = 'fantasy';
But the result I'm getting is repeating the name of publisher if there's more than one fantasy book supplied by that publisher.
Let's say I've got The Hobbit and The Lord of the Rings,both are fantasy and are supplied by the same PublisherA.
In that case the result of my query is:
PublisherA
PublisherA
Is it possible to get that result just once? Even if there's much more than 2 fantasy books
published by the same publisher?
Just use distinct if you only need publisher_name
SELECT distinct publisher.publisher_name
by the way, try to use JOIN syntax... to join tables
SELECT distinct p.publisher_name
FROM publisher p
join book b on b.publisher_id = p.publisher_id
join book_Category bc on bc.category_id = b.category_id
where bc.category_name = 'fantasy'
Use DISTINCT
SELECT DISTINCT publisher.publisher_name
FROM publisher, book, book_category
WHERE publisher.publisher_id = book.publisher_id
AND book.category_id = book_category.category_id
AND category_name = 'fantasy';
Try adding this to the end of the query: GROUP BY publisher.publisher_name
Everyone is mentioning DISTINCT, which is correct (better than GROUP BY in MySQL, because of the way the optimizer is set up), but I figured I would also add a modification for performance enhancements.
Currently you have implicit cross joins to get to the other tables, and making these explicit INNER JOINs will increase efficiency because of the order of filtering. Example:
SELECT DISTINCT Publisher.publisher_name
FROM publisher Publisher
INNER JOIN book Book ON Publisher.publisher_id = Book.publisher_id
INNER JOIN book_category Book_Category ON Book.category_id = Book_Category.category_id
WHERE Book_Category.category_name = 'fantasy';
In the original query, you bring in the complete record set of all three tables (publisher, book, book_category), and then from that set you join on the respective keys, and then return the result set. In this new query, your join to Book_Category happens based only upon the record set returned from the join between Publisher and Book. If there is filtering that happens based on this join, you will see a performance increase.
You also have the added benefit of being ANSI-compliant, as well as explicit coding to improve ease of maintenance.