How to create this very complex MySQL query - mysql

I have a serious troubles with creating the proper query for the following:
I have 3 tables.
authors ( author_id(int), author_name(varchar) ),
books ( book_id(int), book_title(varchar) )
contribution ( book_id(int), author_id(int), prec(double)) -- for storing the precentage how deeply an author is involved in creation of a specific book. (so one book may have more then one author)
And the legendary difficulty (for me right now) query has to ask database for book_id, book_title and in a third column all authors of the specified book concatenated with comma and ordered by perecentage of participation. So I have as many rows in the result of the query as many books I have in the books table, and to every book I have to get the title, and authors in a third column. But how can be such mysql query forged?

Little vague on the 3rd table (name) and the join criteria but this should be close...
The key here is the function group_Concat() which allows you to combine multiple rows into one based on the group by values. Additionally the group_Concat function allows you to define a deliminator as well as an order by.
SELECT B.Book_ID
, B.Book_Title
, group_concat(A.Author_name order by ABP.Prec Desc separator ', ') as Authors
FROM Author_Book_Percent ABP
INNER JOIN AUTHORS A
on ABP.Author_ID = A.Author_ID
INNER JOIN BOOKS B
on ABP.Book_ID = B.Book_ID
GROUP BY B.Book_ID, B.Book_Title

Related

SQL: Column Must Appear in the GROUP BY Clause Or Be Used in an Aggregate Function

I'm doing what I would have expected to be a fairly straightforward query on a modified version of the imdb database:
select primary_name, release_year, max(rating)
from titles natural join primary_names natural join title_ratings
group by year
having title_category = 'film' and year > 1989;
However, I'm immediately running into
"column must appear in the GROUP BY clause or be used in an aggregate function."
I've tried researching this but have gotten confusing information; some examples I've found for this problem look structurally identical to mine, where others state that you must group every single selected parameter, which defeats the whole purpose of a group as I'm only wanting to select the maximum entry per year.
What am I doing wrong with this query?
Expected result: table with 3 columns which displays the highest-rated movie of each year.
If you want the maximum entry per year, then you should do something like this:
select r.*
from ratings r
where r.rating = (select max(r2.rating) where r2.year = r.year) and
r.year > 1989;
In other words, group by is the wrong approach to writing this query.
I would also strongly encourage you to forget that natural join exists at all. It is an abomination. It uses the names of common columns for joins. It does not even use properly declared foreign key relationships. In addition, you cannot see what columns are used for the join.
While I am it, another piece of advice: qualify all column names in queries that have more than one table reference. That is, include the table alias in the column name.
If you want to display all the columns you can user window function like :
select primary_name, year, max(rating) Over (Partition by year) as rating
from titles natural
join primary_names natural join ratings
where title_type = 'film' and year > 1989;

Subquery confusion in SQL

Using just a basic SQL program to run small little stuff so the answer should be pretty simple and small. But have database with couple of tables one being labeled:
tAuthors with fAuthorID and fAuthorName, next I have tBooks with fAuthorID , fPubID....etc (thinking only going to be using one of those two). and have tPublishers with fPubID and fPubName.
So what I have been trying to do is list the names of all authors who have a book published by the publisher with ID number 12;list the author names in alphabetical order. I got the alphabetical part down but can seem to get the correct authors names. This is what I got but it is only pulling one author and I believe here are 7 authors total with the ID number 12 attached to them.
SELECT `fAuthorName`, `fAuthorID`
FROM `tAuthors`
WHERE `fAuthorID` IN (
SELECT `fPubID`
FROM `tPublishers`
WHERE `fPubID` = 12
)
ORDER BY `fAuthorName` ASC;
Might be easier to do a join. Authors table connects to Books table by the author id and the books table connects to the publishers table by the publisher id. Once they are all joined you can just filter by pub id and sort.
SELECT `a.fAuthorName`, `a.fAuthorID`
FROM `tAuthors` a
JOIN `tBooks` b ON (a.fAuthorID = b.fAuthorID )
JOIN `tPublishers` p ON (b.fPubID = p.fPubID)
WHERE `p.fPubID` = 12
ORDER BY `a.fAuthorName` ASC;
You can do it with following query using tBooks instead of tPublishers:
SELECT `fAuthorName`, `fAuthorID`
FROM `tAuthors`
WHERE `fAuthorID` IN (
SELECT `fAuthorID`
FROM `tBooks`
WHERE `fPubID` = 12
)
ORDER BY `fAuthorName` ASC;

Avoiding a MySql Loop With Query

I've got three tables.
Table 1 is packages.
Table 2 is package_to_keyword.
Table 3 is keywords.
Packages can be connected to multiple keywords. So if I query package_to_keyword Joining keywords and packages I can search for all packages that relate to a certain keyword. That's the part I understand.
NOW... my question is, how do I retrieve packages that match a LIST of keywords? Right now, in php, I loop a sql statement and then loop through all the results for each keyword and do an array_intersect() to filter down to the packages that show up in all the result sets. However, I'm sure this is bad, because I'm running a query for each keyword when I'm certain SQL is built to handle this type of relationship, I'm just not sure what type of query to perform.
The key is the list of keywords can be any number of words. If I use something like IN ('keyword','array','returns','all','results') I just get a list for all the packages that have a relationship with ANY of the keywords when I just want packages that have a relationship with ALL of the keywords.
Thoughts?
select title
from packages p
inner join pack_to_tag pt on p.index = pt.pIndex
inner join keyworks w on w.index = pt.kindex
where word in ('keyword','array','returns','all','results')
group by title
having count(*) = 5
First, the "PreQuery" (qualified products result) looks only the products joined to keywords that have ANY of the keywords you are looking for, and will ultimately return 1 or more entries. The GROUP BY then confirms however many you ARE EXPECTING... Then join to products for final results.
select
p.*
from
( select ptt.pIndex
from pack_to_tag ptt
join keywords k
on ptt.kindex = k.index
and k.word in ( 'word1', 'word2', 'word3' )
group by
ptt.pIndex
having
count(*) = 3 ) QualifiedProducts
join Products p
on QualifiedProducts.pIndex = p.index

MySQL search for item with all tags

I'm working on a search engine for an online library, but I'm kind of stuck here. When searching for tags, OR searches (ie books with "tag1" OR "tag2") work fine, but the AND searches are giving me some trouble.
The tables (and their columns) I use for this are:
books | book_id, other_info
tagmap | map_id, book_id, tag_id
tags | tag_id, tag_text
Since a bunch of other search options can be en/disabled by the user, the query is generated by PHP. When searching for books with the tags "tag1" AND "tag2", the following query is generated:
SELECT DISTINCT b.book_id, b.other_info
FROM books b, tagmap tm, tags t
WHERE b.book_id = "NA"
OR ( (t.tag_text IN ("tag1", "tag2"))
AND tm.tag_id = t.tag_id
AND b.book_id = tm.book_id )
HAVING COUNT(tm.book_id)=2
The WHERE line (which doesn't give any results) is there so that additional parameters may be strung to the query more easily. I know this can be handled a lot nicer, but for now that doesn't matter.
When doing an OR search (same query but without the HAVING COUNT line), it returns the two books in the database that have either of those tags, but when searching for the one book in the database that has BOTH tags, it returns nothing.
What's wrong with the query? Is this not the/a way to do it? What am I overlooking?
Thanks!
EDIT: As per request, the data from each table relating to the book that should be returned:
books table:
book_id 110
tagmap table:
book_id 110 110
tag_id 15 16
tags table:
tag_id 15 16
tag_text tag1 tag2
SOLUTION: All I had to do was include
GROUP BY b.book_id
before the HAVING COUNT line. Simple as that. The answer provided by taz is also worth looking into, especially if you're aiming for optimising your search queries.
The comma separated list of tables in your FROM clause functions like an inner join, so your query is selecting all of the rows in the tagmaps table and the tags table that have the same tag ID, and of those rows, all of the rows from the books table and the tagmaps table that have the same book ID. The HAVING clause then requires that two rows be returned from that result set with the same book ID. There can only be one row in the books table with any given book ID (assuming book ID is the primary key of the books table), so this condition is never met.
What you want is a join without the books table. You are looking for the same book ID appearing twice in the results of the OR clauses (I believe), so you don't want to join the books table with those results because that will ensure you can never have the same book ID in the results more than once.
Edit: conceptually, you are essentially combining two different things. You are looking for tags and tagmaps for the same book, and you are also getting the book info from each of those books. So you are actually pulling duplicate other_info data for every instance of the same book ID in the tagmaps table, and then using the distinct clause to reduce that duplicate data down to one row, because all you want is the book ID and other_info. I would consider using two queries or a subquery to do this. There may be other [better] ways as well. I'd have to play around with it to figure it out.
For starters, try
SELECT DISTINCT tm.book_id, b.other_info
FROM tagmap tm inner join tags t
on tm.tag_id = t.tag_id
left join books b
on tm.book_id = b.book_id
HAVING count(tm.book_id) = 2
SELECT book_id FROM tagmap JOIN tags ON (tag_id) WHERE tag_text = "tag1"
INTERSECT
SELECT book_id FROM tagmap JOIN tags ON (tag_id) WHERE tag_text = "tag2"
Wrap this whole thing as a sub query to select other book info you need:
SELECT book_id, other_info FROM books WHERE book_id IN
(
...
)
Ok, look like I did complex solution for mysql database (should be fine for any other). So, database structure is this:
source (id)
tag(id)
tags(source, tag)
Cover this requirements:
Fetch sources with ANY of selected tags OR with ALL selected tags
Fetch sources which did not contain excluded tags
Query here:
SELECT source.id
FROM source
-- optional join, if you need to include sources with tags:
LEFT JOIN tags tags_include ON source.id = tags_include.source
-- optional join, if you need to exclude sources with tags
LEFT JOIN tags tags_exclude ON source.id = tags_exclude.source
-- here list of excluded tags
and tags_exclude.tag in(1)
WHERE
-- optional condition, tags which you need to include
tags_include.tag IN (2, 3)
-- optional condition, which will exlcude sources with excluded tags
and tags_exclude.source is null
GROUP BY source.id
-- optional having, in case when you include tags with "AND" strategy
-- count should be equal to count of selected tags
-- if you fetch sources with "OR" strategy, ignore this having
HAVING count(1) >= 2
ORDER BY source.id DESC
LIMIT 0, 50;
So, this query will fetch 50 newest sources, which have tag with id 2 and 3, and which have no tag with 1.

mysql derived tables, performance, alternative

I have the following tables,
link_books_genres, *table structure -> book_id,genre_id*
genres, *table structure -> genre_id,genre_name*
Given a set of book_ids, I want to form the following result,
result_set structure -> genre_id, genre_name, count(book_id).
I wrote this query,
SELECT one.genre_id,
one.genre_name,
two.count
FROM genres as one,(SELECT genre_id,
count(book_id) as count
FROM link_f2_books_lists GROUP BY genre_id) as two
WHERE one.genre_id = two.genre_id;
I don't know if that's the best solution, but I want this to be optimized if possible or if it is well formed, validated.
P.S. It's done with ruby on rails, so any rails oriented approach would also be fine.
Your query is not using the SQL-92 JOIN syntax but the older implicit join syntax. It's time (20 years now), you should start using it.
It's also not very good to use keywords like COUNT for aliases. You could use cnt or book_count instead:
SELECT one.genre_id,
one.genre_name,
two.cnt
FROM
genres AS one
INNER JOIN
( SELECT genre_id,
COUNT(book_id) AS cnt
FROM link_f2_books_lists
GROUP BY genre_id
) AS two
ON one.genre_id = two.genre_id ;
MySQL usually is a bit faster with COUNT(*), so if book_id cannot be NULL, changing COUNT(book_id) to COUNT(*) will be a small performance improvement.
Off course you can rewrite the Join without the derived table:
SELECT one.genre_id,
one.genre_name,
COUNT(*) AS cnt
FROM
genres AS one
INNER JOIN
link_f2_books_lists AS two
ON one.genre_id = two.genre_id
GROUP BY one.genre_id ;
In both versions, you can change INNER JOIN to LEFT OUTER JOIN in order genres without any books (0 count) to be shown. But then do use COUNT(two.book_id) and not COUNT(*), for correct results.
The above versions (and yours) will not include those genres (that's one good reason to use the JOIN syntax, the change needed is very simple. Try that with your WHERE version!)
The LEFT JOIN versions can also be written like this:
SELECT one.genre_id,
one.genre_name,
( SELECT COUNT(*)
FROM link_f2_books_lists AS two
WHERE one.genre_id = two.genre_id
) AS cnt
FROM
genres AS one ;
Regarding performance, there is nothing better than testing yourself. It all depends on the version of MySQL you use (newer versions will have better optimizer that can select through more options to create an execution plan and possibly it will identify different versions as equivalent), the size of your tables, the indexes you have, the distribution of the data (how many different genres? how many books per genre on average? etc), your memory (and other MySQL) settings and probably many other factors that I'm forgetting now.
An advice is that an index on (genre_id, book_id) will be useful in most cases, for all the versions.
As a general advice, it's usually good to have both a (genre_id, book_id) and a (book_id, genre_id) index on the many-to-many table.
SELECT one.genre_id, one.genre_name, count(two.book_id)
FROM genres as one, link_books_genres as two
WHERE one.genre_id=two.genre_id
GROUP BY genre_id