Select matching pairs of attributes from related table - mysql

I have two tables, one is books, the other is editions. Each edition has an attribute like "hardback" or "paperback" in a column "edtype". There can be multiple edition rows related to each book row.
select books.id, books.name, editions.id, editions.edtype from books
join editions on books.id=editions.bookid;
would return something like:
1 name1 1 hardback
1 name1 2 paperback
2 name2 3 paperback
3 name3 4 hardback
What I want to be able to do is find books that have (1) both a hardback and paperback edition:
1 name1
(2) A paperback but not a hardback edition
2 name2
(3) A hardback but not a paperback edition
3 name3
I have the following query, which I think does (1), but I'm not quite convinced (because the number of rows it returns appears too low):
select id, name
from books
left join editions e1 on ( books.id = e1.bookid and e1.edtype = 'paperback')
left join editions e2 on ( books.id = e2.bookid and e2.edtype = 'hardback')
where e1.edtype is not null
and e2.edtype is not null
group by b_id
;
While I'm looking at just two attributes right now, it would be nice to be able to expand that to multiple matching attributes in the editions table.
Any help gratefully received.

Let's start with your query, improved a bit to use table aliases and have distinct column names:
select b.id as book_id, b.name, e.id as edition_id, e.edtype
from books b join
editions e
on b.id = e.bookid;
You want to find books, so that suggests aggregation by books. Then we need to do something to understand the edtype. Try this:
select b.id as book_id, b.name,
(case when min(edtype) = max(edtype)
then min(edtype)
else 'Both'
end) as EdTypes
from books b join
editions e
on b.id = e.bookid;
group by b.id, b.name;

Following Gordon's code above I put in a couple of sub queries, and I think this does what I need (at least if I am only comparing two attributes):
select * from (
select books.name, bookid, (case when min(edtype) = max(edtype)
then min(edtype)
else 'Both'
end) as EdTypes
from
(select * from editions where (edtype = 'paperback' or edtype = 'hardback'))
as e2
join books on e2.bookid = books.id
group by bookid
) as titlelist
where edtypes = 'Both';
So, first select rows that are exclusively the ones you are interested in - so you have a min and max that makes sense. Then process with the case statement, join to books and eliminate duplicates. Then select each type in turn to get a count.

Related

Understanding use of multiple SUMs with LEFT JOINS in mysql

Using the GROUP BY command, it is possible to LEFT JOIN multiple tables and still get the desired number of rows from the first table.
For example,
SELECT b.title
FROM books `b`
LEFT JOIN orders `o`
ON o.bookid = b.id
LEFT JOIN authors `a`
ON b.authorid = a.id
GROUP BY b.id
However, since behind the scenes MYSQL is doing a cartesian product on the tables, if you include more than one SUM command you get incorrect values based on all the hidden rows. (The problem is explained fairly well here.)
SELECT b.title,SUM(o.id) as sales,SUM(a.id) as authors
FROM books `b`
LEFT JOIN orders `o`
ON o.bookid = b.id
LEFT JOIN authors `a`
ON b.authorid = a.id
GROUP BY b.id
There are a number of answers on SO about this, most using sub-queries in the JOINS but I am having trouble applying them to this fairly simple case.
How can you adjust the above so that you get the correct SUMs?
Edit
Example
books
id|title|authorid
1|Huck Finn|1
2|Tom Sawyer|1
3|Python Cookbook|2
orders
id|bookid
1|1
2|1
3|2
4|2
5|3
6|3
authors
id|author
1|Twain
2|Beazley
2|Jones
The "correct answer" for total # of authors of the Python Cookbook is 2. However, because there are two joins and the overall dataset is expanded by the join on number of orders, SUM(a.id) will be 4.
You are correct that by joining multiple tables you would not get the expected results.
But in this case you should use COUNT() instead of SUM() and count the distinct orders or authors.
Also by your design you should count the names of the authors and not the ids of the table authors:
SELECT b.title,
COUNT(DISTINCT o.id) as sales,
COUNT(DISTINCT a.author) as authors
FROM books `b`
LEFT JOIN orders `o` ON o.bookid = b.id
LEFT JOIN authors `a` ON b.authorid = a.id
GROUP BY b.id, b.title
See the demo.
Results:
| title | sales | authors |
| --------------- | ----- | ------- |
| Huck Finn | 2 | 1 |
| Tom Sawyer | 2 | 1 |
| Python Cookbook | 2 | 2 |
When dealing with separate aggregates, it is good style to aggregate before joining.
Your data model is horribly confusing, making it look like a book is written by one author only (referenced by books.authorid), while this "ID" is not an author's ID at all.
Your main problem is: You don't count! We count with COUNT. But you are mistakenly adding up ID values with SUM.
Here is a proper query, where I am aggregating before joining and using alias names to fight confusion and thus enhance the query's readability and maintainability.
SELECT
b.title,
COALESCE(o.order_count, 0) AS sales,
COALESCE(a.author_count, 0) AS authors
FROM (SELECT title, id AS book_id, authorid AS author_group_id FROM books) b
LEFT JOIN
(
SELECT id as author_group_id, COUNT(*) as author_count
FROM authors
GROUP BY id
) a ON a.author_group_id = b.author_group_id
LEFT JOIN
(
SELECT bookid AS book_id, COUNT(*) as order_count
FROM orders
GROUP BY bookid
) o ON o.book_id = b.book_id
ORDER BY b.title;
i don't think that your query would work like you eexspected.
Assume one book could have 3 authors.
For Authors:
So you would have three rows for that book in your books table,each one for every Author.
So a
SUM(b.authorid)
gives you the correct answer in your case.
For Orders:
you must use a subselect like
LEFT JOIN (SELECT SUM(id) o_sum,bookid FROM orders GROUP BY bookid) `o`
ON o.bookid = b.id
You should really reconsider your approach with books and authors.

Optimization of MySQL query have millions of records

OBJECTIVE: Need query to count all "distinct" leads outside of current company that do not exist in current company. The query needs to account for millions of records between multiple tables (lead_details, domains, company)
EXAMPLE:
company 1 -> domain 1 -> lead 1 lead_details records exists.
company 2 -> domain 2 -> lead 1 lead_details records exists.
company 2 -> domain 2 -> lead 2 lead_details records exists.
company 3 -> domain 3 -> lead 2 lead_details records exists.
company 3 -> domain 3 -> lead 3 lead_details records exists.
RESULT: If I run the query for the data above on company 1, the result should be a count of (2) since lead 2 & lead 3 is unique and does not exist in company 1
domain_id domain_name company_id company_name lead_id lead_count
"2" "D2" "2" "C2" "2" "2"
"3" "D3" "3" "C3" "3" "1"
Here is my Query, Please let me know if anyone has any better suggestion.
SELECT al.*
FROM (
SELECT
d.id AS domain_id,
d.name AS domain_name,
c.id AS company_id,
c.name AS company_name,
ld.lead_id,
count(ld.lead_id) as lead_count
FROM domains d
INNER JOIN company c
ON (c.id = d.company_id AND c.id != 1)
INNER JOIN lead_details ld
ON (ld.domain_id = d.id)
GROUP BY ld.lead_id
) al
LEFT JOIN (
SELECT ld.lead_id FROM domains d
INNER JOIN company c
ON (c.id = d.company_id AND c.id = 1)
INNER JOIN lead_details ld
ON (ld.domain_id = d.id)
) ccl
ON al.lead_id = ccl.lead_id
WHERE ccl.lead_id IS NULL;
I have almost million rows, so need to figure out better solution..
Plan A
The pattern
FROM ( SELECT ... )
JOIN ( SELECT ... ) ON ...
is inefficient, especially in older versions of MySQL. This is because neither of the subqueries has any indexes, so (in older versions) a repeated full table scan is needed of one of the subqueries.
The better method is to try to reformulate as
FROM t1 ...
JOIN t2 ... ON ...
JOIN t3 ... ON ...
LEFT JOIN t4 ... ON ...
LEFT JOIN t5 ... ON ...
Plan B
This is closer to what you have...
CREATE TEMPORARY TABLE ccl
( INDEX(lead_id) )
SELECT ... -- the stuff that is after LEFT JOIN
Then replace that subquery with just ccl. This provides the index that is missing from the original query.
Plan C
Summary Table. (This may or may not be practical for your query, since you are looking distinct and do not exist.) Every month (or week or whatever) calculate subtotals for the last month and store it into another table. Then the query against this other table will be much faster.

Looking for help on joining query

im taking my very first programming (sql) class. I've got no tech background whatsoever and I'm having a little trouble getting the code down.
here is the what the database looks like.
BOOK
BOOK_CODE (UNIQUE)
BOOKTITLE
PUBLISHER_CODE
BOOKTYPE
PRICE
INVENTORY
BOOK_CODE (UNIQUE)
BRANCH_NUM (UNIQUE)
ON_HAND
The question is, list the books (titles) that are available in branches 3 and 4 (on both at the same time).
Im thinking i need to use the following tables: booktitle on the book table , bookcode on both tables ( book and inventory), and branch_num on inventory table.
also the answer can only show the book titles available on branches 3 and 4 (at the same time) and no other columns.
sorry if im making no sense. like i said im a n00b.
select distinct BOOKTITLE from BOOK a, INVENTORY b
where a.BOOK_CODE = b.BOOK_CODE and a.BOOK_CODE in
(select distinct p.BOOK_CODE from Inventory p, Inventory q where p.BOOK_CODE =
q.BOOK_CODE
and p.BRANCH_NUM = 3 AND q.BRANCH_NUM = 4);
SELECT BOOKTITLE FROM BOOK
WHERE BOOK_CODE IN (SELECT BOOK_CODE FROM INVENTORY WHERE BRANCH_NUM = 3 AND ON_HAND > 0)
AND BOOK_CODE IN (SELECT BOOK_CODE FROM INVENTORY WHERE BRANCH_NUM = 4 AND ON_HAND > 0);
OR
SELECT BOOKTITLE FROM
(
SELECT BOOK.BOOK_CODE, BOOKTITLE, COUNT(*) AS BRANCH_COUNT FROM BOOK
INNER JOIN INVENTORY ON BOOK.BOOK_CODE = INVENTORY.BOOK_CODE
AND INVENTORY.BRANCH_NUM IN (3, 4)
GROUP BY BOOK.BOOK_CODE, BOOKTITLE
) B
WHERE B.BRANCH_COUNT = 2;
Please try this:
SELECT BK.BOOKTITLE
FROM BOOK BK
INNER JOIN INVENTORY INV1
ON INV1.BOOK_CODE = BK.BOOK_CODE
INNER JOIN INVENTORY INV2
ON INV2.BOOK_CODE = BK.BOOK_CODE
WHERE INV1.BRANCH_NUM = 3
AND INV2.BRANCH_NUM = 4
Your question tells you the fields you need from the table, which gives you this as a starting point:
SELECT booktitle FROM book . . .
That alone will give you a list of every booktitle in the table book, but you want to filter it down to only those books in both branch_num 3 and 4.
When you JOIN two or more tables, you're looking to match rows based on some shared valued; book_code, in this case.
Since you want to see only those books in both tables, you'll want to use an INNER JOIN. (I'm assuming ON_HAND is an INT describing the number of copies in the branch.)
SELECT booktitle FROM book
INNER JOIN inventory USING (book_code)
WHERE inventory.onhand > 0
That query will return a list of every book that is available in any branch. Modifying the query, you can restrict it only to those books in branch_num 3:
SELECT booktitle FROM book
INNER JOIN inventory USING (book_code)
WHERE inventory.onhand > 0 AND inventory.branch_num = 3
...but you need books in both branches 3 and 4, so you'll need to do another join on the inventory table. But how can you distinguish the two? Use table aliases:
SELECT booktitle FROM book
INNER JOIN inventory AS inventory1 USING (book_code)
INNER JOIN inventory AS inventory2 USING (book_code)
WHERE inventory1.onhand > 0 AND inventory1.branch_num = 3 AND inventory2.onhand > 0 AND inventory2.branch_num = 4
That should give you what you're looking for.
For a much better explanation of joins, see MySQL Joins and A Visual Explanation of SQL Joins.
[Note: this problem could also be using subqueries, which you should try to do.]

MySQL selecting rows with a max id and matching other conditions

Using the tables below as an example and the listed query as a base query, I want to add a way to select only rows with a max id! Without having to do a second query!
TABLE VEHICLES
id vehicleName
----- --------
1 cool car
2 cool car
3 cool bus
4 cool bus
5 cool bus
6 car
7 truck
8 motorcycle
9 scooter
10 scooter
11 bus
TABLE VEHICLE NAMES
nameId vehicleName
------ -------
1 cool car
2 cool bus
3 car
4 truck
5 motorcycle
6 scooter
7 bus
TABLE VEHICLE ATTRIBUTES
nameId attribute
------ ---------
1 FAST
1 SMALL
1 SHINY
2 BIG
2 SLOW
3 EXPENSIVE
4 SHINY
5 FAST
5 SMALL
6 SHINY
6 SMALL
7 SMALL
And the base query:
select a.*
from vehicle a
join vehicle_names b using(vehicleName)
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
and a.vehicleName like '%coo%'
group
by a.id
having count(distinct c.attribute) = 2;
So what I want to achieve is to select rows with certain attributes, that match a name but only one entry for each name that matches where the id is the highest!
So a working solution in this example would return the below rows:
id vehicleName
----- --------
2 cool car
10 scooter
if it was using some sort of max on the id
at the moment I get all the entries for cool car and scooter.
My real world database follows a similar structure and has 10's of thousands of entries in it so a query like above could easily return 3000+ results. I limit the results to 100 rows to keep execution time low as the results are used in a search on my site. The reason I have repeats of "vehicles" with the same name but only a different ID is that new models are constantly added but I keep the older one around for those that want to dig them up! But on a search by car name I don't want to return the older cards just the newest one which is the one with the highest ID!
The correct answer would adapt the query I provided above that I'm currently using and have it only return rows where the name matches but has the highest id!
If this isn't possible, suggestions on how I can achieve what I want without massively increasing the execution time of a search would be appreciated!
If you want to keep your logic, here what I would do:
select a.*
from vehicle a
left join vehicle a2 on (a.vehicleName = a2.vehicleName and a.id < a2.id)
join vehicle_names b on (a.vehicleName = b.vehicleName)
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
and a.vehicleName like '%coo%'
and a2.id is null
group by a.id
having count(distinct c.attribute) = 2;
Which yield:
+----+-------------+
| id | vehicleName |
+----+-------------+
| 2 | cool car |
| 10 | scooter |
+----+-------------+
2 rows in set (0.00 sec)
As other said, normalization could be done on few levels:
Keeping your current vehicle_names table as the primary lookup table, I would change:
update vehicle a
inner join vehicle_names b using (vehicleName)
set a.vehicleName = b.nameId;
alter table vehicle change column vehicleName nameId int;
create table attribs (
attribId int auto_increment primary key,
attribute varchar(20),
unique key attribute (attribute)
);
insert into attribs (attribute)
select distinct attribute from vehicle_attribs;
update vehicle_attribs a
inner join attribs b using (attribute)
set a.attribute=b.attribId;
alter table vehicle_attribs change column attribute attribId int;
Which led to the following query:
select a.id, b.vehicleName
from vehicle a
left join vehicle a2 on (a.nameId = a2.nameId and a.id < a2.id)
join vehicle_names b on (a.nameId = b.nameId)
join vehicle_attribs c on (a.nameId=c.nameId)
inner join attribs d using (attribId)
where d.attribute in ('SMALL', 'SHINY')
and b.vehicleName like '%coo%'
and a2.id is null
group by a.id
having count(distinct d.attribute) = 2;
The table does not seems normalized, however this facilitate you to do this :
select max(id), vehicleName
from VEHICLES
group by vehicleName
having count(*)>=2;
I'm not sure I completely understand your model, but the following query satisfies your requirements as they stand. The first sub query finds the latest version of the vehicle. The second query satisfies your "and" condition. Then I just join the queries on vehiclename (which is the key?).
select a.id
,a.vehiclename
from (select a.vehicleName, max(id) as id
from vehicle a
where vehicleName like '%coo%'
group by vehicleName
) as a
join (select b.vehiclename
from vehicle_names b
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
group by b.vehiclename
having count(distinct c.attribute) = 2
) as b on (a.vehicleName = b.vehicleName);
If this "latest vehicle" logic is something you will need to do a lot, a small suggestion would be to create a view (see below) which returns the latest version of each vehicle. Then you could use the view instead of the find-max-query. Note that this is purely for ease-of-use, it offers no performance benefits.
select *
from vehicle a
where id = (select max(b.id)
from vehicle b
where a.vehiclename = b.vehiclename);
Without going into proper redesign of you model you could
1) Add a column IsLatest that your application could manage.
This is not perfect but will satisfy you question (until next problem, see not at the end)
All you need is when you add a new entry to issue queries such as
UPDATE a
SET IsLatest = 0
WHERE IsLatest = 1
INSERT new a
UPDATE a
SET IsLatest = 1
WHERE nameId = #last_inserted_id
in a transaction or a trigger
2) Alternatively you can find out the max_id before you issue your query
SELECT MAX(nameId)
FROM a
WHERE vehicleName = #name
3) You can do it in single SQL, and providing indexes on (vehicleName, nameId) it should actually have decent speed with
select a.*
from vehicle a
join vehicle_names b ON a.vehicleName = b.vehicleName
join vehicle_attribs c ON b.nameId = c.nameId AND c.attribute = 'SMALL'
join vehicle_attribs d ON b.nameId = c.nameId AND d.attribute = 'SHINY'
join vehicle notmax ON a.vehicleName = b.vehicleName AND a.nameid < notmax.nameid
where a.vehicleName like '%coo%'
AND notmax.id IS NULL
I have removed your GROUP BY and HAVING and replaced it with another join (assuming that only single attribute per nameId is possible).
I have also used one of the ways to find max per group and that is to join a table on itself and filter out a row for which there are no records that have a bigger id for a same name.
There are other ways, search so for 'max per group sql'. Also see here, though not complete.

Struggling with MySQL query using joins

I have 3 tables for storing information about books:
books
-----
id
title
authors
-------
id
name
books_to_authors
----------------
book_id
author_id
When a book's information is displayed, I then want to be able to select any other books by the same authors.
I have the current book id available from the first query, but I can't figure out where to start to achieve what I need as there can be multiple authors. Obviously with just one of them it would be simple, so I'm really struggling with this. Any help would be much appreciated!
I think this aught to do it. Just replace the ? with the book ID they are currently viewing and this will give you all the books by the same author.
SELECT b.*
FROM books b
INNER JOIN books_to_authors b2a ON b2a.book_id = b.id
WHERE b2a.author_id IN (
SELECT author_id FROM books_to_authors WHERE book_id = ?
)
If you want to exclude the book they are currently viewing, you can change the query like this:
SELECT b.*
FROM books b
INNER JOIN books_to_authors b2a ON b2a.book_id = b.id
WHERE b2a.author_id IN (
SELECT author_id FROM books_to_authors WHERE book_id = ?
)
AND b.id <> ?
$book_id = (your code for retrieving book_id);
$db_query = "SELECT b.*
FROM books b
INNER JOIN books_to_authors bta ON bta.book_id = b.id
WHERE bta.author_id IN (
SELECT author_id FROM books_to_authors WHERE book_id = ".$book_id."
)";
I presumed that you are using php. If I'm wrong, just use SQL query string, and ignore the rest...
You're looking for the query below. I see some solutions with subqueries and I'd highly recommend not using subqueries. They are slower than running 2 queries:
Having the book id you do SELECT author_id FROM books_to_authors WHERE book_id = '{$book_id}'
Get the author id and then run this:
SELECT books.id, books.title, authors.name FROM books RIGHT JOIN books_to_authors ON books_to_authors.book_id = books.id) RIGHT JOIN authors ON (authors.id = books_to_authors.author_id) WHERE authors.id = '{$author_id}'