The issue:
I want to select the products which have the option_value_id of both 1 and 3. But, as you can see, it will also show the products which have only have 1 of the option_value_ids.
I tried adding AND instead of IN but that will obviously show no results.
The answer might be simple, but I just can't seem to figure it out at the moment.
Could someone help me out? Even a small hint can be appreciated.
This is called Relation Division, and here is one way to do so:
SELECT *
FROM TABLEName
WHERE Product_ID IN(SELECT Product_ID
FROM Tablename
WHERE option_value_id IN(1, 3)
GROUP BY Product_ID
HAVING COUNT(option_value_id) = 2);
SQL Fiddle Demo
This will give you:
| ID | PRODUCT_ID | OPTION_VALUE_ID |
-------------------------------------
| 1 | 1 | 1 |
| 3 | 1 | 3 |
| 13 | 2 | 3 |
| 14 | 2 | 1 |
This is example of looking at things as a set. I think the best approach is to use SQL's aggregation, particularly the having clause. In MySQL syntax, this looks like:
select pa.product_id
from Product_Attributes pa
group by pa.product_id
having max(pa.option_value_id = 1) = 1 and
max(pa.option_value_id = 3) = 1
Try this:
SELECT *
FROM products_attributes
WHERE option_value_id IN (1, 3)
GROUP BY product_id
HAVING COUNT(*) = 2
This is a common problem, called Relational Division, there is even a tag in SO for it: sql-match-all
Usually, there is a unique constraint on the (product_id, option_value_id), so one solution is to use 2 joins (N joins if you want to check for N attributes):
SELECT p.* -- whatever columns you need
FROM product AS p -- from the `product` table
JOIN products_attributes AS pa1
ON pa1.option_value_id = 1
AND pa1.product_id = p.product_id
JOIN products_attributes AS pa2
ON pa2.option_value_id = 3
AND pa2.product_id = p.product_id ;
There is a similar question, with more than 10 different ways to achieve the same result (and benchmarks for Postgres): How to filter SQL results in a has-many-through relation
Related
This question is a bit complicated to me, and I can't explain it in one sentence so the title may seem quite ambiguous.
I have 3 tables in my MySQL database, their structure is shown below:
word_list (5 million rows)
+-----+--------+
| wid | word |
+-----+--------+
| 1 | foo |
| 2 | bar |
| 3 | hello |
+-----+--------+
paper_word_relation (10 million rows)
+-----+-------+
| pid | word |
+-----+-------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 3 |
+-----+-------+
paper_citation_relation (80K rows)
+----------+--------+
| pid_from | pid_to |
+----------+--------+
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 1 |
| 2 | 3 |
+----------+--------+
I want to find out how many papers contain word W, and cite the papers also contain word W.(for each word in the list)
I use two inner join to do this job but it seems extremely slow when the word is popular - above 50s (quite fast if the word is rarely used - below 0.1s), here is my code
SELECT COUNT(*) FROM (
SELECT a.pid_from, a.pid_to, b.word FROM paper_citation_relation AS a
INNER JOIN paper_word_relation AS b ON a.pid_from = b.pid
INNER JOIN paper_word_relation AS c ON a.pid_to = c.pid
WHERE b.word = 2 AND c.word = 2) AS d
How can I do this faster? Is my query not efficient enough or it's the problem about the amount of data?
I can only come up with one solution that I delete the words which occur less than 2 in the paper_word_relation table. (About 4 million words only occur once)
Thanks!
If you are only concerned with getting the Count, you should not be first getting the results into a Derived Table, and then Count the rows out. This may create unnecessary temporary tables storing lots of data in-memory. You can directly count the number of rows.
I also think that you need to count unique number of papers. Because of Many-to-Many relationships in paper_citation_relation table, duplicate rows may be coming for a single paper.
SELECT COUNT(DISTINCT a.pid_from)
FROM paper_citation_relation AS a
INNER JOIN paper_word_relation AS b ON a.pid_from = b.pid
INNER JOIN paper_word_relation AS c ON a.pid_to = c.pid
WHERE b.word = 2 AND c.word = 2
For performance, you will need following indexing:
Composite Index on (pid_from, pid_to) in the paper_citation_relation table.
Composite Index on (pid, word) in the paper_word_relation table.
We may also possibly optimize the query further by reducing one join and use conditional AND/OR based filtering in HAVING. You will need to benchmark it though.
SELECT COUNT(*)
FROM (
SELECT a.pid_from
FROM paper_citation_relation AS a
INNER JOIN paper_word_relation AS b
ON (a.pid_from = b.pid OR
a.pid_to = b.pid)
GROUP BY a.pid_from
HAVING SUM(a.pid_from = b.pid AND b.word = 2) AND
SUM(a.pid_to = b.pid AND b.word = 2)
)
After the first 1:n join you get the same pid_to multiple times and your next join is no longer 1:n but n:m, creating a possibly huge intermediate result before the final DISTINCT. It's similar to a CROSS JOIN and it's getting worse for popular words, e.g. 10*10 vs. 1000*1000 rows.
You must remove the duplicates before the join, this should return the same number as #MadhurBhaiya's answer
SELECT Count(*) -- no more DISTINCT needed
FROM
(
SELECT DISTINCT cr.pid_to -- reducing m to 1
FROM paper_citation_relation AS cr
JOIN paper_word_relation AS wr
ON cr.pid_from = wr.pid
WHERE wr.word = 2
) AS dt
JOIN paper_word_relation AS wr
ON dt.pid_to = wr.pid -- 1:n join again
WHERE wr.word = 2
If you want to count the number of papers which have been cited you need to get a distinct list of pid (either pid_from or pid_to) from paper_citation_relation first and then join to the specific word.
SELECT Count(*)
FROM
( -- get a unique list of cited or citing papers
SELECT pid_from AS pid -- citing
FROM paper_citation_relation
UNION -- DISTINCT by default
SELECT pid_to -- cited
FROM paper_citation_relation
) AS dt
JOIN paper_word_relation AS wr
ON wr.pid = dt.pid
WHERE wr.word = 2 -- now check for the searched word
The number returned by this might be slightly higher (it counts a paper regardless if cited or citing).
Let's say i've got this database:
book
| idBook | name |
|--------|----------|
| 1 |Book#1 |
category
| idCateg| category |
|--------|----------|
| 1 |Adventures|
| 2 |Science F.|
book_categ
| id | idBook | idCateg | DATA |
|--------|--------|----------|--------|
| 1 | 1 | 1 | (null) |
| 2 | 1 | 2 | (null) |
I'm trying to select only the books which are in category 1 AND category 2 something like this
SELECT book.* FROM book,book_categ
WHERE book_categ.idCateg = 1 AND book_categ.idCateg = 2
Obviously, this giving 0 results becouse each row has only one idCateg it does work width OR but the results are not what I need. I've also tried to use a join, but I just can't get the results I expect.
Here it's the SQLFiddle of my current project, with my current DB, the data at the begining is just a sample. SQLFiddle
Any help will be really appreciated.
Solution using EXISTS:
select *
from book b
where exists (select 'x'
from book_categ x
where x.idbook = b.idbook
and x.idcateg = 1)
and exists (select 'x'
from book_categ x
where x.idbook = b.idbook
and x.idcateg = 2)
Solution using join with an inline view:
select *
from book b
join (select idbook
from book_categ
where idcateg in (1, 2)
group by idbook
having count(*) = 2) x
on b.idbook = x.idbook
You could try using ALL instead of IN (if you only want values that match all criteria to be returned):
SELECT book.*
FROM book, book_categ
WHERE book_categ.idCateg = ALL(1 , 2)
One way to get the result is to do join to the book_categ table twice, something like
SELECT b.*
FROM book b
JOIN book_categ c1
ON c1.book_id = b.id
AND c1.idCateg = 1
JOIN book_categ c2
ON c2.book_id = b.id
AND c2.idCateg = 2
This assumes that (book_id, idCateg) is constrained to be unique in the book_categ table. If it isn't unique, then this query can return duplicate rows. Adding a GROUP BY clause or the DISTINCT keyword will eliminate any generated duplicates.
There are several other queries that can get generate the same result.
For example, another approach to finding book_id that are in two categories is to get all the rows with idCateg values of 1 or 2, and then GROUP BY book_id and get a count of DISTINCT values...
SELECT b.*
FROM book b
JOIN ( SELECT d.book_id
FROM book_categ d
WHERE d.idCateg IN (1,2)
GROUP BY d.book_id
HAVING COUNT(DISTINCT d.idCateg) = 2
) c
ON c.book_id = b.id
Not sure if this is possible but I have a schema like this:
id | user_id | thread_id
1 | 1 | 1
2 | 4 | 1
3 | 1 | 2
4 | 3 | 2
I am trying to retrieve the thread_id where user_id = 1 and 4. I know that in(1,4) does not fit my needs as its pretty much a OR and will pull up record 3 as well and Exists only returns a bool.
You may use JOIN (that answer already exists) or HAVING, like this:
SELECT
thread_id,
COUNT(1) AS user_count
FROM
t
WHERE
user_id IN (1,4)
GROUP BY
thread_id
HAVING
user_count=2
-check the demo. HAVING will fit better in case of many id's (because with JOIN you'll need to join as many times as many id you have). This is a bit tricky, however: you may do = comparison only if your records are unique per (user_id, thread_id); for example, your user_id can repeat, then use >=, like in this demo.
Try this with join, i guess you need to do AND operation with user_id must be 4 and 1 then
SELECT
t1.thread_id
FROM
TABLE t1
JOIN TABLE t2
ON (t1.user_id = t2.user_id)
WHERE t1.user_id = 1
AND t2.user_id = 4
Please find db structure as following...
| id | account_number | referred_by |
+----+-----------------+--------------+
| 1 | ac203003 | ac203005 |
+----+-----------------+--------------+
| 2 | ac203004 | ac203005 |
+----+-----------------+--------------+
| 3 | ac203005 | ac203004 |
+----+-----------------+--------------+
I want to achieve following results...
id, account_number, total_referred
1, ac203005, 2
2, ac203003m 0
3, ac203004, 1
And i am using following query...
SELECT id, account_number,
(SELECT count(*) FROM `member_tbl` WHERE referred_by = account_number) AS total_referred
FROM `member_tbl`
GROUP BY id, account_number
but its not giving expected results, please help. thanks.
You need to use table aliases to do this correctly:
SELECT id, account_number,
(SELECT count(*)
FROM `member_tbl` t2
WHERE t2.referred_by = t1.account_number
) AS total_referred
FROM `member_tbl` t1;
Your original query had referred_by = account_number. Without aliases, these would come from the same row -- and the value would be 0.
Also, I removed the outer group by. It doesn't seem necessary, unless you want to remove duplicates.
One idea is to join the table on itself. This way you can avoid the subquery. There might be performance gains with this approach.
select b.id, b.account_number, count(a.referred_by)
from member_tbl a inner join member_tbl b
on a.referred_by=b.account_number
group by (a.referred_by);
SQL fiddle: http://sqlfiddle.com/#!2/b1393/2
Another test, with more data: http://sqlfiddle.com/#!2/8d216/1
select t1.account_number, count(t2.referred_by)
from (select account_number from member_tbl) t1
left join member_tbl t2 on
t1.account_number = t2.referred_by
group by t1.account_number;
Fiddle for your data
Fiddle with more data
I have a join table named languages_services that basically joins the table services and languages.
I need to find a service that is able to serve both ENGLISH (language_id=1) and ESPANOL (language_id=2).
table languages_services
------------------------
service_id | language_id
------------------------
1 | 1
1 | 2
1 | 3
2 | 1
2 | 3
With the data provided above, I want to test for language_id=1 AND language_id=2 where the result would look like this
QUERY RESULT
------------
service_id
------------
1
Obviously it doesn't return the one with service_id=2 because it doesn't service Espanol.
Any tips on this is greatly appreciated!
SELECT
service_id
FROM
language_services
WHERE
language_id = 1
OR language_id = 2
GROUP BY
service_id
HAVING
COUNT(*) = 2
Or...
WHERE
lanaguage_id IN (1,2)
GROUP BY
service_id
HAVING
COUNT(*) = 2
If you're always looking at 2 languages you could do it with joins, but the aggregate version is easier to adapt to differing numbers of language_ids. (Add an OR, or add an item to the IN list, and change the COUNT(*) = 2 to COUNT(*) = 3, etc, etc).
Be aware, however, that this scales very poorly. And with this table structure there isn't much you can do about that.
EDIT Example using a join for 2 languages
SELECT
lang1.service_id
FROM
language_services AS lang1
INNER JOIN
language_services AS lang2
ON lang1.service_id = lang2.service_id
WHERE
lang1.language_id = 1
AND lang2.language_id = 2