mysql intersection, comparison, opposite of UNION? - mysql

I'm trying to compare two set of resutls aving hard time to undesrtand how subqueries work and if they are efficient. I'm not gonna explain all my tables, but just think i have apair of arrays...i might do it in php but i wonder if i can do it in mysql right away...
this is my query to check how many items user 1 has in lists he owns
SELECT DISTINCT *
FROM list_tb
INNER JOIN item_to_list_tb
ON list_tb.list_id = item_to_list_tb.list_id
WHERE list_tb.user_id = 1
ORDER BY item_to_list_tb.item_id DESC
this is my query to check how many items user 2 has in lists he owns
SELECT DISTINCT *
FROM list_tb
INNER JOIN item_to_list_tb
ON list_tb.list_id = item_to_list_tb.list_id
WHERE list_tb.user_id = 1
ORDER BY item_to_list_tb.item_id DESC
now the problem is that i would intersect those results to check how many item_id they have in common...
thanks!!!

Unfortunately, MySQL does not support the Intersect predicate. However, one way to accomplish that goal would be to exclude List_Tb.UserId from your Select and Group By and then count by distinct User_Id:
Select ... -- everything except List_Tb.UserId
From List_Tb
Inner Join Item_To_List_Tb
On List_Tb.List_Id = Item_To_List_Tb.List_Id
Where List_Tb.User_Id In(1,2)
Group By ... -- everything except List_Tb.UserId
Having Count( Distinct List_Tb.User_Id ) = 2
Order By item_to_list_tb.item_id Desc
Obviously you would replace the ellipses with the actual columns you want to return and on which you wish to group.

Related

mysql group by with where returns totally other results

I try to select in a one to many relation childs where records of childs is 1. For fetching childs with one record I use the following query.
Here is the simple query which works if I do not use wherestatement
select a.iid,
account_id,
count(*) as props
from accounts_prop a
group by a.account_id
having props = 1
when I use where I get back totally other result. In this case I get records which shows that props are having 1 record but actually having more than one
select a.iid,
account_id,
count(*) as props
from accounts_prop a
where a.von >= '2017-08-25'
group by a.account_id
having props = 1
What I'm missing in this case
the where condition filter you original rows so
where a.von >= '2017-08-25'
reduce the number of rows involved in query
the having clause work on the result of a query so in you have filter with a where (or not ) you obtain different result
In your case in the first query your count is calculate on all the rows in your in the second query your count is calculated only on a subset
This can explain why you obtain different resul from the two query
Upon closer inspection, your original query is not even determinate:
SELECT
a.iid, -- OK, but which value of iid do we choose?
a.account_id,
COUNT(*) AS props
FROM accounts_prop a
GROUP BY a.account_id
HAVING props = 1
The query makes no sense because you are selecting the iid column while aggregating on other columns. Which value of iid should be chosen for each account? This is not clear, and your query would not even run on certain versions of MySQL or most other databases.
Instead, put your logic to find accounts with only a single record into a subquery, and then join to that subquery to get the full records for each match:
SELECT a1.*
FROM accounts_prop a1
INNER JOIN
(
SELECT account_id
FROM accounts_prop
GROUP BY account_id
HAVING COUNT(*) = 1
) a2
ON a1.account_id = a2.account_id
WHERE a1.von >= '2017-08-25'

Why does a MySQL query with a Dependent Subquery take so much longer to execute than executing each statement individually?

I'm running two queries.
The first one gets unique IDs. This executes in ~350ms.
select parent_id
from duns_match_sealed_air_072815
group by duns_number
Then I paste those IDs into this second query. With >10k ids pasted in, it also executes in about ~350ms.
select term, count(*) as count
from companies, business_types, business_types_to_companies
where
business_types.id = business_types_to_companies.term_id
and companies.id = business_types_to_companies.company_id
and raw_score > 25
and diversity = 1
and company_id in (paste,ten,thousand,ids,here)
group by term
order by count desc;
When I combine these queries into one it takes a long time to execute. I don't know how long because I stopped it after minutes.
select term, count(*) as count
from companies, business_types, business_types_to_companies
where
business_types.id = business_types_to_companies.term_id
and companies.id = business_types_to_companies.company_id
and raw_score > 25
and diversity = 1
and company_id in (
select parent_id
from duns_match_sealed_air_072815
group by duns_number
)
group by term
order by count desc;
What is going on?
It's down to the way it processes the query - I believe it has to run your embedded query once for each row, whereas using two queries allows you to store the result.
Hope this helps!
The query has been re-written using JOIN, but particularly I've used EXISTS instead of IN. This is a short in the dark. It is possible that there may be many values generated in the sub-query causing the outer query to struggle while it goes through matching each item returned from the sub-query.
select term, count(*) as count
from companies c
inner join business_types_to_companies bc on bc.company_id = c.id
inner join business_types b on b.id = bc.term_id
where
raw_score > 25
and diversity = 1
and exists (
select 1
from duns_match_sealed_air_072815
where parent_id = c.id
)
group by term
order by count desc;
First, with respect, your subquery doesn't use GROUP BY in a sensible way.
select parent_id /* wrong GROUP BY */
from duns_match_sealed_air_072815
group by duns_number
In fact, it misuses the pernicious MySQL extension to GROUP BY. Read this. http://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html . I can't tell what your application logic intends from this query, but I can tell you that it actually returns an unpredictably selected parent_id value associated with each distinct duns_number value.
Do you want
select MIN(parent_id) parent_id
from duns_match_sealed_air_072815
group by duns_number
or something like that? That one selects the lowest parent ID associated with each given number.
Sometimes MySQL has a hard time optimizing the WHERE .... IN () query pattern. Try a join instead. Like this:
select term, count(*) as count
from companies
join (
select MIN(parent_id) parent_id
from duns_match_sealed_air_072815
group by duns_number
) idlist ON companies.id = idlist.parent_id
join business_types_to_companies ON companies.id = business_types_to_companies.company_id
join business_types ON business_types.id = business_types_to_companies.term_id
where raw_score > 25
and diversity = 1
group by term
order by count desc
To optimize this further we'll need to see the table definitions and the output from EXPLAIN.

Subquery returns more than 1 row

im geting this error when trying to do 2 counts inside of my query
first ill show you the query:
$sql = mysql_query("select c.id, c.number, d.name,
(select count(*) from `parts` where `id_container`=c.id group by `id_car`) as packcount,
(select count(*) from `parts` where `id_container`=c.id) as partcount
from `containers` as c
left join `destinations` as d on (d.id = c.id_destination)
order by c.number asc") or die(mysql_error());
now the parts table has 2 fields that i need to use in the count:
id_car
id_container
id_car = the ID of the car the part is for
id_container = the ID of the container the part is in
for packcount all i want is a count of the total cars per container
for partcount all i want it a count of the total parts per container
It's because of GROUP BY You're using
Try something like
(select count(distinct id_car) from `parts` where `id_container`=c.id)
in You're subquery (can't check right now)
EDIT
PFY - I think UNIQUE is for indexes
Your grouping in your first sub-query is causing multiple rows to be returned, you will probably need to run separate queries to get the results you are looking for.
This subquery may return more than one row.
(select count(*) from `parts` where `id_container`=c.id group by `id_car`) as packcount, ...
so, i'd suggest to try something of the following:
(select count(DISTINCT `id_car`) from `parts` where `id_container`=c.id) as packcount, ...
see: COUNT(DISTINCT) on dev.mysql.com
and: QA on stackoverflow

How can I make these two queries into one?

I have two tables, one for downloads and one for uploads. They are almost identical but with some other columns that differs them. I want to generate a list of stats for each date for each item in the table.
I use these two queries but have to merge the data in php after running them. I would like to instead run them in a single query, where it would return the columns from both queries in each row grouped by the date. Sometimes there isn't any download data, only upload data, and in all my previous tries it skipped the row if it couldn't find log data from both rows.
How do I merge these two queries into one, where it would display data even if it's just available in one of the tables?
SELECT DATE(upload_date_added) as upload_date, SUM(upload_size) as upload_traffic, SUM(upload_files) as upload_files
FROM packages_uploads
WHERE upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY upload_date
ORDER BY upload_date DESC
SELECT DATE(download_date_added) as download_date, SUM(download_size) as download_traffic, SUM(download_files) as download_files
FROM packages_downloads
WHERE download_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY download_date
ORDER BY download_date DESC
I want to get result rows like this:
date, upload_traffic, upload_files, download_traffic, download_files
All help appreciated!
Your two queries can be executed and then combined with the UNION cluase along with an extra field to identify Uploads and Downloads on separate lines:
SELECT
'Uploads' TransmissionType,
DATE(upload_date_added) as TransmissionDate,
SUM(upload_size) as TransmissionTraffic,
SUM(upload_files) as TransmittedFileCount
FROM
packages_uploads
WHERE upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY upload_date
ORDER BY upload_date DESC
UNION
SELECT
'Downloads',
DATE(download_date_added),
SUM(download_size),
SUM(download_files)
FROM packages_downloads
WHERE download_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY download_date
ORDER BY download_date DESC;
Give it a Try !!!
What you're asking can only work for rows that have the same add date for upload and download. In this case I think this SQL should work:
SELECT
DATE(u.upload_date_added) as date,
SUM(u.upload_size) as upload_traffic,
SUM(u.upload_files) as upload_files,
SUM(d.download_size) as download_traffic,
SUM(d.download_files) as download_files
FROM
packages_uploads u, packages_downloads d
WHERE u.upload_date_added = d.download_date_added
AND u.upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY date
ORDER BY date DESC
Without knowing the schema is hard to give the exact answer so please see the following as a concept not a direct answer.
You could try left join, im not sure if the table package exists but the following may be food for thought
SELECT
p.id,
up.date as upload_date
dwn.date as download_date
FROM
package p
LEFT JOIN package_uploads up ON
( up.package_id = p.id WHERE up.upload_date = 'etc' )
LEFT JOIN package_downloads dwn ON
( dwn.package_id = p.id WHERE up.upload_date = 'etc' )
The above will select all the packages and attempt to join and where the value does not join it will return null.
There is number of ways that you can do this. You can join using primary key and foreign key. In case if you do not have relationship between tables,
You can use,
LEFT JOIN / LEFT OUTER JOIN
Returns all records from the left table and the matched
records from the right table. The result is NULL from the
right side when there is no match.
RIGHT JOIN / RIGHT OUTER JOIN
Returns all records from the right table and the matched
records from the left table. The result is NULL from the left
side when there is no match.
FULL OUTER JOIN
Return all records when there is a match in either left or right table records.
UNION
Is used to combine the result-set of two or more SELECT statements.
Each SELECT statement within UNION must have the same number of,
columns The columns must also have similar data types The columns in,
each SELECT statement must also be in the same order.
INNER JOIN
Select records that have matching values in both tables. -this is good for your situation.
INTERSECT
Does not support MySQL.
NATURAL JOIN
All the column names should be matched.
Since you dont need to update these you can create a view from joining tables then you can use less query in your PHP. But views cannot update. And you did not mentioned about relationship between tables. Because of that I have to go with the UNION.
Like this,
CREATE VIEW checkStatus
AS
SELECT
DATE(upload_date_added) as upload_date,
SUM(upload_size) as upload_traffic,
SUM(upload_files) as upload_files
FROM packages_uploads
WHERE upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY upload_date
ORDER BY upload_date DESC
UNION
SELECT
DATE(download_date_added) as download_date,
SUM(download_size) as download_traffic,
SUM(download_files) as download_files
FROM packages_downloads
WHERE download_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY download_date
ORDER BY download_date DESC
Then anywhere you want to select you just need one line:
SELECT * FROM checkStatus
learn more.

MySQL Group By and HAVING

I'm a MySQL query noobie so I'm sure this is a question with an obvious answer.
But, I was looking at these two queries. Will they return different result sets? I understand that the sorting process would commence differently, but I believe they will return the same results with the first query being slightly more efficient?
Query 1: HAVING, then AND
SELECT user_id
FROM forum_posts
GROUP BY user_id
HAVING COUNT(id) >= 100
AND user_id NOT IN (SELECT user_id FROM banned_users)
Query 2: WHERE, then HAVING
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN(SELECT user_id FROM banned_users)
GROUP BY user_id
HAVING COUNT(id) >= 100
Actually the first query will be less efficient (HAVING applied after WHERE).
UPDATE
Some pseudo code to illustrate how your queries are executed ([very] simplified version).
First query:
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Group, count, etc.
4. Exclude records from the first result set if they are presented in the second
Second query
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Exclude records from the first result set if they are presented in the second
4. Group, count, etc.
The order of steps 1,2 is not important, mysql can choose whatever it thinks is better. The important difference is in steps 3,4. Having is applied after GROUP BY. Grouping is usually more expensive than joining (excluding records can be considering as join operation in this case), so the fewer records it has to group, the better performance.
You have already answers that the two queries will show same results and various opinions for which one is more efficient.
My opininion is that there will be a difference in efficiency (speed), only if the optimizer yields with different plans for the 2 queries. I think that for the latest MySQL versions the optimizers are smart enough to find the same plan for either query so there will be no difference at all but off course one can test and see either the excution plans with EXPLAIN or running the 2 queries against some test tables.
I would use the second version in any case, just to play safe.
Let me add that:
COUNT(*) is usually more efficient than COUNT(notNullableField) in MySQL. Until that is fixed in future MySQL versions, use COUNT(*) where applicable.
Therefore, you can also use:
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN
( SELECT user_id FROM banned_users )
GROUP BY user_id
HAVING COUNT(*) >= 100
There are also other ways to achieve same (to NOT IN) sub-results before applying GROUP BY.
Using LEFT JOIN / NULL :
SELECT fp.user_id
FROM forum_posts AS fp
LEFT JOIN banned_users AS bu
ON bu.user_id = fp.user_id
WHERE bu.user_id IS NULL
GROUP BY fp.user_id
HAVING COUNT(*) >= 100
Using NOT EXISTS :
SELECT fp.user_id
FROM forum_posts AS fp
WHERE NOT EXISTS
( SELECT *
FROM banned_users AS bu
WHERE bu.user_id = fp.user_id
)
GROUP BY fp.user_id
HAVING COUNT(*) >= 100
Which of the 3 methods is faster depends on your table sizes and a lot of other factors, so best is to test with your data.
HAVING conditions are applied to the grouped by results, and since you group by user_id, all of their possible values will be present in the grouped result, so the placing of the user_id condition is not important.
To me, second query is more efficient because it lowers the number of records for GROUP BY and HAVING.
Alternatively, you may try the following query to avoid using IN:
SELECT `fp`.`user_id`
FROM `forum_posts` `fp`
LEFT JOIN `banned_users` `bu` ON `fp`.`user_id` = `bu`.`user_id`
WHERE `bu`.`user_id` IS NULL
GROUP BY `fp`.`user_id`
HAVING COUNT(`fp`.`id`) >= 100
Hope this helps.
No it does not gives same results.
Because first query will filter records from count(id) condition
Another query filter records and then apply having clause.
Second Query is correctly written