Too many rows when Joining to same table twice

Too many rows when Joining to same table twice - mysql

I have two tables, Product and Benchmark
A benchmark is linked to only one product. There can only be one benchmark per year per product.
I would like to retrieve every products' name for a set of years, and count how many benchmark there are for each product.
SELECT p.name,
p.id,
COUNT(p.id) AS nb_benchmark
FROM product p
INNER JOIN benchmark b0 ON b0.product_id = p.id
INNER JOIN benchmark b1 ON b1.product_id = p.id
WHERE p.owner = "MyCompany"
AND b0.year = 2011
AND b1.year = 2012
GROUP BY p.id
ORDER BY nb_trials DESC
But the count is wrong, it's way to high, it even gives me more results than there actually are in the database. I guess it's because of the JOINs, but I don't know how to build the query.

Remember that the basis of SQL joining is the cartesian product of rows in the referenced tables, which are then eliminated by filters and join conditions. Because you are joining TWICE to table benchmark, which from the nature of your query, we can assume has many benchmark rows per product per benchmark year.
e.g. 1 Product with 3 Benchmark rows each for 2011 and 2012
FROM product p -- 1 Product Row
INNER JOIN benchmark b0 ON b0.product_id = p.id -- 1 x 3 = 3
INNER JOIN benchmark b1 ON b1.product_id = p.id -- 1 x 3 x 3 = 9
So the multiple joins to benchmark introduces duplicate rows for product, which are then counted.
You can use COUNT(DISTINCT xx) to count distinct values, so your query should be of the form:
SELECT p.name,
p.id,
COUNT(DISTINCT p.id) AS distinct_products,
COUNT(DISTINCT b.name) AS distinct_benchmark_names
-- etc
FROM ...
Other Notes
for correctness sake you should GROUP BY both p.id and p.name. Although MySql allows this, other RDBMS are more strict.

Try this:
SELECT p.name,
p.id,
COUNT(b0.id) AS nb_benchmark
FROM product p
INNER JOIN benchmark b0 ON b0.product_id = p.id
WHERE p.owner = "MyCompany"
AND b0.year IN (2011, 2012)
GROUP BY p.name, p.id
ORDER BY nb_trials DESC

I have found a way to achieve what I wanted
SELECT p.name, p.id, COUNT(DISTINCT(b0.id)) + COUNT(DISTINCT(b1.id)) as nb_benchmark
FROM product p
INNER JOIN benchamrk b0 ON b0.product_id = p.id AND b0.year = 2011
INNER JOIN benchamrk b1 ON b1.product_id = p.id AND b1.year = 2012
WHERE
p.owner = "myCompany"
GROUP BY p.id
ORDER BY nb_benchmark DESC

Try this.
SELECT p.id, p.name, b.nb_benchmark
FROM product p
JOIN (
/* number of benchpark per product for years 2011 and 2012 */
SELECT product_id, COUNT(*) AS nb_benchmark
FROM benchmark
WHERE year = 2011 OR year = 2012
GROUP BY product_id
) b ON p.id = b.product_id
WHERE p.owner = "MyCompany"
ORDER BY nb_benchmark DESC

Related

Get total sum and count of a column in MySql

Is a nested SELECT statement possible in sql? I'm working on a problem and I can't seem to get the data that I want. This is the sql that Im querying:
SELECT derived.municipality, count(*) as counts, derived.bearing
from (SELECT m.name as municipality, count(*) as totalcount, sum(f.no_of_bearing_trees) as bearing
from farmer_profile f
inner join barangay b on f.barangay_id = b.id
inner join municipality m on b.municipality_id = m.id
inner join province p on m.province_id = p.id
group by b.name) as derived
group by derived.municipality, derived.bearing
Here is the sample data im working with. I want to get the sum of all the bearing and total counts when i put a where clause at the bottom (eg. where derived.bearing < 20). All of those bearings with less than 20 will totaled as well as their counts. I'm not sure if a subquery is needed again or not.

I suspect that you want to filter on municipalities whose bearing sum is less than 20. If so, you can use a having clause for this:
select
m.name as municipality,
count(*) as totalcount,
sum(f.no_of_bearing_trees) as bearing
from farmer_profile f
inner join barangay b on f.barangay_id = b.id
inner join municipality m on b.municipality_id = m.id
inner join province p on m.province_id = p.id
group by b.name
having sum(f.no_of_bearing_trees) < 20
MySQL is lax about column aliases in the having clause, so you can also do:
having bearing < 20

Mysql - optimisation - multiple group_concat & joins using having

I've looked at similar group_concat mysql optimisation threads but none seem relevant to my issue, and my mysql knowledge is being stretched with this one.
I have been tasked with improving the speed of a script with an extremely heavy Mysql query contained within.
The query in question uses GROUP_CONCAT to create a list of colours, tags and sizes all relevant to a particular product. It then uses HAVING / FIND_IN_SET to filter these concatenated lists to find the attribute, set by the user controls and display the results.
In the example below it's looking for all products with product_tag=1, product_colour=18 and product_size=17. So this could be a blue product (colour) in medium (size) for a male (tag).
The shop_products tables contains about 3500 rows, so is not particularly large, but the below takes around 30 seconds to execute. It works OK with 1 or 2 joins, but adding in the third just kills it.
SELECT shop_products.id, shop_products.name, shop_products.default_image_id,
GROUP_CONCAT( DISTINCT shop_product_to_colours.colour_id ) AS product_colours,
GROUP_CONCAT( DISTINCT shop_products_to_tag.tag_id ) AS product_tags,
GROUP_CONCAT( DISTINCT shop_product_colour_to_sizes.tag_id ) AS product_sizes
FROM shop_products
LEFT JOIN shop_product_to_colours ON shop_products.id = shop_product_to_colours.product_id
LEFT JOIN shop_products_to_tag ON shop_products.id = shop_products_to_tag.product_id
LEFT JOIN shop_product_colour_to_sizes ON shop_products.id = shop_product_colour_to_sizes.product_id
WHERE shop_products.category_id = '50'
GROUP BY shop_products.id
HAVING((FIND_IN_SET( 1, product_tags ) >0)
AND(FIND_IN_SET( 18, product_colours ) >0)
AND(FIND_IN_SET( 17, product_sizes ) >0))
ORDER BY shop_products.name ASC
LIMIT 0 , 30
I was hoping somebody could generally advise a better way to structure this query without re-structuring the database (which isn't really an option at this point without weeks of data migration and script changes)? Or any general advise on optimisation. Using explain currently returns the below (as you can see the indexes are all over the place!).
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE shop_products ref category_id,category_id_2 category_id 2 const 3225 Using where; Using temporary; Using filesort
1 SIMPLE shop_product_to_colours ref product_id,product_id_2,product_id_3 product_id 4 candymix_db.shop_products.id 13
1 SIMPLE shop_products_to_tag ref product_id,product_id_2 product_id 4 candymix_db.shop_products.id 4
1 SIMPLE shop_product_colour_to_sizes ref product_id product_id 4 candymix_db.shop_products.id 133

Rewrite query to use WHERE instead of HAVING. Because WHERE is applied when MySQL performs search on rows and it can use index. HAVING is applied after rows are selected to filter already selected result. HAVING by design can't use indexes.
You can do it, for example, this way:
SELECT p.id, p.name, p.default_image_id,
GROUP_CONCAT( DISTINCT pc.colour_id ) AS product_colours,
GROUP_CONCAT( DISTINCT pt.tag_id ) AS product_tags,
GROUP_CONCAT( DISTINCT ps.tag_id ) AS product_sizes
FROM shop_products p
JOIN shop_product_to_colours pc_test ON p.id = pc_test.product_id AND pc_test.colour_id = 18
JOIN shop_products_to_tag pt_test ON p.id = pt_test.product_id AND pt_test.tag_id = 1
JOIN shop_product_colour_to_sizes ps_test ON p.id = ps_test.product_id AND ps_test.tag_id = 17
JOIN shop_product_to_colours pc ON p.id = pc.product_id
JOIN shop_products_to_tag pt ON p.id = pt.product_id
JOIN shop_product_colour_to_sizes ps ON p.id = ps.product_id
WHERE p.category_id = '50'
GROUP BY p.id
ORDER BY p.name ASC
Update
We are joining each table two times.
First to check if it contains some value (condition from FIND_IN_SET).
Second join will produce data for GROUP_CONCAT to select all product values from table.
Update 2
As #Matt Raines commented, if we don't need list product values with GROUP_CONCAT, query becomes even simplier:
SELECT p.id, p.name, p.default_image_id
FROM shop_products p
JOIN shop_product_to_colours pc ON p.id = pc.product_id
JOIN shop_products_to_tag pt ON p.id = pt.product_id
JOIN shop_product_colour_to_sizes ps ON p.id = ps.product_id
WHERE p.category_id = '50'
AND (pc.colour_id = 18 AND pt.tag_id = 1 AND ps.tag_id = 17)
GROUP BY p.id
ORDER BY p.name ASC
This will select all products with three filtered attributes.

I think if I understand this question, what you need to do is:
Find a list of all of the shop_product.id's that have the correct tag/color/size options
Get a list of all of the tag/color/size combinations available for that product id.
I was trying to make you a SQLFiddle for this, but the site seems broken at the moment. Try something like:
SELECT shop_products.id, shop_products.name, shop_products.default_image_id,
GROUP_CONCAT( DISTINCT shop_product_to_colours.colour_id ) AS product_colours,
GROUP_CONCAT( DISTINCT shop_products_to_tag.tag_id ) AS product_tags,
GROUP_CONCAT( DISTINCT shop_product_colour_to_sizes.tag_id ) AS product_sizes
FROM
shop_products INNER JOIN
(SELECT shop_products.id id,
FROM
shop_products
LEFT JOIN shop_product_to_colours ON shop_products.id = shop_product_to_colours.product_id
LEFT JOIN shop_products_to_tag ON shop_products.id = shop_products_to_tag.product_id
LEFT JOIN shop_product_colour_to_sizes ON shop_products.id = shop_product_colour_to_sizes.product_id
WHERE
shop_products.category_id = '50'
shop_products_to_tag.tag_id=1
shop_product_to_colours.colour_id=18
shop_product_colour_to_sizes.tag_id=17
) matches ON shop_products.id = matches.id
LEFT JOIN shop_product_to_colours ON shop_products.id = shop_product_to_colours.product_id
LEFT JOIN shop_products_to_tag ON shop_products.id = shop_products_to_tag.product_id
LEFT JOIN shop_product_colour_to_sizes ON shop_products.id = shop_product_colour_to_sizes.product_id
GROUP BY shop_products.id
ORDER BY shop_products.name ASC
LIMIT 0 , 30;
The problem with you first approach is that it requires the database to create every combination of every product and then filter. In my example, I'm filtering down the product id's first then generating the combinations.
My query is untested as I don't have a MySQL Environment handy and SQLFiddle is down, but it should give you the idea.

First, I aliased your queries to shorten readability.
SP = Shop_Products
PC = Shop_Products_To_Colours
PT = Shop_Products_To_Tag
PS = Shop_Products_To_Sizes
Next, your having should be a WHERE since you are explicitly looking FOR something. No need trying to query the entire system just to throw records after the result is returned. Third, you had LEFT-JOIN, but when applicable to a WHERE or HAVING, and you are not allowing for NULL, it forces TO a JOIN (both parts required). Finally, your WHERE clause has quotes around the ID you are looking for, but that is probably integer anyhow. Remove the quotes.
Now, for indexes and optimization there. To help with the criteria, grouping, and JOINs, I would have the following composite indexes (multiple fields) instead of a table with just individual columns as the index.
table index
Shop_Products ( category_id, id, name )
Shop_Products_To_Colours ( product_id, colour_id )
Shop_Products_To_Tag ( product_id, tag_id )
Shop_Products_To_Sizes ( product_id, tag_id )
Revised query
SELECT
SP.id,
SP.name,
SP.default_image_id,
GROUP_CONCAT( DISTINCT PC.colour_id ) AS product_colours,
GROUP_CONCAT( DISTINCT PT.tag_id ) AS product_tags,
GROUP_CONCAT( DISTINCT PS.tag_id ) AS product_sizes
FROM
shop_products SP
JOIN shop_product_to_colours PC
ON SP.id = PC.product_id
AND PC.colour_id = 18
JOIN shop_products_to_tag PT
ON SP.id = PT.product_id
AND PT.tag_id = 1
JOIN shop_product_colour_to_sizes PS
ON SP.id = PS.product_id
AND PS.tag_id = 17
WHERE
SP.category_id = 50
GROUP BY
SP.id
ORDER BY
SP.name ASC
LIMIT
0 , 30
One Final comment. Since you are ordering by the NAME, but grouping by the ID, it might cause a delay in the final sorting. HOWEVER, if you change it to group by the NAME PLUS ID, you will still be unique by the ID, but an adjusted index ON your Shop_Products to
table index
Shop_Products ( category_id, name, id )
will help both the group AND order since they will be in natural order from the index.
GROUP BY
SP.name,
SP.id
ORDER BY
SP.name ASC,
SP.ID

How to retrieve sql results with different calculated values for same column with join and group by?

Hi. Below i have written query to retrieve total-hours, last-month-total-hours and current-month-total-hours. All these are calculating from hours column of time_entries table and spent_on column of same table. Sorry if table formatting is not good.
Following three query is giving correct result.
Query#1
select p.name,
FORMAT(sum(te.hours), 2) AS totalhours
from projects p
left join time_entries te on p.id = te.project_id
group by p.id
Result#1
name totalhours
---------------- ---------------
project A 4932.18
project B
534.02
Query#2
select p.name,
FORMAT(sum(te_last_mo.hours), 2) AS totalhours_last_mo
from projects p
left join time_entries te on p.id=te_last_mo.project_id
where te_last_mo.spent_on>=DATE_FORMAT(NOW() - INTERVAL 1 MONTH, '%Y-%m-01') and te_last_mo.spent_on<DATE_FORMAT(NOW() ,'%Y-%m-1')
group by p.id
Result#2
name total_hours_last_mo
---------------- ------------------------------
project A 1726.72
project B
157.75
Query#3
select p.name,
FORMAT(sum(te_this_mo.hours), 2) AS totalhours_this_mo
from projects p
left join time_entries te_this_mo on p.id=te_this_mo.project_id
where te_this_mo.spent_on>=DATE_FORMAT(NOW() ,'%Y-%m-01') and te_this_mo.spent_on<DATE_FORMAT(NOW() ,'%Y-%m-31')
group by p.id
Result#3
name total_hours_this_mo
---------------- ------------------------------
project A 421.19
project B
41.26
The above results and query are correct.
Now i want result like this but unable to figure out.
name total_hours total_hours_last_mo total_hours_this_mo
------------ ----------------------------- ------------------------------- ------------------------------
project A 4932.18 1726.72 421.19
project B 534.02 157.75 41.26
To combine these three hour columns i wrote query like this but throwing wrong result, may be because of join three times for same table.
select p.name,
FORMAT(sum(te.hours), 2) AS totalhours,
FORMAT(sum(te_last_mo.hours), 2) AS totalhours_last_mo,
FORMAT(sum(te_this_mo.hours), 2) AS totalhours_this_mo
from projects p
left join time_entries te on p.id = te.project_id
left join time_entries te_last_mo on p.id = te_last_mo.project_id
and te_last_mo.spent_on>=DATE_FORMAT(NOW() - INTERVAL 1 MONTH, '%Y-%m-01') and te_last_mo.spent_on<DATE_FORMAT(NOW() ,'%Y-%m-1')
left join time_entries te_this_mo on p.id = te_this_mo.project_id
where te_this_mo.spent_on>=DATE_FORMAT(NOW() ,'%Y-%m-01') and te_this_mo.spent_on<DATE_FORMAT(NOW() ,'%Y-%m-31')
group by p.id
Any solution would be appreciated. Thanks in advance.

You may run the query with joins and no aggregations to see how those joins are working when used together and why that will lead to wrong results.
You can achieve the desired result by using one join and moving the criteria to aggregate calculations:
select p.name,
FORMAT(sum(te.hours), 2) AS totalhours,
FORMAT(sum(
IF(spent_on>=DATE_FORMAT(NOW() - INTERVAL 1 MONTH, '%Y-%m-01') and spent_on<DATE_FORMAT(NOW() ,'%Y-%m-1'),
hours, NULL)
), 2) AS totalhours_last_mo,
FORMAT(sum(
IF(spent_on>=DATE_FORMAT(NOW() ,'%Y-%m-01') and spent_on<DATE_FORMAT(NOW() ,'%Y-%m-31'),
hours, NULL)
), 2) AS totalhours_this_mo
from projects p
left join time_entries te on p.id = te.project_id
group by p.id

You could dump the data from the queries into temporary tables making sure the project is in each and then query those based on the project to pull it all together

Try something like:
select p.name as "name",
FORMAT(sum(te.hours), 2) AS totalhours, B.totalhours_last_mo, C.totalhours_this_mo
from projects p
left join time_entries te on p.id = te.project_id
group by p.id) AS A LEFT JOIN
(select p.name as name,
FORMAT(sum(te_last_mo.hours), 2) AS totalhours_last_mo
from projects p
left join time_entries te on p.id=te_last_mo.project_id
where te_last_mo.spent_on>=DATE_FORMAT(NOW() - INTERVAL 1 MONTH, '%Y-%m-01') and te_last_mo.spent_on<DATE_FORMAT(NOW() ,'%Y-%m-1')
group by p.id) AS B
ON A.name = B.name LEFT JOIN
(select p.name as Name,
FORMAT(sum(te_this_mo.hours), 2) AS totalhours_this_mo
from projects p
left join time_entries te_this_mo on p.id=te_this_mo.project_id
where te_this_mo.spent_on>=DATE_FORMAT(NOW() ,'%Y-%m-01') and te_this_mo.spent_on<DATE_FORMAT(NOW() ,'%Y-%m-31')
group by p.id) AS C

How can I get the sum of a column ?

I have 3 tables: activites, taks and requirements. I want to return all of the duration of all the tasks for a specific requirement. This is my query:
SELECT r.id as req_id,
r.project_id,
r.name as req_name,
r.cost,r.estimated,
p.name as project_name,
v.name AS `status` ,
t.taskid,
(SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(duration)))
FROM activities a
WHERE a.taskid = t.taskid) AS worked
FROM requirements r
INNER JOIN projects p
ON p.projectid = r.project_id
INNER JOIN `values` v
ON v.id = r.r_status_id
LEFT JOIN tasks t
on t.id_requirement = r.id
WHERE 1 = 1
ORDER BY req_id desc
And this is the result :
As you can see there are 2 same req_id (48) . I want to appear one time and get the sum of the last two rows in worked. How can I manage that ?
this is the activities structure :
this is tasks structure :
and this is the requirement structure :

Include your activities table in the JOIN, GROUP by all requirement columns you need and add a sum. Since you are aggregating tasks, you cannot have taskid in the SELECT clause.
SELECT r.id as req_id,
r.project_id,
r.name as req_name,
r.cost,r.estimated,
p.name as project_name,
v.name AS `status` ,
SEC_TO_TIME(SUM(TIME_TO_SEC(a.duration)))
FROM requirements r
INNER JOIN projects p ON p.projectid = r.project_id
INNER JOIN `values` v ON v.id = r.r_status_id
LEFT JOIN tasks t ON t.id_requirement = r.id
LEFT JOIN activities a ON a.taskid=t.taskid
WHERE 1 = 1
GROUP BY r.id, r.project_id, r.name,r.cost,r.estimated,p.name, v.name
ORDER BY req_id desc

The joins in your query appear to be creating extra rows. I'm sure there is a way to fix the logic directly, possibly by pre-aggregating some results in the from clause.
Your duplicates appear to be complete duplicates (every column is exactly the same). The easy way to fix the problem is to use select distinct. So, just start your query with:
SELECT DISTINCT r.id as req_id, r.project_id, r.name as req_name,
. . .
I suspect that one of your underlying tables has duplicated rows that you are not expecting, but that is another issue.

mysql group by not doing what i am expecting?

SELECT
bp.product_id,bs.step_number,
p.price, pd.name as product_name
FROM
builder_product bp
JOIN builder_step bs ON bp.builder_step_id = bs.builder_step_id
JOIN builder b ON bp.builder_id = b.builder_id
JOIN product p ON p.product_id = bp.product_id
JOIN product_description pd ON p.product_id = pd.product_id
WHERE b.builder_id = '74' and bs.optional != '1'
group by bs.step_number
ORDER by bs.step_number, p.price
but here is my results
88 1 575.0000 Lenovo Thinkcentre POS PC
244 2 559.0000 Touchscreen with MSR - Firebox 15"
104 3 285.0000 Remote Order Printer - Epson
97 4 395.0000 Aldelo Lite
121 5 549.0000 Cash Register Express - Pro
191 6 349.0000 Integrated Payment Processing
155 7 369.0000 Accessory - Posiflex 12.1" LCD Customer Display

That's not how GROUP BY is supposed to work. If you group by a number of columns, your select can only return:
The columns you group by
Aggregation functions from other columns, such as MIN(), MAX(), AVG()...
So you'd need to do this:
SELECT
bs.step_number,
MIN(p.price) AS min_price, pd.name as product_name
FROM
builder_product bp
JOIN builder_step bs ON bp.builder_step_id = bs.builder_step_id
JOIN builder b ON bp.builder_id = b.builder_id
JOIN product p ON p.product_id = bp.product_id
JOIN product_description pd ON p.product_id = pd.product_id
WHERE b.builder_id = '74' and bs.optional != '1'
group by bs.step_number, pd.name
ORDER by bs.step_number, min_price
(MySQL allows a very relaxed syntax and will happily remove random rows for each group but other DBMS will trigger an error with your original query.)

Join to a sub select of the tables which only contain the min value of each group
In this example. the mygroup min(amt) returns the lowest dollar item for a group
I then join this back to the main table as a full inner join to limit the records only to that minimum.
Select A.myGROUP, A.Amt
from mtest A
INNER JOIN (Select myGroup, min(Amt) as minAmt from mtest group by mygroup) B
ON B.myGroup=A.mygroup
and B.MinAmt = A.Amt

Yes. Each different group key is returned only once. This problem is not easily solved. Run two distinct queries and combine results afterwards. IF this is not an option create a temporary table for the minimum price for each step join the tables in the query.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Too many rows when Joining to same table twice - mysql

Try this: SELECT p.name, p.id, COUNT(b0.id) AS nb_benchmark FROM product p INNER JOIN benchmark b0 ON b0.product_id = p.id WHERE p.owner = "MyCompany" AND b0.year IN (2011, 2012) GROUP BY p.name, p.id ORDER BY nb_trials DESC

Related

Get total sum and count of a column in MySql

Mysql - optimisation - multiple group_concat & joins using having

How to retrieve sql results with different calculated values for same column with join and group by?

How can I get the sum of a column ?

mysql group by not doing what i am expecting?

Categories

Resources