For the longest time I couldn't figure out why an incorrect COUNT(*) values were being returned. After incrementally removing parts of my query I finally realized that joining tables were the reason behind the incorrect values.
This is the query I'm working with:
SELECT `profiles`.`logo` AS logo,
`companies`.`company_name`,
`companies`.`url_slug`,
count(*)
FROM (`companies`)
JOIN `users` ON `users`.`id` = `companies`.`user_id`
JOIN `categories` ON `categories`.`company_id` = `companies`.`id`
JOIN `products` ON `products`.`company_id` = `companies`.`id`
JOIN `profiles` ON `profiles`.`company_id` = `companies`.`id`
WHERE `users`.`last_login` IS NOT NULL
AND `categories`.`category_id` = '3'
AND `products`.`active` = 1
AND `products`.`xmp_1` = 1
AND `products`.`xmp_2` = 1
AND `profiles`.`field_a` = 1
GROUP BY `companies`.`id`
Running this in my SQL program returns 28 rows, but the COUNT(*) row returns something like 1400. I'm not sure where to head from here. I need a column returned that returns the 28 instead of 1400.
SQL Fiddle with sample data: http://sqlfiddle.com/#!2/97d2d/9
There's no way to do this with a simple query. Chris Hayes shows a way to do this with sub-queries and there are a variety of ways to do that.
The reason is that aggregation functions (like COUNT) only work on the GROUP that a row represents... in this case, each row represents the set of results that have the same company id. There's no way for field in a row to show aggregation information that extends beyond its content, in a simple query.
With subqueries, you can generate the first table, and then aggregate over that. Or, you could just use the fact that the total number of rows returned is included as metadata on the data that is sent back to whichever client submitted the query in the first place, thereby removing the need to include it in the data.
The following query works:
SELECT *, COUNT(*) FROM (
SELECT `profiles`.`logo` AS logo,
`companies`.`company_name`,
`companies`.`url_slug`
FROM (`companies`)
JOIN `users` ON `users`.`id` = `companies`.`user_id`
JOIN `categories` ON `categories`.`company_id` = `companies`.`id`
JOIN `products` ON `products`.`company_id` = `companies`.`id`
JOIN `profiles` ON `profiles`.`company_id` = `companies`.`id`
WHERE `users`.`last_login` IS NOT NULL
#AND `categories`.`category_id` = '3'
AND `products`.`active` = 1
#AND `products`.`xmp_1` = 1
#AND `products`.`xmp_2` = 1
AND `profiles`.`field_a` = 1
GROUP BY `companies`.`id`
) AS stuff
I think somehow your original COUNT(*) is looking at pre-join-condition or pre-grouping amounts.
Related
I have this two version of the same query. Both produce same results (164 rows). But the second one takes .5 sec while the 1st one takes 17 sec. Can someone explain what's going on here?
TABLE organizations : 11988 ROWS
TABLE transaction_metas : 58232 ROWS
TABLE contracts_history : 219469 ROWS
# TAKES 17 SEC
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
LEFT JOIN `transaction_metas` as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token` and `tm`.`field` = '1'
WHERE `contracts_history`.`seller_id` = '850'
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
# TAKES .6 SEC
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
left join (select * from `transaction_metas` where contract_token in (select token from `contracts_history` where seller_id = 850)) as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token` and `tm`.`field` = '1'
WHERE `contracts_history`.`seller_id` = '850'
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
Explain Results:
First Query: https://prnt.sc/hjtiw6
Second Query: https://prnt.sc/hjtjjg
As based on my debugging of the first query it was clear that left join to transaction_metas table was making it slow, So I tried to limit its rows instead of joining to the full table. It seems to work but I don't understand why.
Join is a set of combinations from rows in your tables. That in mind, in the first query the engine combines all the results to filter just after. In second case one it applies the filter before it tries make the combinations.
The best case would make use of filter in JOIN clause without subquery.
Much like this:
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
AND `contracts_history`.`seller_id` = '850'
LEFT JOIN `transaction_metas` as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token`
AND `tm`.`field` = 1
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
Note: When you reduce the size of the join tables by filtering with subqueries, it may allow the rows fit into the buffer. Nice trick to small buffer limit.
A Better explication:
https://dev.mysql.com/doc/refman/5.5/en/explain-output.html
I am attempting to count the number of rows from a given query. But count returns more rows than it should. What is happening?
This query returns only 1 row.
select *
from `opportunities`
inner join `companies` on `opportunities`.`company_id` = `companies`.`id`
left join `opportunityTags` on `opportunities`.`id` = `opportunityTags`.`opportunity_id`
where `opportunities`.`isPublished` = '1' and `opportunities`.`Company_id` = '1'
group by `opportunities`.`id` ;
This query returns that there are 3 rows.
select count(*) as aggregate
from `opportunities`
inner join `companies` on `opportunities`.`company_id` = `companies`.`id`
left join `opportunityTags` on `opportunities`.`id` = `opportunityTags`.`opportunity_id`
where `opportunities`.`isPublished` = '1' and `opportunities`.`Company_id` = '1'
group by `opportunities`.`id`;
When you select count(*) it is counting before the group by. You can probably (unfortunately my realm is SQL Server and I don't have a mySQL instance to test) fix this by using the over() function.
For example:
select count(*) over (partition by `opportunities`.`id`)
EDIT: Actually doesn't look like this is available in mySQL, my bad. How about just wrapping the whole thing in a new select statement? It's not the most elegant solution, but will give you the figure you're after.
I have a MySQL-question which I can't seem to figure out.
I have two query's and some PHP and I think it should be possible to merge this into one query.
The thing I want to do is: show all users that completed all modules of a certain programm with id=1
Query 1 -> this gives all modules of the programm:
SELECT `id`
FROM `modules`
JOIN `linkModuleProgramm` ON `module`.`id` = `linkModuleProgramm`.`module`
WHERE `linkModuleProgramm`.`programm` = 1
Query 2 -> this gives the user and the number of modules of the programm that are completed by this user:
SELECT `user`.`id`,
COUNT(*) as 'count'
FROM `user`
JOIN `linkUserModule` ON `user`.`id` = `linkUserModule`.`user`
WHERE `linkUserModule`.`status` = 1
AND `linkUserModule`.`module` IN (Query1)
GROUP BY `user`.`id`
Then PHP filters the users from Query2 where 'count' is equal to the number of results from Query1.
Anyone has a suggestion?
Looks like I haven't understood your question right the first time. Another try, using two queries and MySQL variable:
SET #MODULESCOUNT = (
SELECT COUNT(*)
FROM `modules`
JOIN `linkModuleProgramm` ON `module`.`id` = `linkModuleProgramm`.`module`
WHERE `linkModuleProgramm`.`programm` = 1
);
SELECT `user`.`id`,
FROM `user`
JOIN `linkUserModule` ON `user`.`id` = `linkUserModule`.`user`
WHERE `linkUserModule`.`status` = 1
AND `linkUserModule`.`module` IN (Query1)
GROUP BY `user`.`id`
HAVING COUNT(*)=#MODULESCOUNT ;
There is also a possibility to use 'subquery' instead of variable:
SELECT `user`.`id`,
FROM `user`
JOIN `linkUserModule` ON `user`.`id` = `linkUserModule`.`user`
WHERE `linkUserModule`.`status` = 1
AND `linkUserModule`.`module` IN (Query1)
GROUP BY `user`.`id`
HAVING COUNT(*)=
(
SELECT COUNT(*)
FROM `modules`
JOIN `linkModuleProgramm` ON `module`.`id` = `linkModuleProgramm`.`module`
WHERE `linkModuleProgramm`.`programm` = 1
)
but I don't like subqueries inside 'where' or 'having' sections, mysql sometimes decides to do unreasonably inefficient things because of them.
Everything in the following query results in one line for each invBlueprintTypes row with the correct information. But I'm trying to add something to it. See below the codeblock.
Select
blueprintType.typeID,
blueprintType.typeName Blueprint,
productType.typeID,
productType.typeName Item,
productType.portionSize,
blueprintType.basePrice * 0.9 As bpoPrice,
productGroup.groupName ItemGroup,
productCategory.categoryName ItemCategory,
blueprints.productionTime,
blueprints.techLevel,
blueprints.researchProductivityTime,
blueprints.researchMaterialTime,
blueprints.researchCopyTime,
blueprints.researchTechTime,
blueprints.productivityModifier,
blueprints.materialModifier,
blueprints.wasteFactor,
blueprints.maxProductionLimit,
blueprints.blueprintTypeID
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
So what I need to get in here is the following table with the columns below it so I can use the values timestamp and sort the entire result by profitHour
tablename: invBlueprintTypesPrices
columns: blueprintTypeID, timestamp, profitHour
I need this information with the following select in mind. Using a select to show my intention of the JOIN/in-query select or whatever that can do this.
SELECT * FROM invBlueprintTypesPrices
WHERE blueprintTypeID = blueprintType.typeID
ORDER BY timestamp DESC LIMIT 1
And I need the main row from table invBlueprintTypes to still show even if there is no result from the invBlueprintTypesPrices. The LIMIT 1 is because I want the newest row possible, but deleting the older data is not a option since history is needed.
If I've understood correctly I think I need a subquery select, but how to do that? I've tired adding the exact query that is above with a AS blueprintPrices after the query's closing ), but did not work with a error with the
WHERE blueprintTypeID = blueprintType.typeID
part being the focus of the error. I have no idea why. Anyone who can solve this?
You'll need to use a LEFT JOIN to check for NULL values in invBlueprintTypesPrices. To mimic the LIMIT 1 per TypeId, you can use the MAX() or to truly make sure you only return a single record, use a row number -- this depends on whether you can have multiple max time stamps for each type id. Assuming not, then this should be close:
Select
...
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Left Join (
SELECT MAX(TimeStamp) MaxTime, TypeId
FROM invBlueprintTypesPrices
GROUP BY TypeId
) blueprintTypePrice On blueprints.blueprintTypeID = blueprintTypePrice.typeID
Left Join invBlueprintTypesPrices blueprintTypePrices On
blueprintTypePrice.TypeId = blueprintTypePrices.TypeId AND
blueprintTypePrice.MaxTime = blueprintTypePrices.TimeStamp
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
Order By
blueprintTypePrices.profitHour
Assuming you might have the same max time stamp with 2 different records, replace the 2 left joins above with something similar to this getting the row number:
Left Join (
SELECT #rn:=IF(#prevTypeId=TypeId,#rn+1,1) rn,
TimeStamp,
TypeId,
profitHour,
#prevTypeId:=TypeId
FROM (SELECT *
FROM invBlueprintTypesPrices
ORDER BY TypeId, TimeStamp DESC) t
JOIN (SELECT #rn:=0) t2
) blueprintTypePrices On blueprints.blueprintTypeID = blueprintTypePrices.typeID AND blueprintTypePrices.rn=1
You don't say where you are putting the subquery. If in the select clause, then you have a problem because you are returning more than one value.
You can't put this into the from clause directly, because you have a correlated subquery (not allowed).
Instead, you can put it in like this:
from . . .
(select *
from invBLueprintTypesPrices ibptp
where ibtp.timestamp = (select ibptp2.timestamp
from invBLueprintTypesPrices ibptp2
where ibptp.blueprintTypeId = ibptp2.blueprintTypeId
order by timestamp desc
limit 1
)
) ibptp
on ibptp.blueprintTypeId = blueprintType.TypeID
This identifies the most recent records for all the blueprintTypeids in the subquery. It then joins in the one that matches.
I am trying to create an update query and making little progress in getting the right syntax.
The following query is working:
SELECT t.Index1, t.Index2, COUNT( m.EventType )
FROM Table t
LEFT JOIN MEvents m ON
(m.Index1 = t.Index1 AND
m.Index2 = t.Index2 AND
(m.EventType = 'A' OR m.EventType = 'B')
)
WHERE (t.SpecialEventCount IS NULL)
GROUP BY t.Index1, t.Index2
It creates a list of triplets Index1,Index2,EventCounts.
It only does this for case where t.SpecialEventCount is NULL. The update query I am trying to write should set this SpecialEventCount to that count, i.e. COUNT(m.EventType) in the query above. This number could be 0 or any positive number (hence the left join). Index1 and Index2 together are unique in Table t and they are used to identify events in MEvent.
How do I have to modify the select query to become an update query? I.e. something like
UPDATE Table SET SpecialEventCount=COUNT(m.EventType).....
but I am confused what to put where and have failed with numerous different guesses.
I take it that (Index1, Index2) is a unique key on Table, otherwise I would expect the reference to t.SpecialEventCount to result in an error.
Edited query to use subquery as it didn't work using GROUP BY
UPDATE
Table AS t
LEFT JOIN (
SELECT
Index1,
Index2,
COUNT(EventType) AS NumEvents
FROM
MEvents
WHERE
EventType = 'A' OR EventType = 'B'
GROUP BY
Index1,
Index2
) AS m ON
m.Index1 = t.Index1 AND
m.Index2 = t.Index2
SET
t.SpecialEventCount = m.NumEvents
WHERE
t.SpecialEventCount IS NULL
Doing a left join with a subquery will generate a giant
temporary table in-memory that will have no indexes.
For updates, try avoiding joins and using correlated
subqueries instead:
UPDATE
Table AS t
SET
t.SpecialEventCount = (
SELECT COUNT(m.EventType)
FROM MEvents m
WHERE m.EventType in ('A','B')
AND m.Index1 = t.Index1
AND m.Index2 = t.Index2
)
WHERE
t.SpecialEventCount IS NULL
Do some profiling, but this can be significantly faster in some cases.
my example
update card_crowd as cardCrowd
LEFT JOIN
(
select cc.id , count(1) as num
from card_crowd cc LEFT JOIN
card_crowd_r ccr on cc.id = ccr.crowd_id
group by cc.id
) as tt
on cardCrowd.id = tt.id
set cardCrowd.join_num = tt.num;