I am trying to create an update query and making little progress in getting the right syntax.
The following query is working:
SELECT t.Index1, t.Index2, COUNT( m.EventType )
FROM Table t
LEFT JOIN MEvents m ON
(m.Index1 = t.Index1 AND
m.Index2 = t.Index2 AND
(m.EventType = 'A' OR m.EventType = 'B')
)
WHERE (t.SpecialEventCount IS NULL)
GROUP BY t.Index1, t.Index2
It creates a list of triplets Index1,Index2,EventCounts.
It only does this for case where t.SpecialEventCount is NULL. The update query I am trying to write should set this SpecialEventCount to that count, i.e. COUNT(m.EventType) in the query above. This number could be 0 or any positive number (hence the left join). Index1 and Index2 together are unique in Table t and they are used to identify events in MEvent.
How do I have to modify the select query to become an update query? I.e. something like
UPDATE Table SET SpecialEventCount=COUNT(m.EventType).....
but I am confused what to put where and have failed with numerous different guesses.
I take it that (Index1, Index2) is a unique key on Table, otherwise I would expect the reference to t.SpecialEventCount to result in an error.
Edited query to use subquery as it didn't work using GROUP BY
UPDATE
Table AS t
LEFT JOIN (
SELECT
Index1,
Index2,
COUNT(EventType) AS NumEvents
FROM
MEvents
WHERE
EventType = 'A' OR EventType = 'B'
GROUP BY
Index1,
Index2
) AS m ON
m.Index1 = t.Index1 AND
m.Index2 = t.Index2
SET
t.SpecialEventCount = m.NumEvents
WHERE
t.SpecialEventCount IS NULL
Doing a left join with a subquery will generate a giant
temporary table in-memory that will have no indexes.
For updates, try avoiding joins and using correlated
subqueries instead:
UPDATE
Table AS t
SET
t.SpecialEventCount = (
SELECT COUNT(m.EventType)
FROM MEvents m
WHERE m.EventType in ('A','B')
AND m.Index1 = t.Index1
AND m.Index2 = t.Index2
)
WHERE
t.SpecialEventCount IS NULL
Do some profiling, but this can be significantly faster in some cases.
my example
update card_crowd as cardCrowd
LEFT JOIN
(
select cc.id , count(1) as num
from card_crowd cc LEFT JOIN
card_crowd_r ccr on cc.id = ccr.crowd_id
group by cc.id
) as tt
on cardCrowd.id = tt.id
set cardCrowd.join_num = tt.num;
Related
Still trying to learn my way around indexes, shouldn't the JOINs in the outer query be using the index on the Primary Key? Are indexes not working in combination with subqueries? Thanks!
SELECT SQL_BIG_RESULT
I.item_group_id
FROM
(
SELECT SQL_BIG_RESULT
MAX(ITM.id) as max_id
FROM a_movements M
JOIN a_items_to_movements ITM ON ITM.movement_id = M.id -- Index used
WHERE M.warehouse_id IN (...) -- Index used
GROUP BY ITM.item_id
ORDER BY NULL
) X
JOIN a_items_to_movements ITM ON ITM.id = X.max_id -- Index not used
JOIN a_movements M ON M.id = ITM.movement_id
AND M.direction = 0
AND M.settled IS NOT NULL
JOIN a_items I ON I.id = ITM.item_id -- Index not used
GROUP BY I.item_group_id
ORDER BY NULL
EDIT: attached EXPLAIN output here: https://imgur.com/PdO3mIo
I have a query like this . I have compound index for CC.key1,CC.key2.
I am executing this in a big database
Select * from CC where
( (
(select count(*) from Service s
where CC.key1=s.sr2 and CC.key2=s.sr1) > 2
AND
CC.key3='new'
)
OR
(
(select count(*) from Service s
where CC.key1=s.sr2 and CC.key2=s.sr1) <= 2
)
)
limit 10000;
I tried to make it as inner join , but its getting slower . How can i optimize this query ?
The trick here is being able to articulate a query for the problem:
SELECT *
FROM CC t1
INNER JOIN
(
SELECT cc.key1, cc.key2
FROM CC cc
LEFT JOIN Service s
ON cc.key1 = s.sr2 AND
cc.key2 = s.sr1
GROUP BY cc.key1, cc.key2
HAVING COUNT(*) <= 2 OR
SUM(CASE WHEN cc.key = 'new' THEN 1 ELSE 0 END) > 2
) t2
ON t1.key1 = t2.key1 AND
t1.key2 = t2.key2
Explanation:
Your original two subqueries would only add to the count if a given record in CC, with a given key1 and key2 value, matched to a corresponding record in the Service table. The strategy behind my inner query is to use GROUP BY to count the number of times that this happens, and use this instead of your subqueries. The first count condition is your bottom subquery, and the second one is the top.
The inner query finds all key1, key2 pairs in CC corresponding to records which should be retained. And recognize that these two columns are the only criteria in your original query for determining whether a record from CC gets retained. Then, this inner query can be inner joined to CC again to get your final result set.
In terms of performance, even this answer could leave something to be desired, but it should be better than a massive correlated subquery, which is what you had.
Basically get the Columns that must not have a duplicate then join them together. Example:
select *
FROM Table_X A
WHERE exists (SELECT 1
FROM Table_X B
WHERE 1=1
and a.SHOULD_BE_UNIQUE = b.SHOULD_BE_UNIQUE
and a.SHOULD_BE_UNIQUE2 = b.SHOULD_BE_UNIQUE2
/* excluded because these columns are null or can be Duplicated*/
--and a.GENERIC_COLUMN = b.GENERIC_COLUMN
--and a.GENERIC_COLUMN2 = b.GENERIC_COLUMN2
--and a.NULL_COLUMN = b.NULL_COLUMN
--and a.NULL_COLUMN2 = b.NULL_COLUMN2
and b.rowid > a.ROWID);
Where SHOULD_BE_UNIQUE and SHOULD_BE_UNIQUE2 are columns that shouldn't be repeated and have unique columns and the GENERIC_COLUMN and NULL_COLUMNS can be ignored so just leave them out of the query.
Been using this approach when we have issues in Duplicate Records.
With the limited information you've given us, this could be a rewrite using 'simplified' logic:
SEELCT *
FROM CC NATURAL JOIN
( SELECT key1, key2, COUNT(*) AS tally
FROM Service
GROUP
BY key1, key2 ) AS t
WHERE key3 = 'new' OR tally <= 2;
Not sure whether it will perform better but might give you some ideas of what to try next?
I have a query which is pretty that contains LEFT JOIN subquery. It takes 20 minutes to load completely.
Here is my query:
UPDATE orders AS o
LEFT JOIN (
SELECT obe_order_master_id, COUNT(id) AS count_files, id, added
FROM customer_instalments
GROUP BY obe_order_master_id
) AS oci ON oci.obe_order_master_id = SUBSTRING(o.order_id, 4)
SET o.final_customer_file_id = oci.id,
o.client_work_delivered = oci.added
WHERE oci.count_files = 1
Is there any way that I can make this query runs faster?
Move Where condition in Temp Table and replace WHERE with HAVING Clause, this will eliminate unnecessary rows from temp table so reduce the filtering and may help to improve performance
UPDATE orders AS o
LEFT JOIN (
SELECT obe_order_master_id, id, added
FROM customer_instalments
GROUP BY obe_order_master_id
HAVING COUNT(id) = 1
) AS oci ON oci.obe_order_master_id = SUBSTRING(o.order_id, 4)
SET o.final_customer_file_id = oci.id,
o.client_work_delivered = oci.added
I would suggest to create separate column for Order_id substring and make index on it. Then use this column in WHERE.
For the longest time I couldn't figure out why an incorrect COUNT(*) values were being returned. After incrementally removing parts of my query I finally realized that joining tables were the reason behind the incorrect values.
This is the query I'm working with:
SELECT `profiles`.`logo` AS logo,
`companies`.`company_name`,
`companies`.`url_slug`,
count(*)
FROM (`companies`)
JOIN `users` ON `users`.`id` = `companies`.`user_id`
JOIN `categories` ON `categories`.`company_id` = `companies`.`id`
JOIN `products` ON `products`.`company_id` = `companies`.`id`
JOIN `profiles` ON `profiles`.`company_id` = `companies`.`id`
WHERE `users`.`last_login` IS NOT NULL
AND `categories`.`category_id` = '3'
AND `products`.`active` = 1
AND `products`.`xmp_1` = 1
AND `products`.`xmp_2` = 1
AND `profiles`.`field_a` = 1
GROUP BY `companies`.`id`
Running this in my SQL program returns 28 rows, but the COUNT(*) row returns something like 1400. I'm not sure where to head from here. I need a column returned that returns the 28 instead of 1400.
SQL Fiddle with sample data: http://sqlfiddle.com/#!2/97d2d/9
There's no way to do this with a simple query. Chris Hayes shows a way to do this with sub-queries and there are a variety of ways to do that.
The reason is that aggregation functions (like COUNT) only work on the GROUP that a row represents... in this case, each row represents the set of results that have the same company id. There's no way for field in a row to show aggregation information that extends beyond its content, in a simple query.
With subqueries, you can generate the first table, and then aggregate over that. Or, you could just use the fact that the total number of rows returned is included as metadata on the data that is sent back to whichever client submitted the query in the first place, thereby removing the need to include it in the data.
The following query works:
SELECT *, COUNT(*) FROM (
SELECT `profiles`.`logo` AS logo,
`companies`.`company_name`,
`companies`.`url_slug`
FROM (`companies`)
JOIN `users` ON `users`.`id` = `companies`.`user_id`
JOIN `categories` ON `categories`.`company_id` = `companies`.`id`
JOIN `products` ON `products`.`company_id` = `companies`.`id`
JOIN `profiles` ON `profiles`.`company_id` = `companies`.`id`
WHERE `users`.`last_login` IS NOT NULL
#AND `categories`.`category_id` = '3'
AND `products`.`active` = 1
#AND `products`.`xmp_1` = 1
#AND `products`.`xmp_2` = 1
AND `profiles`.`field_a` = 1
GROUP BY `companies`.`id`
) AS stuff
I think somehow your original COUNT(*) is looking at pre-join-condition or pre-grouping amounts.
Everything in the following query results in one line for each invBlueprintTypes row with the correct information. But I'm trying to add something to it. See below the codeblock.
Select
blueprintType.typeID,
blueprintType.typeName Blueprint,
productType.typeID,
productType.typeName Item,
productType.portionSize,
blueprintType.basePrice * 0.9 As bpoPrice,
productGroup.groupName ItemGroup,
productCategory.categoryName ItemCategory,
blueprints.productionTime,
blueprints.techLevel,
blueprints.researchProductivityTime,
blueprints.researchMaterialTime,
blueprints.researchCopyTime,
blueprints.researchTechTime,
blueprints.productivityModifier,
blueprints.materialModifier,
blueprints.wasteFactor,
blueprints.maxProductionLimit,
blueprints.blueprintTypeID
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
So what I need to get in here is the following table with the columns below it so I can use the values timestamp and sort the entire result by profitHour
tablename: invBlueprintTypesPrices
columns: blueprintTypeID, timestamp, profitHour
I need this information with the following select in mind. Using a select to show my intention of the JOIN/in-query select or whatever that can do this.
SELECT * FROM invBlueprintTypesPrices
WHERE blueprintTypeID = blueprintType.typeID
ORDER BY timestamp DESC LIMIT 1
And I need the main row from table invBlueprintTypes to still show even if there is no result from the invBlueprintTypesPrices. The LIMIT 1 is because I want the newest row possible, but deleting the older data is not a option since history is needed.
If I've understood correctly I think I need a subquery select, but how to do that? I've tired adding the exact query that is above with a AS blueprintPrices after the query's closing ), but did not work with a error with the
WHERE blueprintTypeID = blueprintType.typeID
part being the focus of the error. I have no idea why. Anyone who can solve this?
You'll need to use a LEFT JOIN to check for NULL values in invBlueprintTypesPrices. To mimic the LIMIT 1 per TypeId, you can use the MAX() or to truly make sure you only return a single record, use a row number -- this depends on whether you can have multiple max time stamps for each type id. Assuming not, then this should be close:
Select
...
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Left Join (
SELECT MAX(TimeStamp) MaxTime, TypeId
FROM invBlueprintTypesPrices
GROUP BY TypeId
) blueprintTypePrice On blueprints.blueprintTypeID = blueprintTypePrice.typeID
Left Join invBlueprintTypesPrices blueprintTypePrices On
blueprintTypePrice.TypeId = blueprintTypePrices.TypeId AND
blueprintTypePrice.MaxTime = blueprintTypePrices.TimeStamp
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
Order By
blueprintTypePrices.profitHour
Assuming you might have the same max time stamp with 2 different records, replace the 2 left joins above with something similar to this getting the row number:
Left Join (
SELECT #rn:=IF(#prevTypeId=TypeId,#rn+1,1) rn,
TimeStamp,
TypeId,
profitHour,
#prevTypeId:=TypeId
FROM (SELECT *
FROM invBlueprintTypesPrices
ORDER BY TypeId, TimeStamp DESC) t
JOIN (SELECT #rn:=0) t2
) blueprintTypePrices On blueprints.blueprintTypeID = blueprintTypePrices.typeID AND blueprintTypePrices.rn=1
You don't say where you are putting the subquery. If in the select clause, then you have a problem because you are returning more than one value.
You can't put this into the from clause directly, because you have a correlated subquery (not allowed).
Instead, you can put it in like this:
from . . .
(select *
from invBLueprintTypesPrices ibptp
where ibtp.timestamp = (select ibptp2.timestamp
from invBLueprintTypesPrices ibptp2
where ibptp.blueprintTypeId = ibptp2.blueprintTypeId
order by timestamp desc
limit 1
)
) ibptp
on ibptp.blueprintTypeId = blueprintType.TypeID
This identifies the most recent records for all the blueprintTypeids in the subquery. It then joins in the one that matches.