I have two tables, one with data and another one with some values for that data. I need to select all the data, which have values like that:
(data=1 OR data=2) AND (data=3 OR data=4)
For each Info ofc there might be many values so it should be GROUP BY, but that's all my thoughts about this. I've tried something like this:
SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.info_id WHERE val IN (1) AND val IN (2) GROUP BY id
Of course it doesn't work, because there is no info with different numbers in one field. Could you help, please?
You can do this with a having clause:
SELECT id
FROM t1 LEFT JOIN t2 ON t1.id = t2.info_id
GROUP BY id
having (sum(data = 1) > 0 or sum(data = 2) > 0) and
(sum(data = 3) > 0 or sum(data = 4) > 0)
Each expression like sum(data = 1) counts the number of rows that match that value, within the rows where id is the same.
Note: this returns the ids that match the condition. To get the original data, you need to join back to the tables.
Related
I run the following query against DB2 linux
select * from schemaname.A t1 LEFT OUTER JOIN schemaname.B t2 on t1.SSN = t2.mem_ssn
where t2.mem_ssn = t1.ssn
and t2.ind= 'Y'
and t1.ind = 'Y'
and t1.yyyy = '2018'
and t2.yyyy = '2018'
and t1.plan = '1340'
This gives 143 records.
Where as the following query returns 141 records
select * from schemaname.A where ind = 'Y' and yyyy = '2018' and plan = '1340' and ssn in
(select mem_ssn from schemaname.B where yyyy = '2018' and ind = 'Y')
Why is that difference?
Your where conditions turn the left join into an inner join. Hence, some rows are being filtered out from schemaname.A because there are no matches schemaname.B.
Put all conditions on the second table in the on clause:
select *
from schemaname.A t1 LEFT OUTER JOIN
schemaname.B t2
on t1.SSN = t2.mem_ssn and
t2.mem_ssn = t1.ssn and
t2.ind = 'Y' and
t2.yyyy = '2018'
where t1.ind = 'Y' and
t1.yyyy = '2018'
t1.plan = '1340';
Conditions on the first table belong in the where clause. Note: I assume that all the constant values are strings, even those that look like numbers. If they are really numbers, you should drop the single quotes.
The first select really works like an inner join, as there are non-null values in where condition for t2.
But still, the difference comes from mem_ssn not being the primary key in t2.
E.g. if a particular value of mem_ssn is three times in t2, the first select gives all three rows, but the second with the subselect gives this value only once (if it is only once in t1).
I have a query like this . I have compound index for CC.key1,CC.key2.
I am executing this in a big database
Select * from CC where
( (
(select count(*) from Service s
where CC.key1=s.sr2 and CC.key2=s.sr1) > 2
AND
CC.key3='new'
)
OR
(
(select count(*) from Service s
where CC.key1=s.sr2 and CC.key2=s.sr1) <= 2
)
)
limit 10000;
I tried to make it as inner join , but its getting slower . How can i optimize this query ?
The trick here is being able to articulate a query for the problem:
SELECT *
FROM CC t1
INNER JOIN
(
SELECT cc.key1, cc.key2
FROM CC cc
LEFT JOIN Service s
ON cc.key1 = s.sr2 AND
cc.key2 = s.sr1
GROUP BY cc.key1, cc.key2
HAVING COUNT(*) <= 2 OR
SUM(CASE WHEN cc.key = 'new' THEN 1 ELSE 0 END) > 2
) t2
ON t1.key1 = t2.key1 AND
t1.key2 = t2.key2
Explanation:
Your original two subqueries would only add to the count if a given record in CC, with a given key1 and key2 value, matched to a corresponding record in the Service table. The strategy behind my inner query is to use GROUP BY to count the number of times that this happens, and use this instead of your subqueries. The first count condition is your bottom subquery, and the second one is the top.
The inner query finds all key1, key2 pairs in CC corresponding to records which should be retained. And recognize that these two columns are the only criteria in your original query for determining whether a record from CC gets retained. Then, this inner query can be inner joined to CC again to get your final result set.
In terms of performance, even this answer could leave something to be desired, but it should be better than a massive correlated subquery, which is what you had.
Basically get the Columns that must not have a duplicate then join them together. Example:
select *
FROM Table_X A
WHERE exists (SELECT 1
FROM Table_X B
WHERE 1=1
and a.SHOULD_BE_UNIQUE = b.SHOULD_BE_UNIQUE
and a.SHOULD_BE_UNIQUE2 = b.SHOULD_BE_UNIQUE2
/* excluded because these columns are null or can be Duplicated*/
--and a.GENERIC_COLUMN = b.GENERIC_COLUMN
--and a.GENERIC_COLUMN2 = b.GENERIC_COLUMN2
--and a.NULL_COLUMN = b.NULL_COLUMN
--and a.NULL_COLUMN2 = b.NULL_COLUMN2
and b.rowid > a.ROWID);
Where SHOULD_BE_UNIQUE and SHOULD_BE_UNIQUE2 are columns that shouldn't be repeated and have unique columns and the GENERIC_COLUMN and NULL_COLUMNS can be ignored so just leave them out of the query.
Been using this approach when we have issues in Duplicate Records.
With the limited information you've given us, this could be a rewrite using 'simplified' logic:
SEELCT *
FROM CC NATURAL JOIN
( SELECT key1, key2, COUNT(*) AS tally
FROM Service
GROUP
BY key1, key2 ) AS t
WHERE key3 = 'new' OR tally <= 2;
Not sure whether it will perform better but might give you some ideas of what to try next?
I am trying to perform a query based on the output of a sub-query. I would join the tables and perform one big query but that forces the query to search the entire table of one of the tables. The goal is to have the who (from table1), the what (from table2) and the when (from table3).
Here's what I have,
SELECT
DB1.TB1.`Date`,
DB1.TB1.`Sequence`,
DB1.TB1.`InstanceId`
FROM
(
SELECT
DB1.TB2.`UserName` AS USER,
DB1.TB2.`FirstName`,
DB1.TB2.`LastName`,
DB1.TB3.`ObjectName` AS OBJECT,
DB1.TB3.`ObjectType`
FROM
DB1.TB2
INNER JOIN DB1.TB3 ON DB1.TB2.`UserName` = DB1.TB3.`UserName`
INNER JOIN DB1.TB4 ON DB1.TB3.`ObjectName` = DB1.TB4.`ObjectName`
WHERE
DB1.TB4.`ADD` = 'Y' AND
DB1.TB2.`ADUC` NOT LIKE 'ServiceAccount' AND
DB1.TB3.`ObjectName` NOT IN ('ThisAdmin','ThatAdmin')
) AS MySubQuery
WHERE
DB1.TB1.`UserName` LIKE 'USER' AND
DB1.TB1.`ActionDetail` LIKE '%OBJECT%'
ORDER BY
DB1.TB1.`Date` DESC LIMIT 1
Everything in the following query results in one line for each invBlueprintTypes row with the correct information. But I'm trying to add something to it. See below the codeblock.
Select
blueprintType.typeID,
blueprintType.typeName Blueprint,
productType.typeID,
productType.typeName Item,
productType.portionSize,
blueprintType.basePrice * 0.9 As bpoPrice,
productGroup.groupName ItemGroup,
productCategory.categoryName ItemCategory,
blueprints.productionTime,
blueprints.techLevel,
blueprints.researchProductivityTime,
blueprints.researchMaterialTime,
blueprints.researchCopyTime,
blueprints.researchTechTime,
blueprints.productivityModifier,
blueprints.materialModifier,
blueprints.wasteFactor,
blueprints.maxProductionLimit,
blueprints.blueprintTypeID
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
So what I need to get in here is the following table with the columns below it so I can use the values timestamp and sort the entire result by profitHour
tablename: invBlueprintTypesPrices
columns: blueprintTypeID, timestamp, profitHour
I need this information with the following select in mind. Using a select to show my intention of the JOIN/in-query select or whatever that can do this.
SELECT * FROM invBlueprintTypesPrices
WHERE blueprintTypeID = blueprintType.typeID
ORDER BY timestamp DESC LIMIT 1
And I need the main row from table invBlueprintTypes to still show even if there is no result from the invBlueprintTypesPrices. The LIMIT 1 is because I want the newest row possible, but deleting the older data is not a option since history is needed.
If I've understood correctly I think I need a subquery select, but how to do that? I've tired adding the exact query that is above with a AS blueprintPrices after the query's closing ), but did not work with a error with the
WHERE blueprintTypeID = blueprintType.typeID
part being the focus of the error. I have no idea why. Anyone who can solve this?
You'll need to use a LEFT JOIN to check for NULL values in invBlueprintTypesPrices. To mimic the LIMIT 1 per TypeId, you can use the MAX() or to truly make sure you only return a single record, use a row number -- this depends on whether you can have multiple max time stamps for each type id. Assuming not, then this should be close:
Select
...
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Left Join (
SELECT MAX(TimeStamp) MaxTime, TypeId
FROM invBlueprintTypesPrices
GROUP BY TypeId
) blueprintTypePrice On blueprints.blueprintTypeID = blueprintTypePrice.typeID
Left Join invBlueprintTypesPrices blueprintTypePrices On
blueprintTypePrice.TypeId = blueprintTypePrices.TypeId AND
blueprintTypePrice.MaxTime = blueprintTypePrices.TimeStamp
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
Order By
blueprintTypePrices.profitHour
Assuming you might have the same max time stamp with 2 different records, replace the 2 left joins above with something similar to this getting the row number:
Left Join (
SELECT #rn:=IF(#prevTypeId=TypeId,#rn+1,1) rn,
TimeStamp,
TypeId,
profitHour,
#prevTypeId:=TypeId
FROM (SELECT *
FROM invBlueprintTypesPrices
ORDER BY TypeId, TimeStamp DESC) t
JOIN (SELECT #rn:=0) t2
) blueprintTypePrices On blueprints.blueprintTypeID = blueprintTypePrices.typeID AND blueprintTypePrices.rn=1
You don't say where you are putting the subquery. If in the select clause, then you have a problem because you are returning more than one value.
You can't put this into the from clause directly, because you have a correlated subquery (not allowed).
Instead, you can put it in like this:
from . . .
(select *
from invBLueprintTypesPrices ibptp
where ibtp.timestamp = (select ibptp2.timestamp
from invBLueprintTypesPrices ibptp2
where ibptp.blueprintTypeId = ibptp2.blueprintTypeId
order by timestamp desc
limit 1
)
) ibptp
on ibptp.blueprintTypeId = blueprintType.TypeID
This identifies the most recent records for all the blueprintTypeids in the subquery. It then joins in the one that matches.
Heya!, I have the below query:
SELECT t1.pm_id
FROM fb_user_pms AS t1,
fb_user_pm_replies AS t2
WHERE (t1.pm_id = '{$pm_id}'
AND t1.profile_author = '{$username}'
OR t1.pm_author = '{$username}'
AND t1.pm_id = t2.pm_id
AND t2.pm_author = '{$username}'
AND COUNT(t2.reply_id) > 0)
AND t1.deleted = 0
However, I'm getting a grouping error - my guess is its caused by the AND COUNT(t2.reply_id) > 0?
How can I rectify the above query to make it work.
Hope someone can help.
Cheers!
The aggregate function COUNT can't go in the WHERE clause. You should use a GROUP BY and put it in the HAVING clause.
SELECT t1.pm_id
FROM fb_user_pms AS t1
JOIN fb_user_pm_replies AS t2 ON t1.pm_id = t2.pm_id
WHERE (
(t1.pm_id = '{$pm_id}' AND t1.profile_author = '{$username}') OR
(t1.pm_author = '{$username}' AND t2.pm_author = '{$username}')
) AND t1.deleted = 0
GROUP BY t1.pm_id
HAVING COUNT(t2.reply_id) > 0
If t2.reply_id is a NOT NULL column then you don't need the HAVING clause at all.
The error is because you can't use an aggregate function (COUNT, MIN, MAX, AVG, etc) in the WHERE clause, without it being inside a subquery. Only the HAVING clause allows you to use aggregates without being wrapped in subqueries.
But checking for replies to be more than zero is not necessary on an INNER JOIN - that guarantees that there will be at least one reply associated to the fb_user_pms record. The JOIN also means that the information in t1 will be duplicated for every supported record in fb_user_pm_replies. IE: If a fb_user_pms record has three fb_user_pm_replies records related to it, you'll see the fb_user_pms record in the result set three times.
The query you want to use is:
SELECT t1.pm_id
FROM fb_user_pms AS t1
WHERE t1.pm_id = '{$pm_id}'
AND '{$username}' IN (t1.profile_author, t1.pm_author)
AND t1.deleted = 0
AND EXISTS(SELECT NULL
FROM fb_user_pm_replies AS t2
WHERE t2.pm_id = t1.pm_id
AND t2.pm_author = '{$username}')
The EXISTS clause returns true or false, based on the WHERE criteria. It also won't duplicate t1 results.
The condition with COUNT must be in the HAVING part. It can not be part of the WHERE part.
The SELECT part must be also use aggregate functions for example MAX(t1.pm_id)