Count non null values from a left joint table - mysql

Count non-null values directly from select statement (not using where) on a left joint table
count(*) as comments Need this to provide count of non-null values only. Also, inner join is not a solution because, that does not include content which have zero comments in count(distinct (t1.postId)) as no_of_content
select t1.tagId as tagId, count(distinct (t1.postId)) as no_of_content, count(*) as comments
from content_created as t1
left join comment_created as t2
on t1.postId=t2.postId
where
( (t1.tagId = "S2036623" )
or (t1.tagId = "S97422" )
)
group BY 1

Though Posting the sample data might help us more to answer this but you can update your count function to -
COUNT(CASE WHEN postId IS NULL THEN 1 END) as comments

Count only counts non-null values. What you need to do is reference the right hand side table's column explicitly. So instead of saying count(*) use count(right_joined_table.join_key).
Here's a full example using BigQuery:
with left_table as (
select num
from unnest(generate_array(1,10)) as num
), right_table as (
select num
from unnest(generate_array(2,10,2)) as num
)
select
count(*) as total_rows,
count(l.num) as left_table_counts,
count(r.num) as non_null_counts
from left_table as l
left outer join right_table as r
on l.num = r.num
This gives you the following results:

Related

SQL Complex update query filter distinct values only

I have 3 tables with following columns.
Table: A with column: newColumnTyp1, typ2
Table: B with column: typ2, tableC_id_fk
Table: C with column: id, typ1
I wanted to update values in A.newColumnTyp1 from C.typ1 by following logic:
if A.typ2=B.typ2 and B.tableC_id_fk=C.id
the values must be distinct, if any of the conditions above gives multiple results then should be ignored. For example A.typ2=B.typ2 may give multiple result in that case it should be ignored.
edit:
the values must be distinct, if any of the conditions above gives multiple results then take only one value and ignore rest. For example A.typ2=B.typ2 may give multiple result in that case just take any one value and ignore rest because all the results from A.typ2=B.typ2 will have same B.tableC_id_fk.
I have tried:
SELECT DISTINCT C.typ1, B.typ2
FROM C
LEFT JOIN B ON C.id = B.tableC_id_fk
LEFT JOIN A ON B.typ2= A.typ2
it gives me a result of table with two columns typ1,typ2
My logic was, I will then filter this new table and compare the type2 value with A.typ2 and update A.newColumnTyp1
I thought of something like this but was a failure:
update A set newColumnTyp1= (
SELECT C.typ1 from
SELECT DISTINCT C.typ1, B.typ2
FROM C
LEFT JOIN B ON C.id = B.tableC_id_fk
LEFT JOIN A ON B.typ2= A.type2
where A.typ2=B.typ2);
I am thinking of an updateable CTE and window functions:
with cte as (
select a.newColumnTyp1, c.typ1, count(*) over(partition by a.typ2) cnt
from a
inner join b on b.type2 = a.typ2
inner join c on c.id = b.tableC_id_fk
)
update cte
set newColumnTyp1 = typ1
where cnt > 1
Update: if the columns have the same name, then alias one of them:
with cte as (
select a.typ1, c.typ1 typ1c, count(*) over(partition by a.typ2) cnt
from a
inner join b on b.type2 = a.typ2
inner join c on c.id = b.tableC_id_fk
)
update cte
set typ1 = typ1c
where cnt > 1
I think I would approach this as:
update a
set newColumnTyp1 = bc.min_typ1
from (select b.typ2, min(c.typ1) as min_typ1, max(c.typ1) as max_typ1
from b join
c
on b.tableC_id_fk = c.id
group by b.type2
) bc
where bc.typ2 = a.typ2 and
bc.min_typ1 = bc.max_typ1;
The subquery determines whether typ1 is always the same. If so, it is used for updating.
I should note that you might want the most common value assigned, instead of requiring unanimity. If that is what you want, then you can ask another question.

Show id even result is empty?

I have the SQL command:
SELECT
vinculo.id,
data start,
count(*) title
from
atendimento_regulacao
join vinculo on vinculo.id = atendimento_regulacao.vinculo_id
where data = '2019-07-02'
group by vinculo.usuario_id, atendimento_regulacao.data
The result is empty because not exists none record on where data = '2019-07-02'
How to show the id like below?
id | start | title
1 | |
You can use a CROSS JOIN to generate the rows and LEFT JOIN to bring in the results:
select v.id, d.dte as start, count(ar.vinculo_id) as num_title
from (select '2019-07-02' as dte) d cross join
vinculo v left join
atendimento_regulacao ar
on v.id = ar.vinculo_id and ar.data = d.dte
group by v.id, d.dte;
If you really want to aggregate by v.usuario_id, then include it in both the select and group by.
Notes:
The structure of the query easily extends to multiple dates.
The GROUP BY uses the same columns in the SELECT.
Table aliases make the query easier to write and to read.
Qualify all column references in a query that has more than one table reference.
The COUNT() uses a column from ar so it can return 0.
For the specific case of a single date, you can use conditional aggregation:
select v.id, '2019-07-02' as start,
count(ar.vinculo_id) as num_title
from vinculo v left join
atendimento_regulacao ar
on v.id = ar.vinculo_id and ar.data = '2019-07-02'
group by v.id;
Use RIGHT JOIN, and convert your count to the one below, otherwise it shows zero whenever didn't find to count anything.
SELECT v.id, a.data start,
case when count(*) is null then null end title
FROM atendimento_regulacao a
RIGHT JOIN vinculo v
ON v.id = a.vinculo_id
AND a.data = '2019-07-02'
GROUP BY v.usuario_id, a.data;
Demo

fetch rows where left join subquery is null (not found)

How to fetch rows where a joined subquery is null?
SELECT *
FROM bank_recon b
LEFT JOIN (
SELECT o.bank_recon_id
FROM data_voucher_ocr_bank o
LEFT JOIN data_voucher v ON v.id=o.data_voucher_id
WHERE v.is_ocr_verified=1
LIMIT 1
) s ON s.bank_recon_id=b.id
WHERE s IS NULL
update
When using this query (the subquery) something is fetched depending on if is_ocr_verified is set or not
SELECT o.bank_recon_id
FROM data_voucher_ocr_bank o
LEFT JOIN data_voucher v ON v.id=o.data_voucher_id
WHERE v.is_ocr_verified=1 && o.bank_recon_id=320062
When using this query everything is fetched no matter what!?
SELECT b.txt, b.amount
FROM bank_recon b
LEFT JOIN (
SELECT o.bank_recon_id
FROM data_voucher_ocr_bank o
LEFT JOIN data_voucher v ON v.id=o.data_voucher_id
WHERE v.is_ocr_verified=1
LIMIT 1
) s ON s.bank_recon_id=b.id
WHERE b.id=320062 && s.bank_recon_id IS NULL
Specify a column in your WHERE clause, not just the subquery.
WHERE s.bank_recon_id IS NULL
An anti join (which is what you are trying to apply here) is a method we use when the straight-forward NOT IN or NOT EXISTS have performance issues in a DBMS.
Provided data_voucher_ocr_bank.bank_recon_id cannot be null, we can use:
SELECT txt, amount
FROM bank_recon
WHERE id NOT IN
(
SELECT bank_recon_id
FROM data_voucher_ocr_bank
WHERE data_voucher_id IN (SELECT id FROM data_voucher WHERE is_ocr_verified = 1)
);
(Otherwise we'd add AND bank_recon_id IS NOT NULL or use NOT EXISTS instead.)

Combining an INNER JOIN query with Count of different values

I am trying to link two tables with similar column. I need to find out how many values differ from table1.column1 and table 2.column1:
My current query:
SELECT i10_descr.i10_code, gems_pcsi9.i10_code
FROM i10_descr INNER JOIN gems_pcsi9 ON i10_descr.i10_code = gems_pcsi9.i10_code
ORDER BY i10_descr.i10_code;
I know this query shows the matching codes of each table: I cannot figure out how to COUNT the missing/different codes in the tables.
Also, I have to compute the ratio of codes.
Any help, tips, or direction is much appreciated.
Thanks
You could use an anti-join pattern to get a list of i10_code that exist in one table, but not the other. For example:
SELECT i.i10_code
FROM i10_descr i
LEFT
JOIN gems_pcsi9 g
ON g.i10_code = i.i10_code
WHERE g.i10_code IS NULL
ORDER BY i.i10_code
If you just want a count, you could use COUNT(i.i10_code) and/or COUNT(DISINCT i.i10_code) in the SELECT list and remove the ORDER BY clause.
To get the i10_code in the gems table that aren't in the i10 table, you'd do the same thing but invert the query so that gems is the "driving" table. e.g.
SELECT COUNT(DISTINCT g.i10_code) AS cnt_diff
FROM gems_pcsi9 g
LEFT
JOIN i10_descr i
ON i.i10_code = g.i10_code
WHERE i.i10_code IS NULL
If you want to combine the number of differences, you can combine the two queries by making them inline views:
SELECT d.cnt_diff + e.cnt_diff AS total_diff
FROM (
SELECT COUNT(DISTINCT g.i10_code) AS cnt_diff
FROM gems_pcsi9 g
LEFT
JOIN i10_descr i
ON i.i10_code = g.i10_code
WHERE i.i10_code IS NULL
) d
CROSS
JOIN (
SELECT COUNT(DISTINCT i.i10_code) AS cnt_diff
FROM i10_descr i
LEFT
JOIN gems_pcsi9 g
ON g.i10_code = i.i10_code
WHERE g.i10_code IS NULL
) e
NOTE: the COUNT aggregate will omit NULL values. The query would need to be tweaked if you also wanted to "count" rows that had NULL values for i10_code. You'd use COUNT(DISTINCT ) if you want just a number of distinct values that are different. A COUNT() would give a number of rows. These two results would be different if you had multiple rows with the same i10_code value.
To get a "ratio" of codes, assuming that at this point, the "differences" don't matter, you get a count of codes from each table. The queries to do that could be used inline views:
SELECT d.cnt / e.cnt AS ratio_cnt_g_over_cnt_i
, d.cnt AS cnt_g
, e.cnt AS cnt_i
FROM (
SELECT COUNT(DISTINCT g.i10_code) AS cnt
FROM gems_pcsi9 g
) d
CROSS
JOIN (
SELECT COUNT(DISTINCT i.i10_code) AS cnt
FROM i10_descr i
) e
An alternative method is to use union all with aggregation:
select in_i10descr, in_gems_pcsi9, count(*) as numcodes
from (select code, max(in_i10descr) as in_i10descr, max(in_gems_pcsi9) as in_gems_pcsi9
from ((select i10_descr.i10_code as code, 1 as in_i10descr, 0 as in_gems_pcsi9
from i10_descr
) union all
(select gems_pcsi9.i10_code, 0, 1
gems_pcsi9.i10_code
)
) t
group by code
) c
group by in_i10descr, in_gems_pcsi9;
This will calculate counts of things in each table separately and in both tables.

MySQL Inner Join with where clause sorting and limit, subquery?

Everything in the following query results in one line for each invBlueprintTypes row with the correct information. But I'm trying to add something to it. See below the codeblock.
Select
blueprintType.typeID,
blueprintType.typeName Blueprint,
productType.typeID,
productType.typeName Item,
productType.portionSize,
blueprintType.basePrice * 0.9 As bpoPrice,
productGroup.groupName ItemGroup,
productCategory.categoryName ItemCategory,
blueprints.productionTime,
blueprints.techLevel,
blueprints.researchProductivityTime,
blueprints.researchMaterialTime,
blueprints.researchCopyTime,
blueprints.researchTechTime,
blueprints.productivityModifier,
blueprints.materialModifier,
blueprints.wasteFactor,
blueprints.maxProductionLimit,
blueprints.blueprintTypeID
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
So what I need to get in here is the following table with the columns below it so I can use the values timestamp and sort the entire result by profitHour
tablename: invBlueprintTypesPrices
columns: blueprintTypeID, timestamp, profitHour
I need this information with the following select in mind. Using a select to show my intention of the JOIN/in-query select or whatever that can do this.
SELECT * FROM invBlueprintTypesPrices
WHERE blueprintTypeID = blueprintType.typeID
ORDER BY timestamp DESC LIMIT 1
And I need the main row from table invBlueprintTypes to still show even if there is no result from the invBlueprintTypesPrices. The LIMIT 1 is because I want the newest row possible, but deleting the older data is not a option since history is needed.
If I've understood correctly I think I need a subquery select, but how to do that? I've tired adding the exact query that is above with a AS blueprintPrices after the query's closing ), but did not work with a error with the
WHERE blueprintTypeID = blueprintType.typeID
part being the focus of the error. I have no idea why. Anyone who can solve this?
You'll need to use a LEFT JOIN to check for NULL values in invBlueprintTypesPrices. To mimic the LIMIT 1 per TypeId, you can use the MAX() or to truly make sure you only return a single record, use a row number -- this depends on whether you can have multiple max time stamps for each type id. Assuming not, then this should be close:
Select
...
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Left Join (
SELECT MAX(TimeStamp) MaxTime, TypeId
FROM invBlueprintTypesPrices
GROUP BY TypeId
) blueprintTypePrice On blueprints.blueprintTypeID = blueprintTypePrice.typeID
Left Join invBlueprintTypesPrices blueprintTypePrices On
blueprintTypePrice.TypeId = blueprintTypePrices.TypeId AND
blueprintTypePrice.MaxTime = blueprintTypePrices.TimeStamp
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
Order By
blueprintTypePrices.profitHour
Assuming you might have the same max time stamp with 2 different records, replace the 2 left joins above with something similar to this getting the row number:
Left Join (
SELECT #rn:=IF(#prevTypeId=TypeId,#rn+1,1) rn,
TimeStamp,
TypeId,
profitHour,
#prevTypeId:=TypeId
FROM (SELECT *
FROM invBlueprintTypesPrices
ORDER BY TypeId, TimeStamp DESC) t
JOIN (SELECT #rn:=0) t2
) blueprintTypePrices On blueprints.blueprintTypeID = blueprintTypePrices.typeID AND blueprintTypePrices.rn=1
You don't say where you are putting the subquery. If in the select clause, then you have a problem because you are returning more than one value.
You can't put this into the from clause directly, because you have a correlated subquery (not allowed).
Instead, you can put it in like this:
from . . .
(select *
from invBLueprintTypesPrices ibptp
where ibtp.timestamp = (select ibptp2.timestamp
from invBLueprintTypesPrices ibptp2
where ibptp.blueprintTypeId = ibptp2.blueprintTypeId
order by timestamp desc
limit 1
)
) ibptp
on ibptp.blueprintTypeId = blueprintType.TypeID
This identifies the most recent records for all the blueprintTypeids in the subquery. It then joins in the one that matches.