BigQuery - Left Join two tables ON a column OR another

BigQuery - Left Join two tables ON a column OR another - mysql

I want to perform a left join on two tables. The field I will join by is an email address. As the table I on the left has two email fields which may be different, I want that if the joining by the first email fails and returns null values, perform a second join on the other email field. Lastly, I want to throw away those entries which have not been matched with any of the two joins.
I have thought of doing something like this:
SELECT *
FROM a
LEFT JOIN b
ON
a.address1 = b.email
OR a.address2 = b.email
However, this returned an error message saying LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
What is the correct way of achieving this?

OR in JOIN conditions makes is really hard to optimize queries. So, BigQuery makes it hard to use OR.
I think you can rephrase the query by doing:
SELECT *
FROM a CROSS JOIN
UNNEST(ARRAY[address1, address2]) address LEFT JOIN
b
ON address = b.email;

You might be able to phrase your logic using a union:
SELECT * FROM a LEFT JOIN b ON a.address1 = b.email
UNION
SELECT * FROM a LEFT JOIN b ON a.address2 = b.email;

Consider below (BigQuery) approach
SELECT * EXCEPT(row_key, priority)
FROM (
SELECT *, TO_JSON_STRING(a) row_key, IF(b.email IS NULL, 3, 1) priority
FROM `project.dataset.tableA` a LEFT JOIN `project.dataset.tableB` b
ON a.address1 = b.email
UNION ALL
SELECT *, TO_JSON_STRING(a), IF(b.email IS NULL, 3, 2)
FROM `project.dataset.tableA` a LEFT JOIN `project.dataset.tableB` b
ON a.address2 = b.email
)
WHERE true
QUALIFY ROW_NUMBER() OVER(PARTITION BY row_key ORDER BY priority NULLS LAST) = 1
Or you can use less verbose / more generic (so you can avoid redundant code fragments which can be useful if you need more than just two conditions) version of above
SELECT * EXCEPT(row_key, priority)
FROM (
SELECT a.*, b.*, TO_JSON_STRING(a) row_key, IF(b.email IS NULL, 3, c.priority) priority
FROM `project.dataset.tableA` a,
UNNEST([STRUCT(address1 as address, 1 as priority), (address2, 2)]) c
LEFT JOIN `project.dataset.tableB` b
ON c.address = b.email
)
WHERE TRUE
QUALIFY ROW_NUMBER() OVER(PARTITION BY row_key ORDER BY priority NULLS LAST) = 1

Related

SELECT query with JOIN returns duplicate of each row

I am running a SELECT query to return addresses in a table associated with a certain "applicant code" and I'd like to join a table to also return (in the same row) the name of that applicant.
Therefore my query as of now is
SELECT a.id, a.created_at, a.updated_at, a.code, a.applicant_code, a.form_code, a.address_line_1, a.address_line_2, a.town_city, a.county_state, a.country, a.post_code, a.start_date, a.end_date, a.type, ap.first_name, ap.last_name
FROM sfs_addresses a
JOIN sfs_personal_details ap ON a.form_code = ap.form_code
WHERE a.form_code = ? AND a.applicant_code = ?
The query works, and I get the right columns and values in each row, but it returns 2 of each so like
ID
===
1
1
2
2
3
3
4
4
If I remove the JOIN it works fine. I have tried adding DISTINCT (makes no difference) I'm lost.

EDIT: Based on this answer and the comments, the OP realized that the JOIN condition should be on applicant_code rather than form_code.
You have duplicates in the second table based on the JOIN key you are using (I question if the JOIN is correct).
If you just want one row arbitrarily, you can use row_number():
SELECT a.*, ap.first_name, ap.last_name
FROM sfs_addresses a JOIN
(SELECT ap.*,
ROW_NUMBER() OVER (PARTITION BY ap.form_code ORDER BY ap.form_code) as seqnum
FROM sfs_personal_details ap
) ap
ON a.form_code = ap.form_code
WHERE a.form_code = ? AND a.applicant_code = ?;
You can replace the columns in the ORDER BY with which result you want -- for instance the oldest or most recent.
Note: form_code seems like an odd JOIN column for a table called "personal details". So, you might just need to fix the JOIN condition.

relation between 2 tables one to many to return non duplicate use distinct
SELECT distinct a.id, a.created_at, a.updated_at, a.code, a.applicant_code, a.form_code, a.address_line_1, a.address_line_2, a.town_city, a.county_state, a.country, a.post_code, a.start_date, a.end_date, a.type, ap.first_name, ap.last_name
FROM sfs_addresses a
JOIN sfs_personal_details ap ON a.form_code = ap.form_code
WHERE a.form_code = ? AND a.applicant_code = ?

SQL Complex update query filter distinct values only

I have 3 tables with following columns.
Table: A with column: newColumnTyp1, typ2
Table: B with column: typ2, tableC_id_fk
Table: C with column: id, typ1
I wanted to update values in A.newColumnTyp1 from C.typ1 by following logic:
if A.typ2=B.typ2 and B.tableC_id_fk=C.id
the values must be distinct, if any of the conditions above gives multiple results then should be ignored. For example A.typ2=B.typ2 may give multiple result in that case it should be ignored.
edit:
the values must be distinct, if any of the conditions above gives multiple results then take only one value and ignore rest. For example A.typ2=B.typ2 may give multiple result in that case just take any one value and ignore rest because all the results from A.typ2=B.typ2 will have same B.tableC_id_fk.
I have tried:
SELECT DISTINCT C.typ1, B.typ2
FROM C
LEFT JOIN B ON C.id = B.tableC_id_fk
LEFT JOIN A ON B.typ2= A.typ2
it gives me a result of table with two columns typ1,typ2
My logic was, I will then filter this new table and compare the type2 value with A.typ2 and update A.newColumnTyp1
I thought of something like this but was a failure:
update A set newColumnTyp1= (
SELECT C.typ1 from
SELECT DISTINCT C.typ1, B.typ2
FROM C
LEFT JOIN B ON C.id = B.tableC_id_fk
LEFT JOIN A ON B.typ2= A.type2
where A.typ2=B.typ2);

I am thinking of an updateable CTE and window functions:
with cte as (
select a.newColumnTyp1, c.typ1, count(*) over(partition by a.typ2) cnt
from a
inner join b on b.type2 = a.typ2
inner join c on c.id = b.tableC_id_fk
)
update cte
set newColumnTyp1 = typ1
where cnt > 1
Update: if the columns have the same name, then alias one of them:
with cte as (
select a.typ1, c.typ1 typ1c, count(*) over(partition by a.typ2) cnt
from a
inner join b on b.type2 = a.typ2
inner join c on c.id = b.tableC_id_fk
)
update cte
set typ1 = typ1c
where cnt > 1

I think I would approach this as:
update a
set newColumnTyp1 = bc.min_typ1
from (select b.typ2, min(c.typ1) as min_typ1, max(c.typ1) as max_typ1
from b join
c
on b.tableC_id_fk = c.id
group by b.type2
) bc
where bc.typ2 = a.typ2 and
bc.min_typ1 = bc.max_typ1;
The subquery determines whether typ1 is always the same. If so, it is used for updating.
I should note that you might want the most common value assigned, instead of requiring unanimity. If that is what you want, then you can ask another question.

Get results from second table joined even if the initial select fails and vice versa

I am using the SELECT statement below to join the property table with the epc table. Not always is the EPC available for a property. I also want the epc table if the property does not exist.
SELECT p.dateAdded, p.paon, p.saon, p.street, p.locality, p.townCity, p.district, p.county, p.propertyType,
p.propertyType, p.oldNew, p.postcode, p.tenure, p.ppd, p.bedrooms, p.bathrooms, p.receptions, p.lastSalePrice, p.lastTransferDate,
e.INSPECTION_DATE, e.TOTAL_FLOOR_AREA, e.CURRENT_ENERGY_RATING, e.POTENTIAL_ENERGY_RATING, e.CURRENT_ENERGY_EFFICIENCY, e.POTENTIAL_ENERGY_EFFICIENCY,
e.PROPERTY_TYPE
FROM property p
LEFT JOIN epc e ON p.postcode = e.POSTCODE AND CONCAT(p.paon, ', ', p.street) = e.ADDRESS1
WHERE p.paon = 8 AND p.postcode = "TS6 9LN"
ORDER BY e.INSPECTION_DATE, p.lastTransferDate DESC
LIMIT 1
Is it possible to select both tables but if 1 doesn't exist, select the 1 that does?

You need a FULL OUTER JOIN. Unfortunately MySQL does not implement this part of the SQL standard. You can simulate a full outer join with two outer joins, though, but it becomes long, and possibly quite cumbersome and error prone.
For example:
select a.col1, b.col2
from table_a a
LEFT join table_b b on ...
union -- here we union both outer joins
select a.col1, b.col2
from table_a a
RIGHT join table_b b on ...
In the second SELECT the table roles are inverted, since it uses a RIGHT JOIN instead of a LEFT JOIN.

How can I merge these two left joins into a single one?

How can I merge these two left joins: http://sqlfiddle.com/#!9/1d2954/69/0
SELECT d.`id`, (adcount + bdcount)
FROM `docs` d
LEFT JOIN
(
SELECT da.`doc_id`, COUNT(da.`doc_id`) AS adcount FROM `docs_scod_a` da
INNER JOIN `scod_a` a ON a.`id` = da.`scod_a_id`
WHERE a.`ver_a` IN ('AA', 'AB')
GROUP BY da.`doc_id`
) ad ON ad.`doc_id` = d.`id`
LEFT JOIN
(
SELECT db.`doc_id`, COUNT(db.`doc_id`) AS bdcount FROM `docs_scod_b` db
INNER JOIN `scod_b` b ON b.`id` = db.`scod_b_id`
WHERE b.`ver_b` IN ('BA', 'BB')
GROUP BY db.`doc_id`
) bd ON bd.`doc_id` = d.`id`
to be a Single left join just to ease its use in my code, while making it no less slower?

Let me first emphasize that your method of doing the calculation is the better method. You have two separate dimensions and aggregating them separately is often the most efficient method for doing the calculation. It is also the most scalable method.
That said, your query should be equivalent to this version:
SELECT d.id,
count(distinct a.id),
count(distinct b.id)
FROM docs d left join
docs_scod_a da
ON da.doc_id = d.id LEFT JOIN
scod_a a
ON a.id = da.scod_a_id AND a.ver_a IN ('AA', 'AB') LEFT JOIN
docs_scod_b db
ON db.doc_id = d.id LEFT JOIN
scod_b b
ON b.id = db.scod_b_id AND b.ver_b IN ('BA', 'BB')
GROUP BY d.id
ORDER BY d.id;
This query is more expensive than it looks, because the COUNT(DISTINCT) incurs additional overhead compared to COUNT().
And here is the SQL Fiddle.
And, because LEFT JOIN can return NULL values, your query is more correctly written as:
SELECT d.`id`, COALESCE(adcount, 0) + COALESCE(bdcount, 0)
If you were having problems with the results, this small change might fix those problems.

Performance may be a big problem, depending on sizes of each table. It appears to be an "inflate-deflate" situation since it first "inflates" the number of rows via JOIN, then "deflates" via GROUP BY. The formulation below avoids inflation-deflation.
But first, if I understand this subquery correctly, this
SELECT da.`doc_id`, COUNT(da.`doc_id`) AS adcount
FROM `docs_scod_a` da
INNER JOIN `scod_a` a ON a.`id` = da.`scod_a_id`
WHERE a.`ver_a` IN ('AA', 'AB')
GROUP BY da.`doc_id`
can be rewritten as
SELECT `doc_id`,
( SELECT COUNT(*)
FROM `scod_a`
WHERE `id` = da.`scod_a_id`
AND `ver_a` IN ('AA', 'AB')
) AS adcount
FROM `docs_scod_a` AS da
If that is correct, then the entire query becomes
SELECT d.id,
( SELECT COUNT(*)
FROM docs_scod_a ds
JOIN scod_a s ON s.id = ds.scod_a_id
WHERE ds.doc_id = d.id
AND s.ver_a IN ('AA', 'AB')
) +
( SELECT COUNT(*)
FROM docs_scod_b ds
JOIN scod_b s ON s.id = ds.scod_b_id
WHERE ds.doc_id = d.id
AND s.ver_b IN ('BA', 'BB')
)
FROM docs AS d
Which needs these indexes:
docs_scod_a: (doc_id, scod_a_id), (scod_a_id, doc_id)
docs_scod_b: (doc_id, scod_b_id), (scod_b_id, doc_id)
scod_a: (ver_a, id)
scod_b: (ver_b, id)
docs: -- presumably has PRIMARY KEY(id)
Note the lack of GROUP BY.
docs_scod_a smells like a many-to-many mapping table. I recommend you follow the tips here.
(No COALESCE is needed since COUNT will simply return zero.)
(I don't know whether my version is better (faster or whatever) than Gordon's, nor whether my indexes will help his formulation.)

Joining 1 table twice in the same SQL query

I have joined 1 table twice on the same query, I keep getting error messages that the 'FROM clause have same exposed names. Even using AS does not seem to work, any ideas or suggestions?
here is the query I am using;
select Contact.*, PERSON.*, address.*
from address
full join Contact
on address.uprn = Contact.uprn
full join PERSON
on Contact.contactno = PERSON.contact
full join address
on address.uprn = PERSON.driveruprn

select Contact.*, PERSON.*, a1.*, a2.*
from address a1
full join Contact
on a1.uprn = Contact.uprn
full join PERSON
on Contact.contactno = PERSON.contact
full join address a2
on a2.uprn = PERSON.driveruprn
, however there is no full join in mysql, workaround
select * from t1
left join t2 ON t1.id = t2.id
union
select * from t1
right join t2 ON t1.id = t2.id

You have to alias the second and subsequent usages of a table:
select ...
from address <---first usage
join contact ...
join person ...
join address AS other_address ... <---second usage
^^^^^^^^^^^^^^^^
Doesn't really matter exactly where you do the aliases, but if you use a single table multiple times, all but ONE of those usages have to have unique aliases.

This is probably because you have same field name in different table
change it like this to make sure fieldnames are unique
SELECT
Contact.field1 as c_field1, Contact.field2 as c_field2 ...,
PERSON.field1 as p_field1, PERSON.field2 as p_field2 ...,
address.field1 as a_field1, address.field2 as a_field2 ...

You need to use a separate alias on each of the address table references in your query to avoid the error you are seeing:
SELECT Contact.*, PERSON.*, a1.*, a2.*
FROM address a1 INNER JOIN Contact ON a1.uprn = Contact.uprn
INNER JOIN PERSON ON Contact.contactno = PERSON.contact
INNER JOIN address a2 ON a2.uprn = PERSON.driveruprn
By the way, there is no FULL JOIN in MySQL, so I have replaced them with INNER JOIN which is likely what you had in mind.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

BigQuery - Left Join two tables ON a column OR another - mysql

OR in JOIN conditions makes is really hard to optimize queries. So, BigQuery makes it hard to use OR. I think you can rephrase the query by doing: SELECT * FROM a CROSS JOIN UNNEST(ARRAY[address1, address2]) address LEFT JOIN b ON address = b.email;

You might be able to phrase your logic using a union: SELECT * FROM a LEFT JOIN b ON a.address1 = b.email UNION SELECT * FROM a LEFT JOIN b ON a.address2 = b.email;

Related

SELECT query with JOIN returns duplicate of each row

SQL Complex update query filter distinct values only

Get results from second table joined even if the initial select fails and vice versa

How can I merge these two left joins into a single one?

Joining 1 table twice in the same SQL query

Categories

Resources