Fuzzy match on a left join - mysql

I am looking to join two tables where there's a 90% match for example.
Taking the example below, I want to join table A and table B on the phone number. You can see that the phone numbers differ slightly (the international code). I'd like the end result to show as table C.
I imagine it'll be something like but the join would specify to match on 90% of the phone_number
select
a.*,
b.most_recent_booking_date
from a
left join b
on a.phone_number = b.phone_number
Hope that's clear and any help would be great! Cheers!
Table A
Phone number
Most recent call date
441234567891
01/05/22
441234567892
02/05/22
Table B
Phone number
Most recent booking date
+441234567891
03/05/22
+441234567892
04/05/22
Table C
Phone number
Most recent call date
Most recent bookingdate
441234567891
01/05/22
03/05/22
441234567892
02/05/22
04/05/22

You can try something like this, but I don't like it, as Demeteor says you should have an ID to join on. Note that I use a left join here, in case there is no data in table #T2. I was also considering a computed column where it removes the + and then you could join that way too. I will also get told off for SQL injection if the phone number could be dodgy.
CREATE TABLE #T1 (
PhoneNumber VARCHAR(20) NOT NULL,
CallDate DATE NOT NULL
);
CREATE TABLE #T2 (
PhoneNumber VARCHAR(20) NOT NULL,
BookingDate DATE NOT NULL
);
INSERT INTO #T1 (PhoneNumber, CallDate)
VALUES
('441234567891', '20220501'),
('441234567892', '20220502');
INSERT INTO #T2 (PhoneNumber, BookingDate)
VALUES
('+441234567891', '20220503'),
('+441234567892', '20220504');
GO
SELECT *
FROM #T1 AS T1
LEFT JOIN #T2 AS T2 ON T2.PhoneNumber LIKE '%' + T1.PhoneNumber;

Related

MySQL select and match two tables and update column based on matched data

It seems difficult for me to thats why I need your help. So basically, I got two tables named xp_pn_resale and xp_guru_properties. What I need to do is update or set the column postal_code from table xp_pn_resale based from the data from another table. So here are my tables
My xp_pn_resale table, I wrote query like this in order to show you
SELECT postal_code,
block,
concat(block,' ', street_name) as address
FROM xp_pn_resale
where street_name like '%ANG MO KIO%';
And I get the result like this
As you can see, there are null values there and there are some postal_code that has values because I manually update them based on what I searched. I just want to automatically fill the postal_code from the query I got from other table.
Here is my xp_guru_properties table and I wrote query like this in order to show you
SELECT property_name as GURU_PROEPRTY_NAME,
property_type as GURU_PROPERTY_TYPE ,
JSON_UNQUOTE(JSON_EXTRACT(xp_guru_properties.json, '$.postcode') )as GURU_POSTCODE
FROM xp_guru_properties
where property_type like '%HDB%' AND property_name like '%ang mo kio%';
And the result is like this
xp_guru_properties got a column property_type which is a bit similar in the concatinated columns of block and street_name from other table I named it as GURU_PROPERTY_NAME.
As you can see, there is the virtual column named GURU_POSCODE. The values of that column is what I want to fill in the postal_code column from xp_pn_resale table. I was doing it manually to update the postal_code by doing
UPDATE xp_pn_resale
SET postal_code = 560110
WHERE street_name LIKE '%ANG MO KIO%'
AND block = 110
which is very tedious to me. Does anyone know how could I automatically update it based on the queries I showed ? Help will be appriciated.
EDIT: I wrote a JOIN query like this but this is for the record Lingkong Tiga which i manually filled all the postal_code
select distinct
JSON_UNQUOTE(json_extract(g.json, '$.postcode')) postcode,
JSON_UNQUOTE(json_extract(g.json, '$.name')) name,
JSON_UNQUOTE(json_extract(g.json, '$.streetnumber') )streetnumber,
p.block, p.street_name, p.postal_code
from xp_pn_resale p
inner join xp_guru_properties g
on g.property_name = concat(p.block, ' ', p.street_name)
where g.property_type like '%HDB%' AND g.property_name like '%Lengkong Tiga%'
I got result like this
Join the two tables and update.
UPDATE xp_pn_resale AS r
JOIN xp_guru_properties AS p ON concat(r.block,' ', r.street_name) = p.property_name
SET r.postal_code = JSON_UNQUOTE(JSON_EXTRACT(xp_guru_properties.json, '$.postcode') )
WHERE r.street_name like '%ANG MO KIO%'
AND p.property_type like '%HDB%' AND p.property_name like '%ang mo kio%'
AND r.postal_code IS NULL

SQL Joining two tables but overwriting the first if it exists in the second

I can't get this query to work the way I want it to. I have two tables with almost identical data but want one to override the other if it exists. An example would be easier than trying to explain:
There is an unadjusted balance table:
and a separate table for adjustments for each balance
the desired output takes unadjusted balances and applies any existing adjustments on top of it (if is_current=1)... essentially replacing the row but still keeping the original unadjusted current_balance.
the desired output would be something like this:
here is my current query that is not working how I want... it is flipping values and missing current_balance. i've been trying this for hours and can't get anywhere:
SELECT
*
FROM
(
SELECT
balance_adjustments.name,
balance_adjustments.user_id,
balance_adjustments.amount_owed,
balance_adjustments.when_to_pay,
balance_adjustments.current_balance
FROM
balance_adjustments
WHERE
balance_adjustments.when_to_pay = '2018-11-05'
AND balance_adjustments.is_current = true
UNION ALL
SELECT
unadjusted_balance.name,
unadjusted_balance.user_id,
unadjusted_balance.amount_owed,
unadjusted_balance.when_to_pay,
unadjusted_balance.current_balance
FROM
unadjusted_balance
LEFT OUTER JOIN balance_adjustments ON balance_adjustments.user_id = unadjusted_balance.user_id
AND balance_adjustments.name = unadjusted_balance.name
AND balance_adjustments.when_to_pay = unadjusted_balance.when_to_pay
AND balance_adjustments.is_current = true
WHERE
unadjusted_balance.when_to_pay = '2018-11-05'
AND balance_adjustments.name IS NULL
) AS table1
some additional commands to help anyone set this scenario up to test:
CREATE TABLE balance_adjustments
(
name varchar(30),
user_id varchar(30),
amount_owed float,
when_to_pay datetime,
current_balance float,
is_current boolean
);
CREATE TABLE unadjusted_balance
(
name varchar(30),
user_id varchar(30),
amount_owed float,
when_to_pay datetime,
current_balance float
);
insert into balance_adjustments values ('ricardo', '82340001', 100.00, '2018-11-05', null, 1)
insert into balance_adjustments values ('ricardo', '82340001', 33.00, '2018-11-05', null, 0)
insert into unadjusted_balance values ('joseph', '82340000', 2400.00, '2018-11-05', 4049.00)
insert into unadjusted_balance values ('ricardo', '82340001', 899.00, '2018-11-05', 500.00)
thanks for any help
If you just want to replace the amount_owed by the value from the balance_adjustments table where is_current is 1 (and I am assuming there is only one of these values), then a simple LEFT JOIN and COALESCE will suffice. The COALESCE ensures any NULL values from unmatched rows in the balance_adjustments table get replaced by the original value from the unadjusted_balances table. It's not clear from your question whether you want to replace the when_to_pay field as well, I have assumed you have in this query. If you don't, just replace COALESCE(ba.when_to_pay, ub.when_to_pay) AS when_to_pay with ub.when_to_pay.
SELECT ub.name, ub.user_id,
COALESCE(ba.amount_owed, ub.amount_owed) AS amount_owed,
COALESCE(ba.when_to_pay, ub.when_to_pay) AS when_to_pay,
ub.current_balance
FROM unadjusted_balance ub
LEFT JOIN balance_adjustments ba ON ba.user_id = ub.user_id AND ba.is_current
ORDER BY ub.name
Output:
name user_id amount_owed when_to_pay current_balance
joseph 82340000 2400 2018-11-05 00:00:00 4049
ricardo 82340001 100 2018-11-05 00:00:00 500
Demo on dbfiddle
If you want to overwrite the unadjusted balance with balance adjustments then do do the following:
Select all unadjusted balances.
Left Join to adjustments so you can get any adjustments, making sure to filter out those who have is_current = 1.
Use the sum of amount_owed in adjustments to get the overwrite amount.
In order to default to the original if there is no adjustments, use coalesce outside of the sum, and have the original amount as the second parameter.
Coalesce will return the first value if it is not null, or the value of the next parameter otherwise. The result of the sum aggregate will be null if no rows are returned from the left join.
Query
SELECT ub.name
, ub.user_id
, COALESCE(SUM(ba.amount_owed), ub.amount_owed) AS amount_owed
, ub.when_to_pay
, ub.current_balance
FROM #unadjusted_balance AS ub
LEFT JOIN #balance_adjustments AS ba ON ba.user_id = ub.user_id AND ba.is_current = 1
GROUP BY ub.name, ub.user_id, ub.amount_owed, ub.when_to_pay, ub.current_balance;
I wasn't quite sure if you could have more than one adjustment where is_current is equal to 1. If there is at most going to be one row, then omit the aggregate and group by and just pass ba.amount_owed as the first parameter in the coalesce.

SQL - Column in field list is ambiguous

I have two tables BOOKINGS and WORKER. Basically there is table for a worker and a table to keep track of what the worker has to do in a time frame aka booking. I’m trying to check if there is an available worker for a job, so I query the booking to check if requested time has available workers between the start end date. However, I get stuck on the next part. Which is returning the list of workers that do have that time available. I read that I could join the table passed on a shared column, so I tried doing an inner join with the WORKER_NAME column, but when I try to do this I get a ambiguous error. This leads me to believe I misunderstood the concept. Does anyone understand what I;m trying to do and knows how to do it, or knows why I have the error below. Thanks guys !!!!
CREATE TABLE WORKER (
ID INT NOT NULL AUTO_INCREMENT,
WORKER_NAME varchar(80) NOT NULL,
WORKER_CODE INT,
WORKER_WAGE INT,
PRIMARY KEY (ID)
)
CREATE TABLE BOOKING (
ID INT NOT NULL AUTO_INCREMENT,
WORKER_NAME varchar(80) NOT NULL,
START DATE NOT NULL,
END DATE NOT NULL,
PRIMARY KEY (ID)
)
query
SELECT *
FROM WORKERS
INNER JOIN BOOKING
ON WORKER_NAME = WORKER_NAME
WHERE (START NOT BETWEEN '2010-10-01' AND '2010-10-10')
ORDER BY ID
#1052 - Column 'WORKER_NAME' in on clause is ambiguous
In your query, the column "worker_name" exists in two tables; in this case, you must reference the tablename as part of the column identifer.
SELECT *
FROM WORKERS
INNER JOIN BOOKING
ON workers.WORKER_NAME = booking.WORKER_NAME
WHERE (START NOT BETWEEN '2010-10-01' AND '2010-10-10')
ORDER BY ID
In your query, the column WORKER_NAME and ID columns exists in both tables, where WORKER_NAME retains the same meaning and ID is re-purposed; in this case, you must either specify you are using WORKER_NAME as the join search condition or 'project away' (rename or omit) the duplicate ID problem.
Because the ID columns are AUTO_INCREMENT, I assume (hope!) they have no business meaning. Therefore, they could both be omitted, allowing a natural join that will cause duplicate columns to be 'projected away'. This is one of those situations where one wishes SQL had a WORKER ( ALL BUT ( ID ) ) type syntax; instead, one is required to do it longhand. It might be easier in the long run to to opt for a consistent naming convention and rename the columns to WORKER_ID and BOOKING_ID respectively.
You would also need to identify a business key to order on e.g. ( START, WORKER_NAME ):
SELECT *
FROM
( SELECT WORKER_NAME, WORKER_CODE, WORKER_WAGE FROM WORKER ) AS W
NATURAL JOIN
( SELECT WORKER_NAME, START, END FROM BOOKING ) AS B
WHERE ( START NOT BETWEEN '2010-10-01' AND '2010-10-10' )
ORDER BY START, WORKER_NAME;
This is good, but its returning the start and end times as well. I'm just wanting the WOKER ROWS. I cant take the start and end out, because then sql doesn’t recognize the where clause.
Two approaches spring to mind: push the where clause to the subquery:
SELECT *
FROM
( SELECT WORKER_NAME, WORKER_CODE, WORKER_WAGE FROM WORKER ) AS W
NATURAL JOIN
( SELECT WORKER_NAME, START, END
FROM BOOKING
WHERE START NOT BETWEEN '2010-10-01' AND '2010-10-10' ) AS B
ORDER BY START, WORKER_NAME;
Alternatively, replace SELECT * with a list of columns you want to SELECT:
SELECT WORKER_NAME, WORKER_CODE, WORKER_WAGE
FROM
( SELECT WORKER_NAME, WORKER_CODE, WORKER_WAGE FROM WORKER ) AS W
NATURAL JOIN
( SELECT WORKER_NAME, START, END FROM BOOKING ) AS B
WHERE START NOT BETWEEN '2010-10-01' AND '2010-10-10'
ORDER BY START, WORKER_NAME;
This error comes after you attempt to call a field which exists in both tables, therefore you should make a reference. For instance in example below I first say cod.coordinator so that DBMS know which coordinator I want
SELECT project__number, surname, firstname,cod.coordinator FROMcoordinatorsAS co JOIN hub_applicants AS ap ON co.project__number = ap.project_id JOIN coordinator_duties AS cod ON co.coordinator = cod.email

field in subquery based on age of row instead of "group by"

I can't seem to get this query right. I have tables like this (simplified):
person: PersonID, ...other stuff...
contact: ContactID, PersonID, ContactDate, ContactTypeID, Description
I want to get a list of all the people who had a contact of a certain type (or types) but none of another type(s) that occurred later. An easy-to-understand example: Checking for records of gifts received without having sent a thank-you card afterward. There might have been other previous thank-you cards sent (pertaining to other gifts), but if the most recent occurrence of a Gift Received (we'll say that's ContactTypeID=12) was not followed by a Thank You Sent (ContactTypeID=11), the PersonID should be in the result set. Another example: A mailing list would be made up of everyone who has opted in (12) without having opted out (11) more recently.
My attempt at a query is this:
SELECT person.PersonID FROM person
INNER JOIN (SELECT PersonID,ContactTypeID,MAX(ContactDate) FROM contact
WHERE ContactTypeID IN (12,11) GROUP BY PersonID) AS seq
ON person.PersonID=seq.PersonID
WHERE seq.ContactTypeID IN (12)`
It seems that the ContactTypeID returned in the subquery is for the last record entered in the table, regardless of which record has the max date. But I can't figure out how to fix it. Sorry if this has been asked before (almost everything has!), but I don't know what terms to search for.
Wow. A system to check who has been good and sent thank yous. I think I would be in your list...
Anyway. Give this a go. The idea is to create two views: the first with personId and the time of the most recently received gift and the second with personId and the most recently sent thanks. Join them together using a left outer join to ensure that people who have never sent a thank you are included and then add in a comparison between the most recently received time and the most recent thanks time to find impolite people:
select g.personId,
g.mostRecentGiftReceivedTime,
t.mostRecentThankYouTime
from
(
select p.personId,
max(ContactDate) as mostRecentGiftReceivedTime
from person p inner join contact c on p.personId = c.personId
where c.ContactTypeId = 12
group by p.personId
) g
left outer join
(
select p.personId,
max(ContactDate) as mostRecentThankYouTime
from person p inner join contact c on p.personId = c.personId
where c.ContactTypeId = 11
group by p.personId
) t on g.personId = t.personId
where t.mostRecentThankYouTime is null
or t.mostRecentThankYouTime < g.mostRecentGiftReceivedTime;
Here is the test data I used:
create table person (PersonID int unsigned not null primary key);
create table contact (
ContactID int unsigned not null primary key,
PersonID int unsigned not null,
ContactDate datetime not null,
ContactTypeId int unsigned not null,
Description varchar(50) default null
);
insert into person values (1);
insert into person values (2);
insert into person values (3);
insert into person values (4);
insert into contact values (1,1,'2013-05-01',12,'Person 1 Got a present');
insert into contact values (2,1,'2013-05-03',11,'Person 1 said "Thanks"');
insert into contact values (3,1,'2013-05-05',12,'Person 1 got another present. Lucky person 1.');
insert into contact values (4,2,'2013-05-01',11,'Person 2 said "Thanks". Not sure what for.');
insert into contact values (5,2,'2013-05-08',12,'Person 2 got a present.');
insert into contact values (6,3,'2013-04-25',12,'Person 3 Got a present');
insert into contact values (7,3,'2013-04-30',11,'Person 3 said "Thanks"');
insert into contact values (8,3,'2013-05-02',12,'Person 3 got another present. Lucky person 3.');
insert into contact values (9,3,'2013-05-05',11,'Person 3 said "Thanks" again.');
insert into contact values (10,4,'2013-04-30',12,'Person 4 got his first present');

MYSQL: show registers, relate and rename

The instructions are the following:
Generate a report of all the products purchased by the customers where it appears: the id of the customer, the customer's full name, the city, the state, the ID number, the date of sale, the product code, the product name, the quantity sold and finally a message that says "you paid" or "payment Pending" status depending on the payment status where 0 = paid and 1 = pending. This report should appear sorted alphabetically first by state and then by customer name.
what I tried is this:
select cli_nom, cli_city, cli_state, fac_num, fac_saledate, prod_cod, fac_total, fac_status
where fac_status = 0 as paid and fac_status = 1 as pending
from factures, products, clients order by cli_state, cli_nom, asc;
Wich absolutley didnt work, Im not sure about the sintax to rename or mask a column.
The table structures are the following:
table clientes:
1. cli_nom varchar(100)
2. cli_state varchar(100)
3. cli_city varchar(100)
4. cli_id int(11)
5. cli_status int(11)
6. cli_dateofsale date
table products:
1. prod_cod int(11)
2. prod_categ char(1)
3. prod_nom varchar(100)
4. prod_price double
5. prod_descrip varchar(100)
6. prod_discount float
table facturas:
1. fac_num int(11)
2. fac_datesold date
3. fac_cli_id int(11)
4. fac_status int
5. fac_total float
You are having a trouble with the querys.
When you want to query something, the form of the complete statement is something like this
Select [fields]
from [table(s)] --which means there includes inner joins
where [filter rows]
group by [fields to group]
having [filtering groups]
order by [fields]
Of course, is something much more complicated and bigger than this, but it will give you some initial concepts.
You will always have to respect this order, so in your query are putting a where into the select.
If you want to change something to show, dependending on some evaluation, but you will ALWAYS show something (you are not filtering, you are choosing what to show according to the value), you can use CASE clause.
In this example, you could do something like this
select cli_nom, cli_city, cli_state, fac_num, fac_saledate,
prod_cod, fac_total, fac_status
CASE when fac_status = 0 then 'You Paid'
when fac_status = 1 then 'payment Pending'
else 'Not sure about state' END
from factures
inner join products on --put here how do you relate products with factures
inner join clients on -- put here how do you relate clients with products/factures
order by cli_state, cli_nom, asc;
If you don't know how to use INNER JOIN, here you have some info.
Basically, is a clause that is used to relate two tables.
something like
(..)
from Table1 A
INNER JOIN Table2 B on A.id = B.id
(A and B are aliases, and representing the table that have set).
This means that it will compare every row from Table1 to every row from Table2, and when the condition is matched (in this case id from table1 [A.id] equals id from Table2 [= B.id]) then that relation-row is showed (means that it will show you all the row from table1 + all the row from table2)