SQL: Update table by mapping two columns to each other - mysql

I have the following two tables:
Table A
+-------------------+
|___User___|__Value_|
| 3 | a |
| 4 | b |
| 5 | c |
|____6_____|__d_____|
Table B
+-------------------+
|___User___|__Value_|
| 1 | |
| 4 | |
| 5 | |
|____9_____|________|
My job is to take user from Table A (and their correspondings value) and then map it to Table B and insert those values in there. So from the above example Table B should look like this after running the script:
Table B
+-------------------+
|___User___|__Value_|
| 1 | |
| 4 | b |
| 5 | c |
|____9_____|________|
My question is how can I construct an SQL query that will do this for me in an efficient way, if Table A contains 300,000 + entries and Table B contains 70,000 entries?
NOTES: In Table A the User field is not unique and neither is the Value field. However in Table B, both the User and Value fields are unique and should not appear more than once. Neither are primary keys for either tables.

Could be this
update table_b as b
inner join table_a as a on a.User = b.User
set b.value = a.value

In real-world situations, it would be more likely that you want a predictable value, such as the greatest value for any given user. In that case you would want
update table_b as b
inner join (
select user, max(value) from table_a
group by user ) as a_max on a.user = b.user
set b.value = a_max.value

Your question is unclear about what to do about any values that are already in b. If you use a left join, then these will explicitly be set to NULL:
update table_b b left join
table_a a
on a.User = b.User
set b.value = a.value;
If you want to keep the existing values for non-matches, then use inner join.
Note that this might be inefficient, but should be ok if an index exists on a(user).
If you had very few users in a and lots and lots of duplicates, then you might want to aggregate a before doing the join.

Related

Does MySQL automatically use the coalesce function during a join between tables?

During a table join, when does MySQL use this function?
The single result column that replaces two common columns is defined
using the coalesce operation. That is, for two t1.a and t2.a the
resulting single join column a is defined as a = COALESCE(t1.a, t2.a),
where:
COALESCE(x, y) = (CASE WHEN x IS NOT NULL THEN x ELSE y END)
https://dev.mysql.com/doc/refman/8.0/en/join.html
I know what the function does, but I want to know when it is used during the join operation. This just makes no sense to me! Can someone show me an example?
That is in reference to redundant column elimination during natural join and join with using. Describing how the columns are excluded from display.
The order of operation is described above the section you referenced.
First, coalesced common columns of the two joined tables, in the order in which they occur in the first table
Second, columns unique to the first table, in order in which they occur in that table
Third, columns unique to the second table, in order in which they occur in that table
Example
t1
| a | b | c |
| 1 | 1 | 1 |
t2
| a | b | d |
| 1 | 1 | 1 |
The join with using
SELECT * FROM t1 JOIN t2 USING (b);
Would result in, t1.b being coalesced (due to USING), followed by the columns unique to the first table, followed by those in the second table.
| b | a | c | a | d |
| 1 | 1 | 1 | 1 | 1 |
Whereas a natural join
SELECT * FROM t1 NATURAL JOIN t2;
Would result in, the t1 columns (or rather common columns from both tables) being coalesced, followed by the unique columns of the first table, followed by those in the second table.
| a | b | c | d |
| 1 | 1 | 1 | 1 |

Efficient way to get DISTINCT rows of Table A when JOINing with Table B

Simple problem. Given example tables:
Table A:
id | type
---+-----
1 | A
2 | B
3 | C
Table B:
id | a_id | type
---+------+-----
1 | 1 | X
2 | 2 | Y
3 | 1 | X
4 | 3 | Z
(there are additional columns, which I omitted, in order to clarify the problem)
The query:
SELECT a.*
FROM a a
INNER JOIN b b ON b.a_id = a.id
WHERE b.type = 'X'
Result:
id | type
---+-----
1 | A
1 | A
SQL Fiddle: http://sqlfiddle.com/#!2/e6138f/1
But I only want to have distinct rows of Table A. I know, I could do SELECT DISTINCT a.*, but our Table A has about 40 columns, and this SELECT can return 100-10000 rows. Isn't that extremely slow, if the database has to compare each column?
Or is MySQL intelligent enough, to just focus on the Primary Key for the DISTINCT operation?
Thanks in advance :)
Use exists instead of an explicit join:
select a.*
from tablea a
where exists (select 1 from tableb b where b.a_id = a.id and b.type = 'x');
For performance, create an index on tableb(a_id, type).

SQL Delete on inner join on MISSING data

My question is almost identical to SQL DELETE with INNER JOIN ; but I want to delete on non equal!
MY PROBLEM IN A BRIEF:
There are 2 tables, bus_stops, bus_routes ;
bus_routes {id, bus_route_id,..other columns..}
bus_stops {id, bus_route_id,..other columns..}
Some routes had been deleted, but bus stops remaining, I need to delete them too. Means, I need to delete only bus_stops, which have NO associated bus route!
It means something like:
DELETE bs.* FROM bus_stops AS bs
INNER JOIN bus_routes AS br
ON bs.bus_route_id <> br.bus_route_id
But the above code will definitely not work.
You should use LEFT JOIN, below query will work:
DELETE bs.*
FROM bus_stops AS bs
LEFT JOIN bus_routes AS br
ON bs.bus_route_id = br.bus_route_id
WHERE br.bus_route_id IS NULL
A join in SQL is first of all the Cartesian product of both tables. Meaning every record of table A is combined with every record of table B. The join condition then reduces the records by eleminating records that do not match the condition.
If you use an INNER JOIN with not equal (<>) every record is going to be deleted if you have at least to distinct values. A small example:
Table A | B Table C | D
============= =============
| 1 | 1
| 2 | 2
The Cartesian product of A X B is:
| B | D
==========
| 1 | 1
| 1 | 2
| 2 | 1
| 2 | 2
If you now use B <> C to select the values, the result will be:
| B | D
==========
| 1 | 2
| 2 | 1
This would delete both records.
As a solution try an outer join or a subquery.
Example (subquery):
DELETE FROM C WHERE NOT EXISTS(SELECT * FROM A WHERE A.B = C.d)
Example (outer join):
DELETE FROM C LEFT JOIN A ON C.D = A.B WHERE A.B IS NULL

Update table using certain values from another table

I'm faced with a problem where I need to update one table based on values stored in another. However, the second table contains rows which are not relevant to the query. For example:
Table1
id | active
------------
1 | Yes
2 | Yes
3 | Yes
4 | Yes
Table2
id | type | value
--------------------
1 | date | 2011
1 | name | Glen
2 | date | 2012
2 | name | Mike
I want to read the values of type 'date' and skip name, and update table1 in the process.
I've put together the following:
UPDATE table1 a, tabel2 b
SET a.active='no'
WHERE a.id = b.id
AND b.type='date'
AND b.value='2011'
This doesn't seem to work well at all.
Any help would be great.
id is the key which joins the tables.
UPDATE table1 a, tabel2 b
SET a.active='no'
WHERE a.id = b.id
AND b.type='date'
AND b.value='2011'
Try this:
UPDATE table1
SET active = 'no'
WHERE a.id
IN (
SELECT b.id FROM table2 WHERE type = 'date' AND value = '2011'
)
This will work with a natural join
UPDATE table1
SET active='no'
WHERE id in
(
select id from table1 natural join table2
where
type='date'
AND value='2011'
)

Retrieving as much data as possible using two keys, one of which is corrupted

I'm trying to add a column to one table (some_data) from another (all_info); originally, id_a and id_b would be a sufficient foreign key and I'd use them to uniquely identify the values from table all_info to bring over to table some_data. However, somewhere down the line some of the id_b's in some_data got corrupted. Thus, I want to add a column to some_data that still uses the values in all_info when there is an exact match on id_a and id_b, or, if no exact match exists on id_a and id_b but there is only one entry for that particular id_a in all_info, we assume that's what we want (and replace the corrupted id_b in some_data)
So, given two tables,
some_data all_info
id_a | id_b id_a | id_b | val
------------ --------------------
1 | a 1 | a | v_i
2 | b 2 | c | v_x
3 | c 2 | b | v_ii
4 | d 3 | d | v_iv
3 | e | v_v
4 | f | v_vi
I'd like to obtain:
id_a | id_b | val
------------------
1 | a | v_i
2 | b | v_ii
3 | c | NULL
4 | f | v_vi
Thus far I've thought of two approaches, one of which is rudimentarily:
SELECT sd.*, ai.val
FROM some_data sd
LEFT OUTER JOIN all_info ai
ON sd.id_a = ai.id_a
AND (sd.id_b = ai.id_b OR COUNT(*) = 1)
Of course that itself wouldn't work (and also doesn't accomplish my secondary goal of replacing the bad id_b's), but trying various groupings and selectings with the COUNT() function I couldn't find anything that SQL found agreeable enough to run with. I thought also to try populating the column with SET commands but again couldn't find a way to make it work.
As a side note, looking at the data it seems as if all_info has AT MOST one row that matches some_data on both id_a and id_b. Also, when id_a and id_b do match it is safe to assume that the match is correct, given the complexity of id_b.
Your select would be something like this:
SELECT sd.id_a, sd.id_b,
CASE WHEN ai.id_a IS NULL THEN ai2.val ELSE ai.val END as val
FROM some_data sd
LEFT JOIN all_info ai
ON ad.id_a = ai.id_a AND ad_id_b = ai.id_b
LEFT JOIN
(SELECT id_a, MIN(id_b) id_b, MIN(val) val
FROM all_info
GROUP BY id_a
HAVING COUNT(*) = 1
) ai2 ON sd.id_a = ai.id_a
You could run this to fix your data:
UPDATE some_data sd
JOIN
(SELECT id_a, MIN(id_b) id_b
FROM all_info
GROUP BY id_a
HAVING COUNT(*) = 1
) ai ON sd.id_a = ai.id_a AND sd.id_b <> ai.id_b
SET ad.id_b = ai.id_b