I'm trying to compare two tables in different databases (or looking for a best way to do this).
Table in database one:
id int(11)
lastmod int(11)
Table in database two:
id int(11)
timestamp int(11)
Both tables have matching ids (id is not unique in db1. Like one(db2) to many (db1)) and time stamps (but other columns differ). But with time, records in database two will be updated (data in one unimportant column). And now I need to find records (timestamps), comparing ids, to find which records I need to update in database one.
Problem is also with performance, because both tables have more than 5 000 000 records.
What is best way (most optimal) to find records which need to be updated?
Assuming that id is a primary key in both tables, then the following should be efficient:
select *
from db1.table t1 join
db2.table t2
on t1.id = t2.id and
t1.lastmod <> t2.timestamp
Note that this assumes two things. First, the id is unique in each table and second that the timestamp column is not NULL.
EDIT:
If the situation is that you have multiple modifications in t1 and are trying to compare the results to t2, which has only one row, then aggregate t1 first to get the most recent modification date and proceed from there:
select *
from (select t1.id, max(t1.lastmod) as lastmod
from db1.table t1
group by t1.id
) t1 join
db2.table t2
on t1.id = t2.id and
t1.lastmod <> t2.timestamp
If you are really looking for a record with more than one modification in t1, then add a having count(*) > 1 to the subquery.
Related
I'm currently in the process of converting data, in a table, which is why I've created a new table, identical to the old one, but empty.
I've run my data converter, and I have a difference in row count.
How do I select all rows that are different from the two tables, leaving out the primary key identifier (that differs on every entry).
select * from (
SELECT 'Table1',t1.* FROM table1 t1 WHERE
(t1.id)
NOT IN (SELECT t2.id FROM table2 t2)
UNION ALL
SELECT 'Table2',t2.* FROM table2 t2 WHERE
(t2.id)
NOT IN (SELECT t1.id FROM table1 t1))temp order by id;
You can add more columns in where columns to check on more info.
Try and see if this helps.
I have two MySQL tables that have the exact same structure and mostly the same data. Some of the rows would be different between the two because my client updated the old website instead of the new website. There are hundreds of records and a column is not in place for the last modified date. I have created a new database on localhost and imported the old and new tables. All of the rows of data will need to be compared and differences between the old and new databases will need to be returned. Once the differences are identified, would there be a way to easily migrate the updated data from the old table to the new table? I am a MySQL novice, but I can usually muddle my way through issues. Thanks in advance for your assistance.
I have been looking at the following code, but I am not sure if it is the best answer.
SELECT *,'table_1' AS o FROM table_1
UNION
SELECT *,'table_2' AS o FROM table_2
WHERE some_id IN (
SELECT some_id
FROM (
SELECT * FROM table_1
UNION
SELECT * FROM table_2
) AS x
GROUP BY some_id
HAVING COUNT(*) > 1
)
ORDER BY some_id, o;
This should do the trick. You are finding the primary keys for all rows where the every value is the same across both tables in the subselect used in the where clause. You then exclude rows with those primary keys from the unioned result set. Now how you go about reconciling the differences is a totally different story :)
SELECT * FROM (
SELECT *, 'table 1' FROM table_1
UNION ALL
SELECT *, 'table 2' FROM table_2
) AS combined
WHERE combined.primary_key_field
NOT IN (
SELECT t1.primary_key_field
FROM table_1 AS t1
INNER JOIN table_2 AS t2
ON t1.primary_key_field = t2.primary_key_field
AND t1.some_other_field = t2.some_other_field
AND ... /* join on all fields in tables */
)
A insert into select single query will do.
insert into table_new
select * from table_old
where some_id NOT IN (select some_id from table_new)
In mysql, if I have a record that references the id of another record. For example
Table 1
id bigint
tabe2ref bigint
Table 2
id bigint
Where table2ref simply references Table2.id.
Is there a way to list all records in table 1 that reference a record in table 2 where that record doesn't exist?
If you want the data from table2 as well, use a LEFT JOIN as in dognose's answer. If you only want the data from table1, use a subquery, like this:
SELECT * FROM table1 WHERE table2ref NOT IN (
SELECT id FROM table2
)
Essentially, this reads "get everything from table 1 and subtract all rows which have a table2ref that isn't in all rows in table2."
You are looking for a LEFT JOIN - everything where the entry in table 2 is not existing will have a null for the table2.id after joining:
SELECT
table1.id, table1.table2ref, table2.id
FROM
table1
LEFT JOIN
table2
ON
table1.table2ref = table2.id
WHERE
ISNULL(table2.id) -- only those records with missing reference.
See also: http://giannopoulos.net/wp-content/uploads/2013/05/BHVicYICMAAdHGv.jpg
(first column, second row)
I have:
simple_table
|- first_id
|- second_id
SELECT * FROM table t1 JOIN table t2
ON [many many conditions]
ON t1.id IN (SELECT first_id FROM simple_table)
AND t2 = (
SELECT second_id FROM simple_table WHERE t1.id = first_id //4th row, can return NULL
)
Questions:
How to handle situation where 4th row return null?
Can I use t1 & t2 alias inside subqueries?
Updated [extra wxplanation]
I have very big table. I need to iterate through table and check some conditions. Actually simple_table provide the ids of table entities, conditions of which I should check. I mean:
simple_table
first_id second_id
11 128
table
id <other_fields>
................
11 <other_data>
...............
128 <other_data>
So, I should check whether those two entities in table have right conditions relatively one another.
The question is unclear, but given the update the query should work better if there is an index on the ID of the big table (probably it's there already as the PK).
As the condition seems to be on the same table the easiest query will be
SELECT ...
FROM bigtable t1
INNER JOIN simple_table st ON t1.ID IN (st.first_id, st.second_id)
or
SELECT ...
FROM bigtable t1
INNER JOIN simple_table st ON t1.ID = st.first_id
INNER JOIN bigtable t2 ON st.second_id = t2
to get the two rows from bigtable on the same row of the result.
The second query will make the checks easier to write, the first will be faster but most probable need a GROUP BY to return the wanted results.
Some performance tests on the OP machine are needed to get the fastest one.
In case one of the ID in simple_table is NULL only the other will be considered, the code will have to check about it.
You can use the alias of the tables in the subqueries, and you'll need to do that as you'll probably have the same table in the subqueries.
The relative condition to check are still undisclosed by the OP so that's all I can help with.
I am trying to find all the records that are in t1 but not in t2. I know there are more records in t1 than in t2 because when I run
select count(*)
from t1;
select count(*)
from t2;
I get 21,500 records and 21,000 records respectively. But the problem is these tables are not normalized, there are no primary keys, therefore I cannot do something like this:
SELECT id FROM t1
where t1.id
not in (
SELECT t2.id
FROM t2
where t2.id is not null);
or this
SELECT t1.id, t2.id
FROM t1
LEFT JOIN t2
ON t1.id = t2.id
where t2.id is null
as both return null, as the id numbers match perfectly, there seems to be the same exact amount of ids. There must be another field which is not matching.
UPDATE
I ended up doing this:
select id, count(id)
from t1
group by id;
select id, count(id)
from t2
group by id
it gave the same amount of claim numbers and the count of times it shows up. I copied and pasted it into excel and just subtracted one count from the other and did a conditional formatting to only show the ones that are not zero and this gave me all the ids that showed up in one table more than the other. (Sloppy solution, but it was able to resolve the issue).
You have two problems. A bad database design and somehow bogus data is being inserted into your tables.
I don't know if this will work without indexes.
A left outer join should get you started (look up the syntax).
You should end up with something like:
t1.id t2.id
1 1
2 2
3 null
4 4
Try to fix the tables by adding a primary key on 'id' to both tables using MySQL:
ALTER TABLE t1 ADD PRIMARY KEY (id)
ALTER TABLE t2 ADD PRIMARY KEY (id)