Compare data for two versions of the same MySQL database table - mysql

I have two MySQL tables that have the exact same structure and mostly the same data. Some of the rows would be different between the two because my client updated the old website instead of the new website. There are hundreds of records and a column is not in place for the last modified date. I have created a new database on localhost and imported the old and new tables. All of the rows of data will need to be compared and differences between the old and new databases will need to be returned. Once the differences are identified, would there be a way to easily migrate the updated data from the old table to the new table? I am a MySQL novice, but I can usually muddle my way through issues. Thanks in advance for your assistance.
I have been looking at the following code, but I am not sure if it is the best answer.
SELECT *,'table_1' AS o FROM table_1
UNION
SELECT *,'table_2' AS o FROM table_2
WHERE some_id IN (
SELECT some_id
FROM (
SELECT * FROM table_1
UNION
SELECT * FROM table_2
) AS x
GROUP BY some_id
HAVING COUNT(*) > 1
)
ORDER BY some_id, o;

This should do the trick. You are finding the primary keys for all rows where the every value is the same across both tables in the subselect used in the where clause. You then exclude rows with those primary keys from the unioned result set. Now how you go about reconciling the differences is a totally different story :)
SELECT * FROM (
SELECT *, 'table 1' FROM table_1
UNION ALL
SELECT *, 'table 2' FROM table_2
) AS combined
WHERE combined.primary_key_field
NOT IN (
SELECT t1.primary_key_field
FROM table_1 AS t1
INNER JOIN table_2 AS t2
ON t1.primary_key_field = t2.primary_key_field
AND t1.some_other_field = t2.some_other_field
AND ... /* join on all fields in tables */
)

A insert into select single query will do.
insert into table_new
select * from table_old
where some_id NOT IN (select some_id from table_new)

Related

Select the same columns from a list of tables

I am looking to run this query on a list of tables.
SELECT Description,Code,count(*) as count
FROM table1
group by Description,code
having count(*) > 1
I will have to run this query on 30+ different tables, I was wondering If I could change the from statement and just list off the table names.
In addition, is there some functionality that will add the name of the table that it came from in a seperate column to distinguish where the results came from?
Thanks in advance
You might use UNION ALL to put it together. Unless you need some dynamic table selection.
SELECT Description,Code,count(*) as count, 'table1' as tableNane
FROM table1
group by Description,code
having count(*) > 1
UNION ALL
SELECT Description,Code,count(*) as count, 'table2' as tableNane
FROM table2
group by Description,code
having count(*) > 1
...
Actualy I like #Shubhradeep Majumdar version. It will generate more concise code.
SELECT Description,Code, Count(Code), tableName FROM (
SELECT Description,Code, 'table1' as tableName
FROM table1
UNION ALL
SELECT Description,Code, 'table2' as tableName
FROM table2
) tables
GROUP BY tableName, Description, Code
HAVING COUNT(Code) > 1
But there might be a little catch to it. It is more elegant code, but it might actually be slower than first version. The problem is that tableName is appended at every record before grouping while in my first version you do that on already processed data.
Carrying over from #Marek's answer, You could first append all the tables to a table with union all.
select *, 'tab1' as tabnm from tab1
union all
select *, 'tab2' as tabnm from tab2
union all
select *, 'tab3' as tabnm from tab3
-- and so on...
And then use your code to process that final table.
will save you a great deal of time.
EDITED with a column specifying the table name

Compare two identical tables MySQL

I'm currently in the process of converting data, in a table, which is why I've created a new table, identical to the old one, but empty.
I've run my data converter, and I have a difference in row count.
How do I select all rows that are different from the two tables, leaving out the primary key identifier (that differs on every entry).
select * from (
SELECT 'Table1',t1.* FROM table1 t1 WHERE
(t1.id)
NOT IN (SELECT t2.id FROM table2 t2)
UNION ALL
SELECT 'Table2',t2.* FROM table2 t2 WHERE
(t2.id)
NOT IN (SELECT t1.id FROM table1 t1))temp order by id;
You can add more columns in where columns to check on more info.
Try and see if this helps.

SQL Join 2 tables with almost same field

I need to join two tables in SQL. There are no common fields. But the one table have a field with the value krin1001 and I need it to be joined with the row in the other table where the value is 1001.
The idea behind the joining is i have multiple customers, but in the one table there customer id is 'krin1001' 'krin1002' and so on, in this table is how much they have sold. In the other table there customer is is '1001' '1002' and so on, and in this table is there name and adress and so on. So it will always be the first 4 charakters i need to strip from the field before matching and joining. It might not always be 'krin' i need it to work with 'khjo1001' also, and it still needs to join on the '1001' value from the other table.
Is that possible?
Hope you can help me.
You need to use substring:
ON SUBSTRING(TableA.Field, 5, 4) = TableB.Field
Or Right:
ON RIGHT(TableA.Field, 4) = TableB.Field
You can also try to use CHARINDEX function for join operation. If value from table1 contains value from table2 row will be included in result set.
;WITH table1 AS(
SELECT 'krin1001' AS val
UNION ALL
SELECT 'xxx'
UNION ALL
SELECT 'xyz123'
),
table2 AS(
SELECT '1001' AS val
UNION ALL
SELECT '12345'
UNION ALL
SELECT '123'
)
SELECT * FROM table1 AS t
JOIN table2 AS T2 ON CHARINDEX(T2.val, T.val) > 0
Use it as:
SELECT
*
FROM table t1
INNER JOIN table t2 ON RIGHT(t1.col1, 4) = t2.col1;

Copy rows if value exists x amount of times

I have two tables Board1 and Board2 with the identical structure. They both have a primary index column of id. I have a THIRD table called Table1, which has a non-indexed column board_id, where the same board_id occurs multiple times. board_id always corresponds to an id in Board1. Board2 is currently empty, and I want to add rows from Board1, but only where the same board_id occurs at least six times in Table1. Table1 will be changing periodically, so I'll be needing to do the query in the future, but without doubling id rows which are already in Board2.
So to recap:
There are three tables: Board1, Board2, and Table1. I want to copy rows from Board1 to Board2, but only where the id in the Board1 occurs (at least) six times in Table1 as `board_id'.
I'd appreciate any help!
EDIT: I'm dreadfully sorry, but I realized I made a huge mistake in my question. I've rewritten it to reflect what I actually needed. I'm truly sorry.
You can do it like this
INSERT INTO Table2
SELECT
id,
board_id
FROM (SELECT
b.id,
b.board_id,
bl.Count
FROM board as b
LEFT JOIN (SELECT
board_id,
COUNT(board_id) as `Count`
FROM board
GROUP BY board_id) as bl
on bl.board_id = b.board_id
group by b.id
having bl.Count >= 6) as L
If you need more columns you can select them in inner and outer queries.
Fiddle Demo for Select
Here is what you asked for, with fiddle
INSERT Table2
SELECT
*
FROM
Table1
JOIN
(
SELECT
Board_Id,
count(*) cnt
FROM
Table1
GROUP BY
Board_Id
) BoardIds
ON BoardIds.Board_Id = Table1.Board_Id
WHERE
BoardIds.cnt > 5
AND
NOT EXISTS (SELECT id FROM Table2 WHERE Table2.id = Table1.id)
Try something like the below:
Add your column names where specified (excluding any ID columns), as I'm assuming each row will have a unique ID, so you won't be able to GROUP and COUNT by doing SELECT * FROM Table1
You may need to test / validate this
INSERT INTO Board2 (Your Column Names)
SELECT (Your Column Names)
FROM Board1
WHERE id (IN (SELECT board_id
FROM Table1
GROUP BY (board_id)
HAVING (COUNT(*) >= 6))
AND board_id NOT IN(SELECT DISTINCT board_id FROM Board2)

Insert into a table with a select statement in mysql

I have a table that has training history that has been modified by many different users over the years. This has cause the same training record to be entered twice. I want to create a table that replicates the main table and insert all duplicate records.
What constitutes a duplicate record is if the employee_id, course_code, and completion_date all match.
I can create the duplicate table and I have a select statement that appears to pull duplicates, but it pulls only one of them and I need it to pull both (or more) of them. This is because one person may have entered the training record with a different course name but the id, code, and date are the same so it is a duplicate entry. So by pulling all the duplicates I can validate that that is the case.
Here is my SELECT statement:
SELECT *
FROM
training_table p1
JOIN
training_table p2 ON (
p1.employee_id = p2.employee_id
AND p1.course_code = p2.course_code
AND p1.completion.date = p2.completion_date)
GROUP BY p1.ssn;
The query runs and returns what appear to be unique rows. I would like all of the duplicates. And whenever I try to INSERT it into an identical table I get an error stating my column count doesn't match my value count.
Any help would be great.
This will select any duplicate rows for insertion into your new table.
SELECT p1.*
FROM training_table p1
JOIN
(SELECT employee_id, course_code, completion_date
FROM training_table
GROUP BY employee_id, course_code, completion_date
HAVING COUNT(*) > 1
) dups
ON p1.employee_id = dups.employee_id
AND p1.course_code = dups.course_code
AND p1.completion_date = dups.completion_date
;
Try to use CROSS JOIN (Cartesian Product Join) instead JOIN only. For insert try INSERT INTO TABLE (column1, column2, column3) SELECT column1, column2, column3 FROM TABLE; in same order.
Thanks for the help. I had discovered the answer shortly after I posted the question (even though I had looked for the answer for over an hour :) ) Here is what I used:
SELECT *
FROM training_table mto
WHERE EXISTS
(
SELECT 1
FROM training_table mti
WHERE mti.employee_id = mto.employee_ie
AND mti.course_code = mto.course_code
AND mti.completion_date = mto.completion_date
LIMIT 1, 1
)
I just added the INSERT statement and it worked.
Thanks.