Merging databases - adding rows that not existing in one db to another - mysql

I have 2 databases from Wordpress website.
There was happenned issue and 50% of my posts dissapeared.
I have database 1 copy from 03.03.21
And existing database 2 of website from 24.03.21
So in database 1 i have many posts thats was deleted
And the database 2 has some new posts that not exist in older database 1
Is there any software or a way to merge these 2 database.
To compare databases and add entries to the newer database that are in the older database?
I could do this manullay but one post has entries in a many tables and its gonna be hard to recover deleted posts

There is no easy solution but you could try to make a "merge" locally for testing purposes.
Here's how I would do it, I can't guarrantee it will work.
1. Load the oldest backup into the server, let's say in a database named merge_target.
2. Load the 2nd backup (the most recent one) into the same server, let's say in a merge_source database.
3. Define a logical order to execute the merge for each table, this depends on the presence of foreign keys:
If a table A has a foreign key referencing table B, then you will need to merge table B before table A.
This may not work depending on your database structure (and I never worked with WordPress myself).
4. Write and execute queries for each table, with some rules:
SELECT from the merge_source database
INSERT into the merge_target database
if a row already exists in merge_target (i.e. they have the same primary key or unique key), you can use MySQL features depending on what you want to do:
INSERT ON DUPLICATE KEY UPDATE if the existing row should be updated
INSERT IGNORE if the row should just be skipped
REPLACE if you really need to delete and re-insert the row
This could look like the following query (here with ON DUPLICATE KEY UPDATE):
INSERT INTO merge_target (col_a, col_b, col_c)
SELECT
col_a
, col_b
, col_c
FROM merge_source
ON DUPLICATE KEY UPDATE
merge_target.col_b = merge_source.col_b
Documentation:
INSERT ... SELECT
ON DUPLICATE KEY UPDATE
REPLACE
INSERT IGNORE is in the INSERT documentation page
Not sure it will help but I wrote a database migration framework in PHP, you can still take a look: Fregata.

Related

How to alter and update large table to add composite key columns form another table

We have two very large tables in our Mysql(MariaDb) database. Table_1 holds a many to many map. It has a auto incremented primary key and a composite key of two columns.
Table_2 refers to the primary key of Table_1. We wan't to fix this obvious error in design by,
Use a composite primary key on Table_1
Add the two columns to Table_2
Populate the composite key in Table_2 by copying data from Table_1, and create index on it.
Preferably delete the auto incremented key column from both tables.
These tables have ~300M rows, and the tables are ~10GB range in size. We need to make these updates within a ~6 hour service window.
I'm investigating how to do this efficiently and doing trials on a replica db. So far I have not tried to run anything with actual data, because ordinary scripts would be insufficient.
I'm not an experienced DB admin. So I need some light shedding to get this done.
My question is what would be the best approach/tips to do this efficiently?
Things I have attempted so far
I read about the new instant add column feature, but our production DB is on MariaDb version 10.0, which is older.
I have followed suggestions in this answer and ran below script on a latest DB version with instant add column support(Alter table was instant). The table had ~50M rows (1/6th of original). It took about two hours , that also is excluding creating new indexes. Therefore this would not be sufficient.
SET join_buffer_size = 4 * 50 * 1024 * 1024; -- 50M keys of 4 bytes each
SET optimizer_switch='mrr=on,mrr_cost_based=off,mrr_sort_keys=on,optimize_join_buffer_size=on';
SET join_cache_level = 8;
UPDATE TABLE_2
JOIN TABLE_1 ON TABLE_1_Id = TABLE_2_FKT1_Id
SET
TABLE_2_KeyPart_1 = TABLE_1_KeyPart_1,
TABLE_2_KeyPart_2 = TABLE_1_KeyPart_2
Also considering evaluating this tool
https://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html
Plan A: Use Percona's tool: pt-online-schema-change.
Plan B: Use a competing product: gh-ost.
Plan C: Don't use UPDATE, that is the killer. Instead, rebuild the table(s) in a straightforward way, then use RENAME TABLE to swap the new version into place.
Partitioning is unlikely to help in any way. Daniel's link helps with doing a lengthy UPDATE, but trades off time (it takes longer) versus invasiveness (which is not an issue because you have a maintenance window).
Some more details into Plan C (which I prefer for this case):
CREATE TABLE(s) ... -- with new names, and all the new features except secondary indexes
INSERT INTO new SELECT ... FROM old table(s)
RENAME TABLE real1 TO old1,
new1 TO real1,
real2 TO old2,
new2 TO real2;
test -- you still undo the RENAME if necessary
DROP TABLE old1, old2;

Normalize data before loading to database or use database?

I have some data which I want to add to an existing mysql database. The new data may have entries, which are already saved on DB. Since some of my columns are unique, I get, as expected, an ER_DUP_ENTRY error.
Bulk Insert
Let's say I want to use following statement to save "A", "B" and "C" in a column names of table mytable and "A" is already saved there.
insert into mytable (names) values ("A"), ("B"), ("C");
Is there a way to directly use bulk insert to save "B" and "C" while ignoring "A"? Or do I have to build an insert statement for every new row? This leads to another question:
Normalize Data
Should I assure not to upload duplicate entries before the actual insert statement? In my case I would need to select the data from database, eliminate duplicates and then perform the above seen insert. Or is that a task which is supposed to be done by a database?
If you have UNIQUE constraints that are blocking import, you have a few ways you can work around that:
INSERT IGNORE INTO mytable ...
If any individual rows violate a UNIQUE constraint, they are skipped. Other rows are inserted.
REPLACE INTO mytable ...
If any rows violate a UNIQUE constraint, DELETE the existing row, then INSERT the new row. Keep in mind side-effects of doing this, like if you have foreign keys that cascade on delete referencing the deleted row. Or if the INSERT generates a new auto-increment id.
INSERT INTO mytable ... ON DUPLICATE KEY UPDATE ...
More flexibility. This does not delete the original row, but allows you to set new values for any columns you choose on a case by case basis. See also my answer to "INSERT IGNORE" vs "INSERT ... ON DUPLICATE KEY UPDATE"
If you want to use bulk-loading with mysqlimport or the SQL statement equivalent LOAD DATA INFILE, there are options that match the INSERT IGNORE or REPLACE solutions, but not the INSERT...ON DUPLICATE KEY UPDATE solution.
Read docs for more information:
https://dev.mysql.com/doc/refman/8.0/en/insert.html
https://dev.mysql.com/doc/refman/8.0/en/replace.html
https://dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html
https://dev.mysql.com/doc/refman/8.0/en/mysqlimport.html
https://dev.mysql.com/doc/refman/8.0/en/load-data.html
In some situations, I like to do this:
LOAD DATA into a temp table
Clean up the data
Normalize as needed. (2 SQLs per column that needs normalizing -- details)
Augment Summary table(s) (INSERT .. ON DUPLICATE KEY .. SELECT x, y, count(*), sum(z), .. GROUP BY x,y)
Copy clean data from temp table to real table(s) ("Fact" table). (INSERT [IGNORE] .. SELECT [DISTINCT] .. or IODKU with SELECT.)
More on Normalizing:
I do it outside any transactions. There are multiple reasons why this is better.
At worst (as a result of other failures), I occasionally throw an unused entry in the normalization table. No big deal.
No burning of AUTO_INCREMENT ids (except in edge cases).
Very fast.
Since REPLACE is a DELETE plus INSERT it is almost guaranteed to be worse than IODKU. However, both burn ids when the rows exist.
If at all possible, do not "loop" through the rows; instead find SQL statements to handle them all at once.
Depending on the details, de-dup in step 2 (if lots of dups) or in step 5 (dups are uncommon).

Update all rows of a single column from one table to the same table in another database

Ok, so I have a database in my testing environment called 'Food'. In this database, there is a table called 'recipe', with a column called 'source'.
This same database exists in my local environment. However, I just received an updated database (in my local environment) where all the column values (for 'source') have changed.
Is there any way I can migrate the 'source' column from my local to my test environment, without changing the values for any other column? There are 1186 rows in the 'Food' database 'recipe' table in my test environment that need to be updated ONLY with the 'source' column.
You need some way to uniquely identify your Recipes. If both tables have a surrogate key that remained constant, use that. Otherwise figure out some way to match up the new data with your test data: you might already have a unique index in mind or you might need to decide on a combination of fields that uniquely identify your Recipes.
On a side note, why can't you just overwrite all the columns? It is just test data, right?
If only a column has changed and you have IDs (or keys) on your rows, you could follow these steps:
create an intermediate table locally
insert keys and new source values there (either those which have changed or all)
use mysqldump to selectively export the table from the local database
copy the dumped table to the remote database server
import it there
join it with the production table in an update statement to replace the values
drop the intermediate table on the server

Augment and Prune a MySQL table

I need a little advice concerning a MySQL operation:
There is a database A wich yields several tables. With a query I selected a set of entries out of this database to copy these results into another table of database B.
Now the table in database B contains the results of my query on database A.
For instance the query is:
SELECT names.name,ages.age FROM A.names names A.ages ages WHERE ages.name = name.name;
And to copy these results into database B I would run:
INSERT INTO B.persons (SELECT name,age FROM A.names names A.age age WHERE age.name = name.name);
Here's my question: When the data of database A has changed I want to run an "update" on the table of database B.
So, the easy and dirty approach would be: Truncate the table in database B, re-run the query on database A and copy the result back to database B.
But isn't there a smarter way so that only new result rows of that query will be copied and those entries in database B which are not in database A anymore get deleted?
In short: Is there a way to "augment" the table of database B with new entries and "prune" old entries out?
Thanks for your help
I would do two things:
1) Ensure you have a primary key that's either an integer or a unique combination of columns at a minimum in database B
2) Use logical deletes instead of physical deletes i.e. have a boolean deleted column
Point 2 ensures you never have to delete and lose data, you just update the flag and in your queries put where deleted = 0 or where deleted is null.
When combined with a primary key it means everything can be handled easily by an INSERT ... WITH DUPLICATE KEY which will insert new rows and update existing ones - which means it can perform your 'deletes' at the same time too.
What you describe sounds like you want to replicate the table. There is no simple quick fix for what you describe. You could of course write some application logic to do it but it would not be so efficient as it would have to compare each entry in each table and then delete or update accordingly.
One solution would be to setup a foreign-key index between A and B and cascade updates and deletes to B. But this would only partly solve the problem. It would drop rows in B if they were deleted in A and it would update a key column in B if it were updated in A. But it would not update the other columns. Note also that this would require your table type to be INNODB.
Another would be to run inserts on B with A's values but use
INSERT ON DUPLICATE KEY UPDATE....
Again this would work fine for updates but not for Deletes.
You could try to setup actual MySQL replication but this is perhaps beyond the scope of your problem and is more involved.
Finally you could set up the foreign key index as described above and write a trigger that whenever an updates is applied to A then the corresponding key row in B is also updated. This seems like a plausible solution for you while not the cleanest I would admit.
It would seem that a small batch script run periodically on which ever environment your running on to duplicate the table would be the best to achieve what you are looking for.

How do I remove redundant Primary Keys from a MYSqL Table?

Ihave been developing an app for some time. This involves entering and deleteing alot of useless data in the tables. Now that I want to go to production I want to get rid of all the data but also restore all the 'IDs' ( primary keys ) to 0 so that the live system can start fresh with sensible ID's like 1,2,3 etc.
Using MySQL and PHP / Codeigniter
Many Many Thanks for yoru help !
I would normally use TRUNCATE - this both removes the data and resets the AUTO_INCREMENT.
Note that MySQL will perform a row by row deletion if there is a foreign key relationship, which is quite convenient (compared to SQL Server).
If your pk is autoincrement, you can do
ALTER TABLE tbl AUTO_INCREMENT =1
Make sure table is empty before executing the query.