Augment and Prune a MySQL table

Augment and Prune a MySQL table - mysql

I need a little advice concerning a MySQL operation:
There is a database A wich yields several tables. With a query I selected a set of entries out of this database to copy these results into another table of database B.
Now the table in database B contains the results of my query on database A.
For instance the query is:
SELECT names.name,ages.age FROM A.names names A.ages ages WHERE ages.name = name.name;
And to copy these results into database B I would run:
INSERT INTO B.persons (SELECT name,age FROM A.names names A.age age WHERE age.name = name.name);
Here's my question: When the data of database A has changed I want to run an "update" on the table of database B.
So, the easy and dirty approach would be: Truncate the table in database B, re-run the query on database A and copy the result back to database B.
But isn't there a smarter way so that only new result rows of that query will be copied and those entries in database B which are not in database A anymore get deleted?
In short: Is there a way to "augment" the table of database B with new entries and "prune" old entries out?
Thanks for your help

I would do two things:
1) Ensure you have a primary key that's either an integer or a unique combination of columns at a minimum in database B
2) Use logical deletes instead of physical deletes i.e. have a boolean deleted column
Point 2 ensures you never have to delete and lose data, you just update the flag and in your queries put where deleted = 0 or where deleted is null.
When combined with a primary key it means everything can be handled easily by an INSERT ... WITH DUPLICATE KEY which will insert new rows and update existing ones - which means it can perform your 'deletes' at the same time too.

What you describe sounds like you want to replicate the table. There is no simple quick fix for what you describe. You could of course write some application logic to do it but it would not be so efficient as it would have to compare each entry in each table and then delete or update accordingly.
One solution would be to setup a foreign-key index between A and B and cascade updates and deletes to B. But this would only partly solve the problem. It would drop rows in B if they were deleted in A and it would update a key column in B if it were updated in A. But it would not update the other columns. Note also that this would require your table type to be INNODB.
Another would be to run inserts on B with A's values but use
INSERT ON DUPLICATE KEY UPDATE....
Again this would work fine for updates but not for Deletes.
You could try to setup actual MySQL replication but this is perhaps beyond the scope of your problem and is more involved.
Finally you could set up the foreign key index as described above and write a trigger that whenever an updates is applied to A then the corresponding key row in B is also updated. This seems like a plausible solution for you while not the cleanest I would admit.
It would seem that a small batch script run periodically on which ever environment your running on to duplicate the table would be the best to achieve what you are looking for.

Related

Pentaho Kettle (Spoon) - Delete Records From Different Tables

I'm trying to delete records in my target table based on whether records exist in source table. I tried using a 'Delete' step, but I noticed that this step is based on a conditional clause.
My condition is quite simple "if the record/row does NOT exist in table A [source], delete the record/row from table B [destination]".
I also read about using a 'Merge Rows (diff)' step, but it seems to check/compare the entire set of tables for differences.
The table has several million records with many hundreds of columns in a MySQL server, I need to do this in the most efficient way.
I'm doing a search of table A with the Table input object and sql command:
'' ' SELECT I went , user , password , attribute , op FROM viewuserradiusunisulma
Any help would be appreciated.
print - image screen pentaho transformation
Transformation
Delete Pentaho

if your source and target table are in the same database, you can use a SQL query to delete all records in tableB that don't have a corresponding entry in tableA:
delete tableB where not exists (select id from tableA where id = tableB.id)
if source and destination tables are not in the same database, you would have to go through all rows in tableB and check whether the record exists in tableA. If your source tableA has a limited number of rows, loading the key values in memory and then performing a stream lookup instead of a database lookup would be much faster. I'd probably try that even with higher number of rows because of the significant performance impact.
note: I hope I haven't messed up the sql syntax, I'm thinking almost exclusively in abap at the moment and that messes with my memory a bit. So please test this on some backup before firing away.

I found the solution. In this case, I check the records, then report, update and enter the new data
Trasnsformation

Merging databases - adding rows that not existing in one db to another

I have 2 databases from Wordpress website.
There was happenned issue and 50% of my posts dissapeared.
I have database 1 copy from 03.03.21
And existing database 2 of website from 24.03.21
So in database 1 i have many posts thats was deleted
And the database 2 has some new posts that not exist in older database 1
Is there any software or a way to merge these 2 database.
To compare databases and add entries to the newer database that are in the older database?
I could do this manullay but one post has entries in a many tables and its gonna be hard to recover deleted posts

There is no easy solution but you could try to make a "merge" locally for testing purposes.
Here's how I would do it, I can't guarrantee it will work.
1. Load the oldest backup into the server, let's say in a database named merge_target.
2. Load the 2nd backup (the most recent one) into the same server, let's say in a merge_source database.
3. Define a logical order to execute the merge for each table, this depends on the presence of foreign keys:
If a table A has a foreign key referencing table B, then you will need to merge table B before table A.
This may not work depending on your database structure (and I never worked with WordPress myself).
4. Write and execute queries for each table, with some rules:
SELECT from the merge_source database
INSERT into the merge_target database
if a row already exists in merge_target (i.e. they have the same primary key or unique key), you can use MySQL features depending on what you want to do:
INSERT ON DUPLICATE KEY UPDATE if the existing row should be updated
INSERT IGNORE if the row should just be skipped
REPLACE if you really need to delete and re-insert the row
This could look like the following query (here with ON DUPLICATE KEY UPDATE):
INSERT INTO merge_target (col_a, col_b, col_c)
SELECT
col_a
, col_b
, col_c
FROM merge_source
ON DUPLICATE KEY UPDATE
merge_target.col_b = merge_source.col_b
Documentation:
INSERT ... SELECT
ON DUPLICATE KEY UPDATE
REPLACE
INSERT IGNORE is in the INSERT documentation page
Not sure it will help but I wrote a database migration framework in PHP, you can still take a look: Fregata.

re-inserting a table record and updating an auto increment primary index

I'm running MariaDB 5.5.56.
I'm looking to copy an entire row in a database, change one column, then insert the entire row back into the original database (I don't want to have to specify the individual fields because there's a lot of them). The problem I'm running into is how to deal with an auto-increment/primary key column.
example:
create temporary table t_ownership like ownership;
insert into t_ownership (select * from ownership where name='x' LIMIT 1);
update t_ownership set id='something else';
insert into ownership (select * from t_ownership);
I have a column "recno" that is an auto-increment that will create a collision in the database when I try to re-insert the slightly changed record back into the original table.
Something like this seems to work but doesn't result in an insert:
insert into ownership (select * from t_ownership) ON DUPLICATE KEY UPDATE recno=LAST_INSERT_ID(ownership.recno);
The above statement executes without error but does not add a row to table ownership.
So I think I'm close but not quite there...
What would be the best way to do this? I'd like to avoid doing an insert where I manually specify field/values. I just need to regenerate a new A.I. recno column on the insert.

NULL values inserted into auto-incremented fields end up just getting the next auto-increment value, behaving equivalent to INSERTing without specifying the field; so you should be able to update the source (temp copy) to have NULL for that field.
However, one potential issue that could present itself in scenarios like yours is that the CREATE TEMPORARY TABLE ... LIKE could result in a table that would not allow you to set such fields to NULL; this would require you to either ALTER the temporary table, or create it in a more explicit manner. Either way, it now makes code/queries that do not specify columns even more reliant on knowing columns.
Personally, I would take this route in the first place.
INSERT INTO theTable([list all but the auto-inc column])
SELECT [list all but the auto-inc column, with any replacements or modifications desired]
FROM ...[original query]...
It accomplishes the task in one query, makes the queries more self documenting, and only at the cost of a little typing (most of which a decent database browser, or query builder, will do for you).
The only argument really in favor of your current approach is that the table involved can be changed without necessarily breaking your queries; but that begs the question of whether it would be better for such table changes to break the queries, forcing them to be re-examined. If it is not an issue, it is a minor revision; but the alternative is queries that continue to be valid that have the potential to cause unexpected behavior due to copying information they were never intended to.

Why not to delete tho old row and insert updated row?

I have a table (MySql) that some rows need to be updated when a user desires.
i know the right way is just using Sql UPDATE statement and i don't speak about 'Which is faster? Delete and insert or just update!'. but as my table update operation needs more time to write a code (cause of table's relations) why i don't delete the old row and insert updated field?

Yes, you can delete and insert. but what keeps the record in your database if the program crash a moment before it can insert data to Database?
Update keeps this from happening. It keeps the data in your database and change the value that needed to be changed. Maybe it is complicated to use in your database, but you can certain that your record still safe.

finally i get the answer!
in a RDBMS system there are relations between records and one record might have some dependencies. in such situations you cannot delete and insert new record because foreign key constraint cause data lose. records dependent (ie user posts) to main record (ie an user record) will be deleted!
if there are situations that you don't have records dependencies (not as exceptions! but in data models nature) (like no-sql) and you have some problems in updating a record (ie file checking) you can use this approach.

MySQL: Best way to update a large table

I have a table with huge amount of data. The source of data is an external api. Every few hours, I need to sync the database so that the changes are up to date from the external api. I am doing a full sync (api doesn't allow delta sync).
While sync happens, I want to make sure that the data from the database is also available for read. So, I am following below steps:
I have a cloumn in the table which acts as a flag for whether or not data is readable. Only the data with flag set is marked for read.
I am inserting all the data from the api into the table.
Once all the data is written, I am deleting all the data in the table with flag set.
After deletion, I am updating the table and setting the flag for all the rows.
Table has around ~50 million rows and is expected to grow. There is a customerId field in the table. Sync usually happens based on customerId by passing it to the api.
My problem is, step 3 and 4 above are taking a lot of time. Queries are something like:
Step 3 --> delete from foo where customer_id=12345678 and flag=1
Step 4 --> update foo set flag=1 where customer_id=12345678
I have tried partitioning the table based on customer_id and it works great where customer_id has less number of rows but for some customer_id, the number of rows in each partition itself goes till ~5 million.
Around 90% of data doesn't change between two syncs. How can I make this fast?
I was thinking of using just the update queries instead of insert queries and then check if there was any update. If not, I can issue an insert query for the same row. This way any updates will be taken care of along with the insert. But I am not sure if the operation will block read queries for this while update is in progress.

For your setup (read only data, full sync), the fastest way to update the table is to not update at all, but to import the data into a different table and to rename it afterwards to make it the new table.
Create a table like your original table, e.g. use
create table foo_import like foo;
If you have e.g. triggers, add them too.
From now on, let the import api write its (full) sync to this new table.
After a sync is done, swap the two tables:
RENAME TABLE foo TO foo_tmp,
foo_import TO foo,
foo_tmp to foo_import;
It will (literally) just require a second.
This command is atomic: it will wait for transactions that access these tables to finish, it will not present a situation where there is no table foo and it will completely fail (and not do anything) if one of the tables doesn't exist or foo_tmp already exists.
As a final step, empty your import table (that now contains your old data) to be ready for your next import:
truncate foo_import;
This will again just require a second.
The rest of your querys probably assume that flag=1. Until (if at all) you update the code to not use the flag anymore, you can set its default value to 1 to keep it compatible, e.g. use
alter table foo modify column flag tinyint default 1;
Since you don't have foreign keys, it doesn't have to bother you, but for others with a similar problem it might be useful to know that foreign keys will get adjusted, so foreign keys that are referencing foo will reference foo_import after renaming the tables. To make them point to the new table foo again, they have to be dropped and recreated. Everything else (e.g. views, queries, procedures) will resolve by the current name, so they will always access the current foo.

CREATE TABLE new LIKE real;
Load `new` by whatever means you have; take as long as needed.
RENAME TABLE real TO old, new TO real;
DROP TABLE old;
The RENAME is atomic and "instantaneous"; real is "always" available.
(I don't see the need for flag.)
OR...
Since you are actually updating a chunk of a table, consider these...
If the chunk is small...
Load the new data into a tmp table
DELETE the old rows
INSERT ... SELECT ... to move the new rows in. (Having the new data already in a table is probably the fastest way to achieve this.)
If the chunk is big, and you don't want to lock the table for "too long", there are some other tricks. But first, is there some form of unique row number for each row for the customer? (I'm thinking about batch-moving a bunch or rows at a time, but need more specifics before spelling it out.)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008