Recording MySQL DELETE statements - mysql

We have a MySQL->Oracle ETL using Informatica that works great for all statements except DELETE. Unfortunately, the DELETE makes the record go away such that Informatica never sees it again to remove/expire it in Oracle.
How have people gone about recording MySQL DELETE statements?
The tables are InnoDB (ACID-compliant) with unique primary keys on all records (auto_increment). We're using the open-source MySQL on Windows.
We'd prefer not to use a general query log for performance reasons. We'd also prefer to keep the stock MySQL binary and not recompile our own special DELETE statement.

A possible solution is to never delete anything from your database. I avoid deleting from the database because then the information is lost forever. I prefer to mark information as invalid or obsolete by adding an appropriate column to the table.
Another similar solution is to use a trigger to insert the record you want to delete into an audit table and then delete the information. Use the insert with Informatica to do the same thing on the Oracle side.

Related

CDC not working for update data on SQL Server 2014 SP2

Please help me to fix this problem. I use SQL Server 2014 Service Pack 2. I have enabled the CDC on my database and table.
It worked when I did INSERT and DELETE operations (the tracking records were added to the CDC table) but the problem is: when I did UPDATE operation, there is nothing added to the CDC table.
So, what should I handle or fix with this problem?
Are you sure the table has nothing?
1)
It depends. In certain implementations, when for example, you update the column used in clustered index when it is not primary key, the UPDATE statement is treated by system as combination of DELETE/INSERT operations, so you may see one DELETE and one INSERT operation in such cases.
Note that such INSERT/DELETE operations may be in incorrect order (this CDC bug is confirmed by MS, they're fixing it at the moment), so Get Net Changes function may return incorrect results (duplicate or missing rows).
2) What kind of updates did you perform? If you update the column to the same value, the change won't be there. Please post your scripts to better understand the problem.

How to subscribe to update,delete and inserts on a mysql table?

I would like to get a notification when in certain mysql (or mariadb) tables (innodb) updates,inserts or deletes happen.
I need to track these changes from another process as soon as possible,
I was thinking maybe I could subscribe to the mysql binary log?
Can somebody explain how this can be done?
Is there for example a log read API that mysql offers?
Does the game change when I use a Galera cluster?
You may use mysqlbinlog with --stop-never option to get all insert, update, and delete statements (mysqlbinlog documentation).
You may use the C++ library MySQL Replication Listener that is based on the binlog api.
I don't know if this will help you, but I like to use a separate table to track the changes. If I have a table called "site_visitors", I'll create another table called "site_visitors_log" that is immediately written to with the information I need (IP addresses, timestamp, etc.) right after data is inserted into "site_visitors". Very convenient.
TRIGGER is your friend here. From MySQL-Doc:
A trigger is defined to activate when a statement inserts,
updates, or deletes rows in the associated table
See MySQL-Doc here, there are some examples, too.

Best way to conditionally insert using triggers

I want to create a SQL trigger that inserts a new row if and only if it passes a given condition. I can think of a couple ways to do this, but I'm not sure which is the best or correct way.
Do an AFTER INSERT trigger and then delete the new row if it fails the condition.
Do a BEFORE INSERT trigger and raise an application error if it fails.
???
Option 1 creates a race condition. I would avoid that explicitly.
Option 2 is likely to cause significantly slower INSERTs, but can work.
Option 3 is a stored procedure, but you'll probably need to call the proc for each row inserted, and unless you set up security correctly you may not actually prevent users from inserting data directly.
Option 4 is to insert everything into a staging or transaction table, and then use a broker or procedure with queries or views to move only valid data to the live table. This is extremely old school and relatively nasty, since you're not using an RDBMS like a modern RDBMS anymore. Expect lots of problems with key violation issues and synchronization. And you have the same security problem as Option 3. This method is usually only used today for bulk import and export.
Option 5 is to validate your data in the application instead of the DB. This will work, but runs into problems when your customers try to use your RDBMS like an RDBMS. Then you hit the same security problem as Option 3. It won't actually fix problems or prevent storage of invalid data by programs outside your application.
Option 6 is to use an RDBMS that supports CHECK constraints, which is just about everything not MySQL or MariaDB. MS SQL Server, Oracle, DB2, PostgreSQL, even MS Access and SQLite support CHECK constraints. It's moderately ridiculous that MySQL doesn't.

One-way database sync to MySQL

I have an VFP based application with a directory full of DBFs. I use ODBC in .NET to connect and perform transactions on this database. I want to mirror this data to mySQL running on my webhost.
Notes:
This will be a one-way mirror only. VFP to mySQL
Only inserts and updates must be supported. Deletes don't matter
Not all tables are required. In fact, I would prefer to use a defined SELECT statement to only mirror psuedo-views of the necessary data
I do not have the luxury of a "timemodified" stamp on any VFP records.
I don't have a ton of data records (maybe a few thousand total) nor do I have a ton of concurrent users on the mySQL side, want to be as efficient as possible though.
Proposed Strategy for Inserts (doesn't seem that bad...):
Build temp table in mySQL, insert all primary keys of the VFP table/view I want to mirror
Run "SELECT primaryKey from tempTable not in (SELECT primaryKey from mirroredTable)" on mySQL side to identify missing records
Generate and run the necessary INSERT sql for those records
Blow away the temp table
Proposed Strategy for Updates (seems really heavyweight, probably breaks open queries on mySQL dropped table):
Build temp table in mySQL and insert ALL records from VFP table/view I want to mirror
Drop existing mySQL table
Alter tempTable name to new table name
These are just the first strategies that come to mind, I'm sure there are more effective ways of doing it (especially the update side).
I'm looking for some alternate strategies here. Any brilliant ideas?
It sounds like you're going for something small, but you might try glancing at some replication design patterns. Microsoft has documented some data replication patterns here and that is a good starting point. My suggestion is to check out the simple Move Copy of Data pattern.
Are your VFP tables in a VFP database (DBC)? If so, you should be able to use triggers on that database to set up the information about what data needs to updated in MySQL.

SQL Server / MySQL / Access - speeding up inserting many rows in an inefficient manner

SETUP
I have to insert a couple million rows in either SQL Server 2000/2005, MySQL, or Access. Unfortunately I don't have an easy way to use bulk insert or BCP or any of the other ways that a normal human would go about this. The inserts will happen on one particular database but that code needs to be db agnostic -- so I can't do bulk copy, or SELECT INTO, or BCP. I can however run specific queries before and after the inserts, depending on which database I'm importing to.
eg.
If IsSqlServer() Then
DisableTransactionLogging();
ElseIf IsMySQL() Then
DisableMySQLIndices();
End If
... do inserts ...
If IsSqlServer() Then
EnableTransactionLogging();
ElseIf IsMySQL() Then
EnableMySQLIndices();
End If
QUESTION
Are there any interesting things I can do to SQL Server that might speed up these inserts?
For example, is there a command I could issue to tell SQL Server, "Hey, don't bother recording these transactions in the transaction log".
Or maybe I could say, "Hey, I have a million rows coming in, so don't update your index until I'm totally finished".
ALTER INDEX [IX_TableIndex] ON Table DISABLE
... inserts
ALTER INDEX [IX_TableIndex] ON Table REBUILD
(Note: Above index disable only works on 2005, not 2000. Bonus points if you know a way to do this on 2000).
What about MySQL, and Access?
The single biggest thing that will kill performance here is the fact that (it sounds like) you're executing a million different INSERTs against the DB. Each INSERT is treated as a single operation. If you can do this as a single operation, then you will almost certainly have a huge performance improvement.
Both MySQL and SQL Server support 'selects' of constant expressions without a table name, so this should work as one statement:
INSERT INTO MyTable(ID, name)
SELECT 1, 'Fred'
UNION ALL SELECT 2, 'Wilma'
UNION ALL SELECT 3, 'Barney'
UNION ALL SELECT 4, 'Betty'
It's not clear to me if Access supports that, not having Access available. HOWEVER, Access does support constants in a SELECT, as far as I can tell, and you can coerce the above into ANSI SQL-92 (which should be supported by all 3 engines; it's about as close to 'DB agnostic' as you'll get) by just adding
FROM OneRowTable
to the end of every individual SELECT, where 'OneRowTable' is a table with just one row of dummy data.
This should let you insert a million rows of data in much much less than a million INSERT statements -- and things like index reshuffling will be done once, rather than a million times. You may have much less need for other optimisations after that.
is this a regular process or a one time event?
I have, in the past, just scripted out the current indexes, dropped them, inserted the rows, then just re-add the indexes.
The SQL Management Studio can script out the indexes from the right click menus...
For SQL Server:
You can set the recovery model to "Simple", so your transaction log will be kept small. Do not forget to set back afterwards.
Disabling the indexes is actually a good idea. This will work on SQL 2005, not on SQL Server 2000.
alter index [INDEX_NAME] on [TABLE_NAME] disable
And to enable
alter index [INDEX_NAME] on [TABLE_NAME] rebuild
And then just insert the rows one by one. You have to be patient, but at least it is somewhat faster.
If it is a one-time thing (or it happens often enough to justify automating this), also considering dropping/disabling all indexes, and then adding/reenabling them again when the insert it done
The trouble with setting the recovery model to simple is that it affects any other users entering data at the same time and thus will amke thier changes unrecoverable.
Samre thing with disabling the indexes, this disables for everyone and may make the database run slower than a slug.
Suggest you run the import in batches.
If this is not something that needs to be read terribly quickly, you can do an "Insert Delayed" into the table on MySQL. This allows your code to continue running without having to wait for the insert to actually happen. This does have some limitations, but if your primary concern is to get the program to finish quickly, this may help. Be warned that there is a nice long list of situations where this may not act as expected. Check the docs.
I do not know if this functionality works for Access or MS SQL, though.
Have you considered using the Factory pattern? I'm guessing you're writing the code for this, so if using the factory pattern you could code up a factory that returned a concrete "IDataInserter" type class that would do the work for.
This would still allow you to be data agnostic and get the fastest method for each type of database.
SQL Server 2000/2005, MySQL, and Access can all load directly from a tab / cr text file they just have different commands to do it. If you've got the case statement to determine which DB you're importing into just figure out their preference for importing a text file.
Can you use DTS (2000) or SSIS (2005) to build a package to do this? DTS and SSIS can both pull from the same source and pipe out to the different potential destinations. Go for SSIS if you can. There's a lot of good, fast technology in there along with functionality to embed the IsSQLServer, IsMySQL, etc. logic.
It's worth considering breaking your inserts into smaller batches; a single transaction with lots of queries will be slow.
You might consider using SQL's bulk-logged recovery model during your bulk insert.
http://msdn.microsoft.com/en-us/library/ms190422(SQL.90).aspx
http://msdn.microsoft.com/en-us/library/ms190203(SQL.90).aspx
You might also disable the indexes on the target table during your inserts.