According to MySQL Performance Blog, the new Percona Servers, announced yesterday (May 6), both include the open source version of the MySQL Audit Plugin.
The task I want to accomplish is: log the tables affected by cascade trigger execution during one single update query run. E. g. when UPDATE MY_TABLE … is executed, the triggers {BEFORE,AFTER}_UPDATE may update other tables, on which there might be their own triggers, etc.
Currently I use the domestic solution; inside all triggers I put smth like:
IF (
SELECT count(*)
FROM `information_schema`.`ROUTINES`
WHERE specific_name = 'my_own_log'
AND routine_schema = 'my_schema'
) > 0 THEN
CALL my_own_log ('FOO_TRIGGER', 'Hi, I’m to update MY_TABLE') ;
END IF ;
In production I don’t have the my_own_log procedure defined and since the information_schema table is well-optimized, I don’t yield any performance penalties.
The question is if I could switch to enterprise solution (aforementioned audit plugin) to harvest an information about which tables were affected by cascade trigger execution. JFYI: the only similar question I have found here is not supplied with an applicable answer.
Thanks for any suggestions.
Plugin auditing is designed to register outside interactions with the server, being used to track invasion and other related activities, not interactions of the server with itself (like triggers and procedures).
These internal activities will not generate actions on any audit plugin by design. From the dev blog:
http://dev.mysql.com/doc/refman/5.6/en/audit-log-plugin-logging-control.html
The MySQL server calls the audit log plugin to write an element whenever an auditable event occurs, such as when it completes execution of an SQL statement received from a client. Typically the first element written after server startup has the server description and startup options. Elements following that one represent events such as client connect and disconnect events, executed SQL statements, and so forth. Only top-level statements are logged, not statements within stored programs such as triggers or stored procedures. Contents of files referenced by statements such as LOAD DATA INFILE are not logged.
For now, you are better with your homegrown solution. You could try to improve its performance so you can turn it on in the production environment.
Related
I would like to track all the DB changes happening on particular DB using one log table.
I have checked many solutions but they all give one audit table for each table in DB. How can we track them in one single table with the help of a trigger?
Table columns may have like :
id - primary key
db_name -- DB Name
version, -- Ignore it(i have a column in my table)
event_type, -- DDL/DML command name
object_name, -- Table/Procedure/Trigger/Function name which is changed
object_type, -- TYpe like table,procedure,trigger
sql_command, -- query executed by user
username, -- who executed it
updated_on -- timestamp
Thanks in advance.
A trigger that is called when ddl commands are executed (so you can log them) does not exist in mysql. But you may want to use logfiles, especially the The General Query Log:
The general query log is a general record of what mysqld is doing. The server writes information to this log when clients connect or disconnect, and it logs each SQL statement received from clients. The general query log can be very useful when you suspect an error in a client and want to know exactly what the client sent to mysqld.
The log is disabled by default, and enabling it may reduce performance a bit. And it will not include indirect changes (e.g. ddls executed inside a procedure).
If you can install a plugin, a slightly more configurable (and more performant) alternative would be to use an audit plugin, see MySQL Enterprise Audit, or any free implementation, e.g. this one, or you can write your own, but it will basically log the same things as the general log.
Another great source of information might be the information schema and the performance schema. From there you can collect basically every information you need (especially the log of recently executed queries) and generate your log table from that, but it would require some work to gather all the data you want - and it will not be triggered by actions, so you have to periodically check for changes yourself (e.g. compare the data in INFORMATION_SCHEMA.TABLES with a saved copy to keep track of added, deleted and renamed tables).
On the other hand, a periodically mysql_dump followed by a diff to the most recent version might be a lot easier.
I currently have a PostgreSQL database, because one of the pieces of software we're using only supports this particular database engine. I then have a query which summarizes and splits the data from the app into a more useful format.
In my MySQL database, I have a table which contains an identical schema to the output of the query described above.
What I would like to develop is an hourly cron job which will run the query against the PostgreSQL database, then insert the results into the MySQL database. During the hour period, I don't expect to ever see more than 10,000 new rows (and that's a stretch) which would need to be transferred.
Both databases are on separate physical servers, continents apart from one another. The MySQL instance runs on Amazon RDS - so we don't have a lot of control over the machine itself. The PostgreSQL instance runs on a VM on one of our servers, giving us complete control.
The duplication is, unfortunately, necessary because the PostgreSQL database only acts as a collector for the information, while the MySQL database has an application running on it which needs the data. For simplicity, we're wanting to do the move/merge and delete from PostgreSQL hourly to keep things clean.
To be clear - I'm a network/sysadmin guy - not a DBA. I don't really understand all of the intricacies necessary in converting one format to the other. What I do know is that the data being transferred consists of 1xVARCHAR, 1xDATETIME and 6xBIGINT columns.
The closest guess I have for an approach is to use some scripting language to make the query, convert results into an internal data structure, then split it back out to MySQL again.
In doing so, are there any particular good or bad practices I should be wary of when writing the script? Or - any documentation that I should look at which might be useful for doing this kind of conversion? I've found plenty of scheduling jobs which look very manageable and well-documented, but the ongoing nature of this script (hourly run) seems less common and/or less documented.
Open to any suggestions.
Use the same database system on both ends and use replication
If your remote end was also PostgreSQL, you could use streaming replication with hot standby to keep the remote end in sync with the local one transparently and automatically.
If the local end and remote end were both MySQL, you could do something similar using MySQL's various replication features like binlog replication.
Sync using an external script
There's nothing wrong with using an external script. In fact, even if you use DBI-Link or similar (see below) you probably have to use an external script (or psql) from a cron job to initiate repliation, unless you're going to use PgAgent to do it.
Either accumulate rows in a queue table maintained by a trigger procedure, or make sure you can write a query that always reliably selects only the new rows. Then connect to the target database and INSERT the new rows.
If the rows to be copied are too big to comfortably fit in memory you can use a cursor and read the rows with FETCH, which can be helpful if the rows to be copied are too big to comfortably fit in memory.
I'd do the work in this order:
Connect to PostgreSQL
Connect to MySQL
Begin a PostgreSQL transaction
Begin a MySQL transaction. If your MySQL is using MyISAM, go and fix it now.
Read the rows from PostgreSQL, possibly via a cursor or with DELETE FROM queue_table RETURNING *
Insert them into MySQL
DELETE any rows from the queue table in PostgreSQL if you haven't already.
COMMIT the MySQL transaction.
If the MySQL COMMIT succeeded, COMMIT the PostgreSQL transaction. If it failed, ROLLBACK the PostgreSQL transaction and try the whole thing again.
The PostgreSQL COMMIT is incredibly unlikely to fail because it's a local database, but if you need perfect reliability you can use two-phase commit on the PostgreSQL side, where you:
PREPARE TRANSACTION in PostgreSQL
COMMIT in MySQL
then either COMMIT PREPARED or ROLLBACK PREPARED in PostgreSQL depending on the outcome of the MySQL commit.
This is likely too complicated for your needs, but is the only way to be totally sure the change happens on both databases or neither, never just one.
BTW, seriously, if your MySQL is using MyISAM table storage, you should probably remedy that. It's vulnerable to data loss on crash, and it can't be transactionally updated. Convert to InnoDB.
Use DBI-Link in PostgreSQL
Maybe it's because I'm comfortable with PostgreSQL, but I'd do this using a PostgreSQL function that used DBI-link via PL/Perlu to do the job.
When replication should take place, I'd run a PL/PgSQL or PL/Perl procedure that uses DBI-Link to connect to the MySQL database and insert the data in the queue table.
Many examples exist for DBI-Link, so I won't repeat them here. This is a common use case.
Use a trigger to queue changes and DBI-link to sync
If you only want to copy new rows and your table is append-only, you could write a trigger procedure that appends all newly INSERTed rows into a separate queue table with the same definition as the main table. When you want to sync, your sync procedure can then in a single transaction LOCK TABLE the_queue_table IN EXCLUSIVE MODE;, copy the data, and DELETE FROM the_queue_table;. This guarantees that no rows will be lost, though it only works for INSERT-only tables. Handling UPDATE and DELETE on the target table is possible, but much more complicated.
Add MySQL to PostgreSQL with a foreign data wrapper
Alternately, for PostgreSQL 9.1 and above, I might consider using the MySQL Foreign Data Wrapper, ODBC FDW or JDBC FDW to allow PostgreSQL to see the remote MySQL table as if it were a local table. Then I could just use a writable CTE to copy the data.
WITH moved_rows AS (
DELETE FROM queue_table RETURNING *
)
INSERT INTO mysql_table
SELECT * FROM moved_rows;
In short you have two scenarios:
1) Make destination pull the data from source into its own structure
2) Make source push out the data from its structure to destination
I'd rather try the second one, look around and find a way to create postgresql trigger or some special "virtual" table, or maybe pl/pgsql function - then instead of external script, you'll be able to execute the procedure by executing some query from cron, or possibly from inside postgres, there are some possibilities of operation scheduling.
I'd choose 2nd scenario, because postgres is much more flexible, and manipulating data some special, DIY ways - you will simply have more possibilities.
External script probably isn't a good solution, e.g. because you will need to treat binary data with special care, or convert dates× from DATE to VARCHAR and then to DATE again. Inside external script, various text-stored data will be probably just strings, and you will need to quote it too.
How are triggers implemented inside a SQL database engine? I am not referring to the SQL language-level trigger definitions but rather their underlying implementations inside Oracle, SQL Server, MySQL, etc. How can the database engine scalably manage hundreds or thousands of triggers? Do they use a publish-subscribe model like with an observer/listener pattern? Any pointers to relevant literature on the subject would also be appreciated.
I did google for "database trigger implementation" but all I found was information on SQL trigger definitions, which again is not want I'm looking for.
Triggers are callbacks, so the implementation can be as simple as function pointers in C. Normally, a user is not expected writing user-defined procedural code in the RDBMS in C, though. You would need to support some other "higher-level" language. So the relevant programming pattern is DSL. The number of triggers (scalability) itself is not a problem because there is usually only one, max two per table and DML event triggers only these. The implementation challenge is elsewhere: in the areas of consistency, concurrency semantics.
You can explore source codes of open source databases.
For example PostreSql's trigger.
First off, triggers are pieces of code that are run when a particular event (e.g. INSERT/UPDATE/DELETE on a particular table) occurs in the database. Triggers are executed implicitly BEFORE or AFER the DML statement and triggers cannot be executed explicitly like stored procedures.
There are also two types of triggers - STATEMENT LEVEL triggers and ROW LEVEL triggers.
The STATEMENT LEVEL triggers are fired BEFORE or AFTER a statement is executed.
The ROW LEVEL triggers are fired BEFORE or AFTER an operation is performed on each individual row affected by the operation.
So we have 12 types of triggers:
1. BEFORE INSERT STATEMENT
2. BEFORE INSERT ROW
3. AFTER INSERT STATEMENT
4. AFTER INSERT ROW
5. BEFORE UPDATE STATEMENT
6. BEFORE UPDATE ROW
7. AFTER UPDATE STATEMENT
8. AFTER UPDATE ROW
9. BEFORE DELETE STATEMENT
10. BEFORE DELETE ROW
11. AFTER DELETE STATEMENT
12. AFTER DELETE ROW
Multiple triggers can be coded for an event with their order of precedence of execution mentioned.
Whenever we run a DML query (INSERT/UPDATE/DELETE) on a database, that query is run in a transaction. Hence when a query runs -
The table is locked
The DBMS checks for triggers that run BEFORE the statement is to be executed
Execute the actual SQL statement row-by-row.
The BEFORE trigger for EACH ROW is looked for. If found, executed.
Check for errors. If any, rollback the changes made by the statement or its triggers.
Any AFTER EACH ROW triggers are found and executed.
Any AFTER STATEMENT triggers are found and executed.
Different DBMS manage transactions differently. Refer to their documentation for details.
Many DBMS keep the triggers in text format only, not like stored procedures that are compiled.
It is best practice to call stored procedures from inside a trigger body as stored procedures are much faster performers than triggers.
A really weird (for me) problem is occurring lately. In an application that accepts user submitted data the following occurs at random:
Rows from the Database Table where the user submitted data is stored are disappearing.
Please note that there is NO DELETE, DROP, TRUNCATE or other SQL statement issued on the database table except from the INSERT statement.
Could this be a bug of Mysql? Did some research on mysql.com (forums, bugs, etc) and found 2 similar cases but without getting a solid answer (just suggestions).
Some info you might find useful:
Storage Engine: InnoDB
User Submitted Data sanitized and checked for SQL Injection attempts
Appreciate any suggestions, info.
regards,
Here's 3 possibilities:
The data never got to the database in the first place. Something happened elsewhere so the data disappeared. Maybe intermitten network issues, overloaded server, application bug.
A database transaction was not commited, and got rolled back. Maybe a bug in your application code, maybe some invalid data screwd things up, maybe a concurrency exception occured etc.
A bug in mysql.
I'd look at 1. and 2. first.
A table on which you only ever insert (and presumably select) and never update or delete should be really stable. Are you absolutely certain you're protecting thoroughly against SQL injection attacks? Because those could (of course) delete rows and such if successful.
You haven't mentioned which table engine you're using (there are several), but it's well worth running whatever diagnostic tools there are for it on the table in question. For instance, on a MyISAM table, run myisamchk. Or more generically (this works for several table types), use the CHECK TABLE statement.
Have you had issues with the underlying storage? It may be worth checking for those.
Activating binlog and periodically monitoring DELETE queries can help to identify the culprit.
One more case to fullfill the above. There could also be the case of client-side and server-side parts of application. Client-side initiated changes can be processed on the server side with additional code logics.
For example, in our case, local admin panel updated an order information with pay_date = NULL and php-website processed this table to clean-up overdue orders from this table. As php logics were developed by another programmer, it looked strange when orders update resulted in records to disappear after some time.
The same refers to crone operations, working on mysql database in a schedule.
I am running a couple of databases on MySQL 5.0.45 and am trying to get my legacy database to sync with a revised schema, so I can run both side by side. I am doing this by adding triggers to the new database but I am running into problems with replication. My set up is as follows.
Server "master"
Database "legacydb", replicates to server "slave".
Database "newdb", has triggers which update "legacydb" and no replication.
Server "slave"
Database "legacydb"
My updates to "newdb" run fine, and set off my triggers. They update "legacydb" on "master" server. However, the changes are not replicated down to the slaves. The MySQL docs say that for simplicity replication looks at the current database context (e.g. "SELECT DATABASE();" ) when deciding which queries to replicate rather than looking at the product of the query. My trigger is run from the context of database "newdb", so replication ignores the updates.
I have tried moving the update statement to a stored procedure in "legacydb". This works fine (i.e. data replicates to slave) when I connect to "master" and manually run "USE newdb; CALL legacydb.do_update('Foobar', 1, 2, 3, 4);". However, when this procedure is called from a trigger it does not replicate.
So far my thinking on how to fix this has been one of the following.
Force the trigger to set a new current database. This would be easiest, but I don't think this is possible. This is what I hoped to achieve with the stored procedure.
Replicate both databases, and have triggers in both master and slave. This would be possible, but a pain to set up.
Force the replication to pick up all changes to "legacydb", regardless of the current database context.
If replication runs at too high a level, it will never even see any updates run by my trigger, in which case no amount of hacking is going to achieve what I want.
Any help on how to achieve this would be greatly appreciated.
This may have something to do with it:
A stored function acquires table locks before executing, to avoid inconsistency in the binary log due to mismatch of the order in which statements execute and when they appear in the log. Statements that invoke a function are recorded rather than the statements executed within the function. Consequently, stored functions that update the same underlying tables do not execute in parallel.
In contrast, stored procedures do not acquire table-level locks. All statements executed within stored procedures are written to the binary log.
Additionally, there are a whole list of issues with Triggers:
http://dev.mysql.com/doc/refman/5.0/en/routine-restrictions.html