MySQL/MariaDB: selected synchronize of tables between instances - mysql

I have to synchronize some tables from a MySql-Database to another (different server on different machine).
That transfer should only include special tables and only rows of that tables with a special characteristics (e.g. if a column named transfer is set to 1).
And it should be automatically/transparent, fast and work within short cycles (at least every 20s).
I tried different ways but none of them matched all requirements.
DB-synchronize with Galera works fine but does not exclude tables/rows.
mysqldump is not automatically (must be started) and does not exclude.
Is there no other way for that job beside doing it with some own code that runs permanently?

Those partial sync must be performed with specially created scheme.
Possible realization:
Test does your server instances supports The FEDERATED Storage Engine. By default this is allowed.
Test does destination server may access the data stored on source server using CREATE SERVER.
Create server attached to remote source server and needed remote database. Check that remote data is accessible.
On the destination server create an event procedure which is executed each 20s (and make it disabled firstly). I recommend you to create it in separate service database. In this event procedure execute the queries like
SET #event_start_timestamp = CURRENT_TIMESTAMP;
INSERT localDB.local_table ( {columns} )
SELECT {columns}
FROM serviceDB.federated_table
WHERE must_be_transferred
AND created_at < #event_start_timestamp;
UPDATE serviceDB.federated_table
SET must_be_transferred = FALSE
WHERE must_be_transferred
AND created_at < #event_start_timestamp;
Destination server sends according SELECT query to remote source server. It executes this query and sends the output back. Received rows are inserted. Then destination server sends UPDATE which drops the flag.
Enable Event scheduler.
Enable event procedure.
Ensure that your event procedure is executed fast enough. It must finish its work before the next firing. I.e. execute your code as regular stored procedure and check its execution time. Maybe you'd increase sheduling regularity time.
You may exclude such parallel firings using static flag in some service table created in your service database. If it is set (previous event have not finished its work) then event procedure 'd exit. I recommend you to perform this check anycase.
You must handle the next situation:
destination receives a row;
source updates this row;
destination marks this row as synchronized.
Possible solution.
The flag must_be_transferred should be not boolean but (unsigned tiny)integer, with the next values: 0 - not needed in sync; 1 - needed in sync; 2 - selected for copying, 3 - selected for copying but altered after selection.
Algo:
dest. updates the rows marked with non-zero value and set them to 2;
dest. copies the rows using the condition flag OR 2 (or flag IN (2,3));
dest. clears the flag using the expression flag XOR 2 and above condition.
src. marks altered rows as needed in sync using the expression flag OR 1.

Related

I don't use temp table in SSIS

I have problem when I used OLE-DB Source temp table in SSIS.
I create temp table in Execute T-SQL Statement Task and I change DelayValidation : True and RetainSameConnection : True . But problem is not solved .
Background
What's likely happening here is that the table does not exist at the moment. Temporary tables come in two variants: local and global.
A local temporary table, uses a name with a single sharp/pound/hash/octothorpe prepended to it i.e. #TEMP. The only query that can use that instance of the temporary table is the one that creates it. This is why the advice across the internet says you need to set RetainSameConnection to true to ensure the connection that created the table is reused in the data flow. Otherwise, you're at the mercy of connection pooling and maybe the same connection is used in both places, maybe not and believe me that's an unpleasant bit of randomness to try and debug. The reason for the DelayValidation on the data flow is that as the package starts, the engine will validate that all the data looks as expected before it does any work. As the precursor step is what gets the data flow task into the expected state, we need to tell the execution to only validate the task immediately before execution. Validation always happens, it's just a matter of when you pay the price.
A global temporary table is defined with a double sharp/etc sign prepended to it, ##TEMP. This is accessible by any process, not just the connection that created it. It will live until the creating connection goes away (or explicitly drops it).
Resolution
Once the package is designed (the metadata is established in the data flow), using local temporary table is going to work just fine. Developing it though, it's impossible to use a local temporary table as a source in the data flow. If you execute the precursor step, that connection will open up, create the temporary table and then the connection goes away as does the temporary table.
I would resolve this by the following steps
Copy the query in your Execute SQL Task into a window in SSMS and create your local temporary table as a global temporary table, thus ##TEMP.
Create an SSIS variable called SourceQuery of type String with a value of SELECT * FROM ##TEMP;
Modify the "Data access mode" from the OLE DB Source from "SQL Command" to "SQL Command from Variable" and use the variable User::SourceQuery
Complete the design of the Data Flow
Save the package to ensure the metadata is persisted
Change the query in our variable from referencing ##TEMP to #TEMP
Save again.
Drop the ##TEMP table or close the connection
Run the package to ensure everything is working as I expect it.
Steps 2, 3, and 6 in the above allows you to emulate the magician pulling the tablecloth out from underneath all the dishes.
If you were to manually edit the query in the data flow itself from ##TEMP to #TEMP, the validation fires and since there is no #TEMP table available, it'll report VS_NEEDSNEWMETADATA and likely doesn't let you save the package. Using a variable as the query source provides a level of indirection that gets us around the "validate-on-change"/reinitialize metadata step.

mysql define variable across all sessions

I have a table with a few variables (let's say tbl_param). I have to query tbl_param everytime I need to use one of them. I have a case where I need to use those variables inside triggers, so I have to query tbl_param each time those triggers are executed. But what if I am going to use the same variable for every user that connects to the db? It would be logical to set them only once, since it would not change often (only when the variable in question gets updated in tbl_param). I know I can set session variables, but it would not solve the problem as they would be acessible only for the duration of one connection. So when a new connection is made, I would need to query tbl_param again. Can I define, for instance, a new system variable that gets loaded when the server boots up and and that I could update it as tbl_param gets updated? Or is there another alternative?
There are system that can be defined in mysql.cnf (or mysql.ini) file; but this will require you to had file permissions on that file. On my local server (Ubuntu 20.04.2) it is in /etc/mysql/mariadb.conf.d/50-server.cnf. but this would not work on remote server; because we didn't have access to system files (etc).
I had found an alternative to this that would serve the purpose what you had in mind. SET a session variable (wait for it; i know session variables disappears in other sessions); but initialize it value to some value from a table. e.g initialize your session variable always on startup from a table (and update accordingly to table as required).
In case of using a PHP (MIS) to disable mysql triggers
To disable a trigger on some table for some specific record(s). instead of deleting the triggers and inserting the records and then recreating those triggers. just rewrite the trigger with a minor change. it would disable based on this session variable.
Then your MIS would always initialize a session variable to some value fetched from table. and based on this value skip or execute triggers.

Is there a way to communicate application context to a DB connection in non-Sybase DB servers (similar to set_appcontext in Sybase)?

Sybase has a way for the application to communicate "context" data - such as application's end-user name, etc.. - to the database connection session. Context data is basically just a set of key-value pairs which is stored/retrieved via set_appcontext/get_appcontext stored procs.
QUESTION:
Do other major DB servers (MSSQL/Oracle/MySQL) have a facility for communicating app context to the session similar to Sybase's set_appcontext?
Details:
One specific practical use of app context is when you have an application with middle tier connecting to the database as a very specific generic database user (examples include "webuser"/"http" for a web app back-end running on web server or "myappserver" user for an app server).
When that happens, we still want for the database session to know who the END user (e.g. actual user using the app client) is, either for access control or (more relevant to my interest), for an audit/history trigger to be able to determine which end user made the change and log that end user info into an audit table.
Please note that the info is set at the session level, which means that any inserts/updates/deletes executed within that session are able to use the context data without it being passed to each individual SQL statement - this is VERY important for, say, a trigger.
As a very specific example of why it's useful, let's say you have an app server starting a DB session on behalf of a client within which you insert/update/delete rows in 5 distinct tables. You want to have audit tables for each of those 5 tables, which include "which end user made each change" info.
Using context data, you can simply retrieve "end user" data from app context using the trigger and stored it as part of Audit table record. Without using the app context, your will need to (1) Add "end user" column to every one of those 5 tables (instead of to only audit tables) and (2) Change your app server to insert or set-on-update the value of that column in EVERY SQL statement that the app server issues. Oh, and this doesn't even get into how this can be done if you're deleting a row.
Oracle has a couple of different ways of accomplishing this. First off, you have the DBMS_APPLICATION_INFO package. Although you can use this to set arbitrary context information, it is generally used for tracing an application. You would normally set the module to be the name of the application and the action to be a description of the particular business process. You can then reference this information from V$SESSION and monitor long-running operations via V$SESSION_LONGOPS.
Oracle also has the ability to create a database object called a context. This is a more flexible way to populate session-level context. You can create a new context and then create whatever attributes you'd like within that context. And all of your code can simply reference the context. For example
SQL> create context my_ctx
2 using pkg_ctx;
Context created.
SQL> create package pkg_ctx
2 as
3 procedure set_context;
4 end;
5 /
Package created.
SQL> create or replace package body pkg_ctx
2 as
3 procedure set_context
4 as
5 begin
6 dbms_session.set_context( 'MY_CTX', 'USERNAME', 'Justin Cave' );
7 end;
8 end;
9 /
Package body created.
SQL> exec pkg_ctx.set_context;
PL/SQL procedure successfully completed.
SQL> select sys_context( 'MY_CTX', 'USERNAME' )
2 from dual;
SYS_CONTEXT('MY_CTX','USERNAME')
-------------------------------------------------------------------------------
Justin Cave
For PostgreSQL you can create a custom variable class which is a configuration setting in postgresql.conf. Something like this:
custom_variable_classes = 'myvars'
(Setting this requires a server restart if I'm not mistaken)
Now from through SQL you can read and write this in the following way:
set myvars.some_flag = 'true';
select current_setting('myvars.some_flag');
Note that you can "dynamically" defined new "variables" that are all prefixed with myvars. The individual values do not need to be declard in postgresql.conf
Originally this was intended for add-on modules to allow the definition of custom configuration options so it it a slight abuse of the feature but it should work nevertheless.

SQL Server: unique key for batch loads

I am working on a data warehousing project where several systems are loading data into a staging area for subsequent processing. Each table has a "loadId" column which is a foreign key against the "loads" table, which contains information such as the time of the load, the user account, etc.
Currently, the source system calls a stored procedure to get a new loadId, adds the loadId to each row that will be inserted, and then calls a third sproc to indicate that the load is finished.
My question is, is there any way to avoid having to pass back the loadId to the source system? For example, I was imagining that I could get some sort of connection Id from Sql Server, that I could use to look up the relevant loadId in the loads table. But I am not sure if Sql Server has a variable that is unique to a connection?
Does anyone know?
Thanks,
I assume the source systems are writing/committing the inserts into your source tables, and multiple loads are NOT running at the same time...
If so, have the source load call a stored proc, newLoadStarting(), prior to starting the load proc. This stored proc will update a the load table (creates a new row, records start time)
Put a trigger on your loadID column that will get max(loadID) from this table and insert as the current load id.
For completeness you could add an endLoading() proc which sets an end date and de-activates that particular load.
If you are running multiple loads at the same time in the same tables...stop doing that...it's not very productive.
a local temp table (with one pound sign #temp) is unique to the session, dump the ID in there then select from it
BTW this will only work if you use the same connection
In the end, I went for the following solution "pattern", pretty similar to what Markus was suggesting:
I created a table with a loadId column, default null (plus some other audit info like createdDate and createdByUser);
I created a view on the table that hides the loadId and audit columns, and only shows rows where loadId is null;
The source systems load/view data into the view, not the table;
When they are done, the source system calls a "sp__loadFinished" procedure, which puts the right value in the loadId column and does some other logging (number of rows received, date called, etc). I generate this from a template as it is repetitive.
Because loadId now has a value for all those rows, it is no longer visible to the source system and it can start another load if required.
I also arrange for each source system to have its own schema, which is the only thing it can see and is its default on logon. The view and the sproc are in this schema, but the underlying table is in a "staging" schema containing data across all the sources. I ensure there are no collisions through a naming convention.
Works like a charm, including the one case where a load can only be complete if two tables have been updated.

When a new row in database is added, an external command line program must be invoked

Is it possible for MySQL database to invoke an external exe file when a new row is added to one of the tables in the database?
I need to monitor the changes in the database, so when a relevant change is made, I need to do some batch jobs outside the database.
Chad Birch has a good idea with using MySQL triggers and a user-defined function. You can find out more in the MySQL CREATE TRIGGER Syntax reference.
But are you sure that you need to call an executable right away when the row is inserted? It seems like that method will be prone to failure, because MySQL might spawn multiple instances of the executable at the same time. If your executable fails, then there will be no record of which rows have been processed yet and which have not. If MySQL is waiting for your executable to finish, then inserting rows might be very slow. Also, if Chad Birch is right, then will have to recompile MySQL, so it sounds difficult.
Instead of calling the executable directly from MySQL, I would use triggers to simply record the fact that a row got INSERTED or UPDATED: record that information in the database, either with new columns in your existing tables or with a brand new table called say database_changes. Then make an external program that regularly reads the information from the database, processes it, and marks it as done.
Your specific solution will depend on what parameters the external program actually needs.
If your external program needs to know which row was inserted, then your solution could be like this: Make a new table called database_changes with fields date, table_name, and row_id, and for all the other tables, make a trigger like this:
CREATE TRIGGER `my_trigger`
AFTER INSERT ON `table_name`
FOR EACH ROW BEGIN
INSERT INTO `database_changes` (`date`, `table_name`, `row_id`)
VALUES (NOW(), "table_name", NEW.id)
END;
Then your batch script can do something like this:
Select the first row in the database_changes table.
Process it.
Remove it.
Repeat 1-3 until database_changes is empty.
With this approach, you can have more control over when and how the data gets processed, and you can easily check to see whether the data actually got processed (just check to see if the database_changes table is empty).
you could do what replication does: hang on the 'binary log'. setup your server as a 'master server', and instead of adding a 'slave server', run mysqlbinlog. you'll get a stream of every command that modifies your database.
step in 'between' the client and server: check MySQLProxy. you point it to your server, and point your client(s) to the proxy. it lets you interpose Lua scripts to monitor, analyze or transform any SQL command.
I think it's going to require adding a User-Defined Function, which I believe requires recompilation:
MySQL FAQ - Triggers: Can triggers call an external application through a UDF?
I think it's really a MUCH better idea to have some external process poll changes to the table and execute the external program - you could also have a column which contains the status of this external program run (e.g. "pending", "failed", "success") - and just select rows where that column is "pending".
It depends how soon the batch job needs to be run. If it's something which needs to be run "sooner or later" and can fail and need to be retried, definitely have an app polling the table and running them as necessary.