How may I synchronize two mariadb databases from a date on? - mysql

I am trying to replicate my production database, and in the process clean some tables. In the new one, I got rid of old entries (moved to a "historical" db). This implied a lot of time and concatenation of deletes. Now that I have it as I wanted to, I need to synchronize the newest entries in the production database (which is running and cannot be stopped), but I don't want the old entries to come back. The intention is to switch between the two databases at some point.
I thought of using Navicat's Data Synchronization tool, but it wouldn't allow me to filter the entries based on dates (or any filter at all, for what it's worth). Can anyone suggest a method/tool to use for this task?
Thanks in advance.

Related

mysql automatic replication of partial data

I have to create a dashboard based on a table in mysql, and only on today datas
This db is used on a service with a massive data quantity, and continous read and write data, so I'd like to replicate in a "slave" instance part of this table (only today data).
Is it possible to do it in Mysql, without scripting?
Thanks
MySQL has no built-in feature to replicate a subset of rows. There are replication filters to replicate a subset of schemas or tables, but not rows.
One workaround could be to replicate fully to the replica, then on the replica delete any data that is more than one day old.
But this would work only for a database that is INSERT-only. If you also have UPDATE and DELETE operations replicated, they might find that they are trying to change rows that are missing. If you use ROW-based binary logs, this would result in a replication error when it can't find the row, and replication would stop.
It might work if you only use STATEMENT-based binary logs, but I've never tried it so I can't predict what other problems might occur. Also, you can't fully prevent ROW-based binary logs from occurring, because individual sessions can change their binary log format.
I think you're going to need a bespoke solution no matter what. Probably not using replication, but just an ETL job to query the current day's data and import it into another MySQL instance (not a replica).

Best way for incremental load in ssis

I am getting 600,000 rows daily from my source and I need to dump them into the SQL Server destination, which would be an incremental load.
Now, as the destination table size is likely to be increase day by day which would be the best approach for the incremental load. I have few options in my mind:
Lookup Task
Merge Join
SCD
etc..
Please suggest me the best option which will perform well in incremental load.
Look at Andy Leonard's excellent Stairway to Integration Services series or Todd McDermid's videos on how to use the free SSIS Dimension Merge SCD component Both will address how to do it right far better than I could enumerate in this box.
Merge join is a huge performance problem as it requires sorting of all records upfront and should not be used for this.
We process many multimillion record files daily and generally place them in a staging table and do a hash compare to the data in our Change data tracking tables to see if the data is different from what is on prod and then only load the new ones or ones that are different. Because we do the comparison outside of our production database, we have very little impact on prod becasue uinstead of checking millions of records against prod, we are only dealing with the 247 that it actually needs to have. In fact for our busiest server, all this processing happens on a separate server except for the last step that goes to prod.
if you only need to insert them, it doesnt actually matter.
if you need to check something like, if exists, update else insert, I suggest creating a oleDbSource where you query your 600.000 rows and check if they exist with a lookup task on the existing datasource. Since the existing datasource is (or tend to be) HUGE, be careful with the way you configure the caching mode. i would go with partial cache with some memory limit ordered by the ID you are looking up (this detais is very important based on the way the caching works)

Transfer MySQL data from machineX to machineY

I want to collect MySQL data from 10 different machines and aggregate into a one big MySQL db on a different machine. All machines are Linux based.
What is the "mysqldump" syntax if I want to do this periodically to collect only the "delta" data?
Are there any other ways to achieve this?
This isn't natively supported in MySQL. You could use replication, but a replica can have only a single master, not 10 masters. I know of two workable options:
1) is to script something up that switches the replica between masters in a round-robin fashion. You might wish to refer to http://code.google.com/p/mysql-mmre/ or http://thenoyes.com/littlenoise/?p=117.
2) is to use an ETL tool.
If you get stuck, we (Percona) can help you. This is a common request, but not an easy one, because each case is different.
mysqldump can't generate incremental backups, as it doesn't have any way of determining which rows (or what parts of the schema!) have changed since the last backup, or indeed even when the last backup was. For that you'd need something which could read the MySQL binlog and convert it into a bunch of INSERT/UPDATE/DELETE statements; I'm not aware of anything that exists quite like that.
The current "state of the art" in MySQL backups is generally considered to be Percona XtraBackup.
Multiple Master Slave? Have each of the 10 as Masters, and the aggregate a slave to all 10. This assumes that the data you are aggregating is different on each of the 10. If the data is the same (or similar) on all 10 and you want to interleave it as well as integrate it then this won't work.

What is the best way to update (or replace) an entire database table on a live machine?

I'm being given a data source weekly that I'm going to parse and put into a database. The data will not change much from week to week, but I should be updating the database on a regular basis. Besides this weekly update, the data is static.
For now rebuilding the entire database isn't a problem, but eventually this database will be live and people could be querying the database while I'm rebuilding it. The amount of data isn't small (couple hundred megabytes), so it won't load that instantaneously, and personally I want a bit more of a foolproof system than "I hope no one queries while the database is in disarray."
I've thought of a few different ways of solving this problem, and was wondering what the best method would be. Here's my ideas so far:
Instead of replacing entire tables, query for the difference between my current database and what I want to place in the database. This seems like it could be an unnecessary amount of work, though.
Creating dummy data tables, then doing a table rename (or having the server code point towards the new data tables).
Just telling users that the site is going through maintenance and put the system offline for a few minutes. (This is not preferable for obvious reasons, but if it's far and away the best answer I'm willing to accept that.)
Thoughts?
I can't speak for MySQL, but PostgreSQL has transactional DDL. This is a wonderful feature, and means that your second option, loading new data into a dummy table and then executing a table rename, should work great. If you want to replace the table foo with foo_new, you only have to load the new data into foo_new and run a script to do the rename. This script should execute in its own transaction, so if something about the rename goes bad, both foo and foo_new will be left untouched when it rolls back.
The main problem with that approach is that it can get a little messy to handle foreign keys from other tables that key on foo. But at least you're guaranteed that your data will remain consistent.
A better approach in the long term, I think, is just to perform the updates on the data directly (your first option). Once again, you can stick all the updating in a single transaction, so you're guaranteed all-or-nothing semantics. Even better would be online updates, just updating the data directly as new information becomes available. This may not be an option for you if you need the results of someone else's batch job, but if you can do it, it's the best option.
BEGIN;
DELETE FROM TABLE;
INSERT INTO TABLE;
COMMIT;
Users will see the changeover instantly when you hit commit. Any queries started before the commit will run on the old data, anything afterwards will run on the new data. The database will actually clear the old table once the last user is done with it. Because everything is "static" (you're the only one who ever changes it, and only once a week), you don't have to worry about any lock issues or timeouts. For MySQL, this depends on InnoDB. PostgreSQL does it, and SQL Server calls it "snapshotting," and I can't remember the details off the top of my head since I rarely use the thing.
If you Google "transaction isolation" + the name of whatever database you're using, you'll find appropriate information.
We solved this problem by using PostgreSQL's table inheritance/constraints mechanism.
You create a trigger that auto-creates sub-tables partitioned based on a date field.
This article was the source I used.
Which database server are you using? SQL 2005 and above provides a locking method called "Snapshot". It allows you to open a transaction, do all of your updates, and then commit, all while users of the database continue to view the pre-transaction data. Normally, your transaction would lock your tables and block their queries, but snapshot locking would be perfect in your case.
More info here: http://blogs.msdn.com/craigfr/archive/2007/05/16/serializable-vs-snapshot-isolation-level.aspx
But it requires SQL Server, so if you're using something else....
Several database systems (since you didn't specify yours, I'll keep this general) do offer the SQL:2003 Standard statement called MERGE which will basically allow you to
insert new rows into a target table from a source which don't exist there yet
update existing rows in the target table based on new values from the source
optionally even delete rows from the target that don't show up in the import table anymore
SQL Server 2008 is the first Microsoft offering to have this statement - check out more here, here or here.
Other database system probably will have similar implementations - it's a SQL:2003 Standard statement after all.
Marc
Use different table names(mytable_[yyyy]_[wk]) and a view for providing you with a constant name(mytable). Once a new table is completely imported update your view so that it uses that table.

Best way to archive live MySQL database

We have a live MySQL database that is 99% INSERTs, around 100 per second. We want to archive the data each day so that we can run queries on it without affecting the main, live database. In addition, once the archive is completed, we want to clear the live database.
What is the best way to do this without (if possible) locking INSERTs? We use INSERT DELAYED for the queries.
http://www.maatkit.org/ has mk-archiver
archives or purges rows from a table to another table and/or a file. It is designed to efficiently “nibble” data in very small chunks without interfering with critical online transaction processing (OLTP) queries. It accomplishes this with a non-backtracking query plan that keeps its place in the table from query to query, so each subsequent query does very little work to find more archivable rows.
Another alternative is to simply create a new database table each day. MyIsam does have some advantages for this, since INSERTs to the end of the table don't generally block anyway, and there is a merge table type to being them all back together. A number of websites log the httpd traffic to tables like that.
With Mysql 5.1, there are also partition tables that can do much the same.
I use mysql partition tables and I've achieve wonderful results in all aspects.
Sounds like replication is the best solution for this. After the initial sync the slave gets updates via the Binary Log, thus not affecting the master DB at all.
More on replication.
MK-ARCHIVER is a elegant tool to archive MYSQL data.
http://www.maatkit.org/doc/mk-archiver.html
MySQL replication would work perfectly for this.
Master -> the live server.
Slave -> a different server on the same network.
Could you keep two mirrored databases around? Write to one, keep the second as an archive. Switch every, say, 24 hours (or however long you deem appropriate). Into the database that was the archive, insert all of todays activity. Then the two databases should match. Use this as the new live db. Take the archived database and do whatever you want to it. You can backup/extract/read all you want now that its not being actively written to.
Its kind of like having mirrored raid where you can take one drive offline for backup, resync it, then take the other drive out for backup.