Skip some columns replication on slave in MySQL

Skip some columns replication on slave in MySQL - mysql

I have identical table on master and slave. Both have many columns with price among them. Is there any possibility to set individual values for price column in slave so that replication won't overwrites them?
Our system works in many countries that have individual prices, but we would also like to share some common data from the central database.
I though up some solutions:
Extract all country specific columns to other tables that won't be replicated.
This would require a lot of changes in our source code.
Create some proxy database with only common shared columns that will replicate from master and then set it as master to country based database with full column definition.
It looks ugly to me.
Is there something better? I would appreciate any help.

I really don't think replication is designed to do "almost-replication" like you're asking. I don't think you can replicate just part of a table.
Your first solution sounds the most reasonable to me.
In general, I'd come up with a software solution that works with no replication, or with all tables being replicated. Then add replication in production to add redundancy or performance. Anything else sounds like it'll be pretty complex to develop and test.
(Me: I'm not exactly a MySQL expert, though I have set up MySQL replication a couple times.)

There is a slave-skip-columns patch in Percona Server 5.1. You might want to take a look.

This is an old question, but comes up first in Google so I thought this is relevant and useful: https://dev.mysql.com/doc/refman/5.5/en/replication-features-differing-tables.html
In short, if your common columns in a table, on master and on slave, are of the same type and in the same order, then replication works even if there are extra columns at the end on either master or slave. For mismatching data types, there are some rules.

Related

MySQL replication/synchronization: purge from master but not from slave

I came across this problem a few days ago and have been tinkering with- and pondering about several different approaches, but I cannot seem to find a good answer:
I have two MySQL servers, one master/hot and one slave/archive. All write requests go to the master, and shall also (eventually) be replicated/copied to the slave. However, certan data in the master grows "stale" after a while (say a week) and shall then be purged, so to keep the master's tables short. This purge should however not affect the slave. How can I go about achiving this?
Essentially, my master database acts sort of like a "hot" database, where data is fresh and is purged once it goes old. It should contain data that users might need quickly, and thus we want to keep the tables small. My slave on the other hand works more like an archive, which should contain all data, regardless of "hotness". Queries to the slave doesn't need to execute quickly, and the slaves data can lag behind a few minutes, but it needs to contain all records since our beginning of time.
My initial thought was to utilize ordinary replication, but can I somehow filter certain queries to not affect the slave? I was thinking of creating a purge query, which removes old data from the master but doesn't effect the slave. From reading the MySQL documentation, it seems that this filtering can only be done on Database or Tabel level.
Another thought was to do this via an external application, and manually SELECT data from the master and INSERT it into the slave, and then use some clever logic to decide what data to select. This works good for log-tables, which will only ever add data, but it doesn't work good for tables that represents states, such as user settings. This approach will probably also include a lot of special cases, as I cannot find a good, consistent way of describing all tables in our database (there are log-tables, state-tables, config-tables and a few which I cannot really categorize).
None of these approaches seem to solve the problem in a simple fashion, but I feel I cannot be the first to have this problem. Any ideas are welcome, and thanks in advance.
If more info is needed, feel free to comment and I'll edit it in

Just use regular replication. When you delete data on the master you do in the same session
SET sql_log_bin = 0;
DELETE FROM my_table WHERE whatever = true;
SET sql_log_bin = 1;
This prevents that those statements are written to the binary log. And therefore it won't be replicated to the slave.
read more about it here

How may I synchronize two mariadb databases from a date on?

I am trying to replicate my production database, and in the process clean some tables. In the new one, I got rid of old entries (moved to a "historical" db). This implied a lot of time and concatenation of deletes. Now that I have it as I wanted to, I need to synchronize the newest entries in the production database (which is running and cannot be stopped), but I don't want the old entries to come back. The intention is to switch between the two databases at some point.
I thought of using Navicat's Data Synchronization tool, but it wouldn't allow me to filter the entries based on dates (or any filter at all, for what it's worth). Can anyone suggest a method/tool to use for this task?
Thanks in advance.

Performing Heavy Crunching On a Table Without Affecting the Table

I'm looking for some general advice on the best way to perform heavy crunching/data-mining on a database table, without affecting the performance of regular site queries on the table. Some of the calculations may involve joining several tables, and involve complex sorting and ordering. So "use better indexes" isn't always the solution.
This question isn't really specific. I'm looking for a general way to solve a problem that's come up many times over the years. So I don't have a specific table schema to show, a specific query to show. I've considered dumping the table first using mysqldump, and then re-importing the table under a different name, and then performing my heavy crunching on that temp table. My sysadmin hates the idea, so I'm looking for any other solutions people have come up with to deal with this type of problem.

If your "heavy crunching" is all read only and you are not doing anything that needs to be written back into your production data, use a Master/Slave replication and use the Slave for all your reporting and data analysis needs. The replication link will keep the values up to date on the Slave, and you can hit the Slave with as much load as you want without slowing down the Master which is serving your production system.

If you want to avoid affecting performance of your production database, the only solution I have used previously is to run your queries on another database server.
I would take a backup of the entire database and then restore it on a separate server.
Obviously, you cannot do this if you want to analyze real-time data. But for most analysis, a snapshot from the previous day is sufficient.

Replication with lots of temporary table writes

I've got a database which I intend to replicate for backup reasons (performance is not a problem at the moment).
We've set up the replication correctly and tested it and all was fine.
Then we realized that it replicates all the writes to the temporary tables, which in effect meant that replication of one day's worth of data took almost two hours for the idle slave.
The reason for that is that we recompute some of the data in our db via cronjob every 15 mins to ensure it's in sync (it takes ~3 minutes in total, so it is unacceptable to do those operations during a web request; instead we just store the modifications without attempting to recompute anything while in the web request, and then do all of the work in bulk). In order to process that data efficiently, we use temporary tables (as there's lots of interdependencies).
Now, the first problem is that temporary tables do not persist if we restart the slave while it's in the middle of processing transactions that use that temp table. That can be avoided by not using temporary tables, although this has its own issues.
The more serious problem is that the slave could easily catch up in less than half an hour if it wasn't for all that recomputation (which it does one after the other, so there's no benefit of rebuilding the data every 15 mins... and you can literally see it stuck at, say 1115, only to quickly catch up and got stuck at 1130 etc).
One solution we came up with is to move all that recomputation out of the replicated db, so that the slave doesn't replicate it. But it has disadvantages in that we'd have to prune the tables it eventually updates, making our slave in effect "castrated", ie. we'd have to recompute everything on it before we could actually use it.
Did anyone have a similar problem and/or how would you solve it? Am I missing something obvious?

I've come up with the solution. It makes use of replicate-do-db mentioned by Nick. Writing it down here in case somebody had a similar problem.
The problem with just using replicate-(wild-)do* options in this case (like I said, we use temp tables to repopulate a central table) is that either you ignore temp tables and repopulate the central one with no data (which causes further problems as all the queries relying on the central table being up-to-date will produce different results) or you ignore the central table, which has a similar problem. Not to mention, you have to restart mysql after adding any of those options to my.cnf. We wanted something that would cover all those cases (and future ones) without the need for any further restart.
So, what we decided to do is to split the database into the "real" and a "workarea" databases. Only the "real" database is replicated (I guess you could decide on a convention of table names to be used for replicate-wild-do-table syntax).
All the temporary table work is happening in "workarea" db, and to avoid the dependency problem mentioned above, we won't populate the central table (which sits in "real" db) by INSERT ... SELECT or RENAME TABLE, but rather query the tmp tables to generate a sort of a diff on the live table (ie. generate INSERT statements for new rows, DELETE for the old ones and update where necessary).
This way the only queries that are replicated are exactly the updates that are required, nothing else, ie. some (most?) of the recomputation queries hapenning every fifteen minutes might not even make its way to slave, and the ones that do will be minimal and not computationally expensive at all, just simple INSERTs and DELETEs.

In MySQL, as of 5.0 I believe, you can do table wildcards to replicate specific tables. There are a number of command-line options that can be set but you can also do this via your MySQL config file.
[mysqld]
replicate-do-db = db1
replicate-do-table = db2.mytbl2
replicate-wild-do-table= database_name.%
replicate-wild-do-table= another_db.%
The idea being that you tell it to not replicate any tables other than the ones you specify.

Best way to archive live MySQL database

We have a live MySQL database that is 99% INSERTs, around 100 per second. We want to archive the data each day so that we can run queries on it without affecting the main, live database. In addition, once the archive is completed, we want to clear the live database.
What is the best way to do this without (if possible) locking INSERTs? We use INSERT DELAYED for the queries.

http://www.maatkit.org/ has mk-archiver
archives or purges rows from a table to another table and/or a file. It is designed to efficiently “nibble” data in very small chunks without interfering with critical online transaction processing (OLTP) queries. It accomplishes this with a non-backtracking query plan that keeps its place in the table from query to query, so each subsequent query does very little work to find more archivable rows.
Another alternative is to simply create a new database table each day. MyIsam does have some advantages for this, since INSERTs to the end of the table don't generally block anyway, and there is a merge table type to being them all back together. A number of websites log the httpd traffic to tables like that.
With Mysql 5.1, there are also partition tables that can do much the same.

I use mysql partition tables and I've achieve wonderful results in all aspects.

Sounds like replication is the best solution for this. After the initial sync the slave gets updates via the Binary Log, thus not affecting the master DB at all.
More on replication.

MK-ARCHIVER is a elegant tool to archive MYSQL data.
http://www.maatkit.org/doc/mk-archiver.html

MySQL replication would work perfectly for this.
Master -> the live server.
Slave -> a different server on the same network.

Could you keep two mirrored databases around? Write to one, keep the second as an archive. Switch every, say, 24 hours (or however long you deem appropriate). Into the database that was the archive, insert all of todays activity. Then the two databases should match. Use this as the new live db. Take the archived database and do whatever you want to it. You can backup/extract/read all you want now that its not being actively written to.
Its kind of like having mirrored raid where you can take one drive offline for backup, resync it, then take the other drive out for backup.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008