What will the changes when I changed my replication factor manually? - hadoop2

What will happen when I changed my replication factor manually from 3 to 4 , Is there any changes in past data and what about the replications from the changes made?
Please provide me all possible solutions.

If you change the Replication factor manually, then it will be fixed for the upcoming data in your cluster. It won't change the replication factor of the old data.
Suppose, you had a data with 3 replicas and now you decide to change the replication to 4, then the old data will remain to have 3 replicas but any new data you ingest will get a replication of 4.
Now, if you also want to change the replication of your old data then you have to do it manually. For that, Hadoop provides a command:
hdfs dfs -setrep -w [replication_factor] [/dir/path]

Related

MySQL - row filter replication

I've been having a hard time to figure out how to configure distributed database system in MySQL. I have 3 servers, each has the same database model. I have configured them for using replication - so there is 1 Master and 2 Slaves. Everything works fine, but what I want to do next is to "filter" replication data.
Let's say that I have two tables: customers and products. I want to replicate all content of products table to both slaves (and I got this part), but I want to replicate customers from Europe to Slave1 and from Asia to Slave2. So master will contain all information about customers, but slaves only some part of it. How can I achive that?
As far as I know replication itself doesn't support this kind of filtering. I am not sure, but it seems like partitioning is not an answer as well.
Are there any build-in mechanizm in MySQL that may help? If not, how would you resolve this situation?
For this you have to use custom way to read data and based on your business logic add data to slave 1 or slave 2.
For instance, you can read change in database entry from binlog and based on your input data, you can store it in different slave databases. However, there might be a small delay in this.
There is a very good system which does the same, ie maxwell(https://github.com/zendesk/maxwell) which uses kafka to push database change event with type of query and old and new data. Now on the basis of that you can write kafka consumer to read data and push it into your slave databases. maxwell uses binlog replicator same as Master/Slave replicator and is reliable.

Run multi mysql services on same machine

I have linux machine ( ubuntu 14.04 32GB RAM 8 core ...)
I want to run on this machine several Slaves ( currently 5 slave replications but I will need more )
I use master-slave mysql replication
From my point of view there are 2 options to do that
1. use mysqld_multi - set 5 instance ( done that in the past with 2 instance)
2. use Docker container - each one with mysql slave
what is the Best solution ?
which one will be more easy to maintain ( and add ) ?
Tnx for help
The issues I'm trying to solve are:
I have problems with Performance and with the architecture that we use I’m unable to use cluster - so I want to use load balancing and split the read/write
On one of my machines I need to split data from 1 master to different slaves by some column value - and i want all slaves will be on same machine
I would suggest reading the book High Perfromance MySQL.
If you're having performance problems but your server has enough resources to add multiple slaves then you should tweak the configuration of your master MySql instance to better utilize those resources. If you're attempting to split the writes to slaves those changes will never be propagated back to the master. If you want to utilize slaves to increase read performance then you can do that, but I would only suggest this if you've maxed out the box that the master instance is running on.
I would need more information on why you think you need to do this: 'On one of my machines I need to split data from 1 master to different slaves by some column value' to be able to comment about it. At the surface it feels like this is a bad idea, but there could be a reason for it.

Best way of backing up mysql clustered database

I have a mysql cluster database spread on 2 servers.
I want to create a backup system for this database based on the following requirements:
1. Recovery/restore should be very easy and quick. Even better if i can switch connection string at any time i like.
the back up must be like snapshots, so I want to keep copies of different day (and maybe keep the latest 7 days for example)
the copy database does not have to be clustered.
The best way to back up a MySQL Cluster is to use the native backup mechanism that gets initiated with the START BACKUP command in the `ndb_mgm.
Backup is easy (just a single command) and relatively quick. Restore is a bit more tricky, but is at least faster and more reliable than using mysqldump. See also:
http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-backup.html
and
http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-programs-ndb-restore.html
2) The backups are consistent snapshots and are distinguishable by an auto-incrementing backup ID, so having several snapshots is easily possible
3) The backup is clustered by default (every data node stores backup files on its own file system), but you should either have the backup directory pointing to a shared file system mount, or copy files form all nodes to a central place once a backup has finished

MySQL Master <=(Slave,Master)=> Slave

I want to know if a server can be a slave and master at the same time. Our problem is that we have lots of mobile units that need to be synced to the master but they only need 6 out of the 100s of tables on the master. All the extra tables serve no purpose on the slave except for delaying synchronization and adding data costs.
We want to create a smaller schema say mobileSchema that contains only 6 tables that are sync'd to their counterparts in the masterSchema. Is this possible? To have schemas sync internally or have some master/slave-master/slave configuration where the middle server is slave to the bigger server and master to the mobile units?
If the answer is no would anyone have any alternate solutions to propose. We're trying to avoid having to sync the different schemas/databases manually as that can get real ugly real fast.
Raza
AFAIK you can't natively sync schemas internally.
In your case you can do something like this:
Enable binary logging on your main server.
Create another server to act as a proxy and configure it to replicate from the main.
Configure the 'proxy' to only replicate the tables you need for the remote servers (replicate-do-table).
Enable binary logging and log-slave-updates on the 'proxy'.
Configure your remote units to replicate from the proxy.
You will probably also need to enable encryption for the remote connections.
You might like to look at replication filters.
You can do filtering on the master, so it only logs part of the changes.
Or you can do filtering on the replica(s), so the master would log all changes, and the replica would download all the logs, but the replica would only apply a subset of changes. Good if you want some replicas to replay some changes but other replicas to replay a different subset of changes.

Cross Data Center: MySQL Replication vs Simple File Copying?

Does it make sense to simply copy the mysql\data files vs mysql replication between data centers? I am having the impression mysql replication might be complex when done cross data center. And if I just copy, I could easily switch to the other data center w/o worrying if it's primary or slave. Any thoughts?
MySQL with InnoDB storage engine uses multiversioning of the rows. This means there are changes to the database that may not be yet commited (and possibli will be reverted!). If you simply copy the files, you will end up in inconsistent state.
If you are using MyISAM, copying the files is safe. Replication hovewer will transfer only the changes, while copying will transfer the entire database each time. Which is not wise with large databases.
Replication synchronizes database between data centers "live".
While coping whole database takes a lot of time and databases will desynchronize after first change is made.