Partitioning DB tables in MySQL - mysql

What is the best strategy to make a clustered MySQL deployment in which some tables of the DB are placed on one node and some other tables are placed on another node while acting as a single coherent DB from the application's perspective?
Let's say if I have 2 data nodes A and B, and a database with 5 tables, I want tables 1, 2, and 3 to be placed on node A and tables 4 and 5 to be placed on node B.
Do we need this deployment to be a clustered deployment, or would a typical MySQL deployment handle this? If yes, how so?
How about having table 4 replicated on both A and B?

MySQL will allow for transparent access to tables stored on other instances using the federated engine (this has been available for a long time).
MySQL does provide a feature called partitioning - which is applied to tables to distribute the data across different filesystems - but this is something very different.
How about having table 4 replicated on both A and B?
You can set up mysql replication to only copy specific tables (see replicate-wild-do-table) however mixing federation and replication is going to get very confusing very quickly - get it wrong and you will trash your data. Use one or the other. Not both.

Related

how to create mysql database for cadence

If we want to create mysql databases for cadence. Assuming we want 10 shards for cadence, we should create a set of mysql cadences tables for each shard? If we want 5 machine to create mysql database for 10 shards, how should we do?
Assuming we want 10 shards for cadence, we should create a set of mysql cadences tables for each shard
No. You will only need one set of tables for the whole Cadence cluster.
The sharding mechanism is implemented within Cadence server. Unless you have a sharded MySQL solution, you don't need to worry about anything about sharding when setting up database schema.
If you do have a sharded MySQL, just make sure to use shardID as partition(sharding) key for the table.
Sharding in Cadence is only needed for History service(that's why it's called numHistoryShards in the config).
More reading about the sharding:
https://cadenceworkflow.io/docs/operation-guide/setup/#static-configuration
Typically you will need 2K shards in production if MySQL is the database.
here is some reference that might help you for mysql , postgresql

MySQL replication with custom query for reverse hashes

I have a MySQL DB with a quickly growing amount of data.
I'd like to use some web based tool that plugs into the DB so that I can analyze data and create reports.
The idea would be to use replication in order to give R/O access to the slave DB instead of having to worry about security issues on the master (which also contains data not relevant to this project, but just as important).
The master DB contains strings that are hashed (SHA1 128) from the source and, on the slave, they need to go back to their original form using a reverse hash database.
This will allow whatever tool I plug into the slave-DB (living on another server) to work straight out of the box.
My question is: what is the best way to do replication while somehow reshaping the slave-DB with the mentioned strings back into the source format?
example
MASTER DB
a8866ghfde332as
a8fwe43kf3e3t42
SLAVE DB
John Smith
Rose White
The slave DB should already contain the tables reversed and should NOT be reversed when doing a query.
How do you guys think I should approach this?
Is replication the way to go?
Thank you for any help!
EDIT
I should specify some details:
the slave DB would also contain a reverse hash (lookup) table
the amount of source strings is limited so there's little risk of collisions
the best option would be to replicate only certain tables to the slave, where the slave-DB does a reverse hash lookup every time there is an INSERT and saves the reversed hash in another table (or column) ready to be read by the web based tool
This type of setup I am willing to use is mainly focused on NOT having anything connecting to the master other than the source (that creates records in the DB) and the slave DB itself.
This would result in better security by having the reverse lookup table sitting in a DB (the slave) that is NOT in direct contact with the source of data.
So, even in case somebody hacks the source and makes it to the master DB, no useful data could be retrieved being the strings in object hashed.
It is easier, simpler, and most foolproof to replicate everything from master to slave in MySQL, so plan to relicate everything unless you have an extremely compelling reason not to.
That said, MySQL has absolutely no problem with the slave having tables that the master does not have -- tables created directly on the slave will not cause a problem if no tables with conflicting schema+table names exist on the master.
You don't want to try to have the slave "fix" the data on the way in, because that's not something MySQL replication is designed to do, nor is it something readily accomploahed. Triggers will fire on tables on the slave only when the master writes events to its binlog in statement mode, which is not as reliable as row mode nor as flexible as mixed mode, and even if you had this working, you then lose the ability to compare master and slave data sets with table checksums, which is an important part of the ongoing maintenance of master/slave replication.
However... I do see a way to accomplish what you want to do with the reverse hash tables: create the lookup tables on the slave, and then create views that reconstruct the data in its desired form by joining the replicated tables to the lookup tables, and run your queries on the slave against these views.
When a view simply joins properly indexed tables and doesn't include anything unusual like aggregate functions (e.g. COUNT()) or UNION or EXISTS, then, the server will process queries against views as if the underlying tables had been queried directly, using all available and appropriate indexes... so this approach should not cause significant performance penalties. In fact, you can declare indexes on replicated tables on the slave (on replicated tables) that you don't have or need on the master (except for UNIQUE indexes, which wouldn't make sense) and these can be designed as needed for the slave-specific workload.
Hash functions are surjective, so it is possible for two different inputs to have the same output. As such, it would not be possible to accurately rebuild the data on the slave.
On a simple level, and to demonstrate this; consider a hashing function for integers, that happens to return the square of the input; so, -1 => 1, 0 => 0, 1 => 1, 2 => 4, 3 => 9, etc. Now consider the inverse, being the square root, 1 => -1 & 1, 4 => -2 & 2, etc.
It may be better to only replicate the specific data needed for your project to the slaves, and do it without hashing.

Merge Two MySql DATABASES (not tables) via Perl

I have two identical (in structure) databases residing on separate backend servers.
I need to come up with some logic to 'merge' their data into a single database on a third server.
My initial design is to load their data (by table) into memory using a combination of Perl hashes and arrays and merging them there, then doing a single massive write to a local DB (also identical in structure).
I would repeat for all tables (4-5).
I've seen posts about merging tables, but not sure if I can use some of those responses as my tables reside in separate databases (let alone separate machines).
My question is am I stuck with having to load the results into memory first or are there features of MySQL that I can use to my advantage?
What "mu" said needs addressing, but I'm not sure I'd go with this approach at all.
Get the two databases onto the target server using standard mysql dump/restore
Use standard queries to merge them into the third DB using standard queries
You should let MySQL do the heavy lifting.

replicate specific data between 2 mysql databases

I'm trying to replicate data between 2 MySQL databases. The issue is only some rows needs to be transferred to the second MySQL server based on a specific criteria.
I have 2 MySQL servers. The first one is intranet only, there is an application that reads/writes to it. The second MySQL server is online and the application connecting to it is read only.
I need to find a way to get the data from the first server to the second based on specific criteria (some rows are labeled as private and should not synchronized). I tried to do it with a trigger on the first server (trigger on insert/update) but I have way too many tables, it's very time consuming to do it like that.
What approaches do I have? dumping the entire data is not an option as there will a lot of records and the online server cannot afford to go offline just to get the information. Add to that that not all the records are for public usage.
1 - disable replication
2 - on the intranet, create an empty database and a view based on a query that shows exactly the rows you want to replicate to your internet server
3 - replicate the new database (the one containing the view) to a new database on your internet server
4 - on your internet server, you can cron a script that inserts the new rows to your desired table table, think about using dumps and LOAD DATA IN FILE, it should go verry quickly.

MySQL Database Replication for Multiple Tables

Previously, I have experience in doing the following database replication.
(1) I have 2 tables within 1 database in Machine A
(2) I update 2 tables in Machine A
(3) Machine A will replicate 2 tables to Machine B. Machine B will also contain 2 tables within 1 database.
Now, I would like to accomplish the following :
(1) I have Table A, within 1 database in Machine A.
(2) I have Table B, within 1 database in Machine B.
(3) I would like to replicate Table A and Table B to Machine C.
(4) Machine C will have Table A and Table B, within ONE database.
Is it possible this to be accomplished, through database replication?
Unfortunately, you can have only Master per mysql server. So for instance you could run two separate instances of mysql on different ports on Machine C that slaved from Machine A and Machine B respectively, but not from both in one server.
Depending on your situation, doing that might be get you close enough that some other replication technique (like periodically using mysqldump to copy one table across on Machine C) would work. It would just depend on your needs for the slaves (how big are the tables (i.e. how quickly can they be copied via a non-slaving method), how out of date is acceptable, do you really need them in one DB or is one server good enough, etc).
After a second thought, there is one type of multi-master replication which is possible and might serve your needs if you just want the data in one DB and don't really need Machine C. In that case, you could actually have one of the servers be the Master for Table A and Slave for Table B, while the other is Master for Table B (and if necessary, Slave for Table A). Decent looking explanation.
Multi-master replication isn't really possible unless your using Cluster, then I don't think you can use the example your talking about unless the two tables are really the same data simply seperate partitions.