Can I join between two MySQL tables stores on separate machines? - mysql

I have a relatively light query that needs information from a local MySQL table along with another MySQL table which is stored on a physically separate machine (on the same network). I'm keen to avoid setting up replication just to facilitate this light query that only needs executed once a day.
Is there any way that I can join with a table on a remote machine using one query? Or run a SELECT INTO into a local table.
Notes
I'm using C# & .NET 4.

This can be done by using the FEDERATED storage engine for the remote table. Find out more.

Related

how to create mysql database for cadence

If we want to create mysql databases for cadence. Assuming we want 10 shards for cadence, we should create a set of mysql cadences tables for each shard? If we want 5 machine to create mysql database for 10 shards, how should we do?
Assuming we want 10 shards for cadence, we should create a set of mysql cadences tables for each shard
No. You will only need one set of tables for the whole Cadence cluster.
The sharding mechanism is implemented within Cadence server. Unless you have a sharded MySQL solution, you don't need to worry about anything about sharding when setting up database schema.
If you do have a sharded MySQL, just make sure to use shardID as partition(sharding) key for the table.
Sharding in Cadence is only needed for History service(that's why it's called numHistoryShards in the config).
More reading about the sharding:
https://cadenceworkflow.io/docs/operation-guide/setup/#static-configuration
Typically you will need 2K shards in production if MySQL is the database.
here is some reference that might help you for mysql , postgresql

How to have a centralized MySQL database?

I am trying to setup a MySQL database that takes data from 3 other MySQL databases. The data that would be copied would be a query that standardizes the data format. The method would need to either be run daily as a script or synced in real time, either method would be fine for this project.
For example:
The query from source DB:
SELECT order_id, rate, quantity
WHERE date_order_placed = CUR_DATE()
FROM orders
Then I want to take the results of that query to be inserted into a destination DB.
The databases are on separate hosts.
I have tried creating scripts that run CSV and SQL exports/imports without success. I have also tried using Python pymysql library but seemed overkill. I'm pretty lost haha.
Thanks :)
Plan A:
Connect to source. SELECT ... INTO OUTFILE.
Connect to destination. LOAD DATA INFILE from the output above.
Plan B (both MySQL):
Set up Replication from the source (as a Master) and the destination (as a Slave)
Plan C (3 MySQL servers):
Multi-source replication to allow gathering data from two sources into a single, combined, destination.
I think MariaDB 10.0 is when they introduced multi-source repl. Caution: MariaDB's GTIDs are different than MySQL's. But I think there is a way to make the replication you seek to work. (It may be as simple as turning off GTIDs??)
Plan D (as mentioned):
Some ETL software.
Please ponder which Plan you would like to pursue, then ask for help in focusing on one. Meanwhile, your question is too broad.

MEMSQL vs. MySQL

I need to start off by pointing out that by no means am I a database expert in any way. I do know how to get around to programming applications in several languages that require database backends, and am relatively familiar with MySQL, Microsoft SQL Server and now MEMSQL - but again, not an expert at databases so your input is very much appreciated.
I have been working on developing an application that has to cross reference several different tables. One very simple example of an issue I recently had, is I have to:
On a daily basis, pull down 600K to 1M records into a temporary table.
Compare what has changed between this new data pull and the old one. Record that information on a separate table.
Repopulate the table with the new records.
Running #2 is a query similar to:
SELECT * FROM (NEW TABLE) LEFT JOIN (OLD TABLE) ON (JOINED FIELD) WHERE (OLD TABLE.FIELD) IS NULL
In this case, I'm comparing the two tables on a given field and then pulling the information of what has changed.
In MySQL (v5.6.26, x64), my query times out. I'm running 4 vCPUs and 8 GB of RAM but note that the rest of my configuration is default configuration (did not tweak any parameters).
In MEMSQL (v5.5.8, x64), my query runs in about 3 seconds on the first try. I'm running the exact same virtual server configuration with 4 vCPUs and 8 GB of RAM, also note that the rest of my configuration is default configuration (did not tweak any parameters).
Also, in MEMSQL, I am running a single node configuration. Same thing for MySQL.
I love the fact that using MEMSQL allowed me to continue developing my project, and I'm coming across even bigger cross-table calculation queries and views that I can run that are running fantastically on MEMSQL... but, in an ideal world, i'd use MySQL. I've already come across the fact that I need to use a different set of tools to manage my instance (i.e.: MySQL Workbench works relatively well with a MEMSQL server but I actually need to build views and tables using the open source SQL Workbench and the mysql java adapter. Same thing for using the Visual Studio MySQL connector, works, but can be painful at times, for some reason I can add queries but can't add table adapters)... sorry, I'll submit a separate question for that :)
Considering both virtual machines are exactly the same configuration, and SSD backed, can anyone give me any recommendations on how to tweak my MySQL instance to run big queries like the one above on MySQL? I understand I can also create an in-memory database but I've read there might be some persistence issues with doing that, not sure.
Thank you!
The most likely reason this happens is because you don't have index on your joined field in one or both tables. According to this article:
https://www.percona.com/blog/2012/04/04/join-optimizations-in-mysql-5-6-and-mariadb-5-5/
Vanilla MySQL only supports nested loop joins, that require the index to perform well (otherwise they take quadratic time).
Both MemSQL and MariaDB support so-called hash join, which does not require you to have indexes on the tables, but consumes more memory. Since your dataset is negligibly small for modern RAM sizes, that extra memory overhead is not noticed in your case.
So all you need to do to address the issue is to add indexes on joined field in both tables.
Also, please describe the issues you are facing with the open source tools when connect to MemSQL in a separate question, or at chat.memsql.com, so that we can fix it in the next version (I work for MemSQL, and compatibility with MySQL tools is one of the priorities for us).

MySQL cloning aggregated database from an existing database

We have a MySQL database based on InnoDB. We are looking to build an Analytics system for this data. We are thinking to create a cloned database that denormalizes the data to prevent join and uses MyIsam for faster querying. This second database will also facilitate avoiding extra load on the main database to which the data will be written.
Apart from this, we are also creating some extra tables that will store aggregated numbers to avoid recalculation.
I am wondering how can I sync these tables once every day to keep them updated. It looks similar to Master-slave config of MySQL which uses binary log. But in our case, the second database is not an exact slave. Are there any open-source reliable tools or any other ideas which I can use to write an 'update mechanism'?
Thanks in advance.

One-way database sync to MySQL

I have an VFP based application with a directory full of DBFs. I use ODBC in .NET to connect and perform transactions on this database. I want to mirror this data to mySQL running on my webhost.
Notes:
This will be a one-way mirror only. VFP to mySQL
Only inserts and updates must be supported. Deletes don't matter
Not all tables are required. In fact, I would prefer to use a defined SELECT statement to only mirror psuedo-views of the necessary data
I do not have the luxury of a "timemodified" stamp on any VFP records.
I don't have a ton of data records (maybe a few thousand total) nor do I have a ton of concurrent users on the mySQL side, want to be as efficient as possible though.
Proposed Strategy for Inserts (doesn't seem that bad...):
Build temp table in mySQL, insert all primary keys of the VFP table/view I want to mirror
Run "SELECT primaryKey from tempTable not in (SELECT primaryKey from mirroredTable)" on mySQL side to identify missing records
Generate and run the necessary INSERT sql for those records
Blow away the temp table
Proposed Strategy for Updates (seems really heavyweight, probably breaks open queries on mySQL dropped table):
Build temp table in mySQL and insert ALL records from VFP table/view I want to mirror
Drop existing mySQL table
Alter tempTable name to new table name
These are just the first strategies that come to mind, I'm sure there are more effective ways of doing it (especially the update side).
I'm looking for some alternate strategies here. Any brilliant ideas?
It sounds like you're going for something small, but you might try glancing at some replication design patterns. Microsoft has documented some data replication patterns here and that is a good starting point. My suggestion is to check out the simple Move Copy of Data pattern.
Are your VFP tables in a VFP database (DBC)? If so, you should be able to use triggers on that database to set up the information about what data needs to updated in MySQL.