I want to collect MySQL data from 10 different machines and aggregate into a one big MySQL db on a different machine. All machines are Linux based.
What is the "mysqldump" syntax if I want to do this periodically to collect only the "delta" data?
Are there any other ways to achieve this?
This isn't natively supported in MySQL. You could use replication, but a replica can have only a single master, not 10 masters. I know of two workable options:
1) is to script something up that switches the replica between masters in a round-robin fashion. You might wish to refer to http://code.google.com/p/mysql-mmre/ or http://thenoyes.com/littlenoise/?p=117.
2) is to use an ETL tool.
If you get stuck, we (Percona) can help you. This is a common request, but not an easy one, because each case is different.
mysqldump can't generate incremental backups, as it doesn't have any way of determining which rows (or what parts of the schema!) have changed since the last backup, or indeed even when the last backup was. For that you'd need something which could read the MySQL binlog and convert it into a bunch of INSERT/UPDATE/DELETE statements; I'm not aware of anything that exists quite like that.
The current "state of the art" in MySQL backups is generally considered to be Percona XtraBackup.
Multiple Master Slave? Have each of the 10 as Masters, and the aggregate a slave to all 10. This assumes that the data you are aggregating is different on each of the 10. If the data is the same (or similar) on all 10 and you want to interleave it as well as integrate it then this won't work.
Related
I have to create a dashboard based on a table in mysql, and only on today datas
This db is used on a service with a massive data quantity, and continous read and write data, so I'd like to replicate in a "slave" instance part of this table (only today data).
Is it possible to do it in Mysql, without scripting?
Thanks
MySQL has no built-in feature to replicate a subset of rows. There are replication filters to replicate a subset of schemas or tables, but not rows.
One workaround could be to replicate fully to the replica, then on the replica delete any data that is more than one day old.
But this would work only for a database that is INSERT-only. If you also have UPDATE and DELETE operations replicated, they might find that they are trying to change rows that are missing. If you use ROW-based binary logs, this would result in a replication error when it can't find the row, and replication would stop.
It might work if you only use STATEMENT-based binary logs, but I've never tried it so I can't predict what other problems might occur. Also, you can't fully prevent ROW-based binary logs from occurring, because individual sessions can change their binary log format.
I think you're going to need a bespoke solution no matter what. Probably not using replication, but just an ETL job to query the current day's data and import it into another MySQL instance (not a replica).
I need to start off by pointing out that by no means am I a database expert in any way. I do know how to get around to programming applications in several languages that require database backends, and am relatively familiar with MySQL, Microsoft SQL Server and now MEMSQL - but again, not an expert at databases so your input is very much appreciated.
I have been working on developing an application that has to cross reference several different tables. One very simple example of an issue I recently had, is I have to:
On a daily basis, pull down 600K to 1M records into a temporary table.
Compare what has changed between this new data pull and the old one. Record that information on a separate table.
Repopulate the table with the new records.
Running #2 is a query similar to:
SELECT * FROM (NEW TABLE) LEFT JOIN (OLD TABLE) ON (JOINED FIELD) WHERE (OLD TABLE.FIELD) IS NULL
In this case, I'm comparing the two tables on a given field and then pulling the information of what has changed.
In MySQL (v5.6.26, x64), my query times out. I'm running 4 vCPUs and 8 GB of RAM but note that the rest of my configuration is default configuration (did not tweak any parameters).
In MEMSQL (v5.5.8, x64), my query runs in about 3 seconds on the first try. I'm running the exact same virtual server configuration with 4 vCPUs and 8 GB of RAM, also note that the rest of my configuration is default configuration (did not tweak any parameters).
Also, in MEMSQL, I am running a single node configuration. Same thing for MySQL.
I love the fact that using MEMSQL allowed me to continue developing my project, and I'm coming across even bigger cross-table calculation queries and views that I can run that are running fantastically on MEMSQL... but, in an ideal world, i'd use MySQL. I've already come across the fact that I need to use a different set of tools to manage my instance (i.e.: MySQL Workbench works relatively well with a MEMSQL server but I actually need to build views and tables using the open source SQL Workbench and the mysql java adapter. Same thing for using the Visual Studio MySQL connector, works, but can be painful at times, for some reason I can add queries but can't add table adapters)... sorry, I'll submit a separate question for that :)
Considering both virtual machines are exactly the same configuration, and SSD backed, can anyone give me any recommendations on how to tweak my MySQL instance to run big queries like the one above on MySQL? I understand I can also create an in-memory database but I've read there might be some persistence issues with doing that, not sure.
Thank you!
The most likely reason this happens is because you don't have index on your joined field in one or both tables. According to this article:
https://www.percona.com/blog/2012/04/04/join-optimizations-in-mysql-5-6-and-mariadb-5-5/
Vanilla MySQL only supports nested loop joins, that require the index to perform well (otherwise they take quadratic time).
Both MemSQL and MariaDB support so-called hash join, which does not require you to have indexes on the tables, but consumes more memory. Since your dataset is negligibly small for modern RAM sizes, that extra memory overhead is not noticed in your case.
So all you need to do to address the issue is to add indexes on joined field in both tables.
Also, please describe the issues you are facing with the open source tools when connect to MemSQL in a separate question, or at chat.memsql.com, so that we can fix it in the next version (I work for MemSQL, and compatibility with MySQL tools is one of the priorities for us).
I am trying to replicate my production database, and in the process clean some tables. In the new one, I got rid of old entries (moved to a "historical" db). This implied a lot of time and concatenation of deletes. Now that I have it as I wanted to, I need to synchronize the newest entries in the production database (which is running and cannot be stopped), but I don't want the old entries to come back. The intention is to switch between the two databases at some point.
I thought of using Navicat's Data Synchronization tool, but it wouldn't allow me to filter the entries based on dates (or any filter at all, for what it's worth). Can anyone suggest a method/tool to use for this task?
Thanks in advance.
We have a moderately large production MySQL database. Periodically, we will run commands, usually via a rails migration, that while running, bog down the database. As a specific example, we might add an index to a large table.
Is there any method that can reduce the priority MySQL gives to a specific task. A sort of "nice" within MySQL itself? I found this, which is what inspired the question:
PostgreSQL tips and tricks
Since adding an index causes the work to be done within the DB and MySQL process, lowering the priority of the Rails migration process seems like it won't help. Are there other ways we can lower the priority?
We use multiple, replicated database servers to make changes like this.
In our case, db1 is the master, replicated to db2. (db1->db2).
Start by making the change to db2. If things lock, replication will stall, but that's OK.
Move your traffic to db2. Any remnant traffic going to db1 will replicate over, so you won't lose anything.
Once there's no traffic on db1, rebuild it as a slave of db2 (db2->db1).
That's the general idea and you get very little downtime and you don't have to pull an all-nighter! We actually have three servers, so it's a little more complicated, but not much.
Good luck.
Unfortunately, there is no simple way to do this: commands that alter the database structure don't have a priority option.
If your tables are MyISAM, you can try this:
mysqlhotcopy to make a backup of the table
import that backup it into a different database server (one that's not under load)
make the changes there
make a mysqlhotcopy backup of the altered table
import it into the live server
Note that this may or may not be faster than adding the index on the live server, depending on the time it takes you to transfer the table back and forth.
We have a live MySQL database that is 99% INSERTs, around 100 per second. We want to archive the data each day so that we can run queries on it without affecting the main, live database. In addition, once the archive is completed, we want to clear the live database.
What is the best way to do this without (if possible) locking INSERTs? We use INSERT DELAYED for the queries.
http://www.maatkit.org/ has mk-archiver
archives or purges rows from a table to another table and/or a file. It is designed to efficiently “nibble” data in very small chunks without interfering with critical online transaction processing (OLTP) queries. It accomplishes this with a non-backtracking query plan that keeps its place in the table from query to query, so each subsequent query does very little work to find more archivable rows.
Another alternative is to simply create a new database table each day. MyIsam does have some advantages for this, since INSERTs to the end of the table don't generally block anyway, and there is a merge table type to being them all back together. A number of websites log the httpd traffic to tables like that.
With Mysql 5.1, there are also partition tables that can do much the same.
I use mysql partition tables and I've achieve wonderful results in all aspects.
Sounds like replication is the best solution for this. After the initial sync the slave gets updates via the Binary Log, thus not affecting the master DB at all.
More on replication.
MK-ARCHIVER is a elegant tool to archive MYSQL data.
http://www.maatkit.org/doc/mk-archiver.html
MySQL replication would work perfectly for this.
Master -> the live server.
Slave -> a different server on the same network.
Could you keep two mirrored databases around? Write to one, keep the second as an archive. Switch every, say, 24 hours (or however long you deem appropriate). Into the database that was the archive, insert all of todays activity. Then the two databases should match. Use this as the new live db. Take the archived database and do whatever you want to it. You can backup/extract/read all you want now that its not being actively written to.
Its kind of like having mirrored raid where you can take one drive offline for backup, resync it, then take the other drive out for backup.