How would I create a snapshot of a MySQL instance in WSL? - mysql

We use MySQL for our app. During dev and testing, we have to constantly reload the MySQL database, sometimes with very large amounts of data. In Linux, I would create an LVM snapshot to do quick rollbacks during testing to reduce turn-around time from 10-15 minutes to 30 seconds.
A number of our devs use Windows Subsystem for Linux (WSL) for their dev environment. It works great and allows them to mix the use of their Windows and Linux tools. Alas, unless things have changed recently, we don't have the full capability of LVM snapshots under WSL.
Are there any scripts we can use to create a MySQL snapshot and quickly restore to that rollback point? I'm aware we can manually review snapshots with mysqlbinlog, but that's rather tedious. I'd love a way to:
run a script start-point.sh
do our testing
profit
run a script rollback.sh
I'm very familiar with filesystem snapshots and so haven't done much with MySQL snapshots to date. The googling results are.. kind of a mess and I'm hoping there is an obvious "oh yeah, just use script ABC at this github link" if possible.
Thanks all.
P.S. If there is a better way of doing this in WSL I am all ears!

Related

Best practice - Developing on a local mysql server / RDS?

I am developing a Drupal site using MariaDB.
The import process of a 77MB dump file locally (docker container running maria db) takes about 2 minutes.
The same import to an Amazon RDS (db.m4.large) running a MariaDB database takes more than 30 minutes.
Isn't the Amazon RDS supposed to be quicker ?
What is the recommended practice for having a quick dev environment for SQL ? (the local docker service is running too slow)
Thanks,
Yaron
If you are already on RDS, just use a snapshot.
Take a snapshot from production. (or find one of the automated snapshots)
Create a new DB from the snapshot
It's very fast and doesn't have the issues of latency and running millions of queries which an import has.
However, this is just one very crude approach to making a dev environment.
Some people have scripts that create the data sat for DEV from scratch. This might be more appropriate and even necessary, if for example you have a large database and developers that like to work locally on their computer.
Some people have scripts that sanitize DEV to eliminate sensitive and personal data, which you could run after the snapshot.
Some people even have DEV as a replica of the main DB and modify the DEV db so that additional usage doesn't clash with the replicated changes. This is a bit delicate though.
Often Dev and Tests use dummy data, and Staging uses real data (cloned from Production and possibly sanitized).

What is an efficient way to maintain a local readonly copy of a live remote MySQL database?

I maintain a server that runs daily cron jobs to aggregate data sources and generate reports, accessible by a private Ruby on Rails application.
One of our data sources is a partial dump of one of our partner's databases. The partner runs an active application and the MySQL DB has hundreds of tables. They have given us read-only access to a relatively underpowered readonly slave of their application DB.
Because of latency issues and performance bottlenecking on their slave DB, we have been maintaining a limited local copy of their DB. We only need about 20 tables for our reports, so I only dump those tables. We also only need the data to a daily granularity, so realtime sync is not a requirement.
For a few months, I had implemented a nightly cron which streamed the dump of the necessary tables into a local production_tmp database. Then, when all tables were imported, I dropped production and renamed production_tmp to production. This was working until the DB grew to over 25GB, and we started running into disk space limitations.
For now, I have removed the redundancy step and am just streaming the dump straight into production on our local server. This feels a bit flimsy to me, and I would like to implement a safer approach. Also, currently doing the full dump/load takes our server over 2 hours, and I'd like to implement an approach that doesn't take as long. The database will only keep growing, so I'd like to implement something future proof.
Any suggestions would be appreciated!
I take it you have never heard of, or considered MySQL Replication?
The idea is that you do your backup & restore once, and then configure the replica to "subscribe" to a continuous stream of changes as they are made on the primary MySQL instance. Any change applied to the primary is applied automatically to the replica within seconds. You don't have to do the backup & restore procedure again, unless the replica gets damaged.
It takes some care to set up and keep working, but it's a much more efficient method of keeping two instances in sync.
#SusannahPotts mentions hot backup and/or incremental backup. You can get both of these features for free, without paying for MySQL Enterprise using Percona XtraBackup.
You can also consider using MySQL Transportable Tablespaces.
You'll need filesystem access to run either Percona XtraBackup or MySQL Enterprise Backup. It's not possible to use these physical backup tools for Amazon RDS, for example.
One alternative is to create a replication slave in the same network as the live system, and run Percona XtraBackup on that slave, where you do have filesystem access.
Another option is to stream the binary logs to another host (see https://dev.mysql.com/doc/refman/5.6/en/mysqlbinlog-backup.html) and then transfer them periodically to your local instance and replay them.
Each of these solutions has pros and cons. It's hard to recommend which solution is best for you, because you aren't sharing full details about your requirements.
This was working until the DB grew to over 25GB, and we started running into disk space limitations.
Some question marks "here":
Why don't you just increase the available Diskspace for your database? 25 GB seems nothing when it comes down to disk-space?
Why don't you modify your script to: download table1, import table1_tmp, drop table1_prod, rename table1_tmp to table1_prod; rinse and repeat.
Other than that:
Why don't you ask your partner for a system with enough performance to run your reports on? I'm quite sure, he would prefer this rather than having YOU download sensitive data every day to your "local site"?
Last thought (requires MySQL Enterprise Backup https://www.mysql.de/products/enterprise/backup.html):
Rather than dumping, downloading and importing 25 GB every day:
Create a full backup
Download and import
Use Differential or incremental backups from now.
The next day you download (and import) only the data-delta: https://dev.mysql.com/doc/mysql-enterprise-backup/4.0/en/mysqlbackup.incremental.html

Node.js, Express, MySQL - update schema

I have a small app running on a production server. In the next update the db schema will change; this means the production database schema will need to change and there will need to be some data manipulation.
What's the best way to do this? I.E run a one off script to complete these tasks when I deploy to the production server?
Stack:
Nodejs
Expressjs
MySQL using node mysql
Codeship
Elasticbeanstalk
Thanks!
"The best way" depends on your circumstances. Is this a rather seldom occurrence, or is it likely to happen on a regular basis? How many production servers are there? Are there other environments, e.g. for integration tests, staging etc.? Do your developers have an own DB environment on their machines? Does your process involve continuous integration?
The more complex your landscape is, the better it is to use solutions like Todd R suggested (Liquibase, Flywaydb).
If you just have one production server and it can be down for maintenance for a few hours, the it could be sufficient to
Schedule a maintenance downtime with your stakeholders and users
Shutdown the server
Create a backup
Update the database structure and contents as necessary
Deploy software updates
Restart the server
Test the result (manually or automatically)
Inform your stakeholders and users
If anything goes wrong, rollback to a backed up version of the database and your software.
Having database update scripts is advisable. Having tested them once or more is advisable even more. Creating a backup in advance is essential.
http://www.liquibase.org/ or http://flywaydb.org/ - pretty "heavy" for one time use, but if you'll need to change the schema again in the future, probably worth investing the time to learn one of these.

How can I keep my development server database up to date?

I develop websites for several small clients and I would love to be able to keep my local databases up to date with each client's production servers. (I'm thinking nightly updates) Many of their databases are in the hundreds of megabytes, so I feel like creating and transferring complete dumps every night is excessive.
Here are the harebrained ideas I have come up with so far:
Create a dump on the server and rsync with the previous night's dump
on my local machine. That should only transfer the changed bits of
the file right?
Create a dump on the server, and locally diff it from last night's
dump. Transfer only that diff. Maybe it could also send to md5 of the
original dump so I could be sure I was applying the diff to the same
base file.
Getting ssh access to (most) of these servers is possible, but requires getting my clients to call their hosting providers with that rather technical request, which is something I would rather not have to ask them to do. Bonus points if you can suggest a solution that I could implement via ftp/php.
Edit:
#Jacob's answer prompted some further Googling which lead to this thread: Compare two MySQL databases
This question (in several forms) seems pretty common. Most of the need, and therefore most of the tools, seems to focus around keeping the schema up to date rather than the data. Also, most of the tools seem to be commercial and GUI.
So far the best looking option seems to be pt-table-sync from the Percona Toolkit, although it looks like it might be a pain to setup on OSX.
I am downloading Navicat to test right now. I runs on OSX but is a commercial GUI program.
I think the most efficient way to set this up is to use rsync to backup the files is by using rsync to run each night , doing a incremental backup by syncing yesterday's back to todays backup then running the rsync for that day. In terms of backing-up the different websites i would look at having the sites as slaves which back up to your server (master) using the replicator function this will allow you to keep all the databases upto date and backed up
http://dev.mysql.com/doc/refman/5.0/en/replication.html

Should I stick only to AWS RDS Automated Backup or DB Snapshots?

I am using AWS RDS for MySQL. When it comes to backup, I understand that Amazon provides two types of backup - automated backup and database (DB) snapshot. The difference is explained here. However, I am still confused: should I stick to automated backup only or both automated and manual (db snapshots)?
What do you think guys? What's the setup of your own? I heard from others that automated backup is not reliable due to some unrecoverable database when the DB instance is crashed so the DB snapshots are the way to rescue you. If I am to do daily DB snapshots as similar settings to automated backup, I am gonna pay much bunch of bucks.
Hope anyone could enlighten me or advise me the right set up.
From personal experience, I recommend doing both. I have the automated backup set to 8 days, and then I also have a script that will take a snapshot once per day and delete snapshots older than 7 days. The reason is because from what I understand, there are certain situations where you could not restore from the automated backup. For example, if you accidentally deleted your RDS instance and did not take a final snapshot, you would not be able to access the automated backups that were done. But it is also good to have the automated backups turned on because that will provide you the point-in-time restore.
Hope this helps.
EDIT
To answer your comment, I use a certain naming convention when my script creates the snapshots. Something like:
autosnap-instancename-2012-03-23
When it goes to do the cleanup, it retrieves all the snapshots, looks for that naming convention, parses the date, and deletes any older than a certain date.
I think you could also look at the snapshot creation date, but this is just how I ended up doing it.
Just from personal experience, yesterday I accidentally deleted a table and had to restore from an RDS snapshot. The latest snapshot was only 10 minutes old, which was perfect. However, Amazon RDS took about 3 hours to get the snapshot online, during which time, the affected section of our site was completely offline.
So if you need to make a very quick recovery, do NOT depend on RDS backups.
Keep in mind, you can't download your snapshot so that you could view a database dump. Your only option is to wait for it to load in to a new database instance. So if you're only looking to restore a single table, RDS backups can make it a very painful process.
No blame to Amazon on this- they are awesome. But just something to keep in mind when planning, because it was a learning experience for us.
There are some situations where an automated backup does not recover the specific table you want to recover even though it has a point-in-time recovery feature. I am suggesting you enable the Backtracking feature for this kind of recovery and You can use "AWS-Backup" service to manage backups of Amazon RDS DB instances. Backups managed by AWS Backup are considered manual DB snapshots.
Also, you will be required to keep automated backup enabled for creating read-replica for DB-instance in order to improve read performance. The retention period for automated backup should be between 1 and 35 so you can keep it a minimum of 1 day.