ClearDB - does it take time to read the newest data? - mysql

I just used mySQL workbench to connect to my clearDB account which is connected to an azure web app. The problem is even thought I ran a query that drops/creates tables in the newly made schema that mirrors exactly the tables and data in my previous live server, I go to mysite.azurewebsites.com/wp-admin and the error is in establishing data connection. Site could not be found. Check if your database contains the following pages: wp_blogs, ..........
What could be the problem? Does this process just need a bit of time to propagate all the data?
EDIT: something to note, which might be a factor, when I ran the last query, it also included dropping/adding the table "wp_users" so all previous data was wiped and replaced with the info from a previous live server.

Normally you will see any changes made immediately. But because your database is hosted on a geoseparated cluster in circular replication there are some rare circumstances where this might not be true.
Specifically, if your delete/write went to one master and your read query went to another. Data propagation is normally immediate but if one of the nodes is offline or the system is unusually busy there can be a delay.

Related

AWS RDS showing outdated data for Multiple AZ instance (MySQL)

My RDS instance was showing outdated data temporarily.
I ran a SELECT query on my data. I then ran a query to delete data from a table and another to add new data to the table. I ran a SELECT query and it was showing the old data.
I ran the SELECT query AGAIN and THEN it finally showed me the new data.
Why would this happen? I never had these issues locally or on my normal non AZ instances. Is there a way to avoid this happening?
I am running MySQL 5.6.23
According to the Amazon RDS Multi-AZ FAQs, this might be expected.
Specifically this:
You may observe elevated latencies relative to a standard DB Instance deployment in a single Availability Zone as a result of the synchronous data replication performed on your behalf.
Of course, it depends on the frequency of the delays you're observing and what is the increased latency you're seeing, but an option would be to contact AWS support in case the issue is frequently reproducible.
As embarrassing as this is... it was an issue in our Spring Java code and not AWS.
A method modified a database entity object. The method itself wasn't transactional but was called from a transactional context which would persist any changes on entities to the database.
It looked like it was rolling back changes, but what it was doing was just overwriting data. My guess is it overwrote the data a while ago so until someone tried to modify it we just assumed it was the correct data.

RDS Read Replica Considerations

We hired an intern and want to let him play around with our data to generate useful reports. Currently we just took a database snapshot and created a new RDS instance that we gave him access to. But that is out of date almost immediately due to changes on the production database.
What we'd like is a live (or close-to-live) mirror of our actual database that we can give him access to without worrying about him modifying any real data or accidentally bringing down our production database (eg by running a silly query like SELECT (*) FROM ourbigtable or a really slow join).
Would a read replica be suitable for this purpose? It looks like it would at least be staying up to date but I'm not clear what would happen if a read replica went down or if data was accidentally changed on it or any other potential liabilities.
The only thing I could find related to this was this SO question and this has me a bit worried (emphasis mine):
If you're trying to pre-calculate a lot of data and otherwise modify
what's on the read replica you need to be really careful you're not
changing data -- if the read is no longer consistent then you're in
trouble :)
TL;DR Don't do it unless you really know what you're doing and you
understand all the ramifications.
And bluntly, MySQL replication can be quirky in my experience, so even
knowing what is supposed to happen and what does happen if there's as
the master tries to write updated data to slave you've also
updated.... who knows.
Is there any risk to the production database if we let an intern have at it on an unreferenced read replica?
We've been running read-replicas of our production databases for a couple years now without any significant issues. All of our sales, marketing, etc. people who need the ability to run queries are provided access to the replica. It's worked quite well and has been stable for the most part. The production databases are locked down so that only our applications can connect to it, and the read-replicas are accessible only via SSL from our office. Setting up the security is pretty important since you would be creating all the user accounts on the master database and they'd then get replicated to the read-replica.
I think we once saw a read-replica get into a bad state due to a hardware-related issue. The great thing about read-replicas though is that you can simply terminate one and create a new one any time you want/need to. As long as the new replica has the exact same instance name as the old one its DNS, etc. will remain unchanged, so aside from being briefly unavailable everything should be pretty much transparent to the end users. Once or twice we've also simply rebooted a stuck read-replica and it was able to eventually catch up on its own as well.
There's no way that data on the read-replica can be updated by any method other than processing commands sent from the master database. RDS simply won't allow you to run something like an insert, update, etc. on a read-replica no matter what permissions the user has. So you don't need to worry about data changing on the read-replica causing things to get out of sync with the master.
Occasionally the replica can get a bit behind the production database if somebody submits a long running query, but it typically catches back up fairly quickly once the query completes. In all our production environments we have a few monitors set up to keep an eye on replication and to also check for long running queries. We make use of the pmp-check-mysql-replication-delay command in the Percona Toolkit for MySQL to keep an eye on replication. It's run every few minutes via Nagios. We also have a custom script that's run via cron that checks for long running queries. It basically parses the output of the "SHOW FULL PROCESSLIST" command and sends out an e-mail if a query has been running for a long period of time along with the username of the person running it and the command to kill the query if we decide we need to.
With those checks in place we've had very little problem with the read-replicas.
The MySQL replication works in a way that what happens on the slave has no effect on the master.
A replication slave asks for a history of events that happened on the master and applies them locally. The master never writes anything on the slaves: the slaves read from the master and do the writing themselves. If the slave fails to apply the events it read from the master, it will stop with an error.
The problematic part of this style of data replication is that if you modify the slave and later modify the master, you might have a different value on the slave than on the master. This can be avoided by turning on the global read_onlyvariable.

mySQL Table Sync

I am looking for suggestions on the best way to sync mySQL tables (myISAM) from 2 different databases.
Currently we use Navicat to sync tables from our production server to our test server but we have been running into many problems. Just about everyday we have been running into a sync failure on a table.
We get the error below a lot of the times, not to mention Navicat spams our e-mails with successful and unsuccessful syncs(is there anyway to just receive only the unsuccessful syncs?). I also know altering the table in anyway will cause a failure to sync. So altering the table in anyway must be done to the master first (This makes sense but is there any way around this?).
-[Sync] Finished - Unsuccessful Synchronization: List index out of bounds (0)
Is there any reason to not use the Navicat sync? My boss suggested using mySQL replication instead but my first concern is finding why we have so many problems because it seems like we just are misusing the sync thus giving us all these problems.
Thanks.
sync tables from our production server to our test server
It sounds like you're trying to replicate your production environment in your test environment, right?
A common pattern to follow in this situation is using a tool like mysqldump to create a backup of the entire database, then later import the backup into the test environment. By doing a complete backup and restore, you're not only ensuring that you have at least one backup method that's known to work, you're also ensuring that the test database can never contain modifications that a sync tool might miss. (Sync tools generally require a primary or unique key on each table to operate effectively.)
The backup and reimport process should be an easy thing for you to automate. At my workplace, we perform a mysqldump-based database dump every night, and perform optional imports into each developer's personal copy of the dev environment early the following morning.

Development and Production Database?

I'm working with PHP & mySQL. I've finally got my head around source control and am quite happy with the whole development (testing) v production v repository thing for the PHP part.
My new quandary is what to do with the database. Do I create one for the test environment and one for the production environment? I currently have just the one which both environments use, leaving my test data sitting there. I kind of feel that I should have two, but I'm nervous in terms of making sure that my production database looks and feels exactly the same as my test one.
Any thoughts on which way to go? And, if you think the latter, what the best way is to keep the two databases the same (apart from the data, of course...)?
Each environment should have a separate database. Script all of the database objects (tables, views, procedures, etc) and store the scripts in source control. The scripts are applied first to the development database, then promoted to test (QA, UAT, etc), then production. By applying the same scripts to each database, they should all be the same in the end.
If you have data that needs to be loaded (code tables, lookup values, etc), script that data load as part of the database creation process.
By scripting everything and keeping it in source control, a database structure can be recreated at any time for any given build level.
You should definitely have two. As far as keeping them in sync, you should always create DDL for creating your database objects. Treat these scripts as you do you PHP code - keep them in version control. Anytime you have to modify the test database, make a script to do so, and check it in. Then you can propogate those changes to the production system once you are ready.
As a minimum one database for each development workstation and one for production. Besides that you should have one for the test environment unless you are only one developer and have a similar setup as the production environment.
See also
How do you version your database schema?
It's a common question and has been asked and answered many times.
Thomas Owens: Replication is not usable for versioning schemas - it is for duplicating data. You never want to replicate from dev to production or vice versa.
Once I've deployed my database, any changes made to my development database(s), are done in an SQL script (not a tool), and the script is saved, and numbered.
deploy.001.description.sql
deploy.002.description.sql
deploy.003.description.sql
... etc..
Then I run each of those scripts in order when I deploy.
Then I archive them into a directory called something like
\deploy.YYMMDD\
And start all over.
If I make a mistake, I never go back to the previous deploy script, I'll create a new script and put my fix in there.
Good luck
One thing I've been working with is creating a VM with the database installed. you can save the VM as a playfile, including its data. What you can do then is take a snapshot of the playfile, and start up as many different VM's as you want. They can all be identical, or you can modify one or another. Here's the good thing: assuming you have a dev version of the database that you want to go out, you can simply start that VM on your production server instead of the current server.
It's another problem altogether if you have production data that is not on your dev machines. In that case though, one thing you can do is set up a tracking VM. Run replication from your main DB to the tracking VM. When you get to a point where you need to run some alters on the production database, first stop the slave and save a snapshot.
Start an instance of that snapshot, take it out of slave mode entirely, apply your changes, and point your QA box at that database. If it works as intended, you can run the patches against your main production database. If not, bring up the snapshot, and get it replicating off the master again until you are ready to repeat the update test.
I was having the same dilemmas. I got stuck thinking that there was a clear dichotomy between production db versus development db. I.e they were two sides of a coin and never the twain shall meet.
A lot of problems disappeared when I stopped making my application 'think' in terms of "Either production db OR development db". Instead my application uses a local db.
When its running on my virtual (dev) machine, that local db happens to be a dev db. My application doesn't really 'know' that though.
So, for the main part, the problem disappears.
But sometimes I want to run tests using live data, or move data from the code into the live production db and see the results quickly.
This is when I added the concept of a live-read-only db connection. The application treats this differently. Its a bit like how your application might treat a web service like Google Apps. Its 'some external resource that your app uses'.
By default my app uses the local db and in some very special conditions (in the test suite) it also uses the live-readonly db. (Because its a read-only connection I don't fear making a mess of the live data during tests).
So rather than asking the question "dev db OR production db?", my app asks "local db OR live-read-only db".
Obviously my situation could be different to yours, but I found this 'breakthrough in understanding' to be most helpful for me.

Full complete MySQL database replication? Ideas? What do people do?

Currently I have two Linux servers running MySQL, one sitting on a rack right next to me under a 10 Mbit/s upload pipe (main server) and another some couple of miles away on a 3 Mbit/s upload pipe (mirror).
I want to be able to replicate data on both servers continuously, but have run into several roadblocks. One of them being, under MySQL master/slave configurations, every now and then, some statements drop (!), meaning; some people logging on to the mirror URL don't see data that I know is on the main server and vice versa. Let's say this happens on a meaningful block of data once every month, so I can live with it and assume it's a "lost packet" issue (i.e., god knows, but we'll compensate).
The other most important (and annoying) recurring issue is that, when for some reason we do a major upload or update (or reboot) on one end and have to sever the link, then LOAD DATA FROM MASTER doesn't work and I have to manually dump on one end and upload on the other, quite a task nowadays moving some .5 TB worth of data.
Is there software for this? I know MySQL (the "corporation") offers this as a VERY expensive service (full database replication). What do people out there do? The way it's structured, we run an automatic failover where if one server is not up, then the main URL just resolves to the other server.
We at Percona offer free tools to detect discrepancies between master and server, and to get them back in sync by re-applying minimal changes.
pt-table-checksum
pt-table-sync
GoldenGate is a very good solution, but probably as expensive as the MySQL replicator.
It basically tails the journal, and applies changes based on what's committed. They support bi-directional replication (a hard task), and replication between heterogenous systems.
Since they work by processing the journal file, they can do large-scale distributed replication without affecting performance on the source machine(s).
I have never seen dropped statements but there is a bug where network problems could cause relay log corruption. Make sure you dont run mysql without this fix.
Documented in the 5.0.56, 5.1.24, and 6.0.5 changelogs as follows:
Network timeouts between the master and the slave could result
in corruption of the relay log.
http://bugs.mysql.com/bug.php?id=26489