I'm setting up a high-availability environment for a customer. There are a pair of load-balanced hosts serving http requests to our Rails application, which is deployed in parallel on both hosts.
Additionally, there are two MySQL hosts available. I want to run MySQL Cluster (NDB) on both hosts (i.e., multi-master) to have a fully redundant configuration. I'm specifically trying to avoid a master-slave configuration based on database replication; I feel like that makes the writable node a single point of failure.
I'm looking for some guidance on how best to interface our Rails app to a multi-master MySQL cluster like this. Almost all of the database adapters I can find are for master-slave setups. Failover_adapter sounds very promising, but it's woefully outdated. I haven't managed to turn up anything similar developed in the last five years.
Is anyone aware of any gems to use or approaches to take to serve as an adapter between a Rails application and a multi-master MySQL cluster like I've described?
I've just recently set this up. There shouldn't be much work to be done.
In the mysqladapter gem it specifies the engine as InnoDB, which is obviously not suitable for clustering. You need to add this to an initialization file:
ActiveRecord::ConnectionAdapters::Mysql2Adapter
class ActiveRecord::ConnectionAdapters::Mysql2Adapter
def create_table(table_name, options = {})
super(table_name, options.reverse_merge(:options => "ENGINE=NDB"))
end
end
This is the original in the adapter.
def create_table(table_name, options = {})
super(table_name, options.reverse_merge(:options => "ENGINE=InnoDB"))
end
Another key aspect is you don't want migrations taking place at the same time so in the deploy.rb file:
task :migrate, :max_hosts => 1 do
# sleep 0.5
run "cd #{release_path} && bundle exec rake db:migrate RAILS_ENV=#{rails_env}"
end
Max hosts prevents cap from running the migrations in parallel. This is important as you don't want the cluster running create table type of things at the same time. It may even be worth putting a delay in I have commented above just for a little extra safety.
One more key aspect. Don't forget to set:
DataMemory =
IndexMemory =
The default values are supremely low. Typically index size is DataMemory/5-10
One more pitfall I've seen so far is in your mysqld nodes make sure you set:
ndb_autoincrement_prefetch_sz
to at least a 100. Otherwise bulk inserts will take forever. The default is 1.
EDIT:
ndb_autoincrement_prefetch_sz
Leave this variable completely alone. Don't set it. What it can cause is auto increment indexes to become out of sync on the cluster. Which is a nightmare to debug.
Additionally make sure you're NDB nodes don't run on the same server as the NDB MGM nodes.
Happy coding.
Ad
I ultimately was not able to find an adapter solution that did what I wanted.
I settled on using the mysql2 adapter and pointing it at a reverse-proxy (I used haproxy) in front of my database cluster that could handle load-balancing and failover between the master nodes.
Related
FINAL UPDATE: We fixed this problem by finding a way to accomplish our goals without forking. But forking was the cause of the problem.
---Original Post---
I'm running a ruby on rails stack, our mysql server is separate, but housed at the same site as our app servers. (we've tried swapping it out for a different mysql server with double the specs, but no improvement was seen.
during business hours we get a handful of these from no particular query.
ActiveRecord::StatementInvalid: Mysql2::Error: Lost connection to MySQL server during query
most of the queries that fail are really simple, and there seems to be no pattern between one query and another. This all started when I upgraded from Rails 4.1 to 4.2.
I'm at a loss as to what to try. Our database server is less than 5% CPU throughout the day. I do get bug reports from users who have random interactions fail due to this, so it's not queries that have been running for hours or anything like that, of course when they retry the exact same thing it works.
Our servers are configured by cloud66.
So in short: our mysql server is going away for some reason, but it's not because of lack of resources, it's also a brand new server as we migrated from another server when this problem started.
this also happens to me on localhost while developing features sometimes, so I don't believe it's a load issue.
We're running the following:
ruby 2.2.5
rails 4.2.6
mysql2 0.4.8
UPDATE: per the first answer below I increased our max_connections variable to 500 last night, and confirmed the increase via
show global variables like 'max_connections';
I'm still getting dropped connection, the first one today was dropped only a few minutes ago....
ActiveRecord::StatementInvalid: Mysql2::Error: Lost connection to MySQL server during query
I ran select * from information_schema.processlist; and I got 36 rows back. Does this mean my app servers were running 36 connections at that moment? or can a process be multiple connections?
UPDATE: I just set net_read_timeout = 60 (it was 30 before) I'll see if that helps
UPDATE: It didn't help, I'm still looking for a solution...
Heres my Database.yml with credentials removed.
production:
adapter: mysql2
encoding: utf8
host: localhost
database:
username:
password:
port: 3306
reconnect: true
The connection to MySQL can be disrupted by a number of means, but I would recommend revisiting Mario Carrion's answer since it's a very wise answer.
It seems likely that connection is disrupted because it's being shared with the other processes, causing communication protocol errors...
...this could easily happen if the connection pool is process bound, which I believe it is, in ActiveRecord, meaning that the same connection could be "checked-out" a number of times simultaneously in different processes.
The solution is that database connections must be established only AFTER the fork statement in the application server.
I'm not sure which server you're using, but if you're using a warmup feature - don't.
If you're running any database calls before the first network request - don't.
Either of these actions could potentially initialize the connection pool before forking occurs, causing the MySQL connection pool to be shared between processes while the locking system isn't.
I'm not saying this is the only possible reason for the issue, as stated by #sloth-jr, there are other options... but most of them seem less likely according to your description.
Sidenote:
I ran select * from information_schema.processlist; and I got 36 rows back. Does this mean my app servers were running 36 connections at that moment? or can a process be multiple connections?
Each process could hold a number of connections. In your case, you might have up to 500X36 connections. (see edit)
In general, the number of connections in the pool can often be the same as the number of threads in each process (it shouldn't be less than the number of thread, or contention will slow you down). Sometimes it's good to add a few more depending on your application.
EDIT:
I apologize for ignoring the fact that the process count was referencing the MySQL data and not the application data.
The process count you showed is the MySQL server data, which seems to use a thread per connection IO scheme. The "Process" data actually counts active connections and not actual processes or threads (although it should translate to the number of threads as well).
This means that out of possible 500 connections per application processes (i.e., if you're using 8 processes for your application, that would be 8X500=4,000 allowed connections) your application only opened 36 connections so far.
This indicates a timeout error. It's usually a general resource or connection error.
I would check your MySQL config for max connections on MySQL console:
show global variables like 'max_connections';
And ensure the number of pooled connections used by Rails database.yml is less than that:
pool: 10
Note that database.yml reflects number of connections that will be pooled by a single Rails process. If you have multiple processes or other servers like Sidekiq, you'll need to add them together.
Increase max_connections if necessary in your MySQL server config (my.cnf), assuming your kit can handle it.
[mysqld]
max_connections = 100
Note other things might be blocking too, e.g. open files, but looking at connections is a good starting point.
You can also monitor active queries:
select * from information_schema.processlist;
as well as monitoring the MySQL slow log.
One issue may be a long-running update command. If you have a slow-running command that affects a lot of records (e.g. a whole table), it might be blocking even the simplest queries. This means you could see random queries timeout, but if you check MySQL status, the real cause is another long-running query.
Things you did not mention but you should take a look:
Are you using unicorn? If so, are your reconnecting and disconnecting in your after_fork and before_fork?
Is reconnect: true set in your database.yml configuration?
Well,at first glance this sounds like your webserver is keeping the mysql sessions open and sometimes a user runs into a timeout. Try disabling the keep mysql sessions alive.
It will be a hog but you only use 5% ...
other tipps:
Enable the mysql "Slow Query Log" and take a look.
write a short script which pulls and logs the mysql processlist every minute and cross check the log with timeouts
look at the pool size in your db connection or set one!
http://guides.rubyonrails.org/configuring.html#database-pooling
should be equal to the max-connections mysql likes to have!
Good luck!
Find out if your database is limited in terms of multiple connections. Because normally a SQL database is supposed to have more than one active connection.
(Contact your network provider)
Would you mind posting some of your queries? The MySQL documentation has this to say about it:
https://dev.mysql.com/doc/refman/5.7/en/error-lost-connection.html
TL;DR:
Network problems; are any of your boxes renewing leases
periodically, or experiencing other network connection errors
(netstat / ss), firewall timeouts, etc. Not sure how managed your
hosts are by cloud66....
Query timed out. This can happen if you've got commands backed up
behind blocking statements (eg, alters/locking backups on MyISAM
tables). How simple are your queries? No cartesian products in-play?
EXPLAIN query could help.
Exceeding MAX_PACKET_SIZE. Are you storing pictures, video content, etc.?
There are lots of possibilities here, and without more information, will be difficult to pinpoint this.
Would look first at mysql_error.log, then work your way from the DB server back to your application.
UPDATE: this didn't work.
Heres the solution, special thanks to #Myst for pointing out that forking can cause issues, I had no idea to look at this particular code. As the errors seemed random because we forked in this fashion in several places.
It turns out that when I was forking processes, rails was using the same database connection for all forked processes, This created a situation where when one of the processes (the parent process?) terminated the database connection, the remaining process would have its connection interrupted.
The solution was to change this code:
def recalculate_completion
Process.fork do
if self.course
self.course.user_groups.includes(user:[:events]).each do |ug|
ug.recalculate_completion
end
end
end
end
into this code:
def recalculate_completion
ActiveRecord::Base.remove_connection
Process.fork do
ActiveRecord::Base.establish_connection
if self.course
self.course.user_groups.includes(user:[:events]).each do |ug|
ug.recalculate_completion
end
end
ActiveRecord::Base.remove_connection
end
ActiveRecord::Base.establish_connection
end
Making this change stopped the errors from our servers and everything appears to be working well now. If anyone has any more info as to why this worked I would be happy to hear it, as I would like to have a deeper understanding of this.
Edit: it turns out this didn't work either.... we still got dropped connections but not as often.
If you have query cache enabled, please reset it and it should work.
RESET QUERY CACHE;
I run a pretty customized cluster for processing large amounts of scientific data based on a basic LAMP design. In general, I run a separate MySQL server with around 128GB of ram and about 1TB of storage. Separately, I run a head node that serves as an nfs mount point for the data input of my process, and a webserver to display results. Finally, I trypically have a few compute nodes that get their jobs from a mysql table, get the data from NFS, do some heavy lifting, then put results into mysql.
I have come across a dataset I would like to process which is pretty large (1TB of input data), and I don't really have the hardware on hand to handle it. As a result, I began investigating google compute engine etc, and the prospect of scaling instances to process these data rapidly with the results stored in a mysql instance. Upon completion the mysql tables could be dumped from the cloud and brought up locally for analysis. I would have no problem deploying a MySQL server, along with the rest of the LAMP pieces and the compute nodes, but I can't quite figure out how I would do this in the cloud.
A major sticking point seems to be the lack of read/write NFS which would allow me to get the data onto several instances, crunch it, then push the results to MySQL. This is a necessary step for me as I could queue hundreds of jobs from the webserver, then have the instances (as many as 50-100) pick the jobs up by connecting to a centralized mysql instance to find out what jobs an instance needs to do and where the data is. Process the data (there is a file conversion that happens which make the write part necessary), crunch the data, then load results to mysql. I hope I'm explaining my situation clearly. This seems like a great example of a CPU intensive process that would scale nicely in the cloud, I just can't seem to put all the pieces together... Any input is appreciated!
It sounds quite possible; I've been doing similar things in GCE for a while now.
NFS mount - you just need configure it as you would normally. Set up the NFS server on the head node, and then configure the clients on the slave nodes to mount it. Here and here are some basic configuration instructions for Centos 6 I used to get NFS up and running.
Setting up a LAMP stack is very straightforward. These machines run pretty much vanilla Linux distros, so you can just use yum or apt-get to install components.
For the cluster, you will probably end up having an image for the head node you use once, and then another image for the slave nodes that you replicate for each one.
For the scheduler, I've used Condor and Sge successfully, but I'm sure the other ones would work just as well.
Hope this helps.
I launched an Amazon EC2 with Amazon Linux and Amazon-EBS as root volume. I also started tomcat7 and mysql 5.5 on this EBS volume.
Later I decided to change from Amazon Linux to Ubuntu. To do that I need to launch another Amazon EC2 instance with a new EBS root volume. Now I want to copy tomcat7 and mysql from older EBS volume to new one. I have tables and data in mysql which I don't want to loose and an application running on tomcat. How to go about it?
A couple of thoughts and suggestions.
First, if you are going to be having any kind of significant load on your database, running it on EBS-backed volume is probably not a great idea as EBS-backed storage is incredibly slow relative to the machine's local/ephemeral storage (/mnt). Now obviously you don't want DB data on ephemeral storage, so there is really nothing you can do about it if you want to run MySQL on EC2. So my suggestion would be to utilize an RDS instance for your DB if your infrastructure requirements allow for it.
Second, if this is a production application, you are undoubtedly going to have some down time as you make this transition. The question is whether you need to absolutely minimize the amount of downtime. If so, then you need to have an idea as to the size of your database. Is it going to take a long time to dump/load? If not, you could probably just get your new instance up and running, and tested on an older copy of your database and then just dump and load the current database at the time of cutover.
If it is a large database then perhaps you can turn on MySQL binary logging. Then make a dump of the database at a known binary log position. Then install this dump on your new instance. Then when ready to cutover, you can replay the binary logs on the new instance to bring it current. Similarly, you could just set up the DB on the new instance as a replica until the cutover, at which point you make it the master.
You may even consider just using rsync to sync the physical database files if you don't want to mess with binary logging, though this can be a problematic approach if you are not that familiar with dealing with the actual physical database files.
As far as your application goes, that should be much simpler to migrate assuming it is just a collection of files. I would not copy the Tomcat7 installation itself, but rather just install Tomcat on Ubuntu and then adjust the configuration to match current.
As far as the cutover itself goes, this should be pretty straightforward and would vary in approach depending on whether you are using an elastic IP for your server or whether it is behind a load balancer,
We have a Rails 3.1.3 app, connecting to MySQL via the mysql2 gem. Standard config. We also have a handful of Resque workers performing background jobs. The DB hostname we point to (in database.yml) is actually a Virtual IP (VIP) that points to either node1 or node2.
Behind the scenes, the two MySQL servers (nodes) are setup in a High Availability configuration. The data folders are replicated via DRBD, with mysqld only running on the "active" node. When the cluster detects that node1 is not available, it starts mysqld on node2 and points the VIP to it.
If you want more details on the specific setup, it's very similar to this MySQL HA cookbook.
Here's the issue: When a failover happens, it takes approx 30-60 seconds to complete, during which there is no MySQL server available. Any Resque jobs that are currently running fail badly.
Here's the question(s): How can we tell ActiveRecord to reconnect after a delay? Maybe attempt several reconnects with a backoff timer? Or is there a better way of dealing with this?
Your HA setup is going to cause you infinite amounts of pain in the future. Use database-layer replication instead of block-device-layer replication; MySQL Proxy was designed to do this.
I have a couple of jobs that use a shared resource (database), which sometimes can cause builds to fail in the (rare) event that the jobs happen to get triggered simultaneously.
Given jobs A through E, for example, is there any way to specify that A and C should never be run concurrently?
Other than the aforementioned resource, the builds are independent of each other (not e.g. in a upstream/downstream relation).
A "brute-force" way would be limiting number of executors to one, but that obviously is less than ideal if most jobs could well be executed concurrently and there's no lack of computing resources on the build server.
There are currently 2 ways of doing this:
Use the Throttle Concurrent Builds plugin.
Set up those jobs to run on a slave having only 1 executor.
The Locks and Latches plugin here should help.
This question is probably a dupe of How do I ensure that only one of a certain category of job runs at once in Hudson?
That's an old question, but the topic can still be relevant, especially when running application tests on Jenkins.
The Lockable Resources Plugin allows you to define lockable resources that can be used by builds. If your build requires an resource, it takes the lock. If a second build requires the same resource (which then is already locked), it will be queued for the resource to be free.
Although the docs use computers or printers as examples for lockable resources, the database example from above should work as well.
In opposite to the Locks and Latches Plugin mentioned in answers from 2012, this package seems to be currently maintained (currently ~2016).
Have a look at the External Resource Dispatcher Jenkins plugin, which was first published in November 2012. This (relatively) new plugin seems to exactly cover this use case.
N.B. you don't need physical or virtual hardware for a slave/node, you can set up "slaves" that run on the master server.
Manage Jenkins > Manage Nodes > New node
and make a "dumb slaves" each with its own root directory.
Create a few slaves, execute them when the server boots, and then you have essentially created pools of executors.
You might have, say...
db - only one executor in your case.
compile - limit according to hardware or # of CPUs.
scripts - have many executors for all those little jobs that Jenkins is good at doing.
Old question, and whether this will work for your application I can't be sure as you didn't mention details of your application. However, I wanted to add the way that I handled this in our Rails application test suite.
Our application's database configuration (database.yml) isn't in the source repository. Instead, it lives in /var/lib/configs/uniquing_database.yml on the VM which runs our Jenkins instance.
One of the steps of our build process involves copying this config file to the project workspace:
cp /var/lib/jenkins/configs/myapp_unique_database.yml config/database.yml
and that config takes workspace and build number information exposed to the environment by Jenkins into account in order to create a uniquely named database for that job and it's specific execution:
test:
adapter: postgresql
encoding: unicode
host: 127.0.0.1
port: 5432
database: myapp_test<%= ENV['JOB_NAME'].split('/').last %><%= ENV['BUILD_NUMBER'] %>
The rest of our build proceeds without any knowledge or care that it's running in a distinct database. Finally, at the end of our build, we make sure to drop that database so we don't have a bunch of test databases polluting the file system:
RAILS_ENV=test bundle exec rake db:drop