MySQL is really unstable - Considering migration - mysql

How can I make a more robust database server? I know this is a pretty general question but I'm getting all sort of errors and downtimes using MySQL with a simple service that does some selects and 2 inserts. Nothing complex, no JOINs...
At first I got my host blocked from the DB server because of many connections, so I increased the max_used_connections parameter. And my client is actually closing the connections and handling the pool correctly.
For some days it worked OK. Today I woke up and the MySQL server was down. Tried to restart it with service mysql restart and it failed. Tried to connect and error 1040: too many connections.
So I couldn't even restart the server.
I did killall mysqld and tried to start the server with service mysql start. Service wouldn't start. Run su - mysql -s /bin/sh -c "/usr/bin/mysqld_safe > /dev/null 2>&1 &" and the server finally started.
I also increased max_connections limit because the server would just die after some seconds when starting it today.
Then one table was corrupt ERROR 144 (HY000): Table '<table_name>' is marked as crashed and last (automatic?) repair failed
Tried to REPAIR <table_name> but after some time the server died again ERROR 2006 (HY000): MySQL server has gone away. Tried this same process (killall up to here) about 3 times, everytime the server died.
Now I'm running REPAIR <table_name> QUICK and the server seems not to be dying, but it's taking a really long time to fix a 200k row table.
By the way, I only have two tables, one with 200k rows and another with 11M rows. I peak about 2000 concurrent users, which result in about 100 queries per second max.
Is it normal that MySQL crashes this easy? Should I migrate to other more scalable databases like Cassandra? I actually prefer a relational database for my case (table results has a request_id, FK of a requests table that has a key_id FK of a keys table), but wouldn't mind losing the relational benefits for an actually good uptime.

I'm not sure that your question (Is it normal that MySQL crashes this easy? Should I migrate to other more scalable databases like Cassandra? ) can be answered objectively, so here is my subjective answer:
No, it's not normal, and you probably shouldn't migrate, since you want a relational database and MySQL is well supported and really well known.
More info about your problem (also subjective):
MySQL is really really stable. I have MySQL services that have been running for years with regular use and have not had issues. I'm sure many others have had similar experiences. Many large companies use MySQL with great success.
What concerns me is that killall and service restart didn't work. Especially since you are running on an Ubuntu LTS version that is over 2 years old, and hence pretty stable itself. While I cannot say this with absolute certainty, it looks like you may have hardware issues, or at the very least a corrupt hard drive. You may also have software and/or kernel options or settings that are causing problems.
My suggestion to you would be to do a full hardware check (at a minimum check RAM and HDD), and do a full reinstall.

Related

Ran out of disk space: will my live application mysql database be corrupted?

I have a large Drupal 7 site with many concurrent logged-in users running on a server where disk space recently ran out (MySQL InnoDB behind Memcached on Ubuntu 16.04). How/why this happened is a discussion for another day.
I've cleared up the disk space issue, and the site seems to be running fine as far as general interaction indicates, but Drupal log is full of errors like this:
Uncaught exception 'PDOException' with message 'SQLSTATE[HY000]: General error: 3 Error writing file '/tmp/MYGWmIvU' (Errcode: 28 - No space left on device)' in /var/www/pixelscrapper.com/public_html/includes/database/database.inc:2229
My question now is: will the mysql database that is running Drupal likely be corrupted/flaky at this point? i.e. On a scale of 0-10, how vital is it that I restore the database to a point before the disk space ran out?
(Anything other than 0 means I will likely restore the database--but there are other things that went wrong as well here, which means that we would lose quite a few days of data if I need to restore, which is a huge drag. C'est la vie, etc.).
My assumption is that the data in MySQL may be more or less fine, but that I cannot rely on the integrity of the actual Drupal data (users, nodes, etc.) which are made up of collections of many database rows...
Out of space crashes can cause serious damages to databases. The sole fact that your database is able to startup and to run apparently as usual is already a good indication that it was not totally messed up by the incident.
Next thing you can do is to perform an in-depth scan using the sqlcheck program :
The mysqlcheck client performs table maintenance: It checks, repairs, optimizes, or analyzes tables.
Analyze all tables in all databases :
$ mysqlcheck -A
Analyze and repair all tables in all databases :
$ mysqlcheck -A -r
NB : as explained in the documentation, you would better shutdown your application and make a backup before you run this.

What is the risk of `mysql_upgrade` without doing `mysqldump`?

After upgradig mysql to 5.7 form 5.5 few months ago, I forgot to do mysql_upgrade.
And facing some problems.. mysql, sys, performance_schema databases are missing and root privileges are broken. A lot of Access denied for user 'root'... messages pop up, when I try to do some mysql user privileges things.
This stack answer will have to solve my problem. But I need to know it won't affect any of the schemas, data... ect.
Because my database is pretty huge. It amounts to 10 GB and consists of about 50 tables. I'm afraid some bad things could happend. I know the answer will be the mysqldump.
But the full backup will cost a long time, maybe an hour. And the business won't accept that downtime.
So what is the risk of mysql_upgrade without doing mysqldump?
The risk of doing anything administrative to your database without backups is unacceptably high... not because of any limitations in MySQL per se, but because we're talking about something critical to your business. You should be backing it no less often than the time interval of data you are willing to lose.
If you are using InnoDB, then use the --single-transaction option of mysqldump and there should be no locking, because MVCC handles the consistency. If you are not using InnoDB, that is a problem itself, but using --skip-lock-tables should mimimize locking.
Note that it should be quite safe to kill a mysqldump in progress if you find it is causing issues -- find the thread-id of the dump using SHOW PROCESSLIST; and then KILL QUERY #; where # is the ID of the dump connection from the process list.
The potential problem with the answer you cited is that 5.1 > 5.5 is a supported upgrade path, because those two versions are sequential. 5.5 > 5.7 isn't. You should have upgraded to 5.6 and then 5.7, running the appropriate versions of mysql_upgrade both before and after each step (appropriate meaning the version of the utility matching the version of the server running at the moment).
You may be in a more delicate situation than you imagine... or you may not.
Given similar circumstances, I would not want to do anything less than stop the server completely, clone it by copying all the files to a new machine, and test the remediation steps against the clone.
If this system is business-critical, it should have a live, running replica server, that could be promoted to master and permanently replace this machine in the event of a failure. In a circumstance like this one, you would apply your fixes to the replica and promote it.
Access denied for user 'root'... may or may not be related to the schema incompatibilites.

Lost connection to MySQL server during query on random simple queries

FINAL UPDATE: We fixed this problem by finding a way to accomplish our goals without forking. But forking was the cause of the problem.
---Original Post---
I'm running a ruby on rails stack, our mysql server is separate, but housed at the same site as our app servers. (we've tried swapping it out for a different mysql server with double the specs, but no improvement was seen.
during business hours we get a handful of these from no particular query.
ActiveRecord::StatementInvalid: Mysql2::Error: Lost connection to MySQL server during query
most of the queries that fail are really simple, and there seems to be no pattern between one query and another. This all started when I upgraded from Rails 4.1 to 4.2.
I'm at a loss as to what to try. Our database server is less than 5% CPU throughout the day. I do get bug reports from users who have random interactions fail due to this, so it's not queries that have been running for hours or anything like that, of course when they retry the exact same thing it works.
Our servers are configured by cloud66.
So in short: our mysql server is going away for some reason, but it's not because of lack of resources, it's also a brand new server as we migrated from another server when this problem started.
this also happens to me on localhost while developing features sometimes, so I don't believe it's a load issue.
We're running the following:
ruby 2.2.5
rails 4.2.6
mysql2 0.4.8
UPDATE: per the first answer below I increased our max_connections variable to 500 last night, and confirmed the increase via
show global variables like 'max_connections';
I'm still getting dropped connection, the first one today was dropped only a few minutes ago....
ActiveRecord::StatementInvalid: Mysql2::Error: Lost connection to MySQL server during query
I ran select * from information_schema.processlist; and I got 36 rows back. Does this mean my app servers were running 36 connections at that moment? or can a process be multiple connections?
UPDATE: I just set net_read_timeout = 60 (it was 30 before) I'll see if that helps
UPDATE: It didn't help, I'm still looking for a solution...
Heres my Database.yml with credentials removed.
production:
adapter: mysql2
encoding: utf8
host: localhost
database:
username:
password:
port: 3306
reconnect: true
The connection to MySQL can be disrupted by a number of means, but I would recommend revisiting Mario Carrion's answer since it's a very wise answer.
It seems likely that connection is disrupted because it's being shared with the other processes, causing communication protocol errors...
...this could easily happen if the connection pool is process bound, which I believe it is, in ActiveRecord, meaning that the same connection could be "checked-out" a number of times simultaneously in different processes.
The solution is that database connections must be established only AFTER the fork statement in the application server.
I'm not sure which server you're using, but if you're using a warmup feature - don't.
If you're running any database calls before the first network request - don't.
Either of these actions could potentially initialize the connection pool before forking occurs, causing the MySQL connection pool to be shared between processes while the locking system isn't.
I'm not saying this is the only possible reason for the issue, as stated by #sloth-jr, there are other options... but most of them seem less likely according to your description.
Sidenote:
I ran select * from information_schema.processlist; and I got 36 rows back. Does this mean my app servers were running 36 connections at that moment? or can a process be multiple connections?
Each process could hold a number of connections. In your case, you might have up to 500X36 connections. (see edit)
In general, the number of connections in the pool can often be the same as the number of threads in each process (it shouldn't be less than the number of thread, or contention will slow you down). Sometimes it's good to add a few more depending on your application.
EDIT:
I apologize for ignoring the fact that the process count was referencing the MySQL data and not the application data.
The process count you showed is the MySQL server data, which seems to use a thread per connection IO scheme. The "Process" data actually counts active connections and not actual processes or threads (although it should translate to the number of threads as well).
This means that out of possible 500 connections per application processes (i.e., if you're using 8 processes for your application, that would be 8X500=4,000 allowed connections) your application only opened 36 connections so far.
This indicates a timeout error. It's usually a general resource or connection error.
I would check your MySQL config for max connections on MySQL console:
show global variables like 'max_connections';
And ensure the number of pooled connections used by Rails database.yml is less than that:
pool: 10
Note that database.yml reflects number of connections that will be pooled by a single Rails process. If you have multiple processes or other servers like Sidekiq, you'll need to add them together.
Increase max_connections if necessary in your MySQL server config (my.cnf), assuming your kit can handle it.
[mysqld]
max_connections = 100
Note other things might be blocking too, e.g. open files, but looking at connections is a good starting point.
You can also monitor active queries:
select * from information_schema.processlist;
as well as monitoring the MySQL slow log.
One issue may be a long-running update command. If you have a slow-running command that affects a lot of records (e.g. a whole table), it might be blocking even the simplest queries. This means you could see random queries timeout, but if you check MySQL status, the real cause is another long-running query.
Things you did not mention but you should take a look:
Are you using unicorn? If so, are your reconnecting and disconnecting in your after_fork and before_fork?
Is reconnect: true set in your database.yml configuration?
Well,at first glance this sounds like your webserver is keeping the mysql sessions open and sometimes a user runs into a timeout. Try disabling the keep mysql sessions alive.
It will be a hog but you only use 5% ...
other tipps:
Enable the mysql "Slow Query Log" and take a look.
write a short script which pulls and logs the mysql processlist every minute and cross check the log with timeouts
look at the pool size in your db connection or set one!
http://guides.rubyonrails.org/configuring.html#database-pooling
should be equal to the max-connections mysql likes to have!
Good luck!
Find out if your database is limited in terms of multiple connections. Because normally a SQL database is supposed to have more than one active connection.
(Contact your network provider)
Would you mind posting some of your queries? The MySQL documentation has this to say about it:
https://dev.mysql.com/doc/refman/5.7/en/error-lost-connection.html
TL;DR:
Network problems; are any of your boxes renewing leases
periodically, or experiencing other network connection errors
(netstat / ss), firewall timeouts, etc. Not sure how managed your
hosts are by cloud66....
Query timed out. This can happen if you've got commands backed up
behind blocking statements (eg, alters/locking backups on MyISAM
tables). How simple are your queries? No cartesian products in-play?
EXPLAIN query could help.
Exceeding MAX_PACKET_SIZE. Are you storing pictures, video content, etc.?
There are lots of possibilities here, and without more information, will be difficult to pinpoint this.
Would look first at mysql_error.log, then work your way from the DB server back to your application.
UPDATE: this didn't work.
Heres the solution, special thanks to #Myst for pointing out that forking can cause issues, I had no idea to look at this particular code. As the errors seemed random because we forked in this fashion in several places.
It turns out that when I was forking processes, rails was using the same database connection for all forked processes, This created a situation where when one of the processes (the parent process?) terminated the database connection, the remaining process would have its connection interrupted.
The solution was to change this code:
def recalculate_completion
Process.fork do
if self.course
self.course.user_groups.includes(user:[:events]).each do |ug|
ug.recalculate_completion
end
end
end
end
into this code:
def recalculate_completion
ActiveRecord::Base.remove_connection
Process.fork do
ActiveRecord::Base.establish_connection
if self.course
self.course.user_groups.includes(user:[:events]).each do |ug|
ug.recalculate_completion
end
end
ActiveRecord::Base.remove_connection
end
ActiveRecord::Base.establish_connection
end
Making this change stopped the errors from our servers and everything appears to be working well now. If anyone has any more info as to why this worked I would be happy to hear it, as I would like to have a deeper understanding of this.
Edit: it turns out this didn't work either.... we still got dropped connections but not as often.
If you have query cache enabled, please reset it and it should work.
RESET QUERY CACHE;

Sql databases get corrupt after every backup

I have a problem with my unix server. This started a week ago. One day after a backup (I used to keep 3 backup files) I visited a website on the server but it wouldn't work. I restarted the server and it seemed to be working fine except the mysql service. My attempts to restart it failed. Then I figured that was because the server was full, so I deleted one of the backups, cleaned up some space and the mysql service restarted successfully. Than I figured tables in one of the databases (MYIsam tables) were corrupt. So I repaired them through myisamchk command via ssh and all worked fine. However, the very next day I woke up they were corrupt again (despite mysql was working fine), and this time there was no disk space problem on the server. I repaired them again. The next day the same thing happenned; and this time innodb tables that were part of another database were corrupt as well. I've fixed them too, so now all is working well but I guess the same thing will happen after tonight's backup.
I can't identify the problem and I don't know what logs to look into to understand the problem. Can anyone please help me out? Thanks very much in advance.
No easy answer here. My immediate thought is that the dbase is still busy when the backups commence, possibly corrupting indexes, interferring with caches, etc. Turn on full logging and check for problems when the backup starts happens. Maybe you will find something.
Look for the my.cnf file. On my CentOs it is located in /etc/my.cnf. It will have a config setting for the location of the error log.
My strongest suspect is OOM kill by the kernel or some other issue that results from running the system out of memory. Try this:
Start top on the server and press M to sort by memory so the biggest memory user is at the top.
note the pid of mysqld
manually perform the backup as you observe the value of the RES column in the top output (resident memory size)
once the backup is over see if the pid of mysqld has changed
If the pid has changed (meaning restart took place), and you saw the memory footprint of mysqld take up something comparable to the total amount of system memory, then my suspicion is correct, and we need to lower some settings in my.cnf to make it use less memory, e.g key_buffer_size and innodb_buffer_pool_size.
EDIT - From the log you posted, there are additional issues although it is not clear how they could be contributing to the table corruption. Your server appears to be running with --skip-innodb and your backup script is not able to deal with the absence of InnoDB storage engine printing exception error messages, but nevertheless continuing. It is also attempting to do a repair, which is failing due to the lack of system privileges (error 1 is Operation not permitted). It is possible that encountering those errors triggers some faulty logic in your backup script that leaves the tables corrupted.
At this point I would recommend disabling MySQL backup using the cPanel tool, and using mysqldump or some other solution (e.g. Xtrabackup (https://www.percona.com/doc/percona-xtrabackup/2.3/index.html)) from a cron job instead.
EDIT2 - from the test results. The manual backup does not run the system out of memory and does not crash the server. The jury is still out on the automatic one.
Don't kill mysqld; shut it down gracefully.
Switch from MyISAM to InnoDB; the latter does not suffer from that 'error'.

MySQL Power Outage Affecting MySQL Server - Shuts Down On Its Own

I am new to dealing with mysql settings and admin type issues. About 4-5 hours ago, I had two power outages within 30 minutes of eachother. As a result, my computer shutdown both times, while in the middle of what I can only assume was a around 20-30 commands on mysql at the time. After the first, mysql was unaffected. But after the second, something happened. MySQL Server cannot remain open for more than a few seconds at a time (before the outage, this was not a problem). I am running MySQL Server 5.1.
I can manually start MySQL server using the admin command line (I am running this on Windows): net start mysql. I get a message saying "The MySQL service was started successfully". Then I run a command or (max) two, and then again everything stops working with a 2013 "Lost connection to MYSQL server during query". Then I have to do restart the MySQL Server all over again.
I have some important data in the database which I cannot reach because the connection times out before I can get it out. Is there a way I can fix this connection problem easily? I know my data is in there, because I have gotten a fair amount of it out.
Any help would be appreciated. Please let me know what other information you might need, and how I can get it. I have been trying to find the error log for mysql, and have not found it yet.
And, yes, if I get through this, and even if I dont, I will make sure to create a system to update the data on a regular basis so these types of failures aren't so catastrophic in the future.
Thanks in advance