Why is large MySQL database crashing app?

Why is large MySQL database crashing app? - openshift

I have a webapp which I'm trying to deploy to OpenShift, which serves historical plots of market data. The MySQL database is around 600MB and has >12,000,000 rows over about 6 tables. If I deploy and run the app, it may run at most a couple minutes, but then it crashes with exceptions related to null database connections. I tried restarting mysql and looking at the quota (~750MB of 1024MB) to no avail. If I reduce the database size drastically, the webapp runs fine. So far a couple hours with no crashes. I also have another webapp on OpenShift that uses a small MySQL database that runs just fine.
From what I could tell from reading the documentation is that a gear provides up to 1 GB of storage, and I'm under that amount. If I lower my data footprint drastically, MySQL behaves fine. Any ideas regarding why with a 600MB database, the app fails and what limitations are causing failure? The webapp has also been running solidly on a VPS for about 2 years with no issues, so I'm pretty confident the problem lies within OpenShift's MySQL cartridge.
Edit: Additional Information
The MySQL cartridge is on a small gear and shared with the webapp.
The number of connections is 5 from a pool

Related

Applications downs due to heavy MySQL server load

We have a 2GB Digital Ocean server, and it is dedicated for a MySQL server of other two PHP servers. we are using Percona MySQL Server 5.6 on this server. We configured MySQL replication and these configuration is working fine
Our issue is sometime our site monitoring tools reporting that some of the URL hosted with this server is down (May be this is happening once in a week or two). When I am checking, I could see that Mysql Master server load is too much high (May be 35 - 40), so the MySQL server was not responded. # that I usually do a MySQl service restart, this restart cause to server load become normal and the sites started working after service restart.
This is a back-end MySQL database server of 20-25 PHP applications (WordPress, Drupal and some custom applications server).
Here are my questions,
Why this server load automatically goes down, after a spikes happens?
Is there any way in which database is causing issues? So that I can identify the application too.
How can I identify the root cause of this issues

Depending upon your working dataset, a 2GB server providing access for 20-25 PHP applications (WordPress, Drupal and some custom applications server) could be the issue.
For example, if you have a 1.4GB buffer pool (assuming all tables are InnnoDB) and 10GB of data, then your various applications could end up competing for resources, such as I/O, buffer pool pages, Adaptive Hash Index, query cache. They could also, assuming caching is used, be invalidating theit caches within a similar timeframe, thus sending expensive queries to the database.
Whilst a load of 50 is something that you would normally want to avoid, the load average is not something that you should concern yourself with if showing in isolation.
The use of the uninterruptible state has since grown in the Linux
kernel, and nowadays includes uninterruptible lock primitives. If the
load average is a measure of demand in terms of running and waiting
threads (and not strictly threads wanting hardware resources), then
they are still working the way we want them to.
http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html
If the issue is happening once per week then it is starting to sound like a batch process, or cache expiration issue - too much happening at once for the resources available.
The best thing to do is to monitor and look for the cause. Since you are already using Percona Server, using PMM should give you the perfect insight to find the cause, although it works with Oracle MySQL, MariaDB, Aurora, etc. You can try a demo to see the insights that you can gain:
https://pmmdemo.percona.com. The software is Open Source and free to use.
You can look in QAN to find the most expensive queries, whilst looking at Prometheus data to give an insight into the host itself. There are some recommendations to get the most from PMM, depending upon your flavour of MySQL.

AWS MySql RDS: "Got error 28 from storage engine"

I’m working on a relatively small application, serving about 1,500 users and running on a Mysql database that is about 300 megs. The entire system runs on AWS with a single dedicated EC2 node running the Grails application on Tomcat 8 and a single dedicated Mysql RDS instance. The system has been running live in production for about three years with no database issues. The two largest tables contain about 40k records. The application is built using Grails and Java 1.7.
Yesterday our application began throwing the following exception, with the underlying error message of:
"Got error 28 from storage engine"
The logs available from the RDS admin web console are empty.
Googling has not revealed any promising leads that have helped us resolve the issue, other than most messages point out that the disk is out of space. Given that most search results refer to disk space. Being software developers rather than DBA's with significant Mysql expertise, we boosted the storage space of the Mysql RDS instance. Unfortunately, today our application is still sporadically throwing the same exception. Having created our Mysql DRS instance with 15 gigs of space -- which is several orders of magnitude of additional space than our application makes use of -- we are at a lost as to what is the root cause of this issue. Our guess is that there is possibly some out of the box Mysql limitation that we are hitting up against but have no idea what it may be or how to solve it. Indeed, the whole reason we host onRDS was to avoid issues of this type.
Doing some Googling, this seems like a somewhat common Mysql error but that does not have any concrete trail for us to follow. Most suggestions talk about checking the filesystem or "inode" space. Given that this is a hosted Mysql RDS instance on AWS, I am unsure if or how to check such things. Looking at the CloudWatch for the RDS instance, I can see that the CPU is idle and that the instance is dramatically under the 15 gig storage limit.
Does anyone have any suggestions for us to investigate?
Given that we are new to RDS, can you please point us to any documentation or -- even better -- suggest what settings we can tweak in the RDS console -- to help prevent this error from occuring? Ideally, we moved to RDS thinking that if this was a mysql sizing or scaling issue that moving to RDS would solve the problem. As a last resort, this morning we then deleted about 20k rows of unessential data. Unfortunately, the issue persists and we continue to experience the issue.
A few questions:
Are there any RDS settings we can adjust to avoid this issue?
Can this be solved by moving to a larger RDS instance, perhaps with more memory?
Would we experience this issue if we moved to Aurora?

Well from your comment this is definitely a low on storage issue. Because 13 Gb is very less storage. you can check the free available storage in the dashboard. Check this screenshot below in the "Storage" metric under monitoring if it goes ahead of the red line you will start getting Error 28. You will have to increase the storage of your RDS instance or free up some space. I will suggest increase the storage to avoid this issue in future.

MySQL hanging in Writing to Net

I have problem, when MySQL thread sometime stuck at status "Writing to net".
I have 4 Apache server (2.4) (requests are load-balanced on them) a 1 MySQL (MariaDB 10). Apache is executing php56. All Apache servers have same configuration. All servers runs on CentOS 7. SElinux is disabled on Apache servers for debug reasons. No problems in audit logs on DB server. All servers are virtual and located on same cluster (VMware).
Problem appear only on specific pages and specific queries to DB.
Usually there is around 100-200 separate queries on page and most of them takes 0.0001-0.0010 s. But then I have one query that takes around 1-2sec. The query itself take much lesser time (around 0.0045s).
Problematic query returns around 8984 rows and when executed from CLI from debug script, it is executed fast as expected.
Strange is that in time some Apache servers execute that page quickly, and some slowly. It changes (during day). Also I tried remove one Apache server from cluster and then send same request. If server is not under any load, it usually responds fast.
All server have enough resources (CPU and RAM) so it is definitely not load issue. They usually have around 4-10 active Apache workers (prefork) and have capacity for 100 active workers.
I tried debugging with tcpdump and when requesting page, I can see packet flow for fast queries and then it stops for a while and resumes. Not sure if the problem is on MySQL server or on Apache server.
My guess is that I am hitting some kind of limit, but I have no idea which one.

The solution is quite odd.
First few more details:
All Apache severs have same application data (PHP files, images, etc.) Mounted from NFS. The NFS share was working fine (low latency, no data corruption).
Solution:
When I was desperate I went through every possible log. Then I noticed that iptables are dropping some packets from NFS server. Well I said to myself that I should probably fix that, even when its not related.
But after I allowed all traffic from NFS to my Apache servers, MySQL status "writing to net" disappeared and all websites started to respond quickly.

Amazon RDS (Mysql2::Error 110)

I've had a Rails application running in production for the past 6 months, with weekly deployments, without any issue.
Now, I've been having a recurring issue for about 3 weeks and it seems to get worst every week.
When my app boots and reaches the point where it's trying to connect to the DB, I get this error :
Can't connect to MySQL server on '***.amazonaws.com' (110) (Mysql2::Error)
AFAIK, this error tells me that I've reached MySQL's max connections limit.
From the configs, I should be able to open 296 connections. My app is set to run 7 instances with each a database connection pool of 5, so it can't really exceed 70 connections when deploying a new instance.
I've never seen the connection count go above 20 in either the AWS RDS Console or the SHOW PROCESSLIST command.
I don't think it has anything to do with either Rails or my application server (Puma), since I can't connect through the MySQL Command-Line Tool when the issue occurs.
Has anyone had a similar issue with MySQL on RDS or MySQL itself?

The database pool isn't per application, it's per process. If it's threading/multi process per instance it could be using more than that. Have you tried restarting mysql? It sounds like you have some hanging connections for whatever reason.

I've been getting these issues recently. Could it be related to the pending-restart change of parameter group on my RDS Instance? I sure hope not. As I understand a pending change should have no effect on the current performance.

2 Rails applications utilizing 1 MySQL database - what are the risks?

ENVIRONMENT:
Ubuntu hardy LTS, Apache 2, Passenger
Virtual Server One: rails 2.3.8 application accessing MySQL common_database
Virtual Server Two: rails 3.0.4 application accessing the same MySQL common_database
The first application gets regular use from our clients. The second application is released but seeing light usage currently. The database seems to be working fine.
Someone advised me that this configuration could wind up corrupting the database. We anticipate that the second application will eventually get heavy usage. It would kill some momentum if the database became corrupted. A few more facts:
Neither application has the ability to change database structure.
Both apps will be accessing the same tables at the same time.
It is impossible for both apps to attempt to update the same record at the same time.
Has anyone experienced corruption of a MySQL database that is accessed in this kind of configuration? Were you able to overcome the problem? How?

What was the case made for risk of corruption?
As long as you aren't using rake db:migrate and both apps have the same models, you have roughly the same risks as if two or more web servers with the same application version have been spun up. If there is a divergence in expectation of what the database schema looks like between the two apps, you may run into issues, especially if the divergence is related to foreign keys, or application logic that is based on magic values.
With any concurrent manipulation of a database you run into a category of issues around modifying the same records simultaneously (who wins when two clients try to modify the same piece of data; what kind of locking model do you use, etc).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008