Google cloud SQL - CPU at 100%

Google cloud SQL - CPU at 100% - mysql

Earlier we noticed that our Master DB CPU started spiking:
There wasn't any unusual traffic volume/load. Also, if you look at the earlier spikes they coincide with the Google backups, but it looks like there wasn't one on the 19th despite it saying that it was run in the operations logs. I'm guessing that the Google backup went wrong on the server and it went out of control the next morning when it eventually ran.
I've cloned that server and moved the traffic across to the new server and now the CPU has dropped to 10-20% but this is still a lot higher than normal (1-5%)
Things that I've checked:
- Process list
- Traffic volumes
- DB/Table sizes
Any ideas how to get to the bottom of what's causing the change? or how to fix?

High CPU usage in a database can be caused by a bunch of different things. It might have been a wide or inefficient query, a backup process gone wrong, or a few other likely suspects.
If your app can support downtime, you could try shutting it down and restarting to get a fresh state.
If you have the support package, you can also open a ticket and ask them to look into the spike farther. If you don't, you can still open an issue on the Cloud SQL issue tracker, but the response time might not be as fast.

Related

Why is idle cloud SQL instance showing 10 qps write requests?

I have connected an appmaker app to a 2nd generation MySQL instance on gcp.
ALl seems to work fine, but I noticed that cloud console believes this instance sees 10 write ops per second at all times, even when nothing should be running.
The SQL logs seems to say that there are no requests. Billing does not look off, so I'm just wondering if I see something like prober requests, although 10QPS is a bit high for that, and I would expect to see something in the logs.
Any insights would be very much appreciated.
Update: Looks like any gcp MySQL instance has a heartbeat every 2 seconds, or every second if automatic backups are enabled.
These heartbeats seem pretty cheap in terms of CPU utilization, but they seem to make storage grow slowly over time.
I'm still interested to know if the heartbeat frequency can be tuned lower (for non-replicated setups; replication heartbeat frequency can be tuned.)

AWS MySQL RDS huge CPU spike and rapid storage loss - possible attack?

As you can see from the screenshot attached, we have experienced sudden cpu spike and storage loss. We have nearly lost all storage and had to increase it manually.
When we check database size, it still has the size before this occured, so it seems its not database related. We have checked a lot of stuff (slow logs etc.) but couldn't find the problem.
Is it possible there has been an attack, or any other ideas why this happened and how to recover our free storage?
Thank you.

It's hard to say what the exact issue is but looking at the graph, it looks like you have some huge query running against your database that is filling any temp space up. When it runs itself out of room it is killing the query and that is then flushing a bunch of writes to the disk which could either be related to the query/statement or simply be unrelated queued inserts/updates.
You need to look at the slow query log if it's enabled to see if there's anything unusual there and also check your application(s) to see if they were trying to execute a ridiculous query/statement that was hammering the database.

Unexpected CPU spikes

Im working with SQL Server 2008 R2. In development environment time to time i can see that CPU is under load for couple of minutes (around 55%-80% while normal is 1-2%. yes really- normal load in my development environment is almost none!). During this CPU pressure time automated tests sometimes gets timeout errors.
Just experienced timeout in activity monitor. looks like this:
Tipicaly during these pressure moments its looks like this:
Problem is that i cant understand why it is happening! There is continuously executed automated tests, but they are not making heavy workload. During performance tests system works good and if it slows down there is always good explanation for that.
Im trying to resolve issue by
Running trace, but during those CPU spikes there is "nothing special" going on. No expensive queries.
Using SQL Activity Monitor- everything seems normal, except CPU (just like 1-2 waiting tasks, low I/O, ~5 requests/sec). Recent expensive queries are not that expensive.
Querying data. Using famous sp_WhoIsActive and sys.dm_exec_requests. As far i understand- nothing unusual again..
About my server
There is small number of databases and i do know them good.
Using Service Broker.
Trace is running most of the time.
I do suspect that it is some background process that is making problems. But i dont really get it.. Can you please give some hints/ideas how to resolve this?

It could be some internal SQL Server job. Like a big index rebuild.
Wait for the spike and run sp_who2 'active'. Check the column CPU Time.
Actually, how are you 100% sure that SQL is the responsible? Couldn't it be a SO issue?

I have faced the same issues, and have raised case with Microsoft.
Microsoft guy told there is no issue from SQL DB side , if cpu is spiked. Finally issue is resolved by Microsoft , actually issue was that on IIS, not SQL Server.
Every 29 days IIS need to be restart, So that you will get better performance on Application.

How to find out what is causing a slow down of the application?

This is not the typical question, but I'm out of ideas and don't know where else to go. If there are better places to ask this, just point me there in the comments. Thanks.
Situation
We have this web application that uses Zend Framework, so runs in PHP on an Apache web server. We use MySQL for data storage and memcached for object caching.
The application has a very unique usage and load pattern. It is a mobile web application where every full hour a cronjob looks through the database for users that have some information waiting or action to do and sends this information to a (external) notification server, that pushes these notifications to them. After the users get these notifications, the go to the app and use it, mostly for a very short time. An hour later, same thing happens.
Problem
In the last few weeks usage of the application really started to grow. In the last few days we encountered very high load and doubling of application response times during and after the sending of these notifications (so basically every hour). The server doesn't crash or stop responding to requests, it just gets slower and slower and often takes 20 minutes to recover - until the same thing starts again at the full hour.
We have extensive monitoring in place (New Relic, collectd) but I can't figure out what's wrong; I can't find the bottlekneck. That's where you come in:
Can you help me figure out what's wrong and maybe how to fix it?
Additional information
The server is a 16 core Intel Xeon (8 cores with hyperthreading, I think) and 12GB RAM running Ubuntu 10.04 (Linux 3.2.4-20120307 x86_64). Apache is 2.2.x and PHP is Version 5.3.2-1ubuntu4.11.
If any configuration information would help analyze the problem, just comment and I will add it.
Graphs
info
phpinfo()
apc status
memcache status
collectd
Processes
CPU
Apache
Load
MySQL
Vmem
Disk
New Relic
Application performance
Server overview
Processes
Network
Disks
(Sorry the graphs are gifs and not the same time period, but I think the most important info is in there)

The problem is almost certainly MySQL based. If you look at the final graph mysql/mysql_threads you can see the number of threads hits 200 (which I assume is your setting for max_connections) at 20:00. Once the max_connections has been hit things do tend to take a while to recover.
Using mtop to monitor MySQL just before the hour will really help you figure out what is going on but if you cannot install this you could just using SHOW PROCESSLIST;. You will need to establish your connection to mysql before the problem hits. You will probably see lots of processes queued with only 1 process currently executing. This will be the most likely culprit.
Having identified the query causing the problems you can attack your code. Without understanding how your application is actually working my best guess would be that using an explicit transaction around the problem query(ies) will probably solve the problem.
Good luck!

How to benchmark and optimize a really database-intensive Rails action?

There is an action in the admin section of a client's site, say Admin::Analytics (that I did not build but have to maintain) that compiles site usage analytics by performing a couple dozen, rather intensive database queries. This functionality has always been a bottleneck to application performance whenever the analytics report is being compiled. But, the bottleneck has become so bad lately that, when accessed, the site comes to a screeching halt and hangs indefinitely. Until yesterday I never had a reason to run the "top" command on the server, but doing so I realized that Admin::Analytics#index causes mysqld to spin at upwards of 350+% CPU power on the quad-core, production VPS.
I have downloaded fresh copies of production data and the production log. However, when I access Admin::Analytics#index locally on my development box, while using the production data, it loads in about 10 - 12 seconds (and utilizes ~ 150+% of my dual-core CPU), which sadly is normal. I suppose there could be a discrepancy in mysql settings that has suddenly come into play. Also, a mysqldump of the database is now 531 MB, when it was only 336 MB 28 days ago. Anyway, I do not have root access on the VPS, so tweaking mysqld performance would be cumbersome, and I would really like to get to the exact cause of this problem. However, the production logs don't contain info. on the queries; they merely report the length that these requests took, which average out to a few minutes apiece (although they seemed to have caused mysqld to stall for much longer than this and prompting me to request our host to reboot mysqld just to get our site back up in one instance).
I suppose I can try upping the log level in production to solicit info. on the database queries being performed by Admin::Analytics#index, but at the same time I'm afraid to replicate this behavior in production because I don't feel like calling our host up to restart mysqld again! This action contains a single database request in its controller, and a couple dozen prepared statements embedded in its view!
How would you proceed to benchmark/diagnose and optimize/fix this action?!
(Aside: Obviously I would like to completely replace this functionality with Google Analytics or a similar solution, but I need fix this problem before proceeding.)

I'd recommend taking a look at this article:
http://axonflux.com/building-and-scaling-a-startup
Particularly, query_reviewer and newrelic have been a life-saver for me.

I appreciate all the help with this, but what turned out to be the fix for this was to implement a couple of indexes on the Analytics table to cater to the queries in this action. A simple Rails migration to add the indexes and the action now loads in less than a second both on my dev box and on prod!

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008