I am using couchbase-server community edition version 4.0.0. I find the below logs in the babysitter.log file repeatedly. The frequency is higher when the cluster is under load.
WARNING 473: Slow GET operation on connection (10.3.4.14:55846 => 10.3.9.13:11210): 4325 ms
WARNING 1057: Slow DELETE operation on connection (10.3.2.23:46152 => 10.3.9.13:11210): 1280 ms
I find little to no documentation for this warning log. What do these logs mean? How can I further debug the cause for slow operation?
At some point, we added some logging to Couchbase to help identify bigger problems that come about from resource exhaustion by correlating to smaller problems that happen earlier. These are some of those warnings.
They are generally safe to ignore. It does indicate that your system might be under heavy load and perhaps the process isn't getting enough time to send/receive/handle results.
I'd also say the 4.0 version is quite old at this stage and Couchbase has been much improved and is shipping much newer runtimes. In particular, this is coming from an erlang process, one of the things we've updated the runtime on. I expect you'll see less of this and perhaps a little more detail.
Related
Been working with the XAMP which I installed just under a year. I recently installed few frontend software for MySQL (to see which one am I comfortable with the most).
Now, for the past two days, whenever I go to localhost/phpmysql, I receive this warning
Fatal error: Maximum execution time of 30 seconds exceeded in C:\xampp\phpMyAdmin\lib..
I understand that the maximum execution time required to execute is being exceeded here. Found a few post on stackoverflow which guides you to rectification. All well and good till here.
I have a question and a concern.
Question
Why all of a sudden this error when, clearly remember, I did nothing to upset the default settings of the MySQL?
Concern
I am working on a project which uses a database (important, cannot loose it), the phpmyadmin when refreshed after the warning starts to work normally as there never was a problem. I'll need a couple of week to get done with my project. Can I continue with this timeout error without risking my database or should I try and rectify it right away?
DCoder answer is good, try to change the maximium execution time in MySQL server, also, you can try to find what query is making the noise using the slow-query-log (you can activate it and is very useful in the case of messy querys).
Check this page for more information
If you're using MySQL on windows environments, I strongly suggest to put it on a linux server, because it's lots slower over windows.
About your concern, There is no real danger about your data, and changing the slow-query-log option or maximum-execution-time option doesn't compromise the databases... but do some backup before change anything, just for extra security.
Apologies for the fairly generic nature of the question - I'm simply hoping someone can contribute some suggestions and/or ideas as I'm out of both!
The background:
We run a fairly large (35M hits/month, peak around 170 connections/sec) site which offers free software downloads (stricly legal) and which is written in ASP .NET 2 (VB .Net :( ). We have 2 web servers, sat behind a dedicated hardware load balancer and both servers are fairly chunky machines, Windows Server 2012 Pro 64 bit and IIS 8. We serve extensionless URLs by using a custom 404 page which parses out the requested URL and Server.Transfers appropriately. Because of this particular component, we have to run in classic pipeline mode.
DB wise we use MySQL, and have two replicated DBs, reads are mainly done from the slave. DB access is via a DevArt library and is extensively cached.
The Problem:
We recently (past few months) moved from older servers, running Windows 2003 Server and IIS6. In the process, we also upgraded the Devart Component and MySql (5.1). Since then, we have suffered intermitted scalability issues, which have become significantly worse as we have added more content. We recently increased the number of programs from 2000 to 4000, and this caused response times to increase from <300ms to over 3000ms (measured with NewRelic). This to my mind points to either a bottleneck in the DB (relatively unlikely, given the extensive caching and from DB monitoring) or a badly written query or code problem.
We also regularly see spikes which seem to coincide with cache refreshes which could support the badly written query argument - unfortunately all caching is done for x minutes from retrieval so it can't always be pinpointed accurately.
All our caching uses locks (like this What is the best way to lock cache in asp.net?), so it could be that one specific operation is taking a while and backing up requests behind it.
The problem is... I can't find it!! Can anyone suggest from experience some tools or methods? I've tried to load test, I've profiled the code, I've read through it line by line... NewRelic Pro was doing a good job for us, but the trial expired and for political reasons we haven't purchased a full licence yet. Maybe WinDbg is the way forward?
Looking forward to any insight anyone can add :)
It is not a good idea to guess on a solution. Things could get painful or expensive quickly. You really should start with some standard/common triage techniques and make an educated decision.
Standard process for troubleshooting performance problems on a data driven app go like this:
Review DB indexes (unlikely) and tune as needed.
Check resource utilization: CPU, RAM. If your CPU is maxed-out, then consider adding/upgrading CPU or optimize code or split your tiers. If your RAM is maxed-out, then consider adding RAM or split your tiers. I realize that you just bought new hardware, but you also changed OS and IIS. So, all bets are off. Take the 10 minutes to confirm that you have enough CPU and RAM, so you can confidently eliminate those from the list.
Check HDD usage: if your queue length goes above 1 very often (more than once per 10 seconds), upgrade disk bandwidth or scale-out your disk (RAID, multiple MDF/LDFs, DB partitioning). Check this on each MySql box.
Check network bandwidth (very unlikely, but check it anyway)
Code: a) Consider upgrading to .net 3.5 (or above). It was designed for better scalability and has much better options for caching. b) Use newer/improved caching. c) pick through the code for harsh queries and DB usage. I have had really good experiences with RedGate Ants, but equiv. products work good too.
And then things get more specific to your architecture, code and platform.
There are also some locking mechanisms for the Application variable, but they are rarely the cause of lockups.
You might want to keep an eye on your pool recycle statistics. If you have a memory leak (or connection leak, etc) IIS might seem to freeze when the pool tops-out and restarts.
Im working with SQL Server 2008 R2. In development environment time to time i can see that CPU is under load for couple of minutes (around 55%-80% while normal is 1-2%. yes really- normal load in my development environment is almost none!). During this CPU pressure time automated tests sometimes gets timeout errors.
Just experienced timeout in activity monitor. looks like this:
Tipicaly during these pressure moments its looks like this:
Problem is that i cant understand why it is happening! There is continuously executed automated tests, but they are not making heavy workload. During performance tests system works good and if it slows down there is always good explanation for that.
Im trying to resolve issue by
Running trace, but during those CPU spikes there is "nothing special" going on. No expensive queries.
Using SQL Activity Monitor- everything seems normal, except CPU (just like 1-2 waiting tasks, low I/O, ~5 requests/sec). Recent expensive queries are not that expensive.
Querying data. Using famous sp_WhoIsActive and sys.dm_exec_requests. As far i understand- nothing unusual again..
About my server
There is small number of databases and i do know them good.
Using Service Broker.
Trace is running most of the time.
I do suspect that it is some background process that is making problems. But i dont really get it.. Can you please give some hints/ideas how to resolve this?
It could be some internal SQL Server job. Like a big index rebuild.
Wait for the spike and run sp_who2 'active'. Check the column CPU Time.
Actually, how are you 100% sure that SQL is the responsible? Couldn't it be a SO issue?
I have faced the same issues, and have raised case with Microsoft.
Microsoft guy told there is no issue from SQL DB side , if cpu is spiked. Finally issue is resolved by Microsoft , actually issue was that on IIS, not SQL Server.
Every 29 days IIS need to be restart, So that you will get better performance on Application.
This is not the typical question, but I'm out of ideas and don't know where else to go. If there are better places to ask this, just point me there in the comments. Thanks.
Situation
We have this web application that uses Zend Framework, so runs in PHP on an Apache web server. We use MySQL for data storage and memcached for object caching.
The application has a very unique usage and load pattern. It is a mobile web application where every full hour a cronjob looks through the database for users that have some information waiting or action to do and sends this information to a (external) notification server, that pushes these notifications to them. After the users get these notifications, the go to the app and use it, mostly for a very short time. An hour later, same thing happens.
Problem
In the last few weeks usage of the application really started to grow. In the last few days we encountered very high load and doubling of application response times during and after the sending of these notifications (so basically every hour). The server doesn't crash or stop responding to requests, it just gets slower and slower and often takes 20 minutes to recover - until the same thing starts again at the full hour.
We have extensive monitoring in place (New Relic, collectd) but I can't figure out what's wrong; I can't find the bottlekneck. That's where you come in:
Can you help me figure out what's wrong and maybe how to fix it?
Additional information
The server is a 16 core Intel Xeon (8 cores with hyperthreading, I think) and 12GB RAM running Ubuntu 10.04 (Linux 3.2.4-20120307 x86_64). Apache is 2.2.x and PHP is Version 5.3.2-1ubuntu4.11.
If any configuration information would help analyze the problem, just comment and I will add it.
Graphs
info
phpinfo()
apc status
memcache status
collectd
Processes
CPU
Apache
Load
MySQL
Vmem
Disk
New Relic
Application performance
Server overview
Processes
Network
Disks
(Sorry the graphs are gifs and not the same time period, but I think the most important info is in there)
The problem is almost certainly MySQL based. If you look at the final graph mysql/mysql_threads you can see the number of threads hits 200 (which I assume is your setting for max_connections) at 20:00. Once the max_connections has been hit things do tend to take a while to recover.
Using mtop to monitor MySQL just before the hour will really help you figure out what is going on but if you cannot install this you could just using SHOW PROCESSLIST;. You will need to establish your connection to mysql before the problem hits. You will probably see lots of processes queued with only 1 process currently executing. This will be the most likely culprit.
Having identified the query causing the problems you can attack your code. Without understanding how your application is actually working my best guess would be that using an explicit transaction around the problem query(ies) will probably solve the problem.
Good luck!
I've written a C program thats running multiple threads and uses MySQL. After some testing i repeatedly saw the error (with hours between) "Mysql server gone away", so i maximized the wait_timeout setting of mysql. But now i get the error "Lost connection to MySQL server during query". These errors only occured when i run the program on a multiple core processor.
Maybe you guys know whats wrong or what i have to do to run my threaded program?
If you've got a multithreaded program that behaves differently on a 1-core system and a multicore system (works on 1-core and has bugs on multicore), it's written incorrectly: that's a sure sign of a race condition. It means the code is actually incorrect, and if scheduled just wrong will trample on its own data, and this is actually happening in practice on the multicore system and not on the 1-core system.
Actually, the same problem could happen on the 1-core system too, it's just less likely and more rare because the threads can't be scheduled truly simultaneously, so one thread has to preempt the other at just the wrong time, for you to see the buggy behavior. This is why if you're writing multithreaded code, you should always test and debug it on a multicore host. You're much more likely to actually see the evidence of race conditions; running on a 1-core host they can remain hidden for much longer.
I don't know what libraries you're using, but they don't look thread-safe or you're not using them in a thread-safe fashion.