Joomla 2.5.14 mysql - memory issue - mysql

Can someone explain this Joomla error to me -- what to do to fix it. It shows up with debug on. We seem to have a memory leak? Mysql runs slow than site crashes.
Function Location
JSite->dispatch() /home/greatfam/public_html/index.php:52
JComponentHelper::renderComponent() /home/greatfam/public_html/includes/application.php:197
JComponentHelper::executeComponent() /home/greatfam/public_html/libraries/joomla/application/component/helper.php:351
require_once() /home/greatfam/public_html/libraries/joomla/application/component/helper.php:383
JController->execute() /home/greatfam/public_html/components/com_content/content.php:16
ContentController->display() /home/greatfam/public_html/libraries/joomla/application/component/controller.php:761
JController->display() /home/greatfam/public_html/components/com_content/controller.php:74
ContentViewArticle->display() /home/greatfam/public_html/libraries/joomla/application/component/controller.php:722
JView->get() /home/greatfam/public_html/components/com_content/views/article/view.html.php:32
ContentModelArticle->getItem() /home/greatfam/public_html/plugins/system/jat3/jat3/core/joomla/view.php:348
JError::raiseError() /home/greatfam/public_html/components/com_content/models/article.php:172
JError::raise() /home/greatfam/public_html/libraries/joomla/error/error.php:251

A nice and easy way to get started is: turn Joomla debug on in the Global configuration.
Then reload the frontpage, and examine closely the output at the bottom of the page.
There you will find the details of the memory used by each module, and the list of queries run. This will give you a head start and limit the number of items you need to debug (there will be a single module eating up all your memory).
If "after dispatch" is taking too long, then it could be either a plugin or the component being shown on the page.
If nothing "notable" shows up here (a lot of queries, more than 50, or high memory consumption, or long time for a single item, you might want to look at the apache error_log and mysql log and verify system limits.

Related

NullPointerExceptions while executing LoadTest on WSO2BPS

While performing loadtests on WSO2 BPS 3.2.0 we`ve ran onto the problem.
Let me tell you more about out project and our actions.
Our BPS process is designed to manage some interactions with 3 systems. Basically it is "spread" on two parts - first one to CREATE INSTANCE in one of systems, then waiting a bit, and then SELECT OFFER in instance context.
In real life it looks like: user wants to get a product, the application asks system for an offers and then the user selects offer from available ones.
IN BPS the first part is a straight-forward process, the second part is spread on two flows - one to refresh information with a new offers, and another is to wait if the user chooses one of them.
Our aim is to stand about 1000-1500 simulatious threads on the load-test. An external systems are simulated by mockups executed by LoadUI.
We can achieve our goal if we disable "Process-Level Monitoring Events" in deployment descriptor (set it to "none") of our process. Everything goes well and smooth for hours.
But if we enable this feature (and we need to), everything falls with an error very soon (on about 100-200 run):
[2015-07-28 17:47:02,573] ERROR {org.wso2.carbon.bpel.core.ode.integration.BPELProcessProxy} - Error processing response for MEX null
java.lang.NullPointerException
at org.wso2.carbon.bpel.core.ode.integration.BPELProcessProxy.onResponse(BPELProcessProxy.java:402)
at org.wso2.carbon.bpel.core.ode.integration.BPELProcessProxy.onAxisServiceInvoke(BPELProcessProxy.java:187)
at
[....Et cetera....]
After the first appearance of this error another one type appears - other threads just fall after the timeout.
It seems that database is ok (by the way, it is MySQL 5.6.25). The dashboard shows no extreme levels of input or output.
So I think the BPS itself makes a bottleneck. We have gave it 8gb heap and its conf options are set for extreme amounts of threads (if it possible negative values are set and if not - just ridiculously big like 100000).
Anyone has ever faced this problem? Appreciate any help very much.
Solved in BPS 3.5.0 version, refer to release-notes

Why is there a 10 second delay in this http conversation?

I've got a weird situation. The first time I hit an embedded web server (uclinux/boa) at 10.1.10.29, I get a 10 second delay in the browser window before things start happening. "first time" means I haven't hit the machine in a few days. Browser type/OS doesn't matter (source is 10.1.10.20)
I've got a wireshark capture of it happening.
And here is the detail of frame 296:
Note packet 374 doesn't pop out for around 10 seconds after 296. The packets between those 2 aren't from the machine in question. It's just sitting there for 10 seconds and decides to retransmit. How's it supposed to work?
The main reason is most certainly because the code was swapped out from memory.
MS-Windows is really bad in that regard. If some program does not get used for "too long", it gets swapped out of memory. Period. When you come back at it, it has to re-read it back from the hard drive.
The one good thing (main reason) Windows does that is to defragment the kernel memory. For that, it is good.
You have similar problems under Linux, however, only if your server needs the memory. In other words, if you have tons of processes and they all fight for as much of memory as possible, then it is likely to swap out the least used software. Otherwise it will stay in place.
If you were to use the Cassandra database system, you would notice that on any computer that runs anything else than Cassandra. If you just run Cassandra, it remains fast all the time. If you run other software that use a lot of the memory, Cassandra is slow on first access. This is particularly noticeable.
I want to add the answer that solved our problem that had the problem with the 10 second delay, then working and after 5 minutes of inactivity adding another 10 seconds delay.
First of all, we wiresharked everything, and tried to find some kind of error in code, or in the way that the computer or server handled the network traffic. Found nothing out of the ordinary.
After much searching we found it was a DNS-"problem". In the DNS-server that the client computer used, there were dual entries for the domain name of the server. One was correct and one (the first one in the list) was wrong.
So removing the wrong dns pointer solved the problem.
This means the problem was that the computer tried the first address it got, waited 10seconds to get a reply, didnt get it and went to the second address in line. This creates no error messages as this is how DNS is supposed to work. And that is why all our wireshark logs showed up as just waiting 10 seconds with no error and no reason, and then just jump into life, work for as long as the DNS record is valid (5 minutes in our case) and then the procedure needs to be done again.
Hope this helps someone who has a similar problem.

How can I investigate these mystery Django crashes?

A Django site (hosted on Webfaction) that serves around 950k pageviews a month is experiencing crashes that I haven't been able to figure out how to debug. At unpredictable intervals (averaging about once per day, but not at the same time each day), all requests to the site start to hang/timeout, making the site totally inaccessible until we restart Apache. These requests appear in the frontend access logs as 499s, but do not appear in our application's logs at all.
In poring over the server logs (including those generated by django-timelog) I can't seem to find any pattern in which pages are hit right before the site goes down. For the most recent crash, all the pages that are hit right before the site went down seem to be standard render-to-response operations using templates that seem pretty straightforward and work well the rest of the time. The requests right before the crash do not seem to take longer according to timelog, and I haven't been able to replicate the crashes intentionally via load testing.
Webfaction says that isn't a case of overrunning our allowed memory usage or else they would notify us. One thing to note is that the database is not being restarted (just the app/Apache) when we bring the site back up.
How would you go about investigating this type of recurring issue? It seems like there must be a line of code somewhere that's hanging - do you have any suggestions about a process for finding it?
I once had some issues like this, and it basically boiled down to my misunderstanding of thread-safety within django middleware. Basically the django middleware is I believe a singleton that is shared among all threads, and these threads were thrashing with the values set on a custom middleware class I had. My solution was to rewrite my middleware to not use instance or class attributes that changed, and to switch the critical parts of my application to not use threads at all with my uwsgi server as these seemed to be an overall performance downside for my app. Threaded uwsgi setups seem to work best when you have views that may complete at different intervals (some long running views and some fast ones).
Since you can't really describe what the failure conditions are until you can replicate the crash, you may need to force the situation with ab (apache benchmark). If you don't want to do this with your production site you might replicate the site in a subdomain. Warning: ab can beat the ever loving crap out of a server, so RTM. You might also want to give the WF admins a heads up about what you are going to do.
Update for comment:
I was suggesting using the exact same machine so that the subdomain name was the only difference. Given that you used a different machine there are a large number of subtle (and not so subtle) environmental things that could tweak you away from getting the error to manifest. If the new machine is OK, and if you are willing to walk away from the problem without actually solving it, you might simply make it your production machine and be happy. Personally I tend to obsess about stuff like this, but then again I'm also retired and have plenty of time to play with my toes. :-)

php cli script hangs with no messages

I've written a PHP script that runs via SSH and nohup, meant to process records from a database and do stuff with them (eg. process some images, update some rows).
It works fine with small loads, up to maybe 10k records. I have some larger datasets that process around 40k records (not a lot, I realize, but it adds up to a lot of work when each record requires the download and processing of up to 50 images).
The larger datasets can take days to process. Sometimes I'll see in my debug logs memory errors, which are clear enough-- but sometimes the script just appears to "die" or go zombie on me. My tail of the debug log just stops, with no error messages, the tail of the nohup log ends with no error, and the process is still showing in a ps list, looking like this--
26075 pts/0 S 745:01 /usr/bin/php ./import.php
but no work is getting done.
Can anyone give me some ideas on why a process would just quit? The obvious things (like a php script timeout and memory issues) are not a factor, as far as I can tell.
Thanks for any tips
PS-- this is hosted on a godaddy VDS (not my choice). I am sort of suspecting that godaddy has some kind of limits that might kick in on me despite what overrides I put in the code (such as set_time_limit(0);).
Very likely the OOM killer. If you really , really really want to stay out of its reach, as root, have your process write -17 to /proc/self/oom_adj. Caution: The kernel usually knows better. Evading the OOM killer can actually cripple the same RDBMS that you are trying to query. What a vicious cycle that would be :)
You probably (instead) want to stagger queries based on what you read from /proc/loadavg and /proc/meminfo. If you increase loads or swap exponentially, you need to back off, especially as a background process :)
Additionally, monitor IOWAIT while you run. This can be averaged from /proc/stat when compared with the time the system booted. Note it when you start and as you progress.
Unfortunately, the serial killer known as the OOM killer does not maintain a body count that is accessible beyond parsing kernel messages.
Or, your cron job keeps hitting its ulimited amount of allocated heap. Either way, your job needs to back off when appropriate, or prevent its own demise (as noted above) prior to doing any work.
As a side note, you probably should not be doing what you are doing on shared hosting. If its that big, its time to get a VPS (at least) where you have some control over what process gets to do what.

Is there any way to monitor the number of CAS stackwalks that are occurring?

I'm working with a time sensitive desktop application that uses p/invoke extensively, and I want to make sure that the code is not wasting a lot of time on CAS stackwalks.
I have used the SuppressUnmanagedCodeSecurity attribute where I think it is necessary, but I might have missed a few places. Does anyone know if there is a way to monitor the number of CAS stackwalks that are occurring, and better yet pinpoint the source of the security demands?
You can use the Process Explorer tool (from Sysinternals) to monitor your process.
Bring up Process Explorer, select your process and right click to show "Properties". Then, on the .NET tab, select the .NET CLR Security object to monitor. Process Explorer will show counters for
Total Runtime Checks
Link Time Checks
% Time in RT Checks
Stack Walk Depth
These are standard security performance counters described here ->
http://msdn.microsoft.com/en-us/library/adcbwb64.aspx
You could also use Perfmon or write your own code to monitor these counters.
As far as I can tell, the only one that is really useful is item 1. You could keep an eye on that while you are debugging to see if it is increasing substantially. If so, you need to examine what is causing the security demands.
I don't know of any other tools that will tell you when a stackwalk is being triggered.