NullPointerExceptions while executing LoadTest on WSO2BPS - mysql

While performing loadtests on WSO2 BPS 3.2.0 we`ve ran onto the problem.
Let me tell you more about out project and our actions.
Our BPS process is designed to manage some interactions with 3 systems. Basically it is "spread" on two parts - first one to CREATE INSTANCE in one of systems, then waiting a bit, and then SELECT OFFER in instance context.
In real life it looks like: user wants to get a product, the application asks system for an offers and then the user selects offer from available ones.
IN BPS the first part is a straight-forward process, the second part is spread on two flows - one to refresh information with a new offers, and another is to wait if the user chooses one of them.
Our aim is to stand about 1000-1500 simulatious threads on the load-test. An external systems are simulated by mockups executed by LoadUI.
We can achieve our goal if we disable "Process-Level Monitoring Events" in deployment descriptor (set it to "none") of our process. Everything goes well and smooth for hours.
But if we enable this feature (and we need to), everything falls with an error very soon (on about 100-200 run):
[2015-07-28 17:47:02,573] ERROR {org.wso2.carbon.bpel.core.ode.integration.BPELProcessProxy} - Error processing response for MEX null
java.lang.NullPointerException
at org.wso2.carbon.bpel.core.ode.integration.BPELProcessProxy.onResponse(BPELProcessProxy.java:402)
at org.wso2.carbon.bpel.core.ode.integration.BPELProcessProxy.onAxisServiceInvoke(BPELProcessProxy.java:187)
at
[....Et cetera....]
After the first appearance of this error another one type appears - other threads just fall after the timeout.
It seems that database is ok (by the way, it is MySQL 5.6.25). The dashboard shows no extreme levels of input or output.
So I think the BPS itself makes a bottleneck. We have gave it 8gb heap and its conf options are set for extreme amounts of threads (if it possible negative values are set and if not - just ridiculously big like 100000).
Anyone has ever faced this problem? Appreciate any help very much.

Solved in BPS 3.5.0 version, refer to release-notes

Related

Joomla 2.5.14 mysql - memory issue

Can someone explain this Joomla error to me -- what to do to fix it. It shows up with debug on. We seem to have a memory leak? Mysql runs slow than site crashes.
Function Location
JSite->dispatch() /home/greatfam/public_html/index.php:52
JComponentHelper::renderComponent() /home/greatfam/public_html/includes/application.php:197
JComponentHelper::executeComponent() /home/greatfam/public_html/libraries/joomla/application/component/helper.php:351
require_once() /home/greatfam/public_html/libraries/joomla/application/component/helper.php:383
JController->execute() /home/greatfam/public_html/components/com_content/content.php:16
ContentController->display() /home/greatfam/public_html/libraries/joomla/application/component/controller.php:761
JController->display() /home/greatfam/public_html/components/com_content/controller.php:74
ContentViewArticle->display() /home/greatfam/public_html/libraries/joomla/application/component/controller.php:722
JView->get() /home/greatfam/public_html/components/com_content/views/article/view.html.php:32
ContentModelArticle->getItem() /home/greatfam/public_html/plugins/system/jat3/jat3/core/joomla/view.php:348
JError::raiseError() /home/greatfam/public_html/components/com_content/models/article.php:172
JError::raise() /home/greatfam/public_html/libraries/joomla/error/error.php:251
A nice and easy way to get started is: turn Joomla debug on in the Global configuration.
Then reload the frontpage, and examine closely the output at the bottom of the page.
There you will find the details of the memory used by each module, and the list of queries run. This will give you a head start and limit the number of items you need to debug (there will be a single module eating up all your memory).
If "after dispatch" is taking too long, then it could be either a plugin or the component being shown on the page.
If nothing "notable" shows up here (a lot of queries, more than 50, or high memory consumption, or long time for a single item, you might want to look at the apache error_log and mysql log and verify system limits.

How to force a multihopping topology with xbee zb?

I use some xbee (s2) modules with zb stack for mesh networking evaluation. Therefore a multi hopping environment has to be created. The problem is, that the firmware handles the association for themselves and there is no way deeper into the stack as the api provides. To force the path of the data, without to disturb the routing mechanism, I have tried to measure, I had to put them outside their reach. To get only the next hop in association isn't that easy. I used the least power level of the output, but the distance for the test setup is to large and the rf characteristics of the environment change undetermined.
Therefore my question, has anyone experience with this issue?
Regards, Toby
I don't think it's possible through software and coordinator/routers. You could change the Node Join Time (ATNJ) to force a new router to join through a particular router (disable Node Join on all nodes except one), but that would only affect joining. Once joined to the network, the router will discover that other nodes are within range.
You could possibly do it with sleepy end devices. You can use the ATNJ trick to force an end device to join through a single router, and it will always send its messages to that router. But you won't get that many hops -- end device sends to its parent router, which sends to the target's parent router, which sends to the target end device.
You'll likely need to physically limit the range of the radios to force hopping, as demonstrated in the video you linked of Digi's K-Node test equipment with a network of over 1000 radios. They're putting the radios in RF-shielded boxes and using wired antenna connections with software-controlled attenuators to connect the modules to each other.
If you have XBee modules with the U.fl or RPSMA connector, and don't connect an antenna, it should significantly reduce the range of the module. Otherwise, with a wire whip or integrated PCB antenna, you need to put each radio in some sort of box that attenuates the signal. Perhaps someone else can offer advice on materials that will reduce the signal's range without completely blocking it.
ZigBee nodes try to automatically form an Ad-Hoc network. That is why they join the network with the strongest connection (best network coverage) available on that moment. These modules are designed in such a way, that you do not have to care much about establishing a reliable communication. They will solve networking problems most of the time.
What you want to do, is somehow force a different situation. You want to create a specific topology, in order to get some multi-hopping. That will not be the normal behavior of the nods. But you can still get what you want with some of the AT Commands.
The mentioned command "NJ" should work for you. This command locks joins after a certain time (in seconds). Let us think of a simple ZigBee network with three nodes: one Coordinator, one Router and one End-Device. Switch on the Coordinator with "NJ" set to, let us say, two minutes. Then quickly switch on the Router, so it can associate with the Coordinator within these two minutes. After these two minutes, the Coordinator will be locked and will not accept more joins. At that moment you can start the End-Device, which will have to associate with the Router necessarily. This way, you will see that messages between End-Device and Coordinator go through the Router, as you wanted.
You may get a bigger network applying this idea several times, without needing to play with the module's antennas. You can control the AT Parameters remotely (i.e. from a Computer connected to the Coordinator), so you can use some code to help you initialize the network.

php cli script hangs with no messages

I've written a PHP script that runs via SSH and nohup, meant to process records from a database and do stuff with them (eg. process some images, update some rows).
It works fine with small loads, up to maybe 10k records. I have some larger datasets that process around 40k records (not a lot, I realize, but it adds up to a lot of work when each record requires the download and processing of up to 50 images).
The larger datasets can take days to process. Sometimes I'll see in my debug logs memory errors, which are clear enough-- but sometimes the script just appears to "die" or go zombie on me. My tail of the debug log just stops, with no error messages, the tail of the nohup log ends with no error, and the process is still showing in a ps list, looking like this--
26075 pts/0 S 745:01 /usr/bin/php ./import.php
but no work is getting done.
Can anyone give me some ideas on why a process would just quit? The obvious things (like a php script timeout and memory issues) are not a factor, as far as I can tell.
Thanks for any tips
PS-- this is hosted on a godaddy VDS (not my choice). I am sort of suspecting that godaddy has some kind of limits that might kick in on me despite what overrides I put in the code (such as set_time_limit(0);).
Very likely the OOM killer. If you really , really really want to stay out of its reach, as root, have your process write -17 to /proc/self/oom_adj. Caution: The kernel usually knows better. Evading the OOM killer can actually cripple the same RDBMS that you are trying to query. What a vicious cycle that would be :)
You probably (instead) want to stagger queries based on what you read from /proc/loadavg and /proc/meminfo. If you increase loads or swap exponentially, you need to back off, especially as a background process :)
Additionally, monitor IOWAIT while you run. This can be averaged from /proc/stat when compared with the time the system booted. Note it when you start and as you progress.
Unfortunately, the serial killer known as the OOM killer does not maintain a body count that is accessible beyond parsing kernel messages.
Or, your cron job keeps hitting its ulimited amount of allocated heap. Either way, your job needs to back off when appropriate, or prevent its own demise (as noted above) prior to doing any work.
As a side note, you probably should not be doing what you are doing on shared hosting. If its that big, its time to get a VPS (at least) where you have some control over what process gets to do what.

Simple scalable work/message queue with delay

I need to set up a job/message queue with the option to set a delay for the task so that it's not picked up immediately by a free worker, but after a certain time (can vary from task to task). I looked into a couple of linux queue solutions (rabbitmq, gearman, memcacheq), but none of them seem to offer this feature out of the box.
Any ideas on how I could achieve this?
Thanks!
I've used BeanstalkD to great effect, using the delay option on inserting a new job to wait several seconds till the item becomes available to be reserved.
If you are doing longer-term delays (more than say 30 seconds), or the jobs are somewhat important to perform (abeit later), then it also has a binary logging system so that any daemon crash would still have a record of the job. That said, I've put hundreds of thousands of live jobs through Beanstalkd instances and the workers that I wrote were always more problematical than the server.
You could use an AMQP broker (such as RabbitMQ) and I have an "agent" (e.g. a python process built using pyton-amqplib) that sits on an exchange an intercepts specific messages (specific routing_key); once a timer has elapsed, send back the message on the exchange with a different routing_key.
I realize this means "translating/mapping" routing keys but it works. Working with RabbitMQ and python-amqplib is very straightforward.

Is there any way to monitor the number of CAS stackwalks that are occurring?

I'm working with a time sensitive desktop application that uses p/invoke extensively, and I want to make sure that the code is not wasting a lot of time on CAS stackwalks.
I have used the SuppressUnmanagedCodeSecurity attribute where I think it is necessary, but I might have missed a few places. Does anyone know if there is a way to monitor the number of CAS stackwalks that are occurring, and better yet pinpoint the source of the security demands?
You can use the Process Explorer tool (from Sysinternals) to monitor your process.
Bring up Process Explorer, select your process and right click to show "Properties". Then, on the .NET tab, select the .NET CLR Security object to monitor. Process Explorer will show counters for
Total Runtime Checks
Link Time Checks
% Time in RT Checks
Stack Walk Depth
These are standard security performance counters described here ->
http://msdn.microsoft.com/en-us/library/adcbwb64.aspx
You could also use Perfmon or write your own code to monitor these counters.
As far as I can tell, the only one that is really useful is item 1. You could keep an eye on that while you are debugging to see if it is increasing substantially. If so, you need to examine what is causing the security demands.
I don't know of any other tools that will tell you when a stackwalk is being triggered.