Atmel Studio Dummy_Handler - exception

Occasionally I will get an unexpected interrupt, and my code will hang inside Dummy_Handler() in exceptions.c of the Atmel Studio Framework (ASF). I am using the ATSAM3X8E microcontroller of the Arduino Due.
void Dummy_Handler(void)
{
while(1) {
}
}
Any ideas how to determine which interrupt it was?
Of course I could replace this single handler with unique dummy handlers, one for each exception. (There are about fifty of them.) For example change each line in the same exceptions.c file:
void HardFault_Handler ( void ) __attribute__ ((weak, alias("Dummy_Handler")));
to this
void HardFault_Handler ( void ) __attribute__ ((weak, alias("Dummy_HardFault_Handler")));
Etc... Or try to reason how my code could have generated which interrupt. But who has that kind of time?

This MCU has an Interrupt Program Status Register that gives some clue as to source. ASF has wrapped it in a function __get_IPSR() in core_cmFunc.h:
uint32_t phantomISR = 9999;
void Dummy_Handler(void)
{
while(1) {
phantomISR = __get_IPSR();
}
}
Then this global variable can be monitored at runtime. (In my case I paused the assembly code for this loop-of-death and saw the value 3 in the R3 register.) The Atmel MCU doc explains its value:
ISR_NUMBER
This is the number of the current exception:
0 = Thread mode
1 = Reserved
2 = NMI
3 = Hard fault
4 = Memory management fault
5 = Bus fault
6 = Usage fault
7-10 = Reserved
11 = SVCall
12 = Reserved for Debug
13 = Reserved
14 = PendSV
15 = SysTick
16 = IRQ0
45 = IRQ29
Both times this happened to me it was the Hard Fault, a kind of blue-screen-of-death for the Ardunio Due. So I also installed a Hard Fault handler of my own.
ISR(HardFault_Handler)
{
while (1) {
}
}
Also, detectable in debug mode by pausing. Of course the sequel is, what causes a Hard Fault? I'm guessing memory wipe or infinite recursion.

Related

How to transfer a float array (without serializing/deserializing) from Scala (JeroMQ) to C (ZMQ)?

Currently, I am using a JSON library to serialize the data at the sender (JeroMQ), and deserialize at the receiver (C, ZMQ). But, while parsing, the JSON library starts to consume a lot of memory and the OS kills the process. So, I want to send the float array as it is, i.e. without using JSON.
The existing sender code is below (syn0 and syn1 are Double arrays). If syn0 and syn1 are around 100 MB each, the process is killed while parsing the received arrays, i.e. the last line of the snippet below:
import org.zeromq.ZMQ
import com.codahale.jerkson
socket.connect("tcp://localhost:5556")
socket.send(json.JSONObject(Map("syn0"->json.JSONArray(List.fromArray(syn0Global)))).toString())
println("SYN0 Request sent”)
val reply_syn0 = socket.recv(0)
println("Response received after syn0: " + new String(reply_syn0))
logInfo("Sending Syn1 request … , size : " + syn1Global.length )
socket.send(json.JSONObject(Map("syn1"->json.JSONArray(List.fromArray(syn1Global)))).toString())
println("SYN1 Request sent")
val reply_syn1 = socket.recv(0)
socket.send(json.JSONObject(Map("foldComplete"->"Done")).toString())
println("foldComplete sent")
// Get the reply.
val reply_foldComplete = socket.recv(0)
val processedSynValuesJson = new String(reply_foldComplete)
val processedSynValues_jerkson = jerkson.Json.parse[Map[String,List[Double]]](processedSynValuesJson)
Can these arrays be transferred without using JSON?
Here I am transferring a float array between two C programs:
//client.c
int main (void)
{
printf ("Connecting to hello world server…\n");
void *context = zmq_ctx_new ();
void *requester = zmq_socket (context, ZMQ_REQ);
zmq_connect (requester, "tcp://localhost:5555");
int request_nbr;
float send_buffer[10];
float recv_buffer[10];
for(int i = 0; i < 10; i++)
send_buffer[i] = i;
for (request_nbr = 0; request_nbr != 10; request_nbr++) {
//char buffer [10];
printf ("Sending Hello %d…\n", request_nbr);
zmq_send (requester, send_buffer, 10*sizeof(float), 0);
zmq_recv (requester, recv_buffer, 10*sizeof(float), 0);
printf ("Received World %.3f\n", recv_buffer[5]);
}
zmq_close (requester);
zmq_ctx_destroy (context);
return 0;
}
//server.c
int main (void)
{
// Socket to talk to clients
void *context = zmq_ctx_new ();
void *responder = zmq_socket (context, ZMQ_REP);
int rc = zmq_bind (responder, "tcp://*:5555");
assert (rc == 0);
float recv_buffer[10];
float send_buffer[10];
while (1) {
//char buffer [10];
zmq_recv (responder, recv_buffer, 10*sizeof(float), 0);
printf ("Received Hello\n");
for(int i = 0; i < 10; i++)
send_buffer[i] = recv_buffer[i]+5;
zmq_send (responder, send_buffer, 10*sizeof(float), 0);
}
return 0;
}
Finally, my unsuccessful attempt at doing something similar using Scala (below is the client code):
def main(args: Array[String]) {
val context = ZMQ.context(1)
val socket = context.socket(ZMQ.REQ)
println("Connecting to hello world server…")
socket.connect ("tcp://localhost:5555")
val msg : Array[Float] = Array(1,2,3,4,5,6,7,8,9,10)
val bbuf = java.nio.ByteBuffer.allocate(4*msg.length)
bbuf.asFloatBuffer.put(java.nio.FloatBuffer.wrap(msg))
for (request_nbr <- 1 to 10) {
socket.sendByteBuffer(bbuf,0)
}
}
SER/DES ? Size ?No, an underlying transport-philosophy related constraint matters.
You have started with an 0.1 GB sizing for transport-payload and reported a JSON-library allocations to cause your O/S to kill the process.
Next, in other post, you have requested an 0.762 GB sizing for transport-payload.
But there is a bit more important issue in ZeroMQ transport orchestration than a choice of an external data-serialiser SER/DES policy.
No one may forbid you to try to send as big BLOB as possible, whereas a JSON-decorated string has already shown you the dark-side of such approaches, there are other reasons not to proceed this way ahead.
ZeroMQ is out of question a great and powerful toolbox. Still it takes some time for one to gain an insight necessary for indeed a smart and highly performant code-deployment, that makes maximum out of this powerful work-horse.
One of side-effects of the feature-rich internal ecosystem "under-the-hood" is a not very much known policy, hidden in a message delivery concept.
One may send any reasonable-sized message, while a delivery is not guaranteed. It is either completely delivered, or nothing gets out at all, as said above, nothing is guaranteed.
Ouch?!
Yes, not guaranteed.
Based on this core Zero-Guarrantee philosophy, one shall take due care to decide on steps and measures, the more if you plan to try to move Gigabyte BEASTs there and back.
In this very sense, it might become quantitatively supported by real SUT testing, that small-sized messages may transport ( if you indeed still need to move GBs ( refer to comment above, under the OP ) and have no other choice ) the whole volume of data segmented into smaller pieces, with error-prone re-assembly measures, which results in much faster and much safer end-to-end solution than trying to use dumb-force and instruct the code to dump about a GB of data onto whatever resources there actually are available ( Zero-Copy principle of ZeroMQ cannot and will not per-se save you in these efforts ).
For details on another hidden trap, related to not fully Zero-Copy implementation, read Martin SUSTRIK's, co-father of ZeroMQ, remarks on Zero-Copy "till-kernel-boundary-only" ( so, at least double the memory-space allocations to be expected... ).
Solution:
Redesign the architecture so as to propagate small-sized messages, if not keeping an original datastructure "mirrored" in remote process(es) instead of attempting to keep one-shot giga-transfers survivable.
The best next step?
While it does not solve your trouble with a few SLOC-s, the best thing, if you are serious about to invest your intellectual powers into distributed processing, is to read Pieter HINTJEN's lovely book "Code Connected, Vol.1"
Yes, it takes some time to generate one's own insight, but this will raise you in many aspects onto another level of professional code design. Worth time. Worth efforts.
You'll need to serialize the data in some form or fashion - ultimately you're taking a structure in memory on one side and instructing the other side on how to rebuild that structure (bonus points for using two separate languages where the structure in memory is likely different anyway). I'd suggest you use a new JSON library as that appears to be where the problem lies, but there are more efficient protocols you could be using. Protocol Buffers enjoy good support across many languages, that might be the place I'd start.

Accuracy of Task.Delay

I'm developing Windows 10 Universal App in C#/Xaml, I'm using await Task.Delay(delayInMilliseconds) to suspend a method for given time. My scenario is somewhat realtime, so it's very sensitive to time changes, and I need to make sure that when i suspend a method for let's say 8 millisecods it is suspended for 8 millisecods. I have noticed that the actual time span for which ** Task.Delay ** suspends the method differes from what is passed as delay parameter, for 1 up to 30 ms, the length of "deviation" being different in each call. So, when I want to sleep for 8 milliseconds, my system sleeps for 9 to 39 milliseconds, and this completly ruins my scenario.
So, my question what is the best way to substitute ** Task.Delay ** and to achieve good precision? For the time being I use this method:
public static void Delay1(int delay)
{
long mt = delay * TimeSpan.TicksPerMillisecond;
Stopwatch s = Stopwatch.StarNew();
while (true)
{
if (s.Elapsed.TotalMilliseconds > delay)
{
return;
}
}
}
but it guees it consumes a lot of resources, that is 100% of a processor's core. If an user has small number of processor's cores, this would be very insufficient.
According to msdn it's not possible to achieve more accuracy due to system clock resolution:
This method depends on the system clock. This means that the time
delay will approximately equal the resolution of the system clock if
the millisecondsDelay argument is less than the resolution of the
system clock, which is approximately 15 milliseconds on Windows
systems.
Seems like I've found a sattisfactory solution: using ManualResetEvent(false).WaitOne(delay) instead of await Task.Delay(delayInMilliseconds).
I've used following code to test both methods:
async void Probe()
{
for (int i = 0; i < 1000; i++)
{
// METHOD 1
await Task.Delay(3);
// METHOD 2
new System.Threading.ManualResetEvent(false).WaitOne(3);
}
}
The code above should take exactly 3 seconds to execute. With METHOD 2 it took around 3300 ms, so the error is 0.3 ms per call. Acceptable for me. But the METHOD 1 took around 15 seconds (as mentioned by others and explained above) to execute which gives totally unacceptable error for my scenario.
EDIT: WaitOne() is a blocking call, so you'll probably want to run it as a task to get it off the UI thread (or any other thread with a message pump). #Abel has pointed out another high-res timer approach that is already baked into a task and will run in an alternate thread as shown here: (https://stackoverflow.com/a/62588903/111575). That approach makes cpu-intensive calls to Thread.Spinwait() and Thread.Sleep(0) for small intervals.
You should use multi-media timers. Those are much accurate.
Take a look here:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/2024e360-d45f-42a1-b818-da40f7d4264c/accurate-timer
After working hard I found a solution to sleep the thread for specific time and the error is just 3 to 4 microseconds on my Core 2 Duo CPU 3.00 GHz.
Here it is:
Here is code.(C# code is also given at the end.) Use "ThreadSleepHandler":
Public Class ThreadSleepHandler
Private Stopwatch As New Stopwatch
Private SpinStopwatch As New Stopwatch
Private Average As New TimeSpan
Private Spinner As Threading.SpinWait
Public Sub Sleep(Time As TimeSpan)
Stopwatch.Restart()
If Average.Ticks = 0 Then Average = GetSpinTime()
While Stopwatch.Elapsed < Time
If Stopwatch.Elapsed + Average < Time Then
Average = TimeSpan.FromTicks((Average + GetSpinTime()).Ticks / 2)
End If
End While
End Sub
Public Function GetSpinTime() As TimeSpan
SpinStopwatch.Restart()
Spinner.SpinOnce()
SpinStopwatch.Stop()
Return SpinStopwatch.Elapsed
End Function
End Class
Here is example code:
Sub Main()
Dim handler As New ThreadSleepHandler
Dim stopwatch As New Stopwatch
Do
stopwatch.Restart()
handler.Sleep(TimeSpan.FromSeconds(1))
stopwatch.Stop()
Console.WriteLine(stopwatch.Elapsed)
Loop
End Sub
For c# programmers here is code (I have converted this code but I am not sure):
static class Main
{
public static void Main()
{
ThreadSleepHandler handler = new ThreadSleepHandler();
Stopwatch stopwatch = new Stopwatch();
do {
stopwatch.Restart();
handler.Sleep(TimeSpan.FromSeconds(1));
stopwatch.Stop();
Console.WriteLine(stopwatch.Elapsed);
} while (true);
}
}
public class ThreadSleepHandler
{
private Stopwatch Stopwatch = new Stopwatch();
private Stopwatch SpinStopwatch = new Stopwatch();
private TimeSpan Average = new TimeSpan();
private Threading.SpinWait Spinner;
public void Sleep(TimeSpan Time)
{
Stopwatch.Restart();
if (Average.Ticks == 0)
Average = GetSpinTime();
while (Stopwatch.Elapsed < Time) {
if (Stopwatch.Elapsed + Average < Time) {
Average = TimeSpan.FromTicks((Average + GetSpinTime()).Ticks / 2);
}
}
}
public TimeSpan GetSpinTime()
{
SpinStopwatch.Restart();
Spinner.SpinOnce();
SpinStopwatch.Stop();
return SpinStopwatch.Elapsed;
}
}
Note:
"ThreadSleepHandler" is thread unsafe. You cannot use a single "ThreadSleepHandler" for multiple threads.
The first sleep time will not be enough accurate.
In my tests, I found that DispatcherTimer at 20ms intervals will deliver sufficiently smooth animation if you must do it in code. Doing it in XAML is another alternative as already mentioned.

cudaDeviceReset causes memory leak?

I tried the following code with cuda 7.0.
If I set n_repeat to 1 and remove the last cudaDeviceReset, the code runs fine.
If I set n_repeat to 1 and keep the cudaDeviceReset, I can run the code segment towards the end but I got a memory leak detected by my memory leak detector after running the program.
If I set n_repeat to 2 and keep the cudaDeviceReset, I got an error in the second time I reach cublasCreate. The error code is CUBLAS_STATUS_NOT_INITIALIZED.
Can some one let me know what is the problem here and is cudaDeviceReset for the purpose of cleaning up between different runs of using the GPU, like what I'm trying to do here?
int device_id_ = 0;
cublasHandle_t blas_;
curandGenerator_t rand_gen_;
long alloc_size = 1000;
char* raw_;
int n_repeat = 2;
for (int i = 0; i < n_repeat; ++i) {
CHECK_CUDA(cudaSetDevice(device_id_));
CHECK_CUDA(cublasCreate(&blas_));
CHECK_CUDA(curandCreateGenerator(&rand_gen_, CURAND_RNG_PSEUDO_DEFAULT));
CHECK_CUDA(cudaMalloc((void **)&raw_, alloc_size));
CHECK_CUDA(curandDestroyGenerator(rand_gen_));
CHECK_CUDA(cublasDestroy(blas_));
CHECK_CUDA(cudaFree(raw_));
CHECK_CUDA(cudaDeviceReset());
}
I had the same problem, even with the example from Robert Crovella, cuda 7 ubuntu 14.04, K40c
Adding cudaDeviceSynchronize() after cudaSetDevice and before cublasCreate() made it work for me

openCV / unhandled exception or msvcp100d.dll

I do realise that this problem is pretty common, but I have spent around 4 days so far, trying to fix it on my own, using all the smart advice I found on the Internet and - unfortunately - I've failed.
I managed to make openCV2.4.6 work with my VisualStudio 2012, or at least that's what I assumed after I was able to stream a video from my webcam with this example:
#include "stdafx.h"
#include "opencv2/opencv.hpp"
int main( int argc, const char** argv )
{
CvCapture* capture;
IplImage* newImg;
while (true)
{
capture = cvCaptureFromCAM(-1);
newImg = cvQueryFrame( capture );
cvNamedWindow("Window1", CV_WINDOW_AUTOSIZE);
cvShowImage("Window1", newImg);
int c = cvWaitKey(10);
if( (char)c == 27 ) { exit(0); }
}
cvReleaseImage(&newImg);
return 0;
}
Everything worked fine, so I decided to play around with it and I made an attempt to use a simple image processing operation such as converting rgb to grayscale. I modified my code to following:
#include "stdafx.h"
#include "opencv2/opencv.hpp"
int main( int argc, const char** argv )
{
CvCapture* capture;
IplImage* img1;
IplImage* img2;
while (true)
{
capture = cvCaptureFromCAM(-1);
img1 = cvQueryFrame( capture );
img2 = cvCreateImage(cvGetSize(img1),IPL_DEPTH_8U,1);
cvCvtColor(img1,img2,CV_RGB2GRAY);
cvNamedWindow("Window1", CV_WINDOW_AUTOSIZE);
cvNamedWindow("Window2", CV_WINDOW_AUTOSIZE);
cvShowImage("Window1", img1);
cvNamedWindow("Window2", CV_WINDOW_AUTOSIZE);
int c = cvWaitKey(10);
if( (char)c == 27 ) { exit(0); }
}
cvReleaseImage(&img1);
cvReleaseImage(&img2);
return 0;
}
And that's the place where the nightmare started. I keep getting the
Unhandled exception at at 0x000007FEFD57AA7D in opencvbegginer.exe: Microsoft C++ exception: cv::Exception at memory location 0x000000000030F920.
I did some research and tried few solutions, such as exchanging opencv_core246.lib to opencv_core246d.lib, etc. For a second I hoped it might work, but the reality punched me again with msvcp100d.dll missing. I tried to update all redistributable packages, but it didn't change the fact I keep getting this error. I tried to find out how to fix it another way and I found some forum on which they tell to go to C/C++ properties and change the Runtime Library to MTd, so... I tried this as well, but - as you can notice by now - it didn't work.
At this current moment I just ran out of ideas on how to fix this, so I would be really grateful for any help.
Cheers
PS. Important thing to add: when I got the unhandled exception, opencv 'spoke to me', saying
OpenCV Error: Bad argument in
unknown function, file ......\scr\opencv\modules\core\src\array.cpp,
line 1238
However, I already assumed way back then that I'm just not clever enough with my idiot-resistant code and I tried few other pieces of code that were written by competent people - unfortunately, I keep getting exactly the same error (the rest is the same as well, after I change the things mentioned above).
If img1 == NULL, then it crashes on cvGetSize(img1). Try enclosing the code after cvQueryFrame in an if (img1 != NULL).
However, if it returns NULL for every frame, it means there is something wrong with your camera/drivers/way you capture the frames.
You should also move the cvNamedWindow outside of the loop, since there is no need for it to be recreated for every frame.

Is the stack limit of 5287 in AS3 variable or predefined?

I ran a test just now:
function overflow(stack:int = 0):void
{
if(stack < 5290)
{
trace(stack);
overflow(stack + 1);
}
}
overflow();
This always throws a StackOverflow error after 5287 calls.
Error #1023: Stack overflow occurred.
Is this limit variable (depending on machine specs, environment, etc) or is that a flat value defined somewhere? If I change the if statement to less than 5287, I don't get the error.
Obviously it's variable. Since all the calculations you really do are located in stack (disassembly report codes show pushbyte instructions and other stuff that's working with stack, as non-operand arithmetics), this value only reports how many function contexts can be put into the stack until it overflows.
I have decided to run some tests for recursion thresholds as based on this article that was referenced in baris's comment. The results were pretty embarrassing. Test environment: FlashDevelop 3.3.4 RTM, Flash player debugger 10.1.53.64, flash compile mode: release. "Debug" mode didn't change numbers cardinally, checked that too.
Locals number Iterations (static int) Iterations (Math.random())
0 5306
1 4864 4856
2 4850 4471
3 4474 4149
4 4153 3870
5 3871 3868
6 3869 3621
7 3620 3404
8 3403 3217
9 3210 3214
10 3214 3042
11 3042 3045
10 mixed 3042 1 value was assigned Math.random() and 9 - static int
10 advancedRandom 2890 1 value was assigned a custom random with 1 parameter
Note, all of these values vary within a margin of ten between subsequent executions. The "static int" and "Math.random()" are designations of what is assigned to locals wihin the recursively called function. This, however, leads me to assume the following:
Including function calls into the recursive function adds to function context
Memory for locals is assigned along with its type, in chunks of more than 8 bytes, because adding a local does not always decrease recursion limit
Adding more than one call to a certain function does not add more memory to function context
The "memory chunk" is most likely 16 bytes long, because this value is 2^N, an addition of one int or Number local does not always decrease recursion, and this is more than 8, as a raw value of a Number variable takes 8 bytes, being a double-precision floating-point.
Assuming #4 is correct, the best value for function context size appeared to be 172 bytes, with total stack size being 912632 bytes. This largely confirms my initial assumption that the stack size is actually 1 megabyte in Flash 10. Flash 11 showed me a bit higher numbers when I have tried opening the test SWF in its debugger, but I didn't make extensive tests with it.
Hm, this is interesting. I took a look at the link that Barış gave. It seems like it might be to be with 'method complexity' after all, but I am not sure how to further test it. I am using Flash CS5, publishing for Flash Player 10, Actionscript 3 (of course).
Original:
function overflow(stack:int = 0):void {
if(stack < 5290){
trace(stack);
overflow(stack + 1);
}
}
// gives 5287
Now adding a single Math.random() call to the overflow() method:
function overflow(stack:int = 0):void {
Math.random();
if(stack < 5290){
trace(stack);
overflow(stack + 1);
}
}
// gives 4837
Adding multiple Math.random() calls make no difference, nor does storing it in a local variable or adding another parameter to the overflow() method to 'carry' that random generated value
function overflow(stack:int = 0):void {
Math.random();
Math.random();
if(stack < 5290){
trace(stack);
overflow(stack + 1);
}
}
// still gives 4837
At this point I tried different Math calls, such as:
// just the change to that 1 line:
Math.pow() // gives 4457
Math.random(), Math.sqrt(), Math.tan(), Math.log() // gives 4837
Interestingly, it doesn't seem to matter what you pass in to the Math class, but it remains constant:
Math.sqrt(5) vs Math.sqrt(Math.random()) // gives 4837
Math.tan(5) vs Math.tan(Math.random()) // gives 4837
Math.pow(5, 7) vs Math.pow(Math.random(), Math.random()) // 4457
Until I chained 3 of them:
Math.tan(Math.log(Math.random())); // gives 4457
It looks like two Math calls from that 'group' is "equal" to one Math.pow() call? =b Mixing Math.pow() and something else doesn't seem to decrease the value though:
Math.pow(Math.random(), Math.random()); // gives 4457
However, chaining two Math.pow()'s:
Math.pow(Math.pow(Math.random(), Math.random()), Math.random()); // 4133
I could go on and on, but I wonder if there is some pattern:
Results: 5287, 4837, 4457, 4133
Differences: 450 380 324
Musst be variable! Just compiled your sample and i get to 5274 before stack overflow.
#baris thats for the mxmlc compiler
+1 for stack overflow question ^^