The WebAPI specs mention that importScripts loads scripts synchronously
Here's the chromium code which implements this.
However, my understanding is that web apis run in a separate thread pool. Since importScripts is a web api offering to begin with, it should run separately from the worker thread and not pause worker execution. Then what does synchronously mean in this context?
(Answering my own question, because I have an insight)
There are two facts pointed out in the question :
importScripts is synchronous
importSripts does not run on the worker thread (i.e. it runs outside the V8 runtime)
My misconception was these two facts are contradictory. They're not!
Being a web api, it is implemented as native browser code. But that does not guarantee concurrency.
Another example is when you call XmlHttpRequest.send() with synchronous = true.
Bottomline : running separately from the JS runtime thread does not guarantee asynchronous.
Related
If I register a callback via cudaStreamAddCallback(), what thread is going to run it ?
The CUDA documentation says that cudaStreamAddCallback
adds a callback to be called on the host after all currently enqueued items in the stream have completed. For each cudaStreamAddCallback call, a callback will be executed exactly once. The callback will block later work in the stream until it is finished.
but says nothing about how the callback itself is called.
Just to flesh out comments so that this question has an answer and will fall off the unanswered queue:
The short answer is that this is an internal implementation detail of the CUDA runtime and you don't need to worry about it.
The longer answer is that if you look carefully at the operation of the CUDA runtime, you will notice that context establishment on a device (be it explicit via the driver API, or implicit via the runtime API) spawns a small thread pool. It is these threads which are used to implement features of the runtime like stream command queues and call back operations. Again, an internal implementation detail which the programmer doesn't need to know about.
I want my CPU and GPU to overlap computation, however, my GPU code contains some synchronous function calls like cudaBindTextureToArray() and cudaUnbindTexture() for which no asynchronous counterparts exists. Will these calls calls break GPU-CPU concurrency?
In general, the functions that may be asynchronous are listed here:
- •Kernel launches;
- •Memory copies between two addresses to the same device memory;
- •Memory copies from host to device of a memory block of 64 KB or less;
- •Memory copies performed by functions that are suffixed with Async;
- •Memory set function calls.
Asynchronous functions usually have an Async suffix, and they will usually accept a stream parameter.
Functions that don't meet the above description should be assumed to be synchronous. Specific exceptions (like cudaSetDevice()) are usually evident from their description.
In the context of a single-device system, synchronous functions (with the exception of specific stream synchronizing functions like cudaStreamSynchronize and cudaStreamWaitEvent) will:
Wait to begin until all cuda activity has completed (i.e. all previous cuda API calls and kernel calls have completed)
Execute their designated activity (e.g. cudaMemcpy() will begin the designated copy operation after step 1 is complete)
Release the calling (host) thread after step 2 is complete
Therefore the calling (host) thread is blocked from the moment the cudaMemcpy() call is made until all previous cuda activity is complete and the cudaMemcpy() call is complete. I think most people would say this may "break" GPU-CPU concurrency, because for the duration of the sequence described above (steps 1-3) the CPU thread is effectively doing nothing.
Whether or not it makes much difference in your application will depend on what is happening before and after the synchronous call in question.
I'm trying to run two procedures in parallel. As TCL is the interpreter, it will process procedures one by one. Can someone explain with an example how I can use multi-threading in TCL?
These days, the usual way to do multi-threading in Tcl is to use its Thread extension — it's being developed along with the Tcl's core, but on certain platforms (such as various Linux-based OSes) you might need to install a separate package to get this extension available.
The threading model the Thread extension implements is "one thread per interpreter". This means, each thread can "host" just one Tcl interpreter (and an unlimited number of its child interpreters), but no code executed by any thread may access interpreters hosted in other threads. This, in turn, means that when you work with threads in Tcl, you have to master the idea of multiple interpreters.
The classical approach to exchanging data between interpreters running in different threads is message passing: you post scripts to the input queue of the target interpreter running in different thread and then wait for reply. On the other hand, thread-shared variables (implementing sharing memory by locking) is also available. Another available feature is support for thread pools.
Read the "Tcl and threads" wiki page, the Thread's extension manual pages.
The code examples are on the wiki. Here's just one of them.
Please note that if your procedures which, you think, have to be run in parrallel, are mostly I/O bound (that is, they read something from the network and/or send something there) and not CPU-bound (doing heavy computations), you might have better results with the event-based approach to processing: the Tcl has built-in support for the event loop, and you are able to make Tcl execute your code when the next chunk of data can be read from a channel (such as a network socket) or written to a channel.
I have a VxWorks application running on ARM uC.
First let me summarize the application;
Application consists of a 3rd party stack and a gateway application.
We have implemented an operating system abstraction layer to support OS in-dependency.
The underlying stack has its own memory management&control facility which holds memory blocks in a doubly linked list.
For instance ; we don't directly perform malloc/new , free/delege .Instead we call OSA layer's routines and it gets the memory from OS and puts it in a list then returns this memory to application.(routines : XXAlloc , XXFree,XXReAlloc)
And when freeing the memory we again use XXFree.
In fact this block is a struct which has
-magic numbers indication the beginning and end of memory block
-size that user requested allocated
-size in reality due to alignment issue previous and next pointers
-pointer to piece of memory given back to application. link register that shows where in the application xxAlloc is called.
With this block structure stack can check if a block is corrupted or not.
Also we have pthread library which is ported from Linux that we use to
-create/terminate threads(currently there are 22 threads)
-synchronization objects(events,mutexes..)
There is main task called by taskSpawn and later this task created other threads.
this was a description of application and its VxWorks interface.
The problem is :
one of tasks suddenly gets destroyed by VxWorks giving no information about what's wrong.
I also have a jtag debugger and it hits the VxWorks taskDestoy() routine but call stack doesn't give any information neither PC or r14.
I'm suspicious of specific routine in code where huge xxAlloc is done but problem occurs
very sporadic giving no clue that I can map it to source code.
I think OS detects and exception and performs its handling silently.
any help would be great
regards
It resolved.
I did an isolated test. Allocated 20MB with malloc and memset with 0x55 and stopped thread of my application.
And I wrote another thread which checks my 20MB if any data else than 0x55 is written.
And quess what!! some other thread which belongs other components in CPU (someone else developed them) write my allocated space.
Thanks 4 your help
If your task exits, taskDestroy() is called. If you are suspicious of huge xxAlloc, verify that the allocation code is not calling exit() when memory is exhausted. I've been bitten by this behavior in a third party OSAL before.
Sounds like you are debugging after integration; this can be a hell of a job.
I suggest breaking the problem into smaller pieces.
Process
1) you can get more insight by instrumenting the code and/or using VxWorks intrumentation (depending on which version). This allows you to get more visibility in what happens. Be sure to log everything to a file, so you move back in time from the point where the task ends. Instrumentation is a worthwile investment as it will be handy in more occasions. Interesting hooks in VxWorks: Taskhooklib
2) memory allocation/deallocation is very fundamental functionality. It would be my first candidate for thorough (unit) testing in a well-defined multi-thread environment. If you have done this and no errors are found, I'd first start to look why the tas has ended.
other possible causes
A task will also end when the work is done.. so it may be a return caused by a not-so-endless loop. Especially if it is always the same task, this would be my guess.
And some versions of VxWorks have MMU support which must be considered.
I have a Application A(by some company X). This application allows me to extend the functionality by allowing me to write my own functions.
I tell the Application A to call my user functions in the Applications A's configuration file (this is how it knows that Appl A must call user written Functions). The appl A uses Function pointers which I must register with Application A prior to calling my user written functions.
If there is a bug or fault in my user written functions in production, the Appl A will stop functioning. For example, if I have a segmentation fault in my User written functions.
So Application A will load my user written function from a shared DLL file. This means that my user written functions will be running in Application A' Process address space.
I wish to handle certain signals like Segmentation fault, divide by zero and stack overflow, but applications A has its own signal handlers written for this,
How can I write my own signal handlers to catch the exceptions in my user written functions, so that I can clean up gracefully w/o affecting much of Application A? Since my user functions will be called in Applications A's process, the OS will call signal handlers written in Application A and not my user functions.
How can I change this? I want OS to call signal handlers written in my functions but only for signal raised by my functions, which is asynchronous in nature.
Note: I do not have the source code of Application A and I cannot make any changes to it, because it's controlled by a different company.
I will be using C , and only C on a Linux, solaris and probably windows platforms.
You do not specify which platform you're working with, so I'll answer for Linux, and it should be valid for Windows as well.
When you set your signal handlers, the system call that you use returns the previous handler. It does it so that you can return it once you are no longer interested in handling that signal.
Linux man page for signal
MSDN entry on signal
Since you are a shared library loaded into the application you should have no problems manipulating the signals handlers. Just make sure to override the minimum you need in order to reduce the chances of disrupting the application itself (some applications use signals for async notifications).
The cleanest way to do this would be run your application code in a separate process that communicates with the embedded shared DLL code via some IPC mechanism. You could handle whatever signals you wanted in your process without affecting the other process. Typically the conditions you mention (seg fault, divide by zero, stack overflow) indicate bugs in the program and will result in termination. There isn't much you can do to "handle" these beyond fixing the root cause of the bug.
in C++, you can catch these by putting your code in a try-catch:
try
{
// here goes your code
}
catch ( ... )
{
// handle segfaults
}