In Chrome's Puppeteer, at ExecutionContextDescription.auxData you can find an object containing the following properties: isDefault, type and frameId
frameId is fairly straightforward to understand, but the other properties doesn't seem documented anywhere. What makes an execution context the default one? Isn't the execution context always just one for each frame at any given time?
From https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#overview :
Frame has at least one execution context - the default execution context - where the frame's JavaScript is executed. A Frame might have additional execution contexts that are associated with extensions.
Related
How can I create a CUDA context?
The first call of CUDA is slow and I want to create the context before I launch my kernel.
The canonical way to force runtime API context establishment is to call cudaFree(0). If you have multiple devices, call cudaSetDevice() with the ID of the device you want to establish a context on, then cudaFree(0) to establish the context.
EDIT: Note that as of CUDA 5.0, it appears that the heuristics of context establishment are slightly different and cudaSetDevice() itself establishes context on the device is it called on. So the explicit cudaFree(0) call is no longer necessary (although it won't hurt anything).
Using the runtime API: cudaDeviceSynchronize, cudaDeviceGetLimit, or anything that actually accesses the context should work.
I'm quite certain you're not using the driver API, as it doesn't do that sort of lazy initialization, but for others' benefit the driver call would be cuCtxCreate.
When calling model.rayIntersect() in the Autodesk Forge viewer, I noticed that the intersects returned did not always reflect the accurate intersections unless I wait on the GEOMETRY_LOADED_EVENT.
From inspecting the non-minified source code of the viewer (here) it does not appear to me that waiting on GEOMETRY_LOADED_EVENT is necessary based on any of the operations in the rayIntersect() function. It is my understanding that we could get the mesh data of objects in the viewer simply from the fragments, which does not require the GEOMETRY_LOADED_EVENT. Is there another event I could wait on before calling model.rayIntersect() that may fire more quickly?
I am working to perform this intersection calculation on large models in a headless form of the viewer, so waiting on the GEOMETRY_LOADED_EVENT can take quite some time, so I would prefer not to wait for it to finish.
The hit-testing logic in Forge Viewer is pretty complex and may use different approaches (such as hit-testing a BVH, hit-testing individual meshes, or checking pixels in an "ID buffer") depending on your environment and settings.
The BVH is computed by the viewer after it receives the "fragment list" with bounding boxes of all fragments (it's an async operation that may take a while), and the ID buffer is generated as part of the standard rendering pipeline, so for these to work, you should actually wait for the Autodesk.Viewing.GEOMETRY_EVENT_LOADED event.
CUDA 10 added runtime API calls for putting streams (= queues) in "capture mode", so that instead of executing, they are returned in a "graph". These graphs can then be made to actually execute, or they can be cloned.
But what is the rationale behind this feature? Isn't it unlikely to execute the same "graph" twice? After all, even if you do run the "same code", at least the data is different, i.e. the parameters the kernels take likely change. Or - am I missing something?
PS - I skimmed this slide deck, but still didn't get it.
My experience with graphs is indeed that they are not so mutable. You can change the parameters with 'cudaGraphHostNodeSetParams', but in order for the change of parameters to take effect, I had to rebuild the graph executable with 'cudaGraphInstantiate'. This call takes so long that any gain of using graphs is lost (in my case). Setting the parameters only worked for me when I build the graph manually. When getting the graph through stream capture, I was not able to set the parameters of the nodes as you do not have the node pointers. You would think the call 'cudaGraphGetNodes' on a stream captured graph would return you the nodes. But the node pointer returned was NULL for me even though the 'numNodes' variable had the correct number. The documentation explicitly mentions this as a possibility but fails to explain why.
Task graphs are quite mutable.
There are API calls for changing/setting the parameters of task graph nodes of various kinds, so one can use a task graph as a template, so that instead of enqueueing the individual nodes before every execution, one changes the parameters of every node before every execution (and perhaps not all nodes actually need their parameters changed).
For example, See the documentation for cudaGraphHostNodeGetParams and cudaGraphHostNodeSetParams.
Another useful feature is the concurrent kernel executions. Under manual mode, one can add nodes in the graph with dependencies. It will explore the concurrency automatically using multiple streams. The feature itself is not new but make it automatic becomes useful for certain applications.
When training a deep learning model it happens often to re-run the same set of kernels in the same order but with updated data. Also, I would expect Cuda to do optimizations by knowing statically what will be the next kernels. We can imagine that Cuda can fetch more instructions or adapt its scheduling strategy when knowing the whole graph.
CUDA Graphs is trying to solve the problem that in the presence of too many small kernel invocations, you see quite some time spent on the CPU dispatching work for the GPU (overhead).
It allows you to trade resources (time, memory, etc.) to construct a graph of kernels that you can use a single invocation from the CPU instead of doing multiple invocations. If you don't have enough invocations, or your algorithm is different each time, then it won't worth it to build a graph.
This works really well for anything iterative that uses the same computation underneath (e.g., algorithms that need to converge to something) and it's pretty prominent in a lot of applications that are great for GPUs (e.g., think of the Jacobi method).
You are not going to see great results if you have an algorithm that you invoke once or if your kernels are big; in that case the CPU invocation overhead is not your bottleneck. A succinct explanation of when you need it exists in the Getting Started with CUDA Graphs.
Where task graph based paradigms shine though is when you define your program as tasks with dependencies between them. You give a lot of flexibility to the driver / scheduler / hardware to do scheduling itself without much fine-tuning from the developer's part. There's a reason why we have been spending years exploring the ideas of dataflow programming in HPC.
I've been reading a bit about how to use the inline metadata when using the ASC 2.0 compiler.
However I can't find any source of info explaining why I should use them.
Anyone knows?.
Functions induce overhead in any programming language. Per ActionScript, when function execution begins, a number of objects and properties are created.
First, an activation object is created that stores the parameters and local variables declared in a function body. It's an internal mechanism that cannot be directly accessed.
Second, a scope chain is created that contains an ordered list of objects that Flash platform checks for identifier declarations. Every function that executes has a scope chain that is stored in an internal property.
Function closures maintain a snapshot of a function and its lexical environment.
Moving code inline reduces the creation of these objects, and how references are maintained on the stack. Per Flash, you may see 4x performance increase.
Of course, there are tradeoffs - without the inline keyword induces code complexity; as well, inlining code increasing the amount of bytecode. Besides larger applications, the virtual machine spends additional time verifying and JIT compiling.
To simplify, inline is some sort of copy/paste of code. Since method calls are expensive and cost execution time, using inline keyword will copy/paste the body of the method each time the method call is present in your code so the method call will be replaced by the body of the method instead. Since this is done at compilation time it will increase in theory the size of the resulting app (if an inline method is called 10 times its body will be copied and pasted 10 times) but since all calls will be replaced you will gain speed execution. This is of course only relevant for demanding code execution like loops running at each frame for example.
Summary:
I'm trying to find out if a single method can be executed twice in overlap when executing on a single thread. Or if two different methods can be executed in overlap, where when they share access to a particular variable, some unwanted behaviour can occur.
Ex of a single method:
var ball:Date;
method1 ():Date {
ball = new Date();
<some code here>
return ball;
}
Questions:
1) If method1 gets fired every 20ms using the event system, and the whole method takes more than 20ms to execute, will the method be executed again in overlap?
2) Are there any other scenarios in a single thread environment where a method(s) can be executed in overlap, or is the AVM2 limited to executing 1 method at a time?
Studies: I've read through https://www.adobe.com/content/dam/Adobe/en/devnet/actionscript/articles/avm2overview.pdf which explains that the AVM2 has a stack for running code, and the description for methods makes it seem that if there isn't a second stack, the stack system can only accomodate 1 method execution at a time. I'd just like to double check with the StackeOverflow experts to see for sure.
I'm dealing with some time sensitive data, and have to make sure a method isn't changing a variable that is being accessed by another method at the same time.
ActionScript is single-threaded; although, can support concurrency through ActionScript workers, which are multiple SWF applications that run in parallel.
There are asynchronous patterns, if you want a nested function, or anonymous function to execute within the scope chain of a function.
What I think you're referring to is how AVM2 executes event-driven code, to which you should research AVM2 marshalled slice. Player events are executed at the beginning of the slice.
Heavy code execution will slow frame rate.
It's linear - blocking synchronously. Each frame does not invoke code in parallel.
AVM2 executes 20 millisecond marshalled slices, which depending on frame rate executes user actions, invalidations, and rendering.