How to get the ram footprint of a specific object in JRuby - jruby

Is there a way to query the ram footprint of a specific object in JRuby?

Have you heard about jmap, jhat, and visualvm?
jmap outputs the object memory maps / heap memory details for the given Java process.
jhat is a heap analysis tool that lets you create a Java heap dump, and then lets you query it using a SQL-like language to get more detail.
visualvm is yet another Java tool for viewing details about a Java applications as they are running in the JVM.
I found this post from Charles Nutter to be very helpful in getting me started in JRuby profiling and memory inspection.

Related

What does it mean when a dataset build "OOM"s?

I hear this term used a lot, something about a build OOMing or running out of memory; what does that mean? I'm speaking from the context of running a dataset build in Transforms Python or Transforms SQL.
OOM == OutOfMemory
This is caused in the case of a Transform by the JVM trying to allocate more memory in heap than it has available or can free using GC (Garbage Collection) This can happen e.g. in your driver when materializing huge query plans, or in executor when dealing with very large array columns or other such data that can't fit into memory.
This can also happen when JVM + non-JVM is using more non-heap memory than is available through the combination of non-used heap memory and memoryOverhead (applies both for driver and executor).
This can be caused by not having enough main JVM memory, memoryOverhead, or doing things like using too much Python memory (e.g. on your driver when using .collect() or on an executor when using UDFs)

QEMU/QMP alert when writing to memory

I'm using QEMU to test some software for a personal project and I would like to know whenever the program is writing to memory. The best solution I have come up with is to manually add print statements in the file responsible for writing to memory. Which this would require remaking the object for the file and building QEMU, if I'm correct. But I came across QMP which uses JSON commands to manipulate QEMU, which has an entire list of commands, found here: https://raw.githubusercontent.com/Xilinx/qemu/master/qmp-commands.hx.
But after looking at that I didn't really see anything that would do what I want. I am sort of a new programmer and am not that advanced. And was wondering if anyone had some idea how to go about this a better way.
Recently (9 jun 2016) there were added powerful tracing features to mainline QEMU.
Please see qemu/docs/tracing.txt file as manual.
There are a lot of events that could be traced, see
qemu/trace_events file for list of them.
As i can understand the code, the "guest_mem_before" event is that you need to view guest memory writes.
Details:
There are tracing hooks placed at following functions:
qemu/tcg/tcg-op.c: tcg_gen_qemu_st * All guest stores instructions tcg-generation
qemu/include/exec/cpu_ldst_template.h all non-tcg memory access (fetch/translation time, helpers, devices)
There historically hasn't been any support in QEMU for tracing all guest memory accesses, because there isn't any one place in QEMU where you could easily add print statements to trace them. This is because more guest memory accesses go through the "fast path", where we directly generate native host instructions which look up the host RAM address in a data structure (QEMU's TLB) and perform the load or store. It's only if this fast path doesn't find a hit in the TLB that we fall back to a slow path that's written in C.
The recent trace-events event 'tcg guest_mem_before' can be used to trace virtual memory accesses, but note that it won't tell you:
whether the access succeeded or faulted
what the data being loaded or stored was
the physical address that's accessed
You'll also need to rebuild QEMU to enable it (unlike most trace events which are compiled into QEMU by default and can be enabled at runtime.)

Serving caffe models from GPU - Achieving parallelism

I am looking for options to serve parallel predictions using caffe model from GPU. Since GPU comes with limited memory, what are the options available to achieve parallelism by loading the net only once?
I have successfully wrapped my segmentation net with tornado wsgi + flask. But at the end of the day, this is most equivalent serving from a single process. https://github.com/BVLC/caffe/blob/master/examples/web_demo/app.py.
Is having my own copy of net for each process a strict requirement, since the net is read-only after the training is done? Is it possible to rely on fork for parallelism?
I am working on a sample app which serves result from segmentation model. It utilizes copy on write and loads the net in the master once and serve memory references for the forked children. I am having trouble starting this setup in a web server setting. I get a memory error when I try to initialize the model. The web server I am using here is uwsgi.
Have anyone achieved parallelism by loading the net only once (since GPU memory is limited) and achieved parallelism for serving layer? I would be grateful if any one of you can point me in the right direction.

Understanding 'Native Memory profiling' in Chrome developer tools

I am building an application with a simple search panel with few search attributes and a result panel. In the result panel, I am rendering the data in a tabular form using Slickgrid.
After few searches (AJAX call to server), the page gets so much loaded and it eventually crashes after sometime. I have checked the DOM count and the JavaScript heap usage for possible memory leaks. I couldn't find anything wrong there. However, when I ran the experimental native memory profiler, I see that the "JavaScript external resource" section uses 600+ MB memory. On running the garbage collector, it is getting down to few MBs. I have couple of questions here:
What contributes to the "JavaScript external resource" section? I thought it corresponds to the JSON data / JavaScript sources which gets transferred from the server. FYI, the gzipped JSON response from the server is ~1MB.
Why is Chrome not releasing the memory pro-actively instead of crashing the page? Again, when I manually run the garbage collector, it is releasing the memory used by "JavaScript external resources".
How do I fix the original problem?
JS Heap Profiler makes a snapshot of the objects in the javascript but javascript code may use native memory with help of "Int8Array", "Uint8Array", "Uint8ClampedArray", "Int16Array", "Uint16Array", "Int32Array", "Uint32Array", "Float32Array" and "Float64Array".
So when you take a snapshot it will have only small wrappers that point to native memory blocks.
Unfortunately heap snapshot does not provide data about the native memory that was used for these kind of objects.
Native heap snapshot is able to count that memory and now we know that the page uses native memory via an array or via an external string.
I'd like to know how did you check that the page has no memory leaks? Did you use three snapshot technique or just checked particular objects?
Tool to track down JavaScript memory leak

Read already allocated memory / vector in Thrust

I am loading a simple variable to the GPU memory using Mathematica:
mem = CUDAMemoryLoad[{1, 2, 3}]
And get the following result:
CUDAMemory["<135826556>", "Integer32"]
Now, with this data in the GPU memory I want to access it from a separate .cu program (outside of Mathematica), using Thrust.
Is there any way to do this? If so, can someone please explain how?
No, there isn't a way to do this. CUDA contexts are private, and there is no way in the standard APIs for a process to access memory which is allocated in another processes context.
During the CUDA 4 release cycle, a new API called cudaIpc was released. This allows two processes with CUDA contexts running on the same host to export and exchange handles to GPU memory allocations. The API is only supported on Linux hosts running with unified addressing support. To the best of my knowledge Mathematica doesn't currently support this.