How do I obtain a view of all of the modules, the users, mode, job type, as well of the specifics about the spark module (number of executors, memory etc) in Code Workbook?
You can access this view using the admin account: https://<FOUNDRY_URL>/workspace/vector/admin
This will provide you with a detailed view of the modules, number of executors, driver cores & memory, executor cores & memory.
Related
I have one GPU at my disposal for deployment but multiple models need to be deployed. I don't want to allocate the full GPU memory to the first deployed model because then I can't deploy my subsequent models. While training, this could be controlled using gpu_memory_fraction parameter.
I am using the following command to deploy my model -
tensorflow_model_server --port=9000 --model_name=<name of model> --model_base_path=<path where exported models are stored &> <log file path>
Is there a flag that I can set to control the gpu memory allocation?
Thanks
The new TF Serving allowed to set flag per_process_gpu_memory_fraction in this pull request
I have just add one flag to config gpu memory fraction.
https://github.com/zhouyoulie/serving
I'm using QEMU to test some software for a personal project and I would like to know whenever the program is writing to memory. The best solution I have come up with is to manually add print statements in the file responsible for writing to memory. Which this would require remaking the object for the file and building QEMU, if I'm correct. But I came across QMP which uses JSON commands to manipulate QEMU, which has an entire list of commands, found here: https://raw.githubusercontent.com/Xilinx/qemu/master/qmp-commands.hx.
But after looking at that I didn't really see anything that would do what I want. I am sort of a new programmer and am not that advanced. And was wondering if anyone had some idea how to go about this a better way.
Recently (9 jun 2016) there were added powerful tracing features to mainline QEMU.
Please see qemu/docs/tracing.txt file as manual.
There are a lot of events that could be traced, see
qemu/trace_events file for list of them.
As i can understand the code, the "guest_mem_before" event is that you need to view guest memory writes.
Details:
There are tracing hooks placed at following functions:
qemu/tcg/tcg-op.c: tcg_gen_qemu_st * All guest stores instructions tcg-generation
qemu/include/exec/cpu_ldst_template.h all non-tcg memory access (fetch/translation time, helpers, devices)
There historically hasn't been any support in QEMU for tracing all guest memory accesses, because there isn't any one place in QEMU where you could easily add print statements to trace them. This is because more guest memory accesses go through the "fast path", where we directly generate native host instructions which look up the host RAM address in a data structure (QEMU's TLB) and perform the load or store. It's only if this fast path doesn't find a hit in the TLB that we fall back to a slow path that's written in C.
The recent trace-events event 'tcg guest_mem_before' can be used to trace virtual memory accesses, but note that it won't tell you:
whether the access succeeded or faulted
what the data being loaded or stored was
the physical address that's accessed
You'll also need to rebuild QEMU to enable it (unlike most trace events which are compiled into QEMU by default and can be enabled at runtime.)
I am looking for options to serve parallel predictions using caffe model from GPU. Since GPU comes with limited memory, what are the options available to achieve parallelism by loading the net only once?
I have successfully wrapped my segmentation net with tornado wsgi + flask. But at the end of the day, this is most equivalent serving from a single process. https://github.com/BVLC/caffe/blob/master/examples/web_demo/app.py.
Is having my own copy of net for each process a strict requirement, since the net is read-only after the training is done? Is it possible to rely on fork for parallelism?
I am working on a sample app which serves result from segmentation model. It utilizes copy on write and loads the net in the master once and serve memory references for the forked children. I am having trouble starting this setup in a web server setting. I get a memory error when I try to initialize the model. The web server I am using here is uwsgi.
Have anyone achieved parallelism by loading the net only once (since GPU memory is limited) and achieved parallelism for serving layer? I would be grateful if any one of you can point me in the right direction.
As stated in the title, how can I collect the data when memory is accessed?
I have done some modification in the function cpu_physical_memory_rw
but it does not work.
There is no single place where you can intercept all writes to guest RAM, because for speed reasons QEMU's fast path code for RAM access involves directly generated x86 instructions which load the data from the host memory which has been assigned as guest RAM, and execution doesn't come out to a C function at all. QEMU is designed for speed, not ease of instrumentation.
I'm working on a cluster with a lot of nodes, and each node has two gpus. In the cluster, I can't launch "nvidia-smi" to check which device is busy. My code selects the best device (with cudaChooseDevice) in terms of capability, but when the cluster assign me the same node for two different jobs, then I have two tasks running on the same gpu.
My question is: There is a way to check at runtime if the device is busy or not?
Thanks
Your cluster managers should install and use cluster management (job-scheduling) software that allows them to assign and track GPUs just like CPUs and memory. There are a number of job schedulers that can do this. Even without explicit GPU support in the job-scheduler, it's possible to build job entry/exit scripts that will assign GPUs properly.
You can effectively include the same functionality that nvidia-smi uses by embedding NVML in your applications. Any query or data item reported on by nvidia-smi can be accessed programmatically through NVML.
It's also not clear to me why you could not launch a script for your job which checks which devices are busy using nvidia-smi, then picks an un-busy device.
But keep in mind that any runtime check you might do would be subject to the behavior of other applications. If those applications (whether launched by you or other users) have unusual behavior, your runtime check can easily be defeated.