I am new to Flink on Yarn deployment.
As far as I know, Flink always reserves the resources (CPUs, RAM) to run on Yarn. And, a Flink platform is considered as a Yarn application.
However, is there any way to configure in such a way that allows Yarn to dynamically allocate resources to Flink?
At the moment there is no such way. However, we already the bits and pieces in place to implement a dynamic resource allocation. See the YarnFlinkResourceManager for more information.
Update
With Flink 1.5.0, full resource elasticity was added. This means that Flink now dynamically allocates new containers if it needs more slots to execute a job.
Related
As all information I found about Qemu is related to Linux kernel, uboot or elf binaries I can't quite figure out how to load a binary blob from an embedded device into a specific address and execute part of it. The code I want to run does only arithmetics, so there are no hardware dependencies involved.
I would start qemu with something like
qemu-arm -singlestep -g8000
attach gdb, set initial register state and jump to my starting address to single step through it.
But how do I initially load binary data to a specific address and eventually set up an additional ram range?
how to load a binary blob from an embedded device into a specific address and execute part of it.
You can load binary blob into softmmu QEMU by the generic loader (-device loader).
I would start qemu with something like
qemu-arm -singlestep -g8000
This command line is for the linux-user QEMU invocation. It emulates userspace linux process of the guest architecture, it is unprivileged and does not provide support for any devices, including generic loader. Try using qemu-system-arm instead.
It's in fact easy with the Unicorn framework which works on top of Qemu. Based on the example in the websites doc section I wrote a Python script which loads the data, sets the registers, adds a hook which prints important per step information and start execution at the desired address until a target address.
I am using compute engine for embarrassingly parallel scientific calculations. Some of my calculations require a single core and some require 64-cores machines. I am currently using my own scripts: I have a qsub-like command that creates a new instance with the required number of cores, booting it from a custom image with the pre-installed software, connects to a storage bucket via gcsfuse, runs the required command and then kills the instance after it's done.
Do I really need to do all of that with my own scripts, or is there any tool that I should use instead? I'd much rather use some ready made tool for all of the management.
My usage fluctuates widely (hundreds of cores in parallel for 3 hours, then 2 days with nothing, etc). So I don't want constant sized machines: I like to be billed by the minute for my computations.
You may want to use auto-scaling feature for managed instance group in Google Compute Engine(GCE). This feature adds more instances to your instance group when there is more load (upscaling), and removes instances when there is less load (downscaling). Moreover, you can define autoscaling policy based upon CPU utilization, or Load balancer utilization or request per seconds. Please refer autoscaler decisions document to understand decisions that autoscaler might make when scaling instance groups.
I have one GPU at my disposal for deployment but multiple models need to be deployed. I don't want to allocate the full GPU memory to the first deployed model because then I can't deploy my subsequent models. While training, this could be controlled using gpu_memory_fraction parameter.
I am using the following command to deploy my model -
tensorflow_model_server --port=9000 --model_name=<name of model> --model_base_path=<path where exported models are stored &> <log file path>
Is there a flag that I can set to control the gpu memory allocation?
Thanks
The new TF Serving allowed to set flag per_process_gpu_memory_fraction in this pull request
I have just add one flag to config gpu memory fraction.
https://github.com/zhouyoulie/serving
I'm using QEMU to test some software for a personal project and I would like to know whenever the program is writing to memory. The best solution I have come up with is to manually add print statements in the file responsible for writing to memory. Which this would require remaking the object for the file and building QEMU, if I'm correct. But I came across QMP which uses JSON commands to manipulate QEMU, which has an entire list of commands, found here: https://raw.githubusercontent.com/Xilinx/qemu/master/qmp-commands.hx.
But after looking at that I didn't really see anything that would do what I want. I am sort of a new programmer and am not that advanced. And was wondering if anyone had some idea how to go about this a better way.
Recently (9 jun 2016) there were added powerful tracing features to mainline QEMU.
Please see qemu/docs/tracing.txt file as manual.
There are a lot of events that could be traced, see
qemu/trace_events file for list of them.
As i can understand the code, the "guest_mem_before" event is that you need to view guest memory writes.
Details:
There are tracing hooks placed at following functions:
qemu/tcg/tcg-op.c: tcg_gen_qemu_st * All guest stores instructions tcg-generation
qemu/include/exec/cpu_ldst_template.h all non-tcg memory access (fetch/translation time, helpers, devices)
There historically hasn't been any support in QEMU for tracing all guest memory accesses, because there isn't any one place in QEMU where you could easily add print statements to trace them. This is because more guest memory accesses go through the "fast path", where we directly generate native host instructions which look up the host RAM address in a data structure (QEMU's TLB) and perform the load or store. It's only if this fast path doesn't find a hit in the TLB that we fall back to a slow path that's written in C.
The recent trace-events event 'tcg guest_mem_before' can be used to trace virtual memory accesses, but note that it won't tell you:
whether the access succeeded or faulted
what the data being loaded or stored was
the physical address that's accessed
You'll also need to rebuild QEMU to enable it (unlike most trace events which are compiled into QEMU by default and can be enabled at runtime.)
I am looking for options to serve parallel predictions using caffe model from GPU. Since GPU comes with limited memory, what are the options available to achieve parallelism by loading the net only once?
I have successfully wrapped my segmentation net with tornado wsgi + flask. But at the end of the day, this is most equivalent serving from a single process. https://github.com/BVLC/caffe/blob/master/examples/web_demo/app.py.
Is having my own copy of net for each process a strict requirement, since the net is read-only after the training is done? Is it possible to rely on fork for parallelism?
I am working on a sample app which serves result from segmentation model. It utilizes copy on write and loads the net in the master once and serve memory references for the forked children. I am having trouble starting this setup in a web server setting. I get a memory error when I try to initialize the model. The web server I am using here is uwsgi.
Have anyone achieved parallelism by loading the net only once (since GPU memory is limited) and achieved parallelism for serving layer? I would be grateful if any one of you can point me in the right direction.