Griffon applications - memory usage - swing

I have a question referring to Griffon.
Is there a way to decrease memory consumption of griffon applications?
Actually the sample griffon application process (just single window with
label) takes in Windows ~80MB. Is there a way to change something to
visibly decrease this basic memory usage?
Griffon is a great solution etc. but my customer complains that a simple
application takes such an amount of memory (more than e.g. Word, Outlook,
or most of complicated Java application - comparable with whole).

A barebones Griffon app (just by calling create-app and nothing more) reports 49M of memory usage. In terms of file size it's a bit above 7M. Whereas a java base Griffon app (griffon create-app sample --file-type=java) rises up to 42M of memory usage; same file size.
This is of course using default settings provided by the run-app command. Further memory configuration settings may be applied to limit and streamline resource consumption.

Related

Using CUDA GPUs at prediction time for high througput streams

We're trying to develop a Natural Language Processing application that has a user facing component. The user can call models through an API, and get the results back.
The models are pretrained using Keras with Theano. We use GPUs to speed up the training. However, prediction is still sped up significantly by using the GPU. Currently, we have a machine with two GPUs. However, at runtime (e.g. when running the user facing bits) there is a problem: multiple Python processes sharing the GPUs via CUDA does not seem to offer a parallelism speed up.
We're using nvidia-docker with libgpuarray (pygpu), Theano and Keras.
The GPUs are still mostly idle, but adding more Python workers does not speed up the process.
What is the preferred way of solving the problem of running GPU models behind an API? Ideally we'd utilize the existing GPUs more efficiently before buying new ones.
I can imagine that we want some sort of buffer before sending it off to the GPU, rather than requesting a lock for each HTTP call?
This is not an answer to your more general question, but rather an answer based on how I understand the scenario you described.
If someone has coded a system which uses a GPU for some computational task, they have (hopefully) taken the time to parallelize its execution so as to benefit from the full resources the GPU can offer, or something close to that.
That means that if you add a second similar task - even in parallel - the total amount of time to complete them should be similar to the amount of time to complete them serially, i.e. one after the other - since there are very little underutilized GPU resources for the second task to benefit from. In fact, it could even be the case that both tasks will be slower (if, say, they both somehow utilize the L2 cache a lot, and when running together they thrash it).
At any rate, when you want to improve performance, a good thing to do is profile your application - in this case, using the nvprof profiler or its nvvp frontend (the first link is the official documentation, the second link is a presentation).

Serving caffe models from GPU - Achieving parallelism

I am looking for options to serve parallel predictions using caffe model from GPU. Since GPU comes with limited memory, what are the options available to achieve parallelism by loading the net only once?
I have successfully wrapped my segmentation net with tornado wsgi + flask. But at the end of the day, this is most equivalent serving from a single process. https://github.com/BVLC/caffe/blob/master/examples/web_demo/app.py.
Is having my own copy of net for each process a strict requirement, since the net is read-only after the training is done? Is it possible to rely on fork for parallelism?
I am working on a sample app which serves result from segmentation model. It utilizes copy on write and loads the net in the master once and serve memory references for the forked children. I am having trouble starting this setup in a web server setting. I get a memory error when I try to initialize the model. The web server I am using here is uwsgi.
Have anyone achieved parallelism by loading the net only once (since GPU memory is limited) and achieved parallelism for serving layer? I would be grateful if any one of you can point me in the right direction.

How to reduce memory usage when creating windows metro applications?

I created a metro style app or windows 8,but on using that app my CPU memory usage is always going high,not a single byte is reduced after started running.I think this s because of not properly handling memory.Any body have suggestions for effective handling memory?
I think windows rt handles memory on its own ... so no garbage collection or releasing.
I have a pretty memory intensive app and I find that the following really impacts memory:
lots of ui elements in XAML files
large predefined arrays (e.g., string[,] arr = new string[100,100])
not using incremental loading and ItemsControl
If you provide more info about what your app contains/does the community can help you a bit more.

Windows Phone - Background Task - Broken at DataContractJsonSerializer.WriteObject

I am using Background Task in Windows Phone Mango. I need to send data to server using JSON format. But when DataContractJsonSerializer.WriteObject function is executed, nothing happens thereafter.
Has anyone experienced the same with Background Task in Windows Phone Mango?
It is possible that the operation is taking your app over the 6MB memory limit, and the phone is killing it.
You can run with the debugger attached: http://msdn.microsoft.com/en-us/library/microsoft.phone.scheduler.scheduledactionservice.launchfortest(v=vs.92).aspx
This will let you see what is happening. Also consider logging the amount of memory your app is using to see if you are approaching the limit: http://msdn.microsoft.com/en-us/library/microsoft.phone.info.devicestatus(v=vs.92).aspx
Be careful calling any type of serialization library (or any other library for that matter) as it will very quickly bump your memory usage over the 6MB limit, which will silently kill your agent with no errors.
Also note that on a real device your agent will typically start with 4-4.5 meg used already, significantly higher than on the emulator. That means all your code and the libraries it calls need to use less than 1.5 meg in a worst-case scenario.

In-memory function calls

What are in-memory function calls? Could someone please point me to some resource discussing this technique and its advantages. I need to learn more about them and at the moment do not know where to go. Google does not seem to help as it takes me to the domain of cognition and nervous system etc..
Assuming your explanatory comment is correct (I'd have to see the original source of your question to know for sure..) it's probably a matter of either (a) function binding times or (b) demand paging.
Function Binding
When a program starts, the linker/loader finds all function references in the executable file that aren't resolvable within the file. It searches all the linked libraries to find the missing functions, and then iterates. At least the Linux ld.so(8) linker/loader supports two modes of operation: LD_BIND_NOW forces all symbol references to be resolved at program start up. This is excellent for finding errors and it means there's no penalty for the first use of a function vs repeated use of a function. It can drastically increase application load time. Without LD_BIND_NOW, functions are resolved as they are needed. This is great for small programs that link against huge libraries, as it'll only resolve the few functions needed, but for larger programs, this might require re-loading libraries from disk over and over, during the lifetime of the program, and that can drastically influence response time as the application is running.
Demand Paging
Modern operating system kernels juggle more virtual memory than physical memory. Each application thinks it has access to an entire machine of 4 gigabytes of memory (for 32-bit applications) or much much more memory (for 64-bit applications), regardless of the actual amount of physical memory installed in the machine. Each page of memory needs a backing store, a drive space that will be used to store that page if the page must be shoved out of physical memory under memory pressure. If it is purely data, the it gets stored in a swap partition or swap file. If it is executable code, then it is simply dropped, because it can be reloaded from the file in the future if it needs to be. Note that this doesn't happen on a function-by-function basis -- instead, it happens on pages, which are a hardware-dependent feature. Think 4096 bytes on most 32 bit platforms, perhaps more or less on other architectures, and with special frameworks, upwards of 2 megabytes or 4 megabytes. If there is a reference for a missing page, the memory management unit will signal a page fault, and the kernel will load the missing page from disk and restart the process.