Why does Flash Builder 4.6 Profiler seem to leak Strings, whereas Debug mode GC's as expected

Why does Flash Builder 4.6 Profiler seem to leak Strings, whereas Debug mode GC's as expected - actionscript-3

While unit profiling my classes I noticed that the String class endlessly accumulates (eating up over 90% of the memory in my sizable app). Luckily this is only while running in Profiler mode of Flash Builder 4.6. In debug or deployment (as AIR) memory usage levels off as expected using embedded on-screen memory profilier (Mr Doobs Stats).
To verify I made a test app that was simply a URLLoader continuously loading a text file. When running in Profilier mode using URLLoaderDataFormat.String the String data is never GC'd and grows continuously whereas using URLLoaderDataFormat.BINARY the data is nearly immediately GC'd and stays level.
I hesitate to call this a bug, because possibly it's necessary part of the way the Profilier works… but perhaps this is abnormal for the Profiler? This is the essence of my StackOverflow inquiry.
At any rate, this burned up a couple work-days for me, so if you're Googling wondering why the String class is growing like crazy and never getting GC'd consider measuring your apps memory usage outside the Profilier to verify. In my case I was mislead into thinking I had run into some problem with Master Strings — though it's good understand Master Strings and their impact on memory (see:) don't get mislead like I did.

Related

External FLASH slow verification with STM32cubeProgrammer

I'm working with an STM32F469 chip with a Micron MT25Q Quad_SPI Flash. To program the Flash, there needs to be an external loader program developed. That's all working, but the problem is that verification of the QSPI Flash is extremely slow.
Looking in the log file, it shows that the Flash is being programmed in 150K byte blocks. However, the verification is being done in 1K byte blocks. In addition, the chip is re-initialized before each block check. I've tried this with both through STM32cubeIDE and in STM32cubeProgrammer directly.
The external programmer program include the correct chip configuration information and specifies a 64K page size. I don't see how to get the programmer to use a larger block size. It looks like it understands what part of the SRAM is used and is using the balance of the 256K in the on-board SRAM for programming the QSPI Flash. It could use the same size for reading the data back or use the Verify() function in the external loader. It's calling Read() and then checking the data itself.
Any thought or hints?
Let me add some observations on creating a new external loader. The first observation is "Don't." If you can pick a supported external chip and pin it out to use an existing loader, then do that. STM provides just 4 example programs but they must have 50 external loaders. If the hardware design copies the schematic for a demo board that has an external loader, you should be fine and avoid doing the development work.
The external loader is not a complete executable. It provides a set of functions to do basic operations like Init(), Erase(), Read() and Write(). The trick is that there is no main() and no start-up code is run when the program starts.
The external loader is an ELF file, renamed to "*.stldr". The programming tool looks into the debug information to find the location of the functions. It then sets the registers to provide the parameters, the PC to run the function, and then let's it run. There's some super-clever work going on to make this work. The programmer looks at the returned value (R0) to see if things pass or not. It can also figure out if the function has crashed the core or otherwise timed-out.
What makes writing the external super fun is that the debugger is running the program so there's no debugger available to see what the code is doing. I settled on outputting errors, and encoded information, on the return() from the called functions to give hints as to what was happening.
The external loader isn't a "full" program. Without the startup code, lots of on-chip stuff isn't set up and some just isn't going to work. At least I couldn't figure it out. I'm not sure if it wasn't configured right or the debugger was blocking its use. Looking at the example external loaders, they are written in a very simple way and do not call the HAL or use interrupts. You'll need to provide core set-up functions to configure the clock chains. That Hal_Delay() method will never return as the timers and/or interrupts aren't working. I could never make them work and suspect the NVIC was somehow being disabled. I ended up replacing the HAL_delay() function with a for loop that spun based on core clock rate and a the instruction cycles per loop.
The app note suggests developing a stand-alone program to debug the basic capabilities. That's a good idea but a challenge. Prior to starting the external loader, I had the QSPI doing the needed operations but from a C++ application calling the HAL. Creating an external loader from that was a long exercise in stripping out and replacing functionality. A hint is that the examples are written at a register level. I'm not that good to deal directly with the QuadSPI peripheral and the chip's instruction set at the same time.
The normal start-up of a program is eliminated. Everything that's done before the main() is called (E.g., in startup_stm32f469nihx.s) is up to you. This includes setting the clock chains to boost the core clock and get the peripheral buses working. The program runs in the on-chip SRAM so any initialized variables are loaded correctly. There's no moving data needed but the stack and uninitialized data areas could/should still be zeroed.
I hope this helps someone!

Today I've faced the same issue.
I was able to improve the Verify speed with two simple steps, but Verify still much slower than programming and this is strange...
If anyone find a way to change the 1KB block read of STM32CubeProgrammer I would like to know =).
Follow the changes I made to improve a bit the performance.
Add a kind of lock in the Init Function to avoid multiple initializations. This was the most significant change because I'am checking the Flash ID in my initialization proccess. Other aproachs could be safer but this simple code snippet worked to me.
int Init(void)
{
static uint32_t lock;
if(lock != 0x43213CA5)
{
lock = 0x43213CA5;
/* Init procedure goes here */
}
return(1);
}
Cache a page instead of reading the external memory for each call. This will help more if your external memory page read has too much overhead, otherwise this idea won't give relevant results.

Chrome memory measurement now almost flat for longer test runs

In order to check our web application for memory leaks, I run a machine which does the following:
it runs automated End-to-End tests over (almost) the entire application in Chrome
after each block of tests, it goes to a state of the web application where almost nothing happens
it triggers gc(); for garbage collection
it saves totalJSHeapSize, and usedJSHeapSize to a file
it plots out the results for each test run to a graph
That way, we can see how much the memory increases and which are the problematic parts of our application: At some point the memory increases, at some point it decreases.
Till yesterday, it looked like this:
Bright red (upper line): totalJSHeapSize, light red (lower line): usedJSHeapSize
Yesterday, I updated Chrome to version 69. And now the chart looks quite different:
The start and end amount of memory used (usedJSHeapSize) is almost the same. But as you can clearly see, the way it changes over the course of the test (ca. 1,5h) is quite different.
My questions are now:
Is this a change in reality or in measurement? I.e. did Chrome change its memory handling? Or just the way it puts out memory values via totalJSHeapSize, and usedJSHeapSize?
Concerning memory leaks, is it good news or bad news for me? Like: Before I had dozens of spots where memory increases, now I have just three. Is this true? Or are the memory leaks in the now flat areas still there and hidden?
I'm also thankful for any background information on how Chrome changed its memory measurement.
Some additional info:
The VM runs under KUbuntu 18.04
It's a single web page application done with AngularJS 1.6
The outcome of the memory measurement is quite stable - both before and after the update of Chrome
EDIT:
It seems this was a bug of Chrome version 69. At least, with an update to Chrome 70, this strange behavior is gone and everything looks almost as before.

I don't think you should be worry about it. This can happen due to the memory manager used inside the chrome. You didn't mentioned the version of your first memory graph, possibility that the memory manager used between these two version is different. Chrome was using the TCMalloc which take the large chunk of memory from the OS and manage it, once the memory shortage happenned with TCMalloc then it ask again a big chunk of memory from OS and start managing it. So the later graph what you are seeing have less up and downs (but bigger then previous one) due to that. Hope it answered your query.
As you mentioned that
The outcome of the memory measurement is quite stable - both before and after the update of Chrome
You don't need to really worry about it, the way previously chrome was allocating memory and how it does with new version is different(possible different memory manager) that's it.

Do I Have a Memory Leak / Garbage Collection Bug?

I'm building an Air application, which, due to the workflow of the animators, needs to load a lot of external swf files.
I load them all via FileStream / Loader Objects at the boot process and then store them in an object for further use.
The instant they get loaded, I use a gotoAndStop(1) command to make them stop looping (the original files have no scripts whatsoever).
After the load process I can see the system memory go slowly but consistently up.
When I manually invoke Garbage Collection with the System.gc() command,
the memory gets cleared again.
If I let the application run for hours it seems the Garbage Collector does not run.
Any ideas what the problem might be? Or should I just forget about it and just occasionally run System.gc() manually?
Thanks a lot!

The Garbage Collector will only run when it needs to, so it is entierly possible that it would go a very long time before running (especially if you have a lot of RAM available).
The important thing is the memory is cleared when it is run. This tells me that you don't have a leak, as it can be cleared up.
Also, how are you measuring the system memory? If you are doing it through Task Manager those numbers aren't really to be relied upon.
I recommend Process Explorer . Monitor the 'Private Bytes' column instead.

Windows Phone - Background Task - Broken at DataContractJsonSerializer.WriteObject

I am using Background Task in Windows Phone Mango. I need to send data to server using JSON format. But when DataContractJsonSerializer.WriteObject function is executed, nothing happens thereafter.
Has anyone experienced the same with Background Task in Windows Phone Mango?

It is possible that the operation is taking your app over the 6MB memory limit, and the phone is killing it.
You can run with the debugger attached: http://msdn.microsoft.com/en-us/library/microsoft.phone.scheduler.scheduledactionservice.launchfortest(v=vs.92).aspx
This will let you see what is happening. Also consider logging the amount of memory your app is using to see if you are approaching the limit: http://msdn.microsoft.com/en-us/library/microsoft.phone.info.devicestatus(v=vs.92).aspx

Be careful calling any type of serialization library (or any other library for that matter) as it will very quickly bump your memory usage over the 6MB limit, which will silently kill your agent with no errors.
Also note that on a real device your agent will typically start with 4-4.5 meg used already, significantly higher than on the emulator. That means all your code and the libraries it calls need to use less than 1.5 meg in a worst-case scenario.

How to determine why a task destroys , VxWorks?

I have a VxWorks application running on ARM uC.
First let me summarize the application;
Application consists of a 3rd party stack and a gateway application.
We have implemented an operating system abstraction layer to support OS in-dependency.
The underlying stack has its own memory management&control facility which holds memory blocks in a doubly linked list.
For instance ; we don't directly perform malloc/new , free/delege .Instead we call OSA layer's routines and it gets the memory from OS and puts it in a list then returns this memory to application.(routines : XXAlloc , XXFree,XXReAlloc)
And when freeing the memory we again use XXFree.
In fact this block is a struct which has
-magic numbers indication the beginning and end of memory block
-size that user requested allocated
-size in reality due to alignment issue previous and next pointers
-pointer to piece of memory given back to application. link register that shows where in the application xxAlloc is called.
With this block structure stack can check if a block is corrupted or not.
Also we have pthread library which is ported from Linux that we use to
-create/terminate threads(currently there are 22 threads)
-synchronization objects(events,mutexes..)
There is main task called by taskSpawn and later this task created other threads.
this was a description of application and its VxWorks interface.
The problem is :
one of tasks suddenly gets destroyed by VxWorks giving no information about what's wrong.
I also have a jtag debugger and it hits the VxWorks taskDestoy() routine but call stack doesn't give any information neither PC or r14.
I'm suspicious of specific routine in code where huge xxAlloc is done but problem occurs
very sporadic giving no clue that I can map it to source code.
I think OS detects and exception and performs its handling silently.
any help would be great
regards

It resolved.
I did an isolated test. Allocated 20MB with malloc and memset with 0x55 and stopped thread of my application.
And I wrote another thread which checks my 20MB if any data else than 0x55 is written.
And quess what!! some other thread which belongs other components in CPU (someone else developed them) write my allocated space.
Thanks 4 your help

If your task exits, taskDestroy() is called. If you are suspicious of huge xxAlloc, verify that the allocation code is not calling exit() when memory is exhausted. I've been bitten by this behavior in a third party OSAL before.

Sounds like you are debugging after integration; this can be a hell of a job.
I suggest breaking the problem into smaller pieces.
Process
1) you can get more insight by instrumenting the code and/or using VxWorks intrumentation (depending on which version). This allows you to get more visibility in what happens. Be sure to log everything to a file, so you move back in time from the point where the task ends. Instrumentation is a worthwile investment as it will be handy in more occasions. Interesting hooks in VxWorks: Taskhooklib
2) memory allocation/deallocation is very fundamental functionality. It would be my first candidate for thorough (unit) testing in a well-defined multi-thread environment. If you have done this and no errors are found, I'd first start to look why the tas has ended.
other possible causes
A task will also end when the work is done.. so it may be a return caused by a not-so-endless loop. Especially if it is always the same task, this would be my guess.
And some versions of VxWorks have MMU support which must be considered.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008