Browser Memory consumption / leak issues - google-chrome

I need help in understanding this Testing process.
Our Quality Assurance (QA) team is using Performance Monitor (from Microsoft) to test browser memory consumption & leak.
Steps QA do:
Open web browser and login to our webapp.
Note down the initial virtual bytes from tool (shown in screenshot)
Perform some operation(lets say search) for a couple of times.
Note down the virtual bytes from tool.
Calculate the difference between last & first virtual bytes allocated. (After converting virtual bytes to MB)
Divide this difference by number of total number of clicks performed by user.
Note down the remainder.
Now, this remainder should be less than 1. (this number is decided by them)
If its greater than 1, they say our webapp has memory leaks.
For Firefox & chrome, this remainder is less than 1 for us. But for IE 10 & 11 (32 & 64 bit both) remainder is more than 1.
Questions:
Is this some standard practice they are following?
How correct is their analysis process?
How can I convince them, if their analysis is not so right?
How should I go about fixing this problem?
P.S I'm not able to get more information from our QA.
P.S We use angular js for Client.

Note down the initial virtual bytes from tool (shown in screenshot)
Virtual bytes are nearly meaningless on 64bit because large chunks of address space can be reserved ahead of time without actually backing them with RAM or swap. Of course the amount is somewhat correlated to actual memory use, but it's just that "somewhat".
Calculate the difference between last & first virtual bytes allocated. (After converting virtual bytes to MB)
This calculation can be meaningless for a different reason. Browsers use complex memory management systems (custom allocators and garbage collectors) which may not immediately release memory back to the operating system after they have used it. Which means that for some amount of time their memory usage may only appear to grow, not shrink, even when you close tabs.
How should I go about fixing this problem?
Use the built-in memory tracking tools of the browsers. E.g. about:memory in firefox.

Related

Does the position of a function/method in a program matters in case of increasing/ decreasing the speed at lower level(memory)?

Let say I write a program which contains many functions/methods. In this program some functions are used many times as compared to others.
In this case does the positioning of a function/method matters in terms of altering the speed at lower level(memory).
As currently, I am learning Computer Organization & Architecture, so this doubt arrived in my mind.
RAM itself is "flat", equal performance at any address (except for NUMA local vs. remote memory in a multi-socket machine, or mixed-size DIMMs on a single socket leading to only partial dual-channel benefits1).
i-cache and iTLB locality can make a difference, so grouping "hot" functions together can be useful even if you don't just inline them.
Locality also matters for demand paging of code in from disk: If a whole block of your executable is "cold", e.g. only needed for error handling, program startup doesn't have to wait for it to get page-faulted in from disk (or even soft page faults if it was hot in the OS's pagecache). Similarly, grouping "startup" code into a page can allow the OS to drop that "clean" page later when it's no longer needed, freeing up physical memory for more caching.
Compilers like GCC do this, putting CRT startup code like _start (which eventually calls main) into a .init section in the same program segment (mapping by the program loader) as .text and .fini, just to group startup code together. Any C++ non-const static-initializer functions would also go in that section.
Footnote 1: Usually; IIRC it's possible for a computer with one 4G and one 8G stick of memory to run dual channel for the first 8GB of physical address space, but only single channel for the last 4, so half the memory bandwidth. I think some real-life Intel chipsets / CPUs memory controllers are like that.
But unless you were making an embedded system, you don't choose where in physical memory the OS loads your program. It's also much more normal for computers to use matched memory on multi-channel memory controllers so the whole range of memory can be interleaved between channels.
BTW, locality matters for DRAM itself: its laid out in a row/column setup, and switching rows takes an extra DDR controller command vs. just reading another column in the same open "page". DRAM pages aren't the same thing as virtual-memory pages; a DRAM page is memory in the same row on the same channel, and is often 2kiB. See What Every Programmer Should Know About Memory? for more details than you'll probably ever want about DDR DRAM, and some really good stuff about cache and memory layout.

What's the difference between "memory" and "memory footprint" fields on Chrome's task manager?

I'm using Chrome 64 and noticed that there's two fields called "memory" on Chrome's task manager. See the picture below:
I can't find any explanation of the difference between these fields on Chrome, there's no tooltips available (at least not on macOS). The "memory footprint" field seems to be new, because I don't recall seeing it before yesterday.
In Chrome, the memory column represents Shared Memory + Private Memory. If you enable those two columns and add the numbers you will find they match the Memory column. In the task manager or activity monitor of the computer you can see that these values match the Shared Memory Size and Private Memory Size.
The Memory Footprint column matches the number of MB reported for the Memory column of the process within the Task Manager or Activity Monitor.
Real Memory in a Mac's Activity Monitor maps to the RSS (Resident Set Size) in Unix. The link below explains this.
https://forums.macrumors.com/threads/memory-vs-real-memory.1749505/#post-19295944
The Memory column on a Mac's Activity Monitor roughly correlates to the Private Memory Size, however it seems to be calculated slightly smaller. This column will match the Memory Footprint column in Chrome.
Please note that this answer references Mac because that's what I'm currently using. The column names and answer would change slightly for Linux and Windows system monitor and task manager.
As Josh pointed out, it reports "Private Memory Footprint" as described in consistent memory metrics
Disclaimer: I'm writing this answer as I do some testing and observation because I had this question myself and this is the only relevant result I found through a Google search. Here goes...
I'm comparing the processes in Chrome's task manager with those in Sysinternal's Process Explorer (for Windows). In doing so, I see that the "Memory footprint" in Chrome is exactly identical to "Private Bytes" shown in Process Explorer exactly for every process ID.
Private Bytes is the size of memory that a process has allocated to it (but not necessarily actively using) that cannot be shared with other processes.
So in line with what Josh and Patrick answered, the memory footprint represents memory reserved entirely for that process.
Unfortunately, I can't come to a conclusion on what "Memory" represents specifically. I would expect it to be equivalent to the "Working Set", but that doesn't match up with what Process Explorer shows.
Things also get a little muddier... If you right-click on the column headers in Chrome's task manager, you'll see there's another column available, titled, "Private memory". If you enable that, you'll see the numbers match very closely, but not exactly to the numbers in the "Memory" column (off by 200K at most). :| This is a confusing title, given that we have already confirmed the "Memory footprint" to represent the private memory footprint.
I don't know what the miniscule difference between "Memory" and "Private memory" here is, but I speculate that maybe either or both columns represent the private memory allocated to the process that are actively in use (in contrast to the private bytes definition I gave above). Or it could be the old calculation that they kept in there for some reason. I really am just guessing here.
Sorry I could not be of more help, but since there seems to be no answer to this out there, I wanted to share what I could figure out and hopefully spur the conversation a bit so someone more knowledgeable can add to it.

Slow text input in html field from barcode scanner

I have a webpage in my LAN in order to input barcodes in real time to a db through a field (framework django + postgresql + nginx). It works fine, but lately we have a customer that uses 72 char barcodes (Code Matrix) that slows down inputs because before next scan, user must wait the redraw of the last one in the field (it takes about 1-2 seconds, redrawing one char after the other).
Is there a way to reduce latency of drawing scanned text in the html field?
Best thing would be to show directly all the scanned barcode, not one char after the other. The scanner is set to add an "Enter" after the scanned text.
At the end, as Brad stated, the problem is more related to scanner's settings (USB in HID mode), although PC speed is also an issue. After several tests, on a dual core linux machine I estimate delay due 85% to the scanner and 15% to PC/browser combo.
To solve the problem I first searched and downloaded the complete manual of our 2D barcode scanner (306 pages), then I focused on USB Keystroke Delay as a cause, but default setting was already set to 'No Delay'.
The setting that affected reading speed was USB Polling Interval, an option that applies only to the USB HID Keyboard Emulation Device.
The polling interval determines the rate at which data can be sent between the scanner and the host computer. A lower number indicates a faster data rate: default was 8ms, wich I lowered to 3ms without problems. Lower rates weren't any faster, probably because it was reached the treshold where PC becomes the bottleneck.
CAUTION: Ensure your host machine can handle the selected data rate, selecting a data rate that is too fast for your host machine may result in lost data: in my case when I lowered polling interval to 1ms there were no data loss within the working PC, but when testing inside a virtual machine there was data loss as soon as I reached 6ms.
Another interesting thing is that browsers tend to respond significantly slower after a long period of use with many tabs open (a couple of hours in my case), probably due to caching.
Tests done with Firefox and Chromium browsers on old dual core PC with OS Lubuntu (linux).
This probably has nothing to do with your page, but with the speed of the scanner interface. Most of those scanners intentionally rate-limit their input so as not to fill the computer's buffer, avoiding characters getting dropped. Think about this... when you copy/paste text, it doesn't take a long time to redraw characters. Everything appears instantly.
Most of those scanners are configurable. Check to see if there is an option on your scanner to increase its character rate.
On Honeywell and many other brand scanners the USB Keystroke Interval is marked as an INTERCHARACHTER DELAY.
Also if there is a BAUD rate that would be something to increase.

In-memory function calls

What are in-memory function calls? Could someone please point me to some resource discussing this technique and its advantages. I need to learn more about them and at the moment do not know where to go. Google does not seem to help as it takes me to the domain of cognition and nervous system etc..
Assuming your explanatory comment is correct (I'd have to see the original source of your question to know for sure..) it's probably a matter of either (a) function binding times or (b) demand paging.
Function Binding
When a program starts, the linker/loader finds all function references in the executable file that aren't resolvable within the file. It searches all the linked libraries to find the missing functions, and then iterates. At least the Linux ld.so(8) linker/loader supports two modes of operation: LD_BIND_NOW forces all symbol references to be resolved at program start up. This is excellent for finding errors and it means there's no penalty for the first use of a function vs repeated use of a function. It can drastically increase application load time. Without LD_BIND_NOW, functions are resolved as they are needed. This is great for small programs that link against huge libraries, as it'll only resolve the few functions needed, but for larger programs, this might require re-loading libraries from disk over and over, during the lifetime of the program, and that can drastically influence response time as the application is running.
Demand Paging
Modern operating system kernels juggle more virtual memory than physical memory. Each application thinks it has access to an entire machine of 4 gigabytes of memory (for 32-bit applications) or much much more memory (for 64-bit applications), regardless of the actual amount of physical memory installed in the machine. Each page of memory needs a backing store, a drive space that will be used to store that page if the page must be shoved out of physical memory under memory pressure. If it is purely data, the it gets stored in a swap partition or swap file. If it is executable code, then it is simply dropped, because it can be reloaded from the file in the future if it needs to be. Note that this doesn't happen on a function-by-function basis -- instead, it happens on pages, which are a hardware-dependent feature. Think 4096 bytes on most 32 bit platforms, perhaps more or less on other architectures, and with special frameworks, upwards of 2 megabytes or 4 megabytes. If there is a reference for a missing page, the memory management unit will signal a page fault, and the kernel will load the missing page from disk and restart the process.

What are the advantages of memory-mapped files?

I've been researching memory mapped files for a project and would appreciate any thoughts from people who have either used them before, or decided against using them, and why?
In particular, I am concerned about the following, in order of importance:
concurrency
random access
performance
ease of use
portability
I think the advantage is really that you reduce the amount of data copying required over traditional methods of reading a file.
If your application can use the data "in place" in a memory-mapped file, it can come in without being copied; if you use a system call (e.g. Linux's pread() ) then that typically involves the kernel copying the data from its own buffers into user space. This extra copying not only takes time, but decreases the effectiveness of the CPU's caches by accessing this extra copy of the data.
If the data actually have to be read from the disc (as in physical I/O), then the OS still has to read them in, a page fault probably isn't any better performance-wise than a system call, but if they don't (i.e. already in the OS cache), performance should in theory be much better.
On the downside, there's no asynchronous interface to memory-mapped files - if you attempt to access a page which isn't mapped in, it generates a page fault then makes the thread wait for the I/O.
The obvious disadvantage to memory mapped files is on a 32-bit OS - you can easily run out of address space.
I have used a memory mapped file to implement an 'auto complete' feature while the user is typing. I have well over 1 million product part numbers stored in a single index file. The file has some typical header information but the bulk of the file is a giant array of fixed size records sorted on the key field.
At runtime the file is memory mapped, cast to a C-style struct array, and we do a binary search to find matching part numbers as the user types. Only a few memory pages of the file are actually read from disk -- whichever pages are hit during the binary search.
Concurrency - I had an implementation problem where it would sometimes memory map the file multiple times in the same process space. This was a problem as I recall because sometimes the system couldn't find a large enough free block of virtual memory to map the file to. The solution was to only map the file once and thunk all calls to it. In retrospect using a full blown Windows service would of been cool.
Random Access - The binary search is certainly random access and lightning fast
Performance - The lookup is extremely fast. As users type a popup window displays a list of matching product part numbers, the list shrinks as they continue to type. There is no noticeable lag while typing.
Memory mapped files can be used to either replace read/write access, or to support concurrent sharing. When you use them for one mechanism, you get the other as well.
Rather than lseeking and writing and reading around in a file, you map it into memory and simply access the bits where you expect them to be.
This can be very handy, and depending on the virtual memory interface can improve performance. The performance improvement can occur because the operating system now gets to manage this former "file I/O" along with all your other programmatic memory access, and can (in theory) leverage the paging algorithms and so forth that it is already using to support virtual memory for the rest of your program. It does, however, depend on the quality of your underlying virtual memory system. Anecdotes I have heard say that the Solaris and *BSD virtual memory systems may show better performance improvements than the VM system of Linux--but I have no empirical data to back this up. YMMV.
Concurrency comes into the picture when you consider the possibility of multiple processes using the same "file" through mapped memory. In the read/write model, if two processes wrote to the same area of the file, you could be pretty much assured that one of the process's data would arrive in the file, overwriting the other process' data. You'd get one, or the other--but not some weird intermingling. I have to admit I am not sure whether this is behavior mandated by any standard, but it is something you could pretty much rely on. (It's actually agood followup question!)
In the mapped world, in contrast, imagine two processes both "writing". They do so by doing "memory stores", which result in the O/S paging the data out to disk--eventually. But in the meantime, overlapping writes can be expected to occur.
Here's an example. Say I have two processes both writing 8 bytes at offset 1024. Process 1 is writing '11111111' and process 2 is writing '22222222'. If they use file I/O, then you can imagine, deep down in the O/S, there is a buffer full of 1s, and a buffer full of 2s, both headed to the same place on disk. One of them is going to get there first, and the other one second. In this case, the second one wins. However, if I am using the memory-mapped file approach, process 1 is going to go a memory store of 4 bytes, followed by another memory store of 4 bytes (let's assume that't the maximum memory store size). Process 2 will be doing the same thing. Based on when the processes run, you can expect to see any of the following:
11111111
22222222
11112222
22221111
The solution to this is to use explicit mutual exclusion--which is probably a good idea in any event. You were sort of relying on the O/S to do "the right thing" in the read/write file I/O case, anyway.
The classing mutual exclusion primitive is the mutex. For memory mapped files, I'd suggest you look at a memory-mapped mutex, available using (e.g.) pthread_mutex_init().
Edit with one gotcha: When you are using mapped files, there is a temptation to embed pointers to the data in the file, in the file itself (think linked list stored in the mapped file). You don't want to do that, as the file may be mapped at different absolute addresses at different times, or in different processes. Instead, use offsets within the mapped file.
Concurrency would be an issue.
Random access is easier
Performance is good to great.
Ease of use. Not as good.
Portability - not so hot.
I've used them on a Sun system a long time ago, and those are my thoughts.