Chrome dev tools first memory heap snapshot is mysteriously large - google-chrome

I'm using the Profiles tab in the Chrome developer tools to record memory heap snapshots. My app has a memory leak, so I'm expecting the snapshots to gradually increase in size, which they do. But for reasons I don't understand, the first snapshot is always artificially large... creating a seemingly deceptive drop in memory between the first and second. All subsequent snapshots gradually increase as expected.
I know there is often extra memory used at the beginning of a page load, due to caching and other setup. But the same thing happens no matter when I take the first snapshot. It could be 30 seconds after the page is loaded or 30 minutes. Same pattern. My only guess is that the profile tool its self is interacting with the memory somehow, but that seems like a stretch.
Any ideas what's going on here?

Right before memory snapshot is taken Chrome tries to collect the garbage. It doesn't collect it thoroughly though, it only does a predefined number of passes (this magic number seems to be 7). Therefore, when the first snapshot is taken there still might be some uncollected garbage left.
Before making a first snapshot try going to the "Timeline" tab and forcing garbage collection manually.
From what I've tested, this always reduces the size of the first snapshot.

Related

How should I best troubleshoot a slow memory buildup in a Chrome background tab?

My React app does not have a mem leak, it has remained open for weeks with a steady mem consumption.
However, when let in the background or if I lock my windows session for a couple of days, the memory builds up quite a bit.
It gets back to normal quickly when I bring it back in the foreground though.
I very much suspect some rendering that gets queued, but not sure how to put my finger on the exact detail.
How can I analyze the issue in a structured way?

Chrome DevTools makes tab memory steadily increase until it crashes

I am using Chrome to run a page and it runs fine for hours on end, I can attest that no memory problems rise up when it's running by itself. Whenever I use DevTools to debug something, however, the tab's memory footprint only keeps increasing and increasing (as if there was a memory leak) to the point it just crashes when it reaches the point where Chrome kills a tab due to memory limit.
Some things I could figure out were that it happens if and only if DevTools panel is open (regardless of any options?), and it happens on pretty much any website (websites that load more stuff to memory tend to bloat faster). I also found out that, if the DevTools tab is closed, the memory usage sharply drops to normal usage (something like from 900mb to 200mb, which is what it would be without DevTools), and reopening makes the tab start to bloat again from scratch. I tried running the garbage collector forcibly on Performance tab, but it also didn't do anything. I don't know what causes this, but I also couldn't find anything that refers to this issue.
Any help in this matter is appreciated.

Chrome memory measurement now almost flat for longer test runs

In order to check our web application for memory leaks, I run a machine which does the following:
it runs automated End-to-End tests over (almost) the entire application in Chrome
after each block of tests, it goes to a state of the web application where almost nothing happens
it triggers gc(); for garbage collection
it saves totalJSHeapSize, and usedJSHeapSize to a file
it plots out the results for each test run to a graph
That way, we can see how much the memory increases and which are the problematic parts of our application: At some point the memory increases, at some point it decreases.
Till yesterday, it looked like this:
Bright red (upper line): totalJSHeapSize, light red (lower line): usedJSHeapSize
Yesterday, I updated Chrome to version 69. And now the chart looks quite different:
The start and end amount of memory used (usedJSHeapSize) is almost the same. But as you can clearly see, the way it changes over the course of the test (ca. 1,5h) is quite different.
My questions are now:
Is this a change in reality or in measurement? I.e. did Chrome change its memory handling? Or just the way it puts out memory values via totalJSHeapSize, and usedJSHeapSize?
Concerning memory leaks, is it good news or bad news for me? Like: Before I had dozens of spots where memory increases, now I have just three. Is this true? Or are the memory leaks in the now flat areas still there and hidden?
I'm also thankful for any background information on how Chrome changed its memory measurement.
Some additional info:
The VM runs under KUbuntu 18.04
It's a single web page application done with AngularJS 1.6
The outcome of the memory measurement is quite stable - both before and after the update of Chrome
EDIT:
It seems this was a bug of Chrome version 69. At least, with an update to Chrome 70, this strange behavior is gone and everything looks almost as before.
I don't think you should be worry about it. This can happen due to the memory manager used inside the chrome. You didn't mentioned the version of your first memory graph, possibility that the memory manager used between these two version is different. Chrome was using the TCMalloc which take the large chunk of memory from the OS and manage it, once the memory shortage happenned with TCMalloc then it ask again a big chunk of memory from OS and start managing it. So the later graph what you are seeing have less up and downs (but bigger then previous one) due to that. Hope it answered your query.
As you mentioned that
The outcome of the memory measurement is quite stable - both before and after the update of Chrome
You don't need to really worry about it, the way previously chrome was allocating memory and how it does with new version is different(possible different memory manager) that's it.

Chrome garbage collector going crazy

Using Version 48.0.2564.109 m.
We have a javascript web app (built with ExtJS). In Chrome, when we leave our app sitting there for a while, the GC starts going nuts. In Task Manager, you can see the CPU constantly spinning around 25%.
I took timeline snapshots and CPU profiles, and you can see the GC, about 10 times a seconds, try to collect memory, but collects 0B.
Our app is a large enterprise application and does use quite a bit of memory and updates the screen periodically.
But, there is absolutely no javascript code running during this time. So I can't see that it is something our app is actively doing
Does anyone know what could be triggering this?
It is killing performance of our app.
Also, it only happens when our tab is active. If you switch to a different tab, the CPU dies down and the GC stops.
Is there other data I need to collect to help determine this?
What is your app current JS heap size? You can check it by collecting timeline and enabling memory check box.
It looks like your app is close to the V8 memory limit, so V8 is trying to free some memory. If it is expected for the app to use that much memory, you can increase the limit on your host with something like: --js-flags="--max-old-space-size=2048"
Otherwise it might be just a memory leak in your code. Use heap profiler to hunt it down.

Associative cache simulation - Dealing with a Faulty Scheme

While working on simulating a fully associative cache (in MIPS assembly), a couple of questions came to mind based on some information read online;
According to some notes from the University of Maryland
Finding a slot: At most, one slot should match. If
there is more than one slot that
matches, then you have a faulty
fully-associative cache scheme. You
should never have more than one copy
of the cache line in any slot of a
fully-associative cache. It's hard to
maintain multiple copies, and doesn't
make sense. The slots could be used
for other cache lines.
Does that mean that I should check all the time the whole tag list in order to check for a second match? After all if I don't, i will never "realize" about the fault with the cache, yet, checking every single time seems quite inefficient.
In the case I do check, and somehow I manage to find a second match, meaning faulty cache scheme, what shall I do then? Although the best answer would be to fix my implementation, yet Im interested on how to handle it during execution if this situation should arise.
If more than one valid slot matches an address, then that means that when a previous search for the same address was executed, either a valid slot that should have matched the address was not used (perhaps because it was not checked in the first place) or more than one invalid slot was used to store the line that wasn't in the cache at all.
Without a doubt, this should be considered a bug.
But if we've just decided not to fix the bug (maybe we'd rather not commit that much hardware to a better implementation) the most obvious option is to pick one of the slots to invalidate. It will then be available for other cache lines.
As for how to pick which one to invalidate, if one of the duplicate lines is clean, invalidate that one in preference to a dirty cache line. If more than cache line is dirty and they disagree you have an even bigger bug to fix, but at any rate your cache is out of sync and it probably doesn't matter which you pick.
Edit: here's how I might implement hardware to do this:
First off, it doesn't make a whole lot of sense to start with the assumption of duplicates, rather we'll work around that at the appropriate time later. There are a few possibilities of what must happen when caching a new line.
The line is already in the cache, no action is needed
The line is not in the cache but there are invalid slots available: Place the new line into one of the available slots
The line is not in the cache but there are no invalid slots available. Another valid line must be evicted and the new line takes its place.
Picking an eviction candidate has performance consequences. Clean cache lines can be evicted for free, but if chosen poorly, it can cause another cache miss in the near future. Consider if all but one cache line is dirty. If only the clean cache line is evicted, then many sequential reads alternating between two addresses will cause a cache miss on every read. Cache invalidation is among the two hard problems in Comp Sci (the other being 'naming things') and out of the scope of this exact question.
I would probably implement a search that checks for the correct slot to act on for each of these. Then another block would pick the first from that list and act on it.
Now, getting back to the question. What are the conditions under which duplicates could possibly enter the cache. If memory accesses are strictly ordered, and the implementation (as above) is correct, I don't think duplicates are possible at all. And thus there's no need to check for them.
Now lets consider a more implausible case where A single cache is shared across two CPU cores. We're going to just do the simplest thing that could work and duplicate everything except the cache memory itself for each core. Thus the slot searching hardware is not shared. To support this, an extra bit per slot is used as a mutex. search hardware cannot use a slot that is locked by the other core. specifically,
If the address is in the cache, try to lock the slot and return that slot. If the slot is already locked, stall until it is free.
If the address is not in the cache, find an unlocked slot that is invalid or valid but evictable.
in this case we actually can end up in a position where two slots share the same address. If both cores try to write to an address that is not in the cache, they will end up getting different slots, and a duplicate line will occur. First lets think about what could happen:
Both lines were reads from main memory. They will be the same value and they will both be clean. It is correct to evict either.
Both lines were writes. Both will be dirty, but probably not be equal. This is a race condition that should have been resolved by the application by issuing memory fences or some other memory ordering instructions. We cannot guess which one should be used, if there was no cache the race condition would persist into RAM. It is correct to evict either.
One line was a read and one was a write. The write is dirty but the read is clean. Once again this race condition would have persisted into RAM if there was no intervening cache, but the reader could have seen a different value. evicting the clean line is right by RAM, and also has the side effect of always favoring read then write ordering.
So now we know what to do about it, but where does this logic belong. First lets think about what could happen if we don't do anything. A subsequent cache access for the same address on either core could return either line. Even if neither core is issuing writes, reads could keep coming up different, alternating between the two values. This breaks every conceivable idea about memory ordering.
one solution might be to just say that dirty lines belong to one core only, the line is not dirty, but dirty and owned by another core.
In the case of two concurrent reads, both lines are identical, unlocked and interchangeable. It doesn't matter which line a core gets for subsequent operations.
in the case of concurrent writes, both lines are out of sync, but mutually invisible. Although the race condition that this creates is unfortunate, it still leads to a reasonable memory ordering, as if all of the operations that happen on the discarded line happened before any of the operations on the cleaned line.
If a read and a write happen concurrently, the dirty line is invisible to the reading core. However, the clean line is visible to both cores, and would cause memory ordering to break down for the writer. future writes could even cause it to lock both (because both would be dirty).
That last case pretty much militates that dirty lines be preferred to clean ones. This forces at least some extra hardware to look for dirty lines first and clean lines only if no dirty lines were found. So now we have a new concurrent cache implementation:
If the address is in the cache and dirty and owned by the requesting core, use that slot
if the address is in the cache but clean
for reads, just use that slot
for writes, mark the slot as dirty and use that slot
if the address is not in the cache and there are invalid slots, use an invalid slot
if there are no invalid slots, evict a line and use that slot.
We're getting closer, there's still a hole in the implementation. What if both cores access the same address but not concurrently. The simplest thing is probably to just say that dirty lines are really invisible to other cores. In cache but dirty is the same as not being in the cache at all.
Now all we have to think about is actually providing the tool for applications to synchronize. I'd probably do a tool that just explicitly flushes a line if it is dirty. This would just invoke the same hardware that is used during eviction, but marks the line as clean instead of invalid.
To make a long post short, the idea is to deal with the duplicates not by removing them, but by making sure they cannot lead to further memory ordering issues, and leaving the deduplication work to the application or eventual eviction.