What are Zombies and what causes them? Are there Zombie processes and Zombie objects? - zombie-process

I can find questions about zombies but none that directly addresses what they are and why and how they occur. There are a couple that address what zombie processes are in the context of answering a specific question but don't address the cause.
There are also questions regarding zombie processes and questions about Objective-C/Cocoa-related zombie objects. What are the differences or how are these related? Is an "EXEC_BAD_ACCESS" on Mac/iPhone (or similar error on other platforms) synonymous with a zombie?
How can one prevent zombies and are there any best practices that will help avoid them?
It would be helpful to have this information in one place. This question is intended to be platform/language agnostic, if possible.

Zombie processes and zombie objects are totally unrelated. Zombie processes are when a parent starts a child process and the child process ends, but the parent doesn't pick up the child's exit code. The process object has to stay around until this happens - it consumes no resources and is dead, but it still exists - hence, 'zombie'.
Zombie objects are a debugging feature of Cocoa / CoreFoundation to help you catch memory errors - normally when an object's refcount drops to zero it's freed immediately, but that makes debugging difficult. Instead, if zombie objects are enabled, the object's memory isn't instantly freed, it's just marked as a zombie, and any further attempts to use it will be logged and you can track down where in the code the object was used past its lifetime.
EXEC_BAD_ACCESS is your run-of-the-mill "You used a bad pointer" exception, like if I did:
(*(0x42)) = 5;

When a process ends, much of its state still exists in the kernel, because its parent may still want to look at a few things, like its return value, which needs to be stored someplace. When a parent calls wait() or waitpid(), it tells the kernel to throw it all away because it's done with it. Until it does so, the child retains a pid and uses up resources. Those un-reaped child processes are called zombies. Even killing a zombie won't remove it, it must be reaped (wait-ed-upon) by its parent. If the parent dies, they are passed to "init" on unix systems, whose sole job is to wait for things to clean them up.
I've never heard of "zombie objects", but I assume that it refers to things that have either not been cleaned up by the garbage collector, or that have circular references or some such thing, such that they are not going to be cleaned up by the garbage collector. The metaphor is pretty similar: fork()==malloc(), wait()==free() at a certain level. (Not a perfect metaphor, of course.)

Related

How does a copying garbage collector ensure objects are not accessed while copied?

On collection, the garbage collector copies all live objects into another memory space, thus discarding all garbage objects in the process. A forward pointer to the copied object in new space is installed into the 'old' version of an object to ensure the collector updates all remaining references to the object correctly and doesn't erroneously copy the same object twice.
This obviously works quite well for stop-the-world-collectors. However, since pause times are long with stop-the-world, nowadays most garbage collectors allow the mutator threads to run concurrently with the collector, only stopping the mutators for a short time to do the initial stack scan.
So how can the collector ensure that the 'old' version of an object is not accessed by the mutator while/after copying it? I imagine the mutators could check for the forward pointer with some sort of read barrier, however this seems to costly to me since variables are read so often.
The Loaded Value Barrier implemented in Azul's Generational Pauseless Garbage Collector is an example of a solution to this problem. You can read about it in the article The Azul Garbage Collector posted on InfoQ in early 2011.
You pretty much need to use a read barrier or a write barrier. You're apparently already aware of read barriers so I won't try to get into them.
Writer barriers work because as long as you prevent writes from happening, you simply don't care whether somebody accesses the old or the new copy of the data. You set the write barrier, copy the data, and then start adjusting pointers. After the copy is made, you don't really care whether somebody reads the old or the new copy of the data, because the write barrier ensures they're identical. Once you're done adjusting pointers, everything will be working with the new data, so you revoke the write barrier.
There has been some work done with using page protection bits to mark an area of memory as read-only to create a write-barrier on fairly standard hardware. At least the last time I looked into it, however, this was still pretty much at a proof of concept stage -- working, but too slow to be very practical.
Disclaimer: this is related to java's Shenandoah GC only.
Your reasons are absolutely spot on for Shenandoah! Some details here for example.
In the not so long ago days, Shenandoah had write and read barriers for all primitive and reference types. The read barrier was actually just a single indirection via the forwarding pointer as you assume. Since they are so much more compared to writes (which was a much more complicated barrier), these read barriers were more expensive (cumulative time-wise) vs the write ones. This had to do with the sole fact that these are so darn many.
But, things have changed in jdk-13, when a Load Reference barrier was implemented. Thus only loads have a barrier, write happen the usual way. If you think about it, this makes perfect sense, in order to write to an Object field, you need to read that object first, as such if your barrier preserves the "to-space invariant", you will always read from the most recent and correct copy of the object; without the need to use a read barrier with a forwarding pointer.

Why is RAII and garbage collection mutually exclusive?

While I think I understand the gist of the problem (i.e. a good GC tracks objects, not scope), I don't know enough about the subject to convince others.
Can you give me an explanation on why there are no garbage-collected languages with deterministic destructors?
They are NOT mutually exclusive. Feel free to use C++ with libgc (Boehm-Reiser-Detlefs collector). You can still use RAII, smart pointers, and manual deletion, but with the GC running you can also just "forget" to delete some objects.
#Andy's answer regarding resources being disposed too late misses the important point: it isn't the delay releasing resources which is crucial semantically, but rather the order of release.
The reason GC tends not to order release well is that it would require a topological sort on ordering requirements (dependencies) and that's an expensive algorithm.
Nevertheless Ocaml GC has an interesting facility where you can attach a finaliser to an object. If the object becomes unreachable the finaliser is run, however the object is not deleted (because the finaliser could make it reachable again: in that case you can even attach another finaliser). These finalisers can provide some control over ordering.
From Wikipedia, after noting that tracing garbage collectors are the most common type:
Tracing garbage collection is not
deterministic. An object which becomes
eligible for garbage collection will
usually be cleaned up eventually, but
there is no guarantee when (or even
if) that will happen.
Therefore, relying on RAII could lead to the resource being disposed of too late.
As a result, for example, Java has a guideline to "avoid finalizers" (Item 6 in "Effective Java" by Josua Bloch). "Nothing time-critical should ever be done in a finalizer."
The garbage collector can't run all the time (refcounting gets closer, but generally doesn't count as garbage collection), so it doesn't even try. It's plain impractical. Therefore, there is an inevitable delay between an object becoming unreachable (e.g. because the only reference goes out of scope) and the GC collecting it, possibly firing a finalizer. This delay is not deterministic... unless (and then, deterministic destruction in the strictest sense of the word is possible, although still impractical) force the GC into a deterministic schedule - but this gets pretty close to "GC running all the time", which is still incredibly impractical.
So GC and deterministic cleanup are mutually exclusive because the GC does all the cleanup and it cannot afford do be deterministic but must rely on maximizing its efficiency.

Tri-Color Incremental Updating GC: Does it need to scan each stack twice?

Let me give you a short introduction to a tri-color GC (in case somebody reads it who has never heard of it); if you don't care, skip it and jump to The Problem.
How a Tri-Color GC Works
In a tri-color GC an object has one out of three possible colors; white, gray and black. A tri-color GC can be described as follows:
All objects are initially white.
All objects reachable because a global variable or a stack variable refers to it ("the root objects") are colored gray.
We take any gray object, find all references it has to white objects and color those white objects gray. Then we color the object itself black.
We continue at step 3 as long as we have gray objects.
If we have no gray objects any longer, all remaining objects are either white or black.
All black objects haven been proven to be reachable and must stay alive. All white objects are unreachable and can be deleted.
So far this is not too complicated… at least if the GC is StW (Stop the World), meaning it will pause all threads while collecting garbage. If it is concurrent, a tri-color GC has an invariant that must hold true at all times:
A black object must not refer to a white object!
This holds true automatically for a StW GC, since every object that is colored black has been examined previously and all white objects it was pointing to were colored gray, thus a black object may only refer to other black objects or gray objects.
If threads are not paused, threads can execute code that would break this invariant. There are several ways how to prevent this:
Capture all read access to pointers and look if this read access is made to a white object. If it is, color that object gray immediately. If a ref to this object is now assigned to a black object, it won't matter, the object is gray and not white any longer (this implementation uses a read-barrier)
Capture all write access to pointers and look if the assigned object is white and the object it is assigned to is black. If so, color the white object gray. This is the more obviously way of doing things, but also needs a bit more processing time (this implementation uses a write-barrier)
Since read-accesses are much more common than write-accesses, even though the second possibility involves more processing time when the barrier is hit, it is called less often and such the favored one. A GC working like that is called an "incremental updating GC".
There is an alternative to both techniques, called SatB (Snapshot at the Beginning). This variation works slightly different, considering the fact that it is not really necessary to uphold the invariant at all times, since it does not matter if a black object refers to a white one as long as the GC knows that this white object used to be and still is accessible during the current GC cycle (either because there are still gray objects referring to this white object as well, or because the a ref to this white object is put onto an explicit stack that is also considered by the GC when it runs out of gray objects). SatB collectors are used more often in practice, because they have some advantages, but IMHO they are harder to implement.
I'm referring here to a incremental updating GC, that uses variant 2: Whenever the code tries to make a black object point to a white object, it immediately colors the object gray. That way this object won't be missed in the collection cycle.
The Problem
So much about tri-color GCs. But there is one thing I don't understand about tri-color GCs. Let's assume we have an object A, that is referred to by the stack and itself refers to an object B.
stack -> A.ref -> B
Now the GC starts a cycle, halts the thread, scans the stack and sees A as directly accessible, coloring A gray. Once it is done with scanning the whole stack, it unpauses the thread again and starts processing at step (3). Before it starts doing anything, it is preempted (can happen) and the thread runs again and executes the following code:
localRef = A.ref; // localRef points to B
A.ref = NULL; // Now only the stack points to B
sleep(10000); // Sleep for the whole GC cycle
Since the invariant has not been violated, B was white, but has not been assigned to a black object, the color of B has not changed, it is still white. A does not refer to B any longer, so while processing the "gray" A, B won't change its color and A will become black. At the end of the cycle, B is still white and looks like garbage. However, localRef is referring to B, thus it is not garbage.
The Question
Am I right, that a tri-color GC must scan the stack of each thread twice? Once at the very beginning, to identify root objects (getting color gray) and again before deleting the white objects, as those might be referenced by the stack, even though no other object refers to them any longer. No description of the algorithm I've seen so far mentioned anything about scanning the stack twice. They all only said, that when used concurrent, it is important that the invariant is enforced at all time, otherwise reachable objects are missed. But as far as I can see, that is not enough. The stack must be considered like a single big object and once scanned, the "stack is black" and every ref update of the stack must cause the object to be colored gray.
If that is really the case, using incremental updating may be more tricky than I initially thought and has some performance drawbacks, since stack changes are the most frequent ones of all.
A bit of terminology:
Let me give some names so that explanations are clearer.
A variable is any slot for data, which may contain a pointer and may change over time. This includes global variables, local variables, CPU registers and fields in allocated objects.
In a tricolor incremental or concurrent GC, there are three types of variables:
the true roots, which are always accessible (CPU registers, global variables);
the fast variables, which are scanned in a stop-the-world fashion;
the slow variables, which are handled with the colors. Slow variables are fields in colored objects.
The "true roots" and "fast variables" will be hereafter collectively called roots.
The application threads are called the mutators because they change the contents of variables.
With an incremental or concurrent GC, GC pauses occur regularly. The world is stopped (mutators are paused), and the roots are scanned. This scan reveals a number of references to colored objects. Object colors are adjusted accordingly (such white objects are made grey).
When the GC is incremental, some object scanning activity takes place: some grey objects are scanned (and painted black), greying referenced white objects. This activity (the "marking") is maintained for some time, but not necessarily as long as there are grey objects. At some point, the marking stops and the world is awakened. The GC is called "incremental" because the GC cycle is performed in small increments, interleaved with mutator activity.
In a concurrent GC, scanning of grey objects occurs concurrently with mutator activity. The world is then awakened as soon as the roots have been scanned. With a concurrent GC, access barriers are a tad complex to implement because they must handle concurrent access from the GC thread; but at a conceptual level, this is not very different from an incremental GC. A concurrent GC can be viewed as an optimization over incremental GC, which takes advantage of the presence of multiple CPU cores (a concurrent GC has little advantage over an incremental GC when there is only one core).
Roots need not be protected by an access barrier, since they are scanned with the world stopped. The GC mark phase ends when the following conditions are simultaneously met:
the roots have just been scanned;
all objects are either black or white, but not grey.
so this situation can occur only during a pause. At that point, the sweep phase begins, during which white objects are released. The sweep can be done incrementally or concurrently; objects created during the sweep are immediately painted black. When the sweep is finished, a new GC mark phase can take place: objects (which are all black at that point) are all repainted white (this is done atomically by simply changing the way color bits are interpreted).
Variable Classification:
With that being said, I can now answer to your question. With the description above, the question becomes: what are the roots ? This is actually up to the implementation; there are several possibilities, and trade-offs.
True roots must always be scanned; true roots are the CPU register contents and the global variables. Note that the stacks are not true roots; only the current stack frame pointer is.
Since fast variables are accessed without barriers, it is customary to make stack frames fast variables (i.e. roots as well). This is because while write accesses are rare system-wide, they are quite common in the local variables. It has been measured (on some Lisp programs) that about 99% of writes (of a pointer value) have a local variable as target.
Fast variables are often extended even further, in the case of a generational GC: the "young generation" consists in a special allocation area for new objects, limited in length, and scanned as fast variables. The bright side of fast variables is fast access (hence the name); the downside is that all these fast variables may be scanned only during a pause (the world is stopped). There is a trade-off on the size of the fast variables, which often translates to a limit on the young generation size. A larger young generation promotes average performance (by reducing the number of access barriers) at the cost of longer pauses.
At the other extreme, you may have no fast variable at all, and no root but the true roots. The stack frames are then handled as objects, each with their own color. Pauses are then minimal (a mere snapshot of a dozen register) but barriers must be used even for access to local variables. This is expensive, but has some advantages:
Hard guarantees on pause times can be made. This is difficult if stack frames are roots, because each new thread has its own stack, so the roots total size may grow up to arbitrary amounts as new threads are launched. If only CPU registers and global variables (no more than a few dozens in typical cases, and their number is known at compilation time) are roots, then pauses can be kept very short.
This allows for dynamic allocation of stack frames in the heap. This is needed if you play with co-routines and continuations, as with Scheme's call/cc primitive. In such a case, frames are no longer handled as a pure "stack". Proper handling of continuations in a GC-aware language mostly requires that function frames be allocated dynamically.
It is possible to make stack frames non-root while keeping a young generation as root. Guarantees on pause times can still be made (depending on the young generation size, which is fixed) and some trickery can be applied to make sure that stack frames are in the young generation when their function is active. This can ensure barrier-free access to local variables. None of this is really free, but it can be made efficient enough for most purposes.
Another Conceptual View:
Another way to view root-handling is the following: roots are the variables for which the tricolor rule (no black-to-white pointer) is not maintained at all times; these variables are allowed to be mutated without constraint. But they must be brought back in line regularly, by stopping the world and scanning them.
In practice, the mutators are racing with the GC. Mutators create new objects, and point to them; each pause creates new grey objects. In a concurrent or incremental GC, if you let the mutators play with roots for too long, then each pause may create a big batch of new grey objects. In the worst case, the GC cannot scan objects fast enough to keep up with the rate of grey object creation. This is an issue because white objects can be released only during the sweep phase, which is reached only if at some point the GC may complete its marking. A usual implementation strategy for an incremental GC is to scan grey objects, during each pause, for a total size which is proportional to the total size of roots. Thus, pause time remains bounded by the roots total size, and if the proportionality factor is well balanced then it can be guaranteed that the GC will ultimately terminate is marking phase and enter the sweeping phase.
In a concurrent GC, things are a bit more complex, because the mutators roam freely in the wild. A possible implementation would make a little bit of incremental marking while the world is still stopped.
Bibliography:
Garbage Collection: Algorithms for Automatic Dynamic Memory Management: a must-read book on garbage collection.
Thomas obviously has the best answer. However, I'm just going to add a little side-note here.
The root nodes can be conceptually thought of as black nodes, because any object referred to by a root node must by grey or black.
Therefore, in order to maintain the invariant, assignment to a root variably should automatically grey the object.

Why do garbage collectors freeze execution?

I was thinking about garbage collection on the way home, and I began wondering, why does the garbage collector totally freeze execution of a program? Personally I would have designed it to block any threads which try to allocate a new object, but threads which were running would be left alone.
I can't imagine any situation where this would be a problem compared to how a garbage collector currently works.
I was thinking about garbage collection on the way home, and I began wondering, why does the garbage collector totally freeze execution of a program?
There is a trade-off between latency and throughput in GC design. You can either process heap-allocated blocks individually ("incremental") or you can batch them up and process them all at the same time ("stop the world"). Fully incremental collection never totally freezes a program and it has very low latency but it also has very poor throughput. Stop the world garbage collectors have the worst possible latency (freezing the program for seconds or even minutes at a time) but near-optimal throughput.
All of the major production GCs today provide a middle ground, typically with generational collection with the per-thread nursery generations collected in batches and incremental or concurrent collection of the shared old generation. Thus, only nursery collections incur pauses and nursery size is bounded so pause times are kept low, e.g. 10-100ms in .NET with the workstation GC.
For a simple GC algorithm that never pauses, see Baker's Treadmill. For more information on garbage collection I highly recommend the Memory Management Reference and the Garbage Collection Handbook.
There is a lot of misinformation in the other answers here. Jon Skeet wrote some source code and started discussing it from the point of view of garbage collection. You need to be very careful doing this because there is little correspondence between source code and what the GC sees. The compiler does instruction block rearrangements, register allocation, promotion and so on, all of which affect what is visible to the GC at run time. In particular, scope in source code is not carried through to compiled code and is typically replaced with the related concept of liveness. Jon also wrote that you must pause in order to get the global roots. That is not strictly true although it is the most efficient way to get the global roots and the resulting pause is almost always tiny (sub-millisecond) because you're just copying less than a kB of stack from each thread.
Powerlord wrote that moving collectors must block reads and, therefore, all threads that read. This is also not true. The simplest counter example is immutable data: referential transparency means you can read from any copy safely.
Kico wrote that pauses are required to determine reachability. This is also not true. See Dijkstra's research about "on-the-fly" collectors and any recent real-time GC such as Stacatto.
Jerry Coffin wrote the best answer but moving isn't the reason GCs pause. There are GCs that don't move but do pause (e.g. HLVM's) and those that do move but don't pause (e.g. Stacatto).
Modern garbage collectors (in .NET and Java, anyway) don't actually "stop the world" - they do all kinds of clever things to collect concurrently.
However, you might want to consider a situation like this:
object x = null;
object y = new object();
...
x = y;
y = null;
Now, suppose the GC looks at x, then the lines below the ... run, and then the GC looks at y - it won't have seen any live objects... but the object should still be live.
Basically there needs to be a certain amount of pausing in order to get a consistent set of references. Then there's compaction, reference reassignment etc. However, it isn't nearly as bad as it used to be in terms of requiring everything to be stopped for the whole of the GC cycle. It does, however, get painful to think about :)
In addition to what Kico Lobo said, Garbage Collectors can also move things around in memory.
Therefore, they don't just have to block threads that write to memory, but also threads that read from memory.
Which is every thread.
Most GCs stop execution because objects can move in memory during a collection cycle (at least with most reasonably recent designs). That means either reading or writing almost any object at the wrong time can cause a problem.
There are collectors that have been designed around the idea of just blocking reads (or writes) to the specific parts of memory being modified at a given time, so as long as execution only uses objects that aren't (currently) being moved around, it can proceed unhindered. The problem is that most typical hardware doesn't provide efficient support for this, so even though they work in principle, they're fairly inefficient in practice. There has been at least one attempt at adapting that type of algorithm to use the write protection available in a typical paging unit, but I'm not aware of its having been used for much other than research and experimentation.
The primary alternative is to make the collector incremental -- i.e. have it do only a small amount of work at a time, so even though other execution gets stopped, it only has to stop for a little while at any given time.
With multi-core machines becoming so common, however, I'd expect to see more work put into garbage collection algorithms that can run in parallel with other execution. Up until recently, the primary emphasis was on minimizing the total time/effort spent on garbage collection. The growing number of cores available is likely to (often) mean that doing more total work in garbage collection may be easily justified, if doing so allows the mainstream of the code to run with fewer hindrances.
Edit: You might want to read Paul Wilson's Survey of Uniprocessor Garbage Collection Techniques. This isn't definitive (especially any more, given its age), but it's at least a reasonable starting point.
Because that's the only way it can assure that the refereces it is going to clean are not been used by anyone else.
If it didn´t freezed the execution, it could not assure that.

How much resources does an unremoved event listener consume?

Let's say I've got an event listener function that is listening for an event that will never be called again throughout the lifespan of the program. The listening object will never need to be garbage collected.
How much memory does this use?
If it's negligible, I'd rather not remove the listener because having the removeEventListener() statement makes my code less readable.
That depends entirely on how large and complex the listener is. In many cases, the memory impact is negligible, however, the object you're holding in memory may be keeping several other objects in memory. If one of them is a streaming video or something, it may be sucking on your memory, processor, and network.
You can also set useWeakReferences to true when you first add the event listeners. This makes the link between the listener and the event dispatcher weak so that the latter doesn't hold the prior in memory if it's deleted everywhere else. More on that here.
Still, it's never a good idea to leave objects in memory that are not going to be used again and there's no reason to avoid removeEventListener(). Striving for code readability before making it work correctly is never a good idea. If you're that concerned with the way your code looks, put the removeEventListener() calls inside a method called cleanupUnusedListeners() or something. Indeed, I would say that omitting it is less readable because when you're looking for the source of your memory leak, it will be harder to find the spot where you DIDN'T put a removeEventListener(). It may not be pretty but that's just the way it is, Jack.
It's negligible, unless you have thousands of them. Check out how EventDispatcher works and take a look at it's source code.