Copy on Write - COW - what happened after the page is modified

Copy on Write - COW - what happened after the page is modified - copy-on-write

I'm learning the copy-on-write technique. I can understand that parent and child process share the same address space. When the parent or child want to modify the page, so that page will be copied to private memory of process then modified it.
So my question is, assume that child process is modified the page, then complete and terminate. How the modified data? is it still there and visible to parent process and other child processes?
In short, if child process modified the page, and what happen next to parent and other child processes for that modified page/data?
I read the COW concepts and understand it basic principles but not sure how deep I understand.

In short - the parent does not have access to the child process data. Neither do any other siblings. The moment the child process terminates, all its modifications is lost.
Remember, COW is just an optimization. From the processes point of view, they don't even realize it is copy on write. From their perspective, each process has its own copy of the memory space.
Long answer, what happens behind the scenes:
*Note, I am simplifying some corner case, not everything is 100% accurate but this is the idea.
Each process has its own page table, which maps process virtual addresses to physical pages.
At some point, the parent process calls fork. At this step, a child process is created, its VMA descriptors are duplicated (there are certain rules on how that is done, with intermediate chain etc, not going to deep dive into this). What is important is that at this stage, child and parent virtual addresses are pointing to the same physical addresses.
Next, all pages are made read only.
At this point, if either child or parent try to write to a certain page, it will cause a page fault. They can read freely, however.
Now assume child writes to a page. This causes a page fault. This page fault is caught by the kernel. So the kernel understands it is COW now. What it does is creating a separate copy of a physical page for child.
So at this point, child and parent have same virtual address pointing to two different physical addresses.
That should answer your question. Parent cannot access other process physical pages. The virtual address is the same, but that does not matter. When child dies, its pages are recycled, and all changes are lost.

Related

Google realtime object pool

This question is a little "meta" for SO, but there doesn't seem to be a better place to ask it...
According to Google, realtime collaborative objects are never deleted from the model. So it makes sense to pool objects where possible, rather than not-really-delete them and subsequently create new ones, thus preventing an unnecessary increase in file-size and overhead.
And here's the problem: in an "undo" scenario, this would mean pulling a deleted object out of the trash pool. But "undo" only applies to operations by the local user, and I can't see how the realtime engine could cope if that "deleted" object had already been claimed by a different user.
My question is, am I missing something or wrong-thinking, and/or is there an alternative to a per-user pool?
(It also occurs to me that as a feature, the API could handle pooling deleted objects, automatically minimizing file-bloat.)

I think you have to be very careful about reusing objects in the way you describe. Its really hard to get right. Are you actually running into size issues? In general as long as you don't constantly create and throw out objects, it shouldn't be a big deal.
You can delete the contents of the collab object when its not being used to free up space. That should generally be enough.
(Note, yes, the API could theoretically handle this object cleanup automatically. It turns out to be a really tricky problem to get right, do to features like undo. It might show up as a future feature if it becomes a real issue for people.)

Adding to Cheryl's answer, the one thing that I see as particularly challenging (actually, impossible) is the pulling-an-object-from-the-pool stuff:
Let's say you have a pool of objects, which (currently) contains a single object O1.
When a client needs a new object it will first check the pool. if the pool is not empty it will pull an object from there (the O1 object) and use it, right?
Now, consider the scenario where two clients (a.k.a, editors/collaborators) need a new object at the same time. Each of these clients will run the logic described in the previous paragraph. That is: both clients will check whether the pool is empty and both clients will pull O1 off of the pull.
So, the loosing client will "think" for some time that it succeeded. it will grab an object from the pool and will do some things with it. later on it will receive an event (E) that tells it that the object was actually pulled by another client. At this point the "loosing" client will need to create another object and re-apply whatever changes it did to the first object to this second object.
Given that you do not know if/when the (E) event is going to fire it actually means that every client needs to be prepared to replace every collaborative object it uses with a new one. This seems quite difficult. Making it more difficult is the fact that you cannot do model changes from event handlers (as this will trump the redo/undo stack). So the actual reaction to the (E) event need to be carried out outside of the (E) event handler. Thus, in the time between the receiving of the (E) event and the fix to the model, your UI layer will not be able to use the model.

Does Removing a component also remove the children present in it?

I am working on how to reduce the memory usage in the code and got to know that removing the component also remove its children present inside it.If this happens the memory usage must decrease but it is increasing.
I have a titlewindow which contains of hboxes and those hboxes have canvases as children which contains images. Now if i use removeChild(titlewindow)
Does all the hboxes, canvases and images present in it gets removed or not?
If gets removed the memory usage is reduceed or not? How can i do that in flex?

Yeah, everything pretty much gets removed with it, as long as you then set the value of titleWindow to null and don't ever re-add those children. As for whether this clears out any memory or not, it basically will under two conditions:
The garbage collector runs afterwards. This can be expensive, and thus Adobe's designed it to not necessarily just keep happening over and over again at regular intervals. Instead it tends to happen when Flash Player or AIR is running out of memory in its current heap, at which point the garbage collector will check first to see if it can free up enough space within the current heap before anything more is grabbed from the operating system.
You don't have any non-orphaned references to these children anywhere else. By "non-orphaned", I mean that if the only places where you still have references to them are themselves without any references in the rest of your program, this condition is still met.
There is at least one exception to this rule, and that is that the garbage collector can single out multiple objects in your program as GCRoots. A GCRoot is never garbage-collected, period. So if you orphan off a GCRoot (make it so that neither it nor any of its descendants have any references anywhere outside of themselves), the garbage collector basically just doesn't care. The GCRoot will be left in there, and any references it has to any objects are thus considered live and active. Additionally there are certain occasions when the garbage collector will simply not be able to tell whether something in memory is a reference or not, so it'll just assume that it is and potentially fail to delete something. Usually this is not a problem, but if your program is big enough and is not doing a lot of object-pooling, I can tell you from experience that reacting specifically to this can on rare occasions be a necessity.

Try setting the titlewindow to null after removing them:
removeChild(titlewindow);
titlewindow = null;
The garbage collector will remove all your boxes from memory if there are no more references to them from your main code. It should be okay to ignore explicitly removing the children, as long as the only references to them are from the parent, i.e. titlewindow and its children are an isolated group of objects. But make sure you also remove any event listeners that anything might have registered to with removeEventListener().
Also, there is no guarantee when the garbage collector actually runs, so if it looks like your memory is increasing, it might just mean the GC hasn't had a chance to clear up the memory yet. Here's an SO question on how to force GC to run. (when debugging, System.gc() usually works for me).

Multiprocessing data sharing

I was wondering how does Google Chrome works in regards to its multiprocess architecture. From what I understand there is one process which renders everything and every page has one additional process associated with it. My question is, if a page loads 100MB picture how does it pass it to the renderer process?
In other words, what is the fastest way to pass (copy?) data from one process to another?
In yet another words, if one process produces 100 MB of data, how to let another process to read it? (Note that the data was produces after the process forked).
Edit: If the child process creates the data and the parent process doesn't know in advance the size of the data, how to pass the data from child to parent? I mean, "shared block of memory" has to created by the parent, right? So how much does the parent know how much of space to allocate?

General name for this is IPC - Inter Process Communication.
http://en.wikipedia.org/wiki/Inter-process_communication
Now I do not know how chrome implements it but I hope you get the idea. If I had to choose one I'd say memory sharing or pipe but it could be (almost) any of those.

Clojure agents: rate limiting?

Okay, so I have this small procedural SVG editor in Clojure.
It has a code pane where the user creates code that generates a SVG document, and a preview pane. The preview pane is updated whenever the code changes.
Right now, on a text change event, the code gets recompiled on the UI thread (Ewwwww!) and the preview pane updated. The compilation step should instead happen asynchronously, and agents seem a good answer to that problem: ask an agent to recompile the code on an update, and pass the result to the image pane.
I have not yet used agents, and I do not know whether they work with an implicit queue, but I suspect so. In my case, I have zero interest in computing "intermediate" steps (think about fast keystrokes: if a keystroke happens before a recompilation has been started, I simply want to discard the recompilation) -- ie I want a send to overwrite any pending agent computation.
How do I make that happen? Any hints? Or even a code sample? Is my rambling even making sense?
Thanks!

You describe a problem that has more to deal with execution flow control rather than shared state management. Hence, you might want to leave STM apart for a moment and look into futures: they're still executed in a thread pool as agents, but instead of agents they can be stopped by calling future-cancel, and inspecting their status with future-cancelled?.
There are no strong guarantees that the thread the future is executing can be effectively stopped. Still, your code will be able to try to cancel the future, and move on to schedule the next recompilation.

agents to indeed work on a queue, so each function gets the state of the agent and produces the next state of the agent. Agents track an identity over time. this sounds like a little more than you need, atoms are a slightly better fit for your task and used in a very similar manner.

Why people always encourage single js for a website?

I read some website development materials on the Web and every time a person is asking for the organization of a website's js, css, html and php files, people suggest single js for the whole website. And the argument is the speed.
I clearly understand the fewer request there is, the faster the page is responded. But I never understand the single js argument. Suppose you have 10 webpages and each webpage needs a js function to manipulate the dom objects on it. Putting 10 functions in a single js and let that js execute on every single webpage, 9 out of 10 functions are doing useless work. There is CPU time wasting on searching for non-existing dom objects.
I know that CPU time on individual client machine is very trivial comparing to bandwidth on single server machine. I am not saying that you should have many js files on a single webpage. But I don't see anything go wrong if every webpage refers to 1 to 3 js files and those js files are cached in client machine. There are many good ways to do caching. For example, you can use expire date or you can include version number in your js file name. Comparing to mess the functionality in a big js file for all needs of many webpages of a website, I far more prefer split js code into smaller files.
Any criticism/agreement on my argument? Am I wrong? Thank you for your suggestion.

A function does 0 work unless called. So 9 empty functions are 0 work, just a little exact space.
A client only has to make 1 request to download 1 big JS file, then it is cached on every other page load. Less work than making a small request on every single page.

I'll give you the answer I always give: it depends.
Combining everything into one file has many great benefits, including:
less network traffic - you might be retrieving one file, but you're sending/receiving multiple packets and each transaction has a series of SYN, SYN-ACK, and ACK messages sent across TCP. A large majority of the transfer time is establishing the session and there is a lot of overhead in the packet headers.
one location/manageability - although you may only have a few files, it's easy for functions (and class objects) to grow between versions. When you do the multiple file approach sometimes functions from one file call functions/objects from another file (ex. ajax in one file, then arithmetic functions in another - your arithmetic functions might grow to need to call the ajax and have a certain variable type returned). What ends up happening is that your set of files needs to be seen as one version, rather than each file being it's own version. Things get hairy down the road if you don't have good management in place and it's easy to fall out of line with Javascript files, which are always changing. Having one file makes it easy to manage the version between each of your pages across your (1 to many) websites.
Other topics to consider:
dormant code - you might think that the uncalled functions are potentially reducing performance by taking up space in memory and you'd be right, however this performance is so so so so minuscule, that it doesn't matter. Functions are indexed in memory and while the index table may increase, it's super trivial when dealing with small projects, especially given the hardware today.
memory leaks - this is probably the largest reason why you wouldn't want to combine all the code, however this is such a small issue given the amount of memory in systems today and the better garbage collection browsers have. Also, this is something that you, as a programmer, have the ability to control. Quality code leads to less problems like this.
Why it depends?
While it's easy to say throw all your code into one file, that would be wrong. It depends on how large your code is, how many functions, who maintains it, etc. Surely you wouldn't pack your locally written functions into the JQuery package and you may have different programmers that maintain different blocks of code - it depends on your setup.
It also depends on size. Some programmers embed the encoded images as ASCII in their files to reduce the number of files sent. These can bloat files. Surely you don't want to package everything into 1 50MB file. Especially if there are core functions that are needed for the page to load.
So to bring my response to a close, we'd need more information about your setup because it depends. Surely 3 files is acceptable regardless of size, combining where you would see fit. It probably wouldn't really hurt network traffic, but 50 files is more unreasonable. I use the hand rule (no more than 5), but surely you'll see a benefit combining those 5 1KB files into 1 5KB file.

Two reasons that I can think of:
Less network latency. Each .js requires another request/response to the server it's downloaded from.
More bytes on the wire and more memory. If it's a single file you can strip out unnecessary characters and minify the whole thing.

The Javascript should be designed so that the extra functions don't execute at all unless they're needed.
For example, you can define a set of functions in your script but only call them in (very short) inline <script> blocks in the pages themselves.

My line of thought is that you have less requests. When you make request in the header of the page it stalls the output of the rest of the page. The user agent cannot render the rest of the page until the javascript files have been obtained. Also javascript files download sycronously, they queue up instead of pull at once (at least that is the theory).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008