What is differential execution? - language-agnostic

I stumbled upon a Stack Overflow question, How does differential execution work?, which has a VERY long and detailed answer. All of it made sense... but when I was done I still had no idea what the heck differential execution actually is. What is it really?

REVISED. This is my Nth attempt to explain it.
Suppose you have a simple deterministic procedure that executes repeatedly, always following the same sequence of statement executions or procedure calls.
The procedure calls themselves write anything they want sequentially to a FIFO, and they read the same number of bytes from the other end of the FIFO, like this:**
The procedures being called are using the FIFO as memory, because what they read is the same as what they wrote on the prior execution.
So if their arguments happen to be different this time from last time, they can see that, and do anything they want with that information.
To get it started, there has to be an initial execution in which only writing happens, no reading.
Symmetrically, there should be a final execution in which only reading happens, no writing.
So there is a "global" mode register containing two bits, one that enables reading and one that enables writing, like this:
The initial execution is done in mode 01, so only writing is done.
The procedure calls can see the mode, so they know there is no prior history.
If they want to create objects they can, and put the identifying information in the FIFO (no need to store in variables).
The intermediate executions are done in mode 11, so both reading and writing happen, and the procedure calls can detect data changes.
If there are objects to be kept up to date,
their identifying information is read from and written to the FIFO,
so they can be accessed and, if necessary, modified.
The final execution is done in mode 10, so only reading happens.
In that mode, the procedure calls know they are just cleaning up.
If there were any objects being maintained, their identifiers are read from the FIFO, and they can be deleted.
But real procedures do not always follow the same sequence.
They contain IF statements (and other ways of varying what they do).
How can that be handled?
The answer is a special kind of IF statement (and its terminating ENDIF statement).
Here's how it works.
It writes the boolean value of its test expression, and it reads the value that the test expression had last time.
That way, it can tell if the test expression has changed, and take action.
The action it takes is to temporarily alter the mode register.
Specifically, x is the prior value of the test expression, read from the FIFO (if reading is enabled, else 0), and y is the current value of the test expression, written to the FIFO (if writing is enabled).
(Actually, if writing is not enabled, the test expression is not even evaluated, and y is 0.)
Then x,y simply MASKs the mode register r,w.
So if the test expression has changed from True to False, the body is executed in read-only mode. Conversely if it has changed from False to True, the body is executed in write-only mode.
If the result is 00, the code inside the IF..ENDIF statement is skipped.
(You might want to think a bit about whether this covers all cases - it does.)
It may not be obvious, but these IF..ENDIF statements can be arbitrarily nested, and they can be extended to all other kinds of conditional statements like ELSE, SWITCH, WHILE, FOR, and even calling pointer-based functions. It is also the case that the procedure can be divided into sub-procedures to any extent desired, including recursive, as long as the mode is obeyed.
(There is a rule that must be followed, called the erase-mode rule, which is that in mode 10 no computation of any consequence, such as following a pointer or indexing an array, should be done. Conceptually, the reason is that mode 10 exists only for the purpose of getting rid of stuff.)
So it is an interesting control structure that can be exploited to detect changes, typically data changes, and take action on those changes.
Its use in graphical user interfaces is to keep some set of controls or other objects in agreement with program state information. For that use, the three modes are called SHOW(01), UPDATE(11), and ERASE(10).
The procedure is initially executed in SHOW mode, in which controls are created, and information relevant to them populates the FIFO.
Then any number of executions are made in UPDATE mode, where the controls are modified as necessary to stay up to date with program state.
Finally, there is an execution in ERASE mode, in which the controls are removed from the UI, and the FIFO is emptied.
The benefit of doing this is that, once you've written the procedure to create all the controls, as a function of the program's state, you don't have to write anything else to keep it updated or clean up afterward.
Anything you don't have to write means less opportunity to make mistakes.
(There is a straightforward way to handle user input events without having to write event handlers and create names for them. This is explained in one of the videos linked below.)
In terms of memory management, you don't have to make up variable names or data structure to hold the controls. It only uses enough storage at any one time for the currently visible controls, while the potentially visible controls can be unlimited. Also, there is never any concern about garbage collection of previously used controls - the FIFO acts as an automatic garbage collector.
In terms of performance, when it is creating, deleting, or modifying controls, it is spending time that needs to be spent anyway.
When it is simply updating controls, and there is no change, the cycles needed to do the reading, writing, and comparison are microscopic compared to altering controls.
Another performance and correctness consideration, relative to systems that update displays in response to events, is that such a system requires that every event be responded to, and none twice, otherwise the display will be incorrect, even though some event sequences may be self-canceling. Under differential execution, update passes may be performed as often or as seldom as desired, and the display is always correct at the end of a pass.
Here is an extremely abbreviated example where there are 4 buttons, of which buttons 2 and 3 are conditional on a boolean variable.
In the first pass, in Show mode, the boolean is false, so only buttons 1 and 4 appear.
Then the boolean is set to true and pass 2 is performed in Update mode, in which buttons 2 and 3 are instantiated and button 4 is moved, giving the same result as if the boolean had been true on the first pass.
Then the boolean is set false and pass 3 is performed in Update mode, causing buttons 2 and 3 to be removed and button 4 to move back up to where it was before.
Finally pass 4 is done in Erase mode, causing everything to disappear.
(In this example, the changes are undone in the reverse order as they were done, but that is not necessary. Changes can be made and unmade in any order.)
Note that, at all times, the FIFO, consisting of Old and New concatenated together, contains exactly the parameters of the visible buttons plus the boolean value.
The point of this is to show how a single "paint" procedure can also be used, without change, for arbitrary automatic incremental updating and erasing.
I hope it is clear that it works for arbitrary depth of sub-procedure calls, and arbitrary nesting of conditionals, including switch, while and for loops, calling pointer-based functions, etc.
If I have to explain that, then I'm open to potshots for making the explanation too complicated.
Finally, there are couple crude but short videos posted here.
** Technically, they have to read the same number of bytes they wrote last time. So, for example, they might have written a string preceded by a character count, and that's OK.
ADDED: It took me a long time to be sure this would always work.
I finally proved it.
It is based on a Sync property, roughly meaning that at any point in the program the number of bytes written on the prior pass equals the number read on the subsequent pass.
The idea behind the proof is to do it by induction on program length.
The toughest case to prove is the case of a section of program consisting of s1 followed by an IF(test) s2 ENDIF, where s1 and s2 are subsections of the program, each satisfying the Sync property.
To do it in text-only is eye-glazing, but here I've tried to diagram it:
It defines the Sync property, and shows the number of bytes written and read at each point in the code, and shows that they are equal.
The key points are that 1) the value of the test expression (0 or 1) read on the current pass must equal the value written on the prior pass, and 2) the condition of Sync(s2) is satisfied.
This satisfies the Sync property for the combined program.

I read all the stuff I can find and watched the video and will take a shot at a first-principles description.
Overview
This is a DSL-based design pattern for implementing user interfaces and perhaps other state-oriented subsystems in a clean, efficient manner. It focuses on the problem of changing the GUI configuration to match current program state, where that state includes the condition of GUI widgets themselves, e.g. the user selects tabs, radio buttons, and menu items, and widgets appear/disappear in arbitrarily complex ways.
Description
The pattern assumes:
A global collection C of objects that needs periodic updates.
A family of types for those objects, where instances have parameters.
A set of operations on C:
Add A P - Put a new object A into C with parameters P.
Modify A P - Change the parameters of object A in C to P.
Delete A - Remove object A from C.
An update of C consists of a sequence of such operations to transform C to a given target collection, say C'.
Given current collection C and target C', the goal is to find the update with minimum cost. Each operation has unit cost.
The set of possible collections is described in a domain-specific language (DSL) that has the following commands:
Create A H - Instantiate some object A, using optional hints H, and add it to the global state. (Note no parameters here.)
If B Then T Else F - Conditionally execute command sequence T or F based on Boolean function B, which can depend on anything in the running program.
In all the examples,
The global state is a GUI screen or window.
The objects are UI widgets. Types are button, dropdown box, text field, ...
Parameters control widget appearance and behavior.
Each update consists of adding, deleting, and modifying (e.g. relocating) any number of widgets in the GUI.
The Create commands are making widgets: buttons, dropdown boxes, ...
The Boolean functions depend on the underlying program state including the condition of GUI controls themselves. So changing a control can affect the screen.
Missing links
The inventor never explicitly states it, but a key idea is that we run the DSL interpreter over the program that represents all possible target collections (screens) every time we expect any combination of the Boolean function values B has changed. The interpreter handles the dirty work of making the collection (screen) consistent with the new B values by emitting a sequence of Add, Delete, and Modify operations.
There is a final hidden assumption: The DSL interpreter includes some algorithm that can provide the parameters for the Add and Modify operations based on the history of Creates executed so far during its current run. In the GUI context, this is the layout algorithm, and the Create hints are layout hints.
Punch line
The power of the technique lies in the way complexity is encapsulated in the DSL interpreter. A stupid interpreter would start by Deleting all the objects (widgets) in the collection (screen), then Add a new one for each Create command as it sees them while stepping through the DSL program. Modify would never occur.
Differential execution is just a smarter strategy for the interpreter. It amounts to keeping a serialized recording of the interpreter's last execution. This makes sense because the recording captures what's currently on the screen. During the current run, the interpreter consults the recording to make decisions about how to bring about the target collection (widget configuration) with operations having least cost. This boils out to never Deleting an object (widget) only to Add it again later for a cost of 2. DE will always Modify instead, which has a cost of 1. If we happen to run the interpreter in some case where the B values have not changed, the DE algorithm will generate no operations at all: the recorded stream already represents the target.
As the interpreter executes commands, it is also setting up the recording for its next run.
An analogous algorithm
The algorithm has the same flavor as minimum edit distance (MED). However DE is a simpler problem than MED because there are no "repeated characters" in the DE serialized execution strings as there are in MED. This means we can find an optimal solution with a straightforward on-line greedy algorithm rather than dynamic programming. That's what the inventor's algorithm does.
Strengths
My take is that this is a good pattern for implementing systems with many complex forms where you want total control over placement of widgets with your own layout algorithm and/or the "if else" logic of what's visible is deeply nested. If there are K nests of "if elses" N deep in the form logic, then there are K*2^N different layouts to get right. Traditional form design systems (at least the ones I've used) don't support larger K, N values very well at all. You tend to end up with large numbers of similar layouts and ad hoc logic to select them that's ugly and hard to maintain. This DSL pattern seems a way to avoid all that. In systems with enough forms to offset the DSL interpreter's cost, it would even be cheaper during initial implementation. Separation of concerns is also a strength. The DSL programs abstract the content of forms while the interpreter is the layout strategy, acting on hints from the DSL. Getting the DSL and layout hint design right seems like a significant and cool problem in itself.
Questionable...
I'm not sure that avoiding Delete/Add pairs in favor of Modify is worth all the trouble in modern systems. The inventor seems most proud of this optimization, but the more important idea is a concise DSL with conditionals to represent forms, with layout complexity isolated in the DSL interpreter.
Recap
The inventor's has so far has focused on deep details of how the interpreter makes its decisions. This is confusing because it's directed at trees while the forest is of greater interest. This is a description of the forest.

Related

DDD: The conondrum of Side-Effect-Free functions

I apologize for so many questions, but I felt that they make the most sense only when treated as a unit
Note - all quotes are from DDD: Tackling Complexity in the Heart of Software ( pages 250 and 251 )
1)
Operations can be broadly divided into two categories, commands and
queries.
...
Operations that return results without producing side effects are
called functions. A function can be called multiple times and return
the same value each time.
...
Obviously, you can't avoid commands in most software systems, but the
problem can be mitigated in two ways. First, you can keep the commands
and queries strictly segregated in different operations. Ensure that
the methods that cause changes do not return domain data and are kept
as simple as possible. Perform all queries and calculations in methods
that cause no observable side effects
a) Author implies that a query is a function since it doesn't produce side effects. He also notes that function will always return same value, by which I assume he means that for the same input we will always get the same output?
b) Assume we have a method QandC(int entityId) which queries for specific domain entity, from which it extracts certain values, which in turn are used to initialize a new Value Object and this VO is then returned to the caller. Isn't according to above quote QandC a function, since it doesn't change any state?
c) But author also argues that for same input a function will always produce same output, which isn't the case with QandC, since if we place several calls to QandC, it will produce different results, assuming that in the time between the two calls this entity was modified or even deleted. As such, how can we claim QandC is a function?
d)
Ensure that the methods that cause changes do not return domain data
...
Reason being that the state of returned non-VO may be changed in some future operations and as such the side effects of such methods are unpredictable?
e)
Ensure that the methods that cause changes do not return domain data
...
Is a query method that returns an entity still considered a function, even if it doesn't change any state?
2)
VALUE OBJECTS are immutable, which implies that, apart from
initializers called only during creation, all their operations are
functions.
...
An operation that mixes logic or calculations with state change
should be refactored into two separate operations. But by definition,
this segregation of side effects into simple command methods only
applies to ENTITIES. After completing the refactoring to separate
modification from querying, consider a second refactoring to move the
responsibility for the complex calculations into a VALUE OBJECT. The
side effect often can be completely eliminated by deriving a VALUE
OBJECT instead of changing existing state, or by moving the entire
responsibility into a VALUE OBJECT.
a)
VALUE OBJECTS are immutable, which implies that, apart from
initializers called only during creation, all their operations are
functions ... But by definition, this segregation of side effects into
simple command methods only applies to ENTITIES.
I think author is saying all methods defined on VOs are functions, which doesn't make sense, since even though a method defined on a VO can't change its own state, it still can change the state of other, non-VO objects?!
b) Assuming method defined on an entity doesn't change any state, do we consider such a method as being a function, even though it is defined on an entity?
c)
... consider a second refactoring to move the responsibility for the
complex calculations into a VALUE OBJECT.
Why is author suggesting we should only refactor from entities those function that perform complex calculations? Why instead shouldn't we also refactor simpler functions?
d)
... consider a second refactoring to move the responsibility for the
complex calculations into a VALUE OBJECT.
In any case, why is author suggesting we should refactor functions out of entities and place them inside VOs? Just because it makes it more apparent to the client that this operation MAY be a function?
e)
The side effect often can be completely eliminated by deriving a VALUE
OBJECT instead of changing existing state, or by moving the entire
responsibility into a VALUE OBJECT.
This doesn't make sense, since it appears author is arguing if we move a command ( ie operation which changes the state ) into a VO, then we will in essence eliminate any side-effects, even if command is changing the state. So any ideas, what was author actually trying to say?
UPDATE:
1b)
It depends on the perspective. A database query does not change state
and thus has no side effects, however it isn't deterministic by
nature, since as you point out the data can change. In the book, the
author is referring to functions associated with value object and
entities, which don't themselves make external calls. Therefore, the
rules don't apply to QandC.
So author was describing only functions that don't make external calls and as such QandC isn't a type of function that author was describing?
1c)
QandC does not itself change state - there are no side effects. The
underlying state may be changed out of band however. Due to this, it
is not a pure function.
But it also isn't the Side-Effect-Free function in the sense author defined them?
1d)
Again, this is based on CQS.
I know I'm repeating myself, but I assume discussion in the book is based on CQS and CQS doesn't consider QandC as Side Effect Free function due to a chance of entity returned by QandC having its state modified ( by some other operation ) sometime in the future?
1e)
It is considered a query from the CQRS perspective, but it cannot be
called a function in the sense that a pure function on a VO is a
function due to lack of determinism.
I don't quite understand what you were trying to say ( the confusing part is in bold ). Perhaps that while QandC is considered a query, it is not considered a function due to returning an entity and such the side-effects are unpredictable, which makes QandC a non-deterministic by nature
So author is only making those statements ( see quote in 1e ) under the implicit assumption that no operation defined in VO will ever try to change the state of non-VO objects?
2d)
Given that VOs are immutable, they are a fitting place to house pure
functions. This is another step towards freeing domain knowledge from
technical constraints.
I don't understand why moving function from entity to VO would help free domain knowledge from technical constraints ( I'm also not really sure what you mean by technical – technical as in technology-related or... )?
I assume other reason for putting function in VO is because it is that much more obvious ( to client ) that this is a function?
2e)
I view this as a hint towards event-sourcing. Instead of changing
existing state, you add a new event which represents the change. There
is still a net side effect, however existing state remains stable.
I must confess I know nothing about even-source programming, since I'd like to first wrap my head around DDD. Anyway, so author didn't imply that just moving a command to VO would automatically eliminate side-effects, but instead some additional actions would have to be taken ( such as implementing event-sourcing ), only he "forgot" to mention that part?
SECOND UPDATE:
2d)
One of the defining characteristics of an entity is its identity ....
By placing business logic into VOs you can consider it outside of the
context of an entity's identity. This makes it easier to test this
logic, among other things.
I somehwat understand the point you're making ( when thinking about the concept from distance ), but on the other hand I really don't. Why would function within an entity be influenced by an identity of this entity ( assuming this function is pure function, in other word it doesn't change state and is deterministic )?
2e)
Yes that is my understanding of it - there is still a net "side
effect". However, there are different ways to attain a side effect.
One way is to mutate existing state. Another way is to make the state
change explicit with an object representing that change.
I - Just to be sure ... From your answer I gather that author didn't imply that side-effects would be eliminated simply by moving a command into VO?
II - Ok,if I understand you correctly, we can move a command into VOs ( even though VOs shouldn't change the state of anything and as such shouldn't cause any side-effects ) and this command inside VO is still allowed to produce some sort of side effects, but this side effect is somehow more acceptable ( OR MORE CONTROLLABLE ) by making state change explicit ( which I interpret as the thing that changed is returned to the caller as VO )?
3) I must say that I still don't quite understand why state-changing method SC shouldn't return domain objects. Perhaps because non-VO may be changed in some future operations and as such the side effects of SC are very unpredictable?
THIRD UPDATE:
Delegating the management of state to the entity and the
implementation of behavior to VOs creates certain advantages. One is
basic partitioning of responsibilities.
a) You're saying that even though a method describes a behavior of an entity ( and thus entity containing this method adheres to SRP ) and as such belongs in the entity, it may still be a good idea to move it into VO? Thus in essence, we would partition a responsibility of an entity into two even smaller responsibilities?
b) But won't moving behavior into VO basically turn this entity into a mere data container ( I understand that entity will still manage its state, but still ... )?
thank you
1a) Yes. The discourse on separating queries from commands is based on the Command-query separation principle.
1b) It depends on the perspective. A database query does not change state and thus has no side effects, however it isn't deterministic by nature, since as you point out the data can change. In the book, the author is referring to functions associated with value object and entities, which don't themselves make external calls. Therefore, the rules don't apply to QandC. Determinism could be fabricated however, offering degrees of "pureness". For instance, a serializable transaction could be created which can ensure that data doesn't change for its duration.
1c) QandC does not itself change state - there are no side effects. The underlying state may be changed out of band however. Due to this, it is not a pure function. However, the restriction that QandC doesn't change state is still valuable. The value is fittingly demonstrated by CQRS which is the application of CQS in distributed scenarios.
1d) Again, this is based on CQS. Another take on this is the Tell-Don't-Ask principle. Given an understanding of these principles however, the rule can be bent IMO. A side-effecting method could return a VO representing the result for instance. However, in certain scenarios such as CQRS + Event Sourcing it could be desirable for commands to return void.
1e) It is considered a query from the CQRS perspective, but it cannot be called a function in the sense that a pure function on a VO is a function due to lack of determinism.
2a) No, a VO function shouldn't change state of anything, it should instead return a new object.
2b) Yes.
2c) Because functional purity tends to become more important in more complex scenarios. However, as you point out, isn't a clear and definitive rule. It shouldn't be based on complexity as much as it is based on the domain at hand.
2d) Given that VOs are immutable, they are a fitting place to house pure functions. This is another step towards freeing domain knowledge from technical constraints.
2e) I view this as a hint towards event-sourcing. Instead of changing existing state, you add a new event which represents the change. There is still a net side effect, however existing state remains stable.
UPDATE
1b) Yes.
1c) It is a side-effect free function, however it is not a deterministic function because it cannot be thought to always return the same value given the same input. For example, the function that returns the current time is a side-effect free function, but it certainly does not return the same value in subsequent calls.
1d) QandC can be thought of as side-effect free, but not pure. Another way to look at functional purity is as referential transparency - the ability to replace a function call by its value without changing program behavior. In other words, asking the question does not change the answer. QandC can guarantee that, but only within a context such as a transaction. So QandC can be thought of as a function, but only in a specific context.
1e) I think the confusing part is that the author is talking specifically about functions on VOs and entities - not database queries, where as we are talking about both. My statement extends the discussion to database queries and CQRS given certain restrictions, ie an ambient transaction.
2d) I can see how what I said was a bit vague, I was getting lazy. One of the defining characteristics of an entity is its identity. It maintains its identity throughout its life-cycle while its state may change. By placing business logic into VOs you can consider it outside of the context of an entity's identity. This makes it easier to test this logic, among other things.
2e) Yes that is my understanding of it - there is still a net "side effect". However, there are different ways to attain a side effect. One way is to mutate existing state. Another way is to make the state change explicit with an object representing that change.
UPDATE 2
2d) This particular point can be argued or can be a matter of preference. One perspective is the idea is based on the single-responsibility principle (SRP). The responsibility of an entity is the association of an identity with behavior and state. Behavior combines input with existing state to produce state transitions. Delegating the management of state to the entity and the implementation of behavior to VOs creates certain advantages. One is basic partitioning of responsibilities. Another is more subtle and perhaps more arguable. It is the idea that logic can be considered in a stateless manner. This allows thinking about such logic easier and more like thinking about a mathematical equation where all changes are explicit - no hidden state.
2e.1) Yes, eliminating a net side effect would alter behavior, which is not the goal.
2e.2) Yes.
3) Commands returning void have several advantages. One is that they become naturally more adept in async scenarios - no need to wait for a result. Another is that it allows you to represent the operation as a single command object - again, because there is no return value. This applies in CQRS and also event sourcing. In these cases, any command output is dispatched as an event instead of a result. But again, if these requirements don't apply returning a result object can be appropriate.
UPDATE 3
a) Yes, and this is a specific type of partitioning.
b) The responsibility of the entity is to coordinate behavior by delegating to VOs and applying the resulting state changes.

Check for equality before assigning?

Is it a good practice to assign a value only if it's not equal to the assignee? For example, would:
bool isVisible = false;
if(TextBox1.Visible != isVisible)
TextBox1.Visible = isVisible;
be more desirable than:
bool isVisible = false;
TextBox1.Visible = isVisible;
Furthermore, does the answer depend on the data type, like an object with a costlier Equals method versus an object with a costlier assignment method?
From a readability standpoint, I'd definitely prefer the second way -- just assign the darn thing.
Some object properties have semantics that require that assigning a value the property already holds will have a particular effect. For example, setting an object's "text" may force a redraw, even if the value doesn't change. When dealing with such objects, unless one wants to force the action to take place, one should often test and set if unequal.
Generally, with fields, there is no advantage to doing a comparison before a set. There is one notable exception, however: if many concurrently-running threads will be wanting to set a field to the same value, and it's likely to already hold that value, caching behavior may be very bad if all the threads are unconditionally writing to that field, since a processor that wants to write to it will have to acquire the cache line from the most recent processor that wrote it. By contrast, if all the processors are simply reading the field and deciding to leave it alone, they can all share the cache line, resulting in much better performance.
Your instincts seem about right - it depends on the cost of the operations.
In your example, making a text box visible or invisible, the cost of the test is imperceptible (just check a bit in the window structure) and the cost of assignment is also typically imperceptible (repaint the window). In fact, if you set the "visible" bit to its existing value you'll still incur the function call cost, but the window manager will check the bit and return immediately. In this case, just go ahead and assign it.
But in other cases it might matter. For example, if you have a cached copy of a long string or binary object, and whenever you assign a new value it gets saved back to a database. Then you might find that the cost of testing for equality every time is worth it to save unnecessary writes to the database. No doubt you can imagine more expensive scenarios.
So in the general case you've got at least these primary variables to consider: the cost of the test, the cost of the assignment, and the relative frequencies of assigning a new value versus assigning the same value.

cache behaviour on redundant writes

Edit - I guess the question I asked was too long so I'm making it very specific.
Question: If a memory location is in the L1 cache and not marked dirty. Suppose it has a value X. What happens if you try to write X to the same location? Is there any CPU that would see that such a write is redundant and skip it?
For example is there an optimization which compares the two values and discards a redundant write back to the main memory? Specifically how do mainstream processors handle this? What about when the value is a special value like 0? If there's no such optimization even for a special value like 0, is there a reason?
Motivation: We have a buffer that can easily fit in the cache. Multiple threads could potentially use it by recycling amongst themselves. Each use involves writing to n locations (not necessarily contiguous) in the buffer. Recycling simply implies setting all values to 0. Each time we recycle, size-n locations are already 0. To me it seems (intuitively) that avoiding so many redundant write backs would make the recycling process faster and hence the question.
Doing this in code wouldn't make sense, since branch instruction itself might cause an unnecessary cache miss (if (buf[i]) {...} )
I am not aware of any processor that does the optimization you describe - eliminating writes to clean cache lines that would not change the value - but it's a good question, a good idea, great minds think alike and all that.
I wrote a great big reply, and then I remembered: this is called "Silent Stores" in the literature. See "Silent Stores for Free", K. Lepak and M Lipasti, UWisc, MICRO-33, 2000.
Anyway, in my reply I described some of the implementation issues.
By the way, topics like this are often discussed in the USEnet newsgroup comp.arch.
I also write about them on my wiki, http://comp-arch.net
Your suggested hardware optimization would not reduce the latency. Consider the operations at the lowest level:
The old value at the location is loaded from the cache to the CPU (assuming it is already in the cache).
The old and new values are compared.
If the old and new values are different, the new value is written to the cache. Otherwise it is ignored.
Step 1 may actually take longer time than steps 2 and 3. It is because steps 2 and 3 cannot start until the old value from step 1 has been brought into the CPU. The situation would be the same if it was implemented in software.
Consider if we simply write the new values to the cache, without checking the old value. It is actually faster than the three-step process mentioned above, for two reasons. Firstly, there is no need to wait for the old value. Secondly, the CPU can simply schedule the write operation in an output buffer. The output buffer can perform the cache write simutaneously while the ALU can start working on something else.
So far, the only latencies involved are that of between the CPU and the cache, not between the cache and the main memory.
The situation is more complicated in modern-day microprocessors, because their cache is organized into cache-lines. When a byte value is written to a cache-line, the complete cache-line has to be loaded because the other part of the cache-line that is not rewritten has to keep its old values.
http://blogs.amd.com/developer/tag/sse4a/
Read
Cache hit: Data is read from the cache line to the target register
Cache miss: Data is moved from memory to the cache, and read into the target register
Write
Cache hit: Data is moved from the register to the cache line
Cache miss: The cache line is fetched into the cache, and the data from the register is moved to the cache line
This is not an answer to your original question on computer-architecture, but might be relevant to your goal.
In this discussion, all array index starts with zero.
Assuming n is much smaller than size, change your algorithm so that it saves two pieces of information:
An array of size
An array of n, and a counter, used to emulate a set container. Duplicate values allowed.
Every time a non-zero value is written to the index k in the full-size array, insert the value k to the set container.
When the full-size array needs to be cleared, get each value stored in the set container (which will contain k, among others), and set each corresponding index in the full-size array to zero.
A similar technique, known as a two-level histogram or radix histogram, can also be used.
Two pieces of information are stored:
An array of size
An boolean array of ceil(size / M), where M is the radix. ceil is the ceiling function.
Every time a non-zero value is written to index k in the full-size array, the element floor(k / M) in the boolean array should be marked.
Let's say, bool_array[j] is marked. This corresponds to the range from j*M to (j+1)*M-1 in the full-size array.
When the full-size array needs to be cleared, scan the boolean array for any marked elements, and its corresponding range in the full-size array should be cleared.

High-level/semantic optimization

I'm writing a compiler, and I'm looking for resources on optimization. I'm compiling to machine code, so anything at runtime is out of the question.
What I've been looking for lately is less code optimization and more semantic/high-level optimization. For example:
free(malloc(400)); // should be completely optimized away
Even if these functions were completely inlined, they could eventually call OS memory functions which can never be inlined. I'd love to be able to eliminate that statement completely without building special-case rules into the compiler (after all, malloc is just another function).
Another example:
string Parenthesize(string str) {
StringBuilder b; // similar to C#'s class of the same name
foreach(str : ["(", str, ")"])
b.Append(str);
return b.Render();
}
In this situation I'd love to be able to initialize b's capacity to str.Length + 2 (enough to exactly hold the result, without wasting memory).
To be completely honest, I have no idea where to begin in tackling this problem, so I was hoping for somewhere to get started. Has there been any work done in similar areas? Are there any compilers that have implemented anything like this in a general sense?
To do an optimization across 2 or more operations, you have to understand the
algebraic relationship of those two operations. If you view operations
in their problem domain, they often have such relationships.
Your free(malloc(400)) is possible because free and malloc are inverses
in the storage allocation domain.
Lots of operations have inverses and teaching the compiler that they are inverses,
and demonstrating that the results of one dataflow unconditionally into the other,
is what is needed. You have to make sure that your inverses really are inverses
and there isn't a surprise somewhere; a/x*x looks like just the value a,
but if x is zero you get a trap. If you don't care about the trap, it is an inverse;
if you do care about the trap then the optimization is more complex:
(if (x==0) then trap() else a)
which is still a good optimization if you think divide is expensive.
Other "algebraic" relationships are possible. For instance, there are
may idempotent operations: zeroing a variable (setting anything to the same
value repeatedly), etc. There are operations where one operand acts
like an identity element; X+0 ==> X for any 0. If X and 0 are matrices,
this is still true and a big time savings.
Other optimizations can occur when you can reason abstractly about what the code
is doing. "Abstract interpretation" is a set of techniques for reasoning about
values by classifying results into various interesting bins (e.g., this integer
is unknown, zero, negative, or positive). To do this you need to decide what
bins are helpful, and then compute the abstract value at each point. This is useful
when there are tests on categories (e.g., "if (x<0) { ... " and you know
abstractly that x is less than zero; you can them optimize away the conditional.
Another way is to define what a computation is doing symbolically, and simulate the computation to see the outcome. That is how you computed the effective size of the required buffer; you computed the buffer size symbolically before the loop started,
and simulated the effect of executing the loop for all iterations.
For this you need to be able to construct symbolic formulas
representing program properties, compose such formulas, and often simplify
such formulas when they get unusably complex (kinds of fades into the abstract
interpretation scheme). You also want such symbolic computation to take into
account the algebraic properties I described above. Tools that do this well are good at constructing formulas, and program transformation systems are often good foundations for this. One source-to-source program transformation system that can be used to do this
is the DMS Software Reengineering Toolkit.
What's hard is to decide which optimizations are worth doing, because you can end
of keeping track of vast amounts of stuff, which may not pay off. Computer cycles
are getting cheaper, and so it makes sense to track more properties of the code in the compiler.
The Broadway framework might be in the vein of what you're looking for. Papers on "source-to-source transformation" will probably also be enlightening.

Automatic spell checking of words in a text

[EDIT]In Short: How would you write an automatic spell checker? The idea is that the checker builds a list of words from a known good source (a dictionary) and automatically adds new words when they are used often enough. Words which haven't been used a while should be phased out. So if I delete part of a scene which contains "Mungrohyperiofier", the checker should remember it for a while and when I type "Mung<Ctrl+Space>" in another scene, it should offer it again. If I don't use the word for, say, a few days, it should forget about it.
At the same time, I'd like to avoid adding typos to the dictionary.[/EDIT]
I want to write a text editor for SciFi stories. The editor should offer word completion for any word used anywhere in the current story. It will only offer a single scene of the story for editing (so you can easily move scenes around).
This means I have three sets:
The set of all words in all other scenes
The set of word in the current scene before I started editing it
The set of words in the current editor
I need to store the sets somewhere as it would be too expensive to build the list from scratch every time. I think a simple plain text file with one-word-per-line is enough for that.
As the user edits the scene, we have these situations:
She deletes a word. This word is not used anywhere else in the current scene.
She types a word which is new
She types a word which already exists
She types a word which already exists but makes a typo
She corrects a typo in a word which is in set #2.
She corrects a typo in a word which is in set #1 (i.e. the typo is elsewhere, too).
She deletes a word which she plans to use again. After the deletion, the word is no longer in the sets #1 and #3, though.
The obvious strategy would be to rebuilt the word sets when a scene is saved and build the set #1 from a word-list file per scene.
So my question is: Is there a clever strategy to keep words which aren't used anywhere anymore but still be able to phase out typos? If possible, this strategy should work in the background without the user even noticing what is going on (i.e. I want to avoid to have to grab the mouse to select "add word to dictionary" from the menu).
[EDIT] Based on a comment from grieve
So you want to write a spelling checker. Here's Peter Norvig's paper about writing a spelling corrector. It describes a simple and robust spelling corrector. You can use the already-written part of the book, plus a reference list (say from a free dictionary) for the language model.
I would also go to existing open-source spelling checkers, such as aspell and hunspell, to get some ideas.
The structure you should use is a trie. Tail/suffix compression will help with memory. You can use a pseudo reference counting GC for keeping track of usage.
For the actual nodes, you would probably need no more than a 32-bit integer, 21-bits for unicode, and the rest for various other tags and information.
Reminds me of what I have been told about garbage collecting in modern LISP implementations :
data when created is put in "pool 1",
when there is a need to garbage collect the garbage collector look in pool 1 for unused entries and remove them.
Then any remaining entry is moved to pool 2.
Pool 2 is examined only when there is a need to more memory than pool 1 can release.
Data from pool 2 that survive a garbage collection is put in pool 3 and ... so on.
The idea is to put dynamically the data in a pool corresponding to its lifetime...