Terminology: opposite of "zero copy"? - terminology

We're benchmarking some code that we've converted to use sendfile(), the linux zero-copy system call. What's the term for the traditional read()/write() loop that sendfile() replaces? I.e., in our report I want to say "zerocopy is X millisecs, and ??? is Y millisecs." What word/phrase should I use?

"Traditional data copying approach"

Programmed I/O (PIO) would be appropriate, I think.

Just "copy"?......

Related

What's that CS "big word" term for the same action always having the same effect

There's a computer science term for this that escapes my head, one of those words that ends with "-icity".
It means something like a given action will always produce the same result, IE there won't be any hysteresis, or the action will not alter the functioning of the system...
Ring a bell, anyone? Thanks.
Apologies for the tagging, I'm only tagging it Java b/c I learned about this in a Java class back in school and I figure that crowd tends to have more CS background...
This could mean two different things:
deterministic - meaning that given the same initial state, the same operation (with exactly the same data) will always produce the same resulting state (and optional output.) - http://en.wikipedia.org/wiki/Deterministic_algorithm
i.e. same action has the same effect - assuming you start from the same place in the same system. (Nothing random about it, nothing fed in from the outside that could effect the result...)
idempotent - meaning applying a function to a value once e.g. f(x) = v produces the same result as applying the function multiple times e.g. f(f(f(x))) = v - http://en.wikipedia.org/wiki/Idempotence
i.e. one or more function applications yields the same value given the same initial value
you mean idempotent ??
Referential transparency is also used in some CS circles.
Nullipotent?
deterministic ,.,-=
Are you looking for invariant?
http://en.wikipedia.org/wiki/Invariant_%28computer_science%29
In computer science, a predicate is
called an invariant to a sequence of
operations if the predicate always
evaluates at the end of the sequence
to the same value as before starting
the sequence.
side effect-free?
In math, a function 'f' is idempotent if multiple applications do not change the result.
you mean idempotence?
or the action will not alter the functioning of the system...
Are you looking for ‘idempotence’?
The "ends with -icity" part of your question makes me think you might be looking for monotonicity, even though it does not quite match description/definition of the word. From the Wikipedia article:
In mathematics, a monotonic function (or monotone function) is a function which preserves the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of order theory.
In the following illustrations (also borrowed from the Wikipedia article) three functions are drawn:
A:
B:
C:
A and B and both monotonic (increasing and decreasing respectively), while C is not monotonic.
You mean an atomic block of code?
The A in ACID.
Atomicity - states that database modifications must follow an “all or nothing” rule. Each transaction is said to be “atomic.” If one part of the transaction fails, the entire transaction fails.
It sounds like what you're describing would be a memoryless function. Although the term memorylessness is usually used for stochastic distributions, I don't quite remember if it has a programming equivalent...

Any reason to use hex notation for null pointers?

I'm currently improving the part of our COM component that logs all external calls into a file. For pointers we write something like (IInterface*)0x12345678 with the value being equal to the actual address.
Currently no difference is made for null pointers - they are displayed as 0x0 which IMO is suboptimal and inelegant. Changing this behaviour is not a problem at all. But first I'd like to know - is there any real advantage in representing null pointers in hex?
In C or C++, you should be able to use the standard %p formatting code, which will then make your pointers look like everybody else's.
I'm not sure how null pointers are formatted in Win32 by %p, on Linux I think you get "null" or something similar.
Using the notation 0x0 (IMO) makes it clearer that it's referring to an address (even if it's not the internal representation of the null pointer). (In actual code, I prefer would using the NULL macro, though, but it sounds like you're talking specifically about debugging spew.)
It gives some context, just like I prefer using '\0' for the NUL-terminator.
It's a stylistic preference, though, so do what appeals to you (and to your colleagues).
Personally, I'd print 0x0 to the log file[*]. Some day when someone comes to parse the file automatically, the more uniform the data is the better. I don't find 0x0 difficult to read, so it seems silly to have a special case in the writer code, and another special case in the reader code, for no benefit that I can think of.
0x0 is preferable to 0 for grepping the log for NULLs, too: saves you having to figure out that you should be grepping for )0 or something funny.
I wouldn't write 0x0 for a null pointer constant in C or C++, though. I write non-null addresses so unbelievably rarely that there's nothing for the nulls to be uniform with. I guess if I was defining a bunch of constants to represent the memory map of some device, and the zero address was significant in that memory map, then I might write it 0x0 in that context.
[*] Or perhaps 0x00000000. I like 32-bit pointers to be printed 8 chars long, because when I read/remember a pointer I start out in pairs from the left. If it turns out to have 7 chars, I get horribly confused at the end ;-). 64-bit pointers it doesn't matter, because I can't remember a number that long anyway...
It's all positive zero in the end.
There is: You can always convert them back to a number (0), with no additional effort. And the only disadvantage is readability.
There is no reason to prefer (SomeType*)0x0 to (SomeType*)0.
As an aside: In C, the null pointer constant is a somewhat strange construct; the compiler recognizes (SomeType*)0 as "the null pointer", even if the internal representation on some machine might differ from the numerical value 0. It is more like NULL in SQL -- not a "real" pointer value. In practice, all machines I know of model the null pointer as the "0" address.
I am pretty sure the hex notation is a result of the layout of memory. Memory is word aligned, where a word is 32 bits if you are on a 32 bit processor. These words are segmented into pages, which are arranged in page tables, etc. etc. Hex notation is the only way to make sense of this arrangements (unless you really like using your calculator).
My opinion, is for readability, think about it, if you were to look at 0, what does that mean, does that mean its a unsigned integer, or if it was 0x0, then instinctively, it has something to do with binary notation, more likely platform dependent.
Since the tag is language agnostic, and the word 'null pointer', in Delphi/Object Pascal, it is 'nil', in C#, it is 'null', in C/C++ it is 'NULL'.
Look at for example in the C-FAQ, in Section 5 on NULL pointers, specifically, 5.4, 5.5, 5.6 and 5.7 to give you an insight into this.
In a nutshell, the usage and notation of null pointers is dependent on
What language is used?
Semantics and syntax of the language specifications.
What type of compiler?
Type of platform, in terms of how memory is accessed, the processor, bits...
Hope this helps,
Best regards,
Tom.

2D non-polynomial function fitting from the command line

I just wrote a simple Unix command line utility that could be implemented a lot more efficiently. I can measure its performance by just running it on a number of inputs and measuring the time it takes. This will produce a set of pairs of numbers, s t, where s is the input size and t the processing time. In order to determine the performance characteristics of my utility, I need to fit a function through these data points. I can do this manually, but I prefer to be lazy and let a utility do it for me.
Does such a utility exist?
Its input is a sequence of pairs of numbers.
Its output is a formula that expresses how the second number depends as a function on the first, plus an error measure.
One step of the way is to have a utility that does this just for polynomials.
This has been discussed here but it didn't produce a ready-to-use solution.
The next step is to extend the utility to try non-polynomial terms: negative-degree polynomials (as in y = 1/x) and logarithmic terms (as in y = x log x) will need to be tried as well. One idea to cope with the non-polynomial terms is to just surround the polynomial fitting with x and y scale transformations. I don't know whether that will do. This question is related but not exactly the same.
As I said, I'm lazy: I'm not looking for ideas on how to to write this myself, I'm looking for a reliable result of a project that has already done it for me. Any suggestions?
I believe that SAS has this, RS/1 has this, I think that Mathematica has this, Execel and most spreadsheets have a primitive form of this and usually there are add-ons available for more advanced forms. There are lots of Lab analysis and Statistical analysis tools that have stuff like this.
RE., Command Line Tools:
SAS, RS/1 and Minitab were all command line tools 20 years ago when I used them. I bet at least one of them still has this capability.

Can every recursion be converted into iteration?

A reddit thread brought up an apparently interesting question:
Tail recursive functions can trivially be converted into iterative functions. Other ones, can be transformed by using an explicit stack. Can every recursion be transformed into iteration?
The (counter?)example in the post is the pair:
(define (num-ways x y)
(case ((= x 0) 1)
((= y 0) 1)
(num-ways2 x y) ))
(define (num-ways2 x y)
(+ (num-ways (- x 1) y)
(num-ways x (- y 1))
Can you always turn a recursive function into an iterative one? Yes, absolutely, and the Church-Turing thesis proves it if memory serves. In lay terms, it states that what is computable by recursive functions is computable by an iterative model (such as the Turing machine) and vice versa. The thesis does not tell you precisely how to do the conversion, but it does say that it's definitely possible.
In many cases, converting a recursive function is easy. Knuth offers several techniques in "The Art of Computer Programming". And often, a thing computed recursively can be computed by a completely different approach in less time and space. The classic example of this is Fibonacci numbers or sequences thereof. You've surely met this problem in your degree plan.
On the flip side of this coin, we can certainly imagine a programming system so advanced as to treat a recursive definition of a formula as an invitation to memoize prior results, thus offering the speed benefit without the hassle of telling the computer exactly which steps to follow in the computation of a formula with a recursive definition. Dijkstra almost certainly did imagine such a system. He spent a long time trying to separate the implementation from the semantics of a programming language. Then again, his non-deterministic and multiprocessing programming languages are in a league above the practicing professional programmer.
In the final analysis, many functions are just plain easier to understand, read, and write in recursive form. Unless there's a compelling reason, you probably shouldn't (manually) convert these functions to an explicitly iterative algorithm. Your computer will handle that job correctly.
I can see one compelling reason. Suppose you've a prototype system in a super-high level language like [donning asbestos underwear] Scheme, Lisp, Haskell, OCaml, Perl, or Pascal. Suppose conditions are such that you need an implementation in C or Java. (Perhaps it's politics.) Then you could certainly have some functions written recursively but which, translated literally, would explode your runtime system. For example, infinite tail recursion is possible in Scheme, but the same idiom causes a problem for existing C environments. Another example is the use of lexically nested functions and static scope, which Pascal supports but C doesn't.
In these circumstances, you might try to overcome political resistance to the original language. You might find yourself reimplementing Lisp badly, as in Greenspun's (tongue-in-cheek) tenth law. Or you might just find a completely different approach to solution. But in any event, there is surely a way.
Is it always possible to write a non-recursive form for every recursive function?
Yes. A simple formal proof is to show that both µ recursion and a non-recursive calculus such as GOTO are both Turing complete. Since all Turing complete calculi are strictly equivalent in their expressive power, all recursive functions can be implemented by the non-recursive Turing-complete calculus.
Unfortunately, I’m unable to find a good, formal definition of GOTO online so here’s one:
A GOTO program is a sequence of commands P executed on a register machine such that P is one of the following:
HALT, which halts execution
r = r + 1 where r is any register
r = r – 1 where r is any register
GOTO x where x is a label
IF r ≠ 0 GOTO x where r is any register and x is a label
A label, followed by any of the above commands.
However, the conversions between recursive and non-recursive functions isn’t always trivial (except by mindless manual re-implementation of the call stack).
For further information see this answer.
Recursion is implemented as stacks or similar constructs in the actual interpreters or compilers. So you certainly can convert a recursive function to an iterative counterpart because that's how it's always done (if automatically). You'll just be duplicating the compiler's work in an ad-hoc and probably in a very ugly and inefficient manner.
Basically yes, in essence what you end up having to do is replace method calls (which implicitly push state onto the stack) into explicit stack pushes to remember where the 'previous call' had gotten up to, and then execute the 'called method' instead.
I'd imagine that the combination of a loop, a stack and a state-machine could be used for all scenarios by basically simulating the method calls. Whether or not this is going to be 'better' (either faster, or more efficient in some sense) is not really possible to say in general.
Recursive function execution flow can be represented as a tree.
The same logic can be done by a loop, which uses a data-structure to traverse that tree.
Depth-first traversal can be done using a stack, breadth-first traversal can be done using a queue.
So, the answer is: yes. Why: https://stackoverflow.com/a/531721/2128327.
Can any recursion be done in a single loop? Yes, because
a Turing machine does everything it does by executing a single loop:
fetch an instruction,
evaluate it,
goto 1.
Yes, using explicitly a stack (but recursion is far more pleasant to read, IMHO).
Yes, it's always possible to write a non-recursive version. The trivial solution is to use a stack data structure and simulate the recursive execution.
In principle it is always possible to remove recursion and replace it with iteration in a language that has infinite state both for data structures and for the call stack. This is a basic consequence of the Church-Turing thesis.
Given an actual programming language, the answer is not as obvious. The problem is that it is quite possible to have a language where the amount of memory that can be allocated in the program is limited but where the amount of call stack that can be used is unbounded (32-bit C where the address of stack variables is not accessible). In this case, recursion is more powerful simply because it has more memory it can use; there is not enough explicitly allocatable memory to emulate the call stack. For a detailed discussion on this, see this discussion.
All computable functions can be computed by Turing Machines and hence the recursive systems and Turing machines (iterative systems) are equivalent.
Sometimes replacing recursion is much easier than that. Recursion used to be the fashionable thing taught in CS in the 1990's, and so a lot of average developers from that time figured if you solved something with recursion, it was a better solution. So they would use recursion instead of looping backwards to reverse order, or silly things like that. So sometimes removing recursion is a simple "duh, that was obvious" type of exercise.
This is less of a problem now, as the fashion has shifted towards other technologies.
Recursion is nothing just calling the same function on the stack and once function dies out it is removed from the stack. So one can always use an explicit stack to manage this calling of the same operation using iteration.
So, yes all-recursive code can be converted to iteration.
Removing recursion is a complex problem and is feasible under well defined circumstances.
The below cases are among the easy:
tail recursion
direct linear recursion
Appart from the explicit stack, another pattern for converting recursion into iteration is with the use of a trampoline.
Here, the functions either return the final result, or a closure of the function call that it would otherwise have performed. Then, the initiating (trampolining) function keep invoking the closures returned until the final result is reached.
This approach works for mutually recursive functions, but I'm afraid it only works for tail-calls.
http://en.wikipedia.org/wiki/Trampoline_(computers)
I'd say yes - a function call is nothing but a goto and a stack operation (roughly speaking). All you need to do is imitate the stack that's built while invoking functions and do something similar as a goto (you may imitate gotos with languages that don't explicitly have this keyword too).
Have a look at the following entries on wikipedia, you can use them as a starting point to find a complete answer to your question.
Recursion in computer science
Recurrence relation
Follows a paragraph that may give you some hint on where to start:
Solving a recurrence relation means obtaining a closed-form solution: a non-recursive function of n.
Also have a look at the last paragraph of this entry.
It is possible to convert any recursive algorithm to a non-recursive
one, but often the logic is much more complex and doing so requires
the use of a stack. In fact, recursion itself uses a stack: the
function stack.
More Details: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Functions
tazzego, recursion means that a function will call itself whether you like it or not. When people are talking about whether or not things can be done without recursion, they mean this and you cannot say "no, that is not true, because I do not agree with the definition of recursion" as a valid statement.
With that in mind, just about everything else you say is nonsense. The only other thing that you say that is not nonsense is the idea that you cannot imagine programming without a callstack. That is something that had been done for decades until using a callstack became popular. Old versions of FORTRAN lacked a callstack and they worked just fine.
By the way, there exist Turing-complete languages that only implement recursion (e.g. SML) as a means of looping. There also exist Turing-complete languages that only implement iteration as a means of looping (e.g. FORTRAN IV). The Church-Turing thesis proves that anything possible in a recursion-only languages can be done in a non-recursive language and vica-versa by the fact that they both have the property of turing-completeness.
Here is an iterative algorithm:
def howmany(x,y)
a = {}
for n in (0..x+y)
for m in (0..n)
a[[m,n-m]] = if m==0 or n-m==0 then 1 else a[[m-1,n-m]] + a[[m,n-m-1]] end
end
end
return a[[x,y]]
end

What exactly is the danger of using magic debug values (such as 0xDEADBEEF) as literals?

It goes without saying that using hard-coded, hex literal pointers is a disaster:
int *i = 0xDEADBEEF;
// god knows if that location is available
However, what exactly is the danger in using hex literals as variable values?
int i = 0xDEADBEEF;
// what can go wrong?
If these values are indeed "dangerous" due to their use in various debugging scenarios, then this means that even if I do not use these literals, any program that during runtime happens to stumble upon one of these values might crash.
Anyone care to explain the real dangers of using hex literals?
Edit: just to clarify, I am not referring to the general use of constants in source code. I am specifically talking about debug-scenario issues that might come up to the use of hex values, with the specific example of 0xDEADBEEF.
There's no more danger in using a hex literal than any other kind of literal.
If your debugging session ends up executing data as code without you intending it to, you're in a world of pain anyway.
Of course, there's the normal "magic value" vs "well-named constant" code smell/cleanliness issue, but that's not really the sort of danger I think you're talking about.
With few exceptions, nothing is "constant".
We prefer to call them "slow variables" -- their value changes so slowly that we don't mind recompiling to change them.
However, we don't want to have many instances of 0x07 all through an application or a test script, where each instance has a different meaning.
We want to put a label on each constant that makes it totally unambiguous what it means.
if( x == 7 )
What does "7" mean in the above statement? Is it the same thing as
d = y / 7;
Is that the same meaning of "7"?
Test Cases are a slightly different problem. We don't need extensive, careful management of each instance of a numeric literal. Instead, we need documentation.
We can -- to an extent -- explain where "7" comes from by including a tiny bit of a hint in the code.
assertEquals( 7, someFunction(3,4), "Expected 7, see paragraph 7 of use case 7" );
A "constant" should be stated -- and named -- exactly once.
A "result" in a unit test isn't the same thing as a constant, and requires a little care in explaining where it came from.
A hex literal is no different than a decimal literal like 1. Any special significance of a value is due to the context of a particular program.
I believe the concern raised in the IP address formatting question earlier today was not related to the use of hex literals in general, but the specific use of 0xDEADBEEF. At least, that's the way I read it.
There is a concern with using 0xDEADBEEF in particular, though in my opinion it is a small one. The problem is that many debuggers and runtime systems have already co-opted this particular value as a marker value to indicate unallocated heap, bad pointers on the stack, etc.
I don't recall off the top of my head just which debugging and runtime systems use this particular value, but I have seen it used this way several times over the years. If you are debugging in one of these environments, the existence of the 0xDEADBEEF constant in your code will be indistinguishable from the values in unallocated RAM or whatever, so at best you will not have as useful RAM dumps, and at worst you will get warnings from the debugger.
Anyhow, that's what I think the original commenter meant when he told you it was bad for "use in various debugging scenarios."
There's no reason why you shouldn't assign 0xdeadbeef to a variable.
But woe betide the programmer who tries to assign decimal 3735928559, or octal 33653337357, or worst of all: binary 11011110101011011011111011101111.
Big Endian or Little Endian?
One danger is when constants are assigned to an array or structure with different sized members; the endian-ness of the compiler or machine (including JVM vs CLR) will affect the ordering of the bytes.
This issue is true of non-constant values, too, of course.
Here's an, admittedly contrived, example. What is the value of buffer[0] after the last line?
const int TEST[] = { 0x01BADA55, 0xDEADBEEF };
char buffer[BUFSZ];
memcpy( buffer, (void*)TEST, sizeof(TEST));
I don't see any problem with using it as a value. Its just a number after all.
There's no danger in using a hard-coded hex value for a pointer (like your first example) in the right context. In particular, when doing very low-level hardware development, this is the way you access memory-mapped registers. (Though it's best to give them names with a #define, for example.) But at the application level you shouldn't ever need to do an assignment like that.
I use CAFEBABE
I haven't seen it used by any debuggers before.
int *i = 0xDEADBEEF;
// god knows if that location is available
int i = 0xDEADBEEF;
// what can go wrong?
The danger that I see is the same in both cases: you've created a flag value that has no immediate context. There's nothing about i in either case that will let me know 100, 1000 or 10000 lines that there is a potentially critical flag value associated with it. What you've planted is a landmine bug that, if I don't remember to check for it in every possible use, I could be faced with a terrible debugging problem. Every use of i will now have to look like this:
if (i != 0xDEADBEEF) { // Curse the original designer to oblivion
// Actual useful work goes here
}
Repeat the above for all of the 7000 instances where you need to use i in your code.
Now, why is the above worse than this?
if (isIProperlyInitialized()) { // Which could just be a boolean
// Actual useful work goes here
}
At a minimum, I can spot several critical issues:
Spelling: I'm a terrible typist. How easily will you spot 0xDAEDBEEF in a code review? Or 0xDEADBEFF? On the other hand, I know that my compile will barf immediately on isIProperlyInitialised() (insert the obligatory s vs. z debate here).
Exposure of meaning. Rather than trying to hide your flags in the code, you've intentionally created a method that the rest of the code can see.
Opportunities for coupling. It's entirely possible that a pointer or reference is connected to a loosely defined cache. An initialization check could be overloaded to check first if the value is in cache, then to try to bring it back into cache and, if all that fails, return false.
In short, it's just as easy to write the code you really need as it is to create a mysterious magic value. The code-maintainer of the future (who quite likely will be you) will thank you.