Octave force deepcopy - octave

The question
What are the ways of coercing octave to create a real copy of whatever object? Structures are the main interest.
My underlying problem
In my problem I'm obtaining a rather large structure from another function in a loop but for the current task only a few pieces of it are needed. For example:
for i=1:many
res=solver(params);
store1{i}=res.string1;
store2{i}=res.arr(:,1);
end
res is a sizable chunk of data and due to lazy-copy those store-s are references to tiny portions of bytes in that chunk. After I store those tiny portions, I don't need res itself, however, since middle of that chunk is referenced by store, the memory area is unfit for res obtained on the next iteration (they are of the same size) and thus another sizable piece of memory is allocated, which is then again crossed by few tiny links an so on.
Without storing parts of res, the program successfully keeps the memory consumption same after first couple of iterations.
So how do I make a complete copy of structure field?
I've tried using struct-related functions like rmfield but those keep references instead of their own objects.
I've tried to wrap the assignment of in its own function:
new_struct=copy( rmfield(old_struct,"bigdata"));
function c=copy(a);
c=a;
end;
This by the way doesn't work even for arrays.
I'm interested in method applicable to any generic variable.
Minimal working example of the problem
a=cell(3,1);
for i=1:length(a);
r=rand(100000,1000);
a{i}=r(1:100,end);
whos; fflush(stdout);
pause(2);
end;
The above code will cause memory usage to gradually grow by far more than 8.08 kb reported by whos due to references stored by a{i} blocking bigger memory block than they actually need. If you force the proper copy, the problem is not present.

Numerical arrays
For numeric types addition of zero is enough to warrant a new array.
c=a+0;
Strings
For string which is 1 x n char array, something along the following lines will work:
c=[a "a"](1:end-1);
Multidimensional char arrays will require concatenation with a column:
c=[a true(size(a,1),1)](:,1:end-1);
Here true is used to generate dummy array of size compatible with char. (There seems to be no procedural method of generating char array of arbitrary size) char(zeros(size(a,1),1)) and char(true(size(a,1),1)) caused excess memory usage during their creation on some calls.
Note that empty concatenation c=[a ""]; will not result in a copying. Also it is possible to do c=[a+0 ""]; which will result in a copying due to +0 but that one infers type conversions to and from double which is 8 times larger in size. (char(zeros( doesn't seem to cause that)
Other types
In general you can use casting for the types allowed by it in order to not tailor the expressions manually as I had to do above:
typelist={"double","single","char"}; %full list of supported types is available in the link
class_of_a = typelist{ isa(a,typelist) };
c=typecast( [typecast(a,'single'); single(1)] (1:end-1), class_of_a);
Single is seemingly smallest datatype available in octave.
Note that logical is not supported by this method.
Copying structures
Apparently you'd have to write your own function to go around struct fields, copy them with above methods and recursively go to substructs.
(As it doesn't involve complexities relevant here, I'd rather leave that to be done by those who actually needs that, my own problem being solved by +0's.)

Related

What does "cleanup" in NEXT_INST_F and NEXT_INST_V mean?

I am plowing TCL source code and get confused at macro NEXT_INST_F and NEXT_INST_V in tclExecute.c. Specifically the cleanup parameter of the macro.
Initially I thought cleanup means the net number of slots consumed/popped from the stack, e.g. when 3 objects are popped out and 1 object pushed in, cleanup is 2.
But I see INST_LOAD_STK has cleanup set to 1, shouldn't it be zero since one object is popped out and 1 object is pushed in?
I am lost reading the code of NEXT_INST_F and NEXT_INST_V, there are too many jumps.
Hope you can clarify the semantic of cleanup for me.
The NEXT_INST_F and NEXT_INST_V macros (in the implementation of Tcl's bytecode engine) clean up the state of the operand stack and push the result of the operation before going to the next instruction. The only practical difference between the two is that one is designed to be highly efficient when the number of stack locations to be cleaned up is a constant number (from a small range: 0, 1 and 2 — this is the overwhelming majority of cases), and the other is less efficient but can handle a variable number of locations to clean up or a number outside the small range. So NEXT_INST_F is basically an optimised version of NEXT_INST_V.
The place where macros are declared in tclExecute.c has this to say about them:
/*
* The new macro for ending an instruction; note that a reasonable C-optimiser
* will resolve all branches at compile time. (result) is always a constant;
* the macro NEXT_INST_F handles constant (nCleanup), NEXT_INST_V is resolved
* at runtime for variable (nCleanup).
*
* ARGUMENTS:
* pcAdjustment: how much to increment pc
* nCleanup: how many objects to remove from the stack
* resultHandling: 0 indicates no object should be pushed on the stack;
* otherwise, push objResultPtr. If (result < 0), objResultPtr already
* has the correct reference count.
*
* We use the new compile-time assertions to check that nCleanup is constant
* and within range.
*/
However, instructions can also directly manipulate the stack. This complicates things quite a lot. Most don't, but that's not the same as all. If you were to view this particular load of code as one enormous pile of special cases, you'd not be very wrong.
INST_LOAD_STK (a.k.a loadStk if you're reading disassembly of some Tcl code) is an operation that will pop an unparsed variable name from the stack and push the value read from the variable with that name. (Or an error will be thrown.) It is totally expected to pop one value and push another (from objResultPtr) since we are popping (and decrementing the reference count) of the variable name value, and pushing and incrementing the reference count of a different value that was read from the variable.
The code to read and write variables is among the most twisty in the bytecode engine. Far more goto than is good for your health.

TFPCustomHashTable constructor use 196613 constant. Why use this particular value?

Following code is part of contnrs unit of FreePascal
constructor TFPCustomHashTable.Create;
begin
CreateWith(196613,#RSHash);
end;
I am curious about 196613. I know it is hash table size. Is there any particular reason why this value is used?
In my test, constructor took about 3-4 ms to execute, which in my particular situation is not accepted. I suspect this is related to this constant value.
Update:
My question are:
Why 196613 is chosen? Why not other prime number?
Does it affect constructor call execution time?
196613 is one of the numbers being recommended for hash tables sizes (prime and far away from powers of two), for more info see e.g. https://planetmath.org/goodhashtableprimes.
It affects the constructor call execution time, yes. You can always construct TFPCustomHashTable using CreateWith and pass a size of your choice (any number is fine, as resizing algorithm checks for suitable sizes anyways) as well as your own hash function (or the pre-defined RSHash):
MyHashTable:=TFPCustomHashTable.CreateWith(193,#RSHash);
Keep in mind though, that resizing a hash table is an expensive operation as it requires to recalculate the hash function for all elements, so starting with a too small value is neither a good idea.

Constructor/Function overload signature lookup time complexity?

I was reading up on the std::string class in C++ and noticed there are quite a few different constructors available giving us a wide set of initialization features. This got me wondering how a compiler picks which constructor to choose when given parameters, or in the case of overloads, how a compiler matches a function signature with a given set of parameters.
If we have the following functions declared in pseudo-code:
function f1(int numberHere) {
//....do something
}
function f1(int numberHere, string stringHere) {
//....do something
}
And I decide to call f1(4), there are obviously two options to choose from, but what if there are 10000 options/signatures? Would it take proportionally longer? If so, what takes longer? Does the compiler have some sneaky O(n) way to index overloads such that it can call the right one in O(1) time once the program is running or would it compile in O(1) no matter how many overloads exist but take longer to run the finished result because of on-the-fly signature matching?
Can this question even be answered effectively?
Thanks!
Matching function signatures is actually not different from any other search or lookup problem. There are three basic ways to do it depending on the data structure you are storing the available function signatures in:
Use an unsorted list or array and get O(n) time complexity.
Use a sorted array or a tree-like structure and get O(log(n)). (You can sort by type of 1st argument, then 2nd and so on, assuming that each type has an integer id assigned to it.)
Use a hash map and get O(1).
But I doubt that time complxity has any practical relevance in this case. It describes the asymptotic behaviour of algorithms for large values of n. Even for n=100, an unsorted array search might be faster than hash map lookup because it has less overhead.
And from a usability point of view it is a very bad idea to design an API having functions with 10 or even 100 overloads.

How do interpreters load their values?

I mean, interpreters work on a list of instructions, which seem to be composed more or less by sequences of bytes, usually stored as integers. Opcodes are retrieved from these integers, by doing bit-wise operations, for use in a big switch statement where all operations are located.
My specific question is: How do the object values get stored/retrieved?
For example, let's (non-realistically) assume:
Our instructions are unsigned 32 bit integers.
We've reserved the first 4 bits of the integer for opcodes.
If I wanted to store data in the same integer as my opcode, I'm limited to a 24 bit integer. If I wanted to store it in the next instruction, I'm limited to a 32 bit value.
Values like Strings require lots more storage than this. How do most interpreters get away with this in an efficient manner?
I'm going to start by assuming that you're interested primarily (if not exclusively) in a byte-code interpreter or something similar (since your question seems to assume that). An interpreter that works directly from source code (in raw or tokenized form) is a fair amount different.
For a typical byte-code interpreter, you basically design some idealized machine. Stack-based (or at least stack-oriented) designs are pretty common for this purpose, so let's assume that.
So, first let's consider the choice of 4 bits for op-codes. A lot here will depend on how many data formats we want to support, and whether we're including that in the 4 bits for the op code. Just for the sake of argument, let's assume that the basic data types supported by the virtual machine proper are 8-bit and 64-bit integers (which can also be used for addressing), and 32-bit and 64-bit floating point.
For integers we pretty much need to support at least: add, subtract, multiply, divide, and, or, xor, not, negate, compare, test, left/right shift/rotate (right shifts in both logical and arithmetic varieties), load, and store. Floating point will support the same arithmetic operations, but remove the logical/bitwise operations. We'll also need some branch/jump operations (unconditional jump, jump if zero, jump if not zero, etc.) For a stack machine, we probably also want at least a few stack oriented instructions (push, pop, dupe, possibly rotate, etc.)
That gives us a two-bit field for the data type, and at least 5 (quite possibly 6) bits for the op-code field. Instead of conditional jumps being special instructions, we might want to have just one jump instruction, and a few bits to specify conditional execution that can be applied to any instruction. We also pretty much need to specify at least a few addressing modes:
Optional: small immediate (N bits of data in the instruction itself)
large immediate (data in the 64-bit word following the instruction)
implied (operand(s) on top of stack)
Absolute (address specified in 64 bits following instruction)
relative (offset specified in or following instruction)
I've done my best to keep everything about as minimal as is at all reasonable here -- you might well want more to improve efficiency.
Anyway, in a model like this, an object's value is just some locations in memory. Likewise, a string is just some sequence of 8-bit integers in memory. Nearly all manipulation of objects/strings is done via the stack. For example, let's assume you had some classes A and B defined like:
class A {
int x;
int y;
};
class B {
int a;
int b;
};
...and some code like:
A a {1, 2};
B b {3, 4};
a.x += b.a;
The initialization would mean values in the executable file loaded into the memory locations assigned to a and b. The addition could then produce code something like this:
push immediate a.x // put &a.x on top of stack
dupe // copy address to next lower stack position
load // load value from a.x
push immediate b.a // put &b.a on top of stack
load // load value from b.a
add // add two values
store // store back to a.x using address placed on stack with `dupe`
Assuming one byte for each instruction proper, we end up around 23 bytes for the sequence as a whole, 16 bytes of which are addresses. If we use 32-bit addressing instead of 64-bit, we can reduce that by 8 bytes (i.e., a total of 15 bytes).
The most obvious thing to keep in mind is that the virtual machine implemented by a typical byte-code interpreter (or similar) isn't all that different from a "real" machine implemented in hardware. You might add some instructions that are important to the model you're trying to implement (e.g., the JVM includes instructions to directly support its security model), or you might leave out a few if you only want to support languages that don't include them (e.g., I suppose you could leave out a few like xor if you really wanted to). You also need to decide what sort of virtual machine you're going to support. What I've portrayed above is stack-oriented, but you can certainly do a register-oriented machine if you prefer.
Either way, most of object access, string storage, etc., comes down to them being locations in memory. The machine will retrieve data from those locations into the stack/registers, manipulate as appropriate, and store back to the locations of the destination object(s).
Bytecode interpreters that I'm familiar with do this using constant tables. When the compiler is generating bytecode for a chunk of source, it is also generating a little constant table that rides along with that bytecode. (For example, if the bytecode gets stuffed into some kind of "function" object, the constant table will go in there too.)
Any time the compiler encounters a literal like a string or a number, it creates an actual runtime object for the value that the interpreter can work with. It adds that to the constant table and gets the index where the value was added. Then it emits something like a LOAD_CONSTANT instruction that has an argument whose value is the index in the constant table.
Here's an example:
static void string(Compiler* compiler, int allowAssignment)
{
// Define a constant for the literal.
int constant = addConstant(compiler, wrenNewString(compiler->parser->vm,
compiler->parser->currentString, compiler->parser->currentStringLength));
// Compile the code to load the constant.
emit(compiler, CODE_CONSTANT);
emit(compiler, constant);
}
At runtime, to implement a LOAD_CONSTANT instruction, you just decode the argument, and pull the object out of the constant table.
Here's an example:
CASE_CODE(CONSTANT):
PUSH(frame->fn->constants[READ_ARG()]);
DISPATCH();
For things like small numbers and frequently used values like true and null, you may devote dedicated instructions to them, but that's just an optimization.

C# Data structure Algorithm

I recently gave a interview to one of the TOP software company. I was completely stuck with only one question asked by interviewer to me, which was
Q. I have a machine with 512 mb / 1 GB RAM and I have to sort a file (XML, or any) of 4 GB size. How will I proceed? What will be the data structure, and which sorting algorithm will I use and how?
Do you think it is achievable? If yes then can you please explain?
Thanks in advance!
The answer the interviewer might want maybe how you manage to efficiently sort the data set which exceeds system memory.The following section is taken from Wikipedia:
Memory usage patterns and index
sorting
When the size of the array to be
sorted approaches or exceeds the
available primary memory, so that
(much slower) disk or swap space must
be employed, the memory usage pattern
of a sorting algorithm becomes
important, and an algorithm that might
have been fairly efficient when the
array fit easily in RAM may become
impractical. In this scenario, the
total number of comparisons becomes
(relatively) less important, and the
number of times sections of memory
must be copied or swapped to and from
the disk can dominate the performance
characteristics of an algorithm. Thus,
the number of passes and the
localization of comparisons can be
more important than the raw number of
comparisons, since comparisons of
nearby elements to one another happen
at system bus speed (or, with caching,
even at CPU speed), which, compared to
disk speed, is virtually
instantaneous.
For example, the popular recursive
quicksort algorithm provides quite
reasonable performance with adequate
RAM, but due to the recursive way that
it copies portions of the array it
becomes much less practical when the
array does not fit in RAM, because it
may cause a number of slow copy or
move operations to and from disk. In
that scenario, another algorithm may
be preferable even if it requires more
total comparisons.
One way to work around this problem,
which works well when complex records
(such as in a relational database) are
being sorted by a relatively small key
field, is to create an index into the
array and then sort the index, rather
than the entire array. (A sorted
version of the entire array can then
be produced with one pass, reading
from the index, but often even that is
unnecessary, as having the sorted
index is adequate.) Because the index
is much smaller than the entire array,
it may fit easily in memory where the
entire array would not, effectively
eliminating the disk-swapping problem.
This procedure is sometimes called
"tag sort".[5]
Another technique for overcoming the
memory-size problem is to combine two
algorithms in a way that takes
advantages of the strength of each to
improve overall performance. For
instance, the array might be
subdivided into chunks of a size that
will fit easily in RAM (say, a few
thousand elements), the chunks sorted
using an efficient algorithm (such as
quicksort or heapsort), and the
results merged as per mergesort. This
is less efficient than just doing
mergesort in the first place, but it
requires less physical RAM (to be
practical) than a full quicksort on
the whole array.
Techniques can also be combined. For
sorting very large sets of data that
vastly exceed system memory, even the
index may need to be sorted using an
algorithm or combination of algorithms
designed to perform reasonably with
virtual memory, i.e., to reduce the
amount of swapping required.
Use Divide and Conquer.
Here's the pseudocode:
function sortFile(file)
if fileTooBigForMemory(file)
pair<firstHalfOfFile, secondHalfOfFile> = breakIntoTwoHalves()
sortFile(firstHalfOfFile)
sortFile(secondHalfOfFile)
else
sortCharactersInFile(file)
endif
MergeTwoHalvesInOrder(firstHalfOfFile, secondHalfOfFile)
end
Two well-known algorithms that fall in to the divide and conquer category are merge sort and quick sort algorithm. So you could use them for implementation.
As for the data structure, a char array containing characters in the file could do. If you want to be more object oriented, wrap it in a class called File:
class File {
private char[] characters;
//methods to access and mutate 'characters'
}
There is a nice post on the Guido van Rossum blog which has something to suggest. Beware that the code is in Python.
Split your file to chunks which fit into memory.
Sort each chunk using quick sort and save it to a separate file.
Then merge result files and you get your result.
I would use a multiway merge. There is an excellent book called Managing Gigabytes that shows several different ways of doing it. They also go into sort based inversion for files that are larger than physical memory. Look around page 240 for a pretty detailed algorithm on sorting through chunks on disk.
The post above is correct in that you split the file and sort each portion.
Say you have the 4GB file and only want to load a max of 512MB. That means you need to split the file into 8 chunks minimum. If you are not sure how much extra overhead your sort is going to use, you might even double that number to be safe to 16 chunks.
The 16 files are then sorted one at a time to be in a guaranteed order. So now you have chunk 0-15 as sorted files.
Now you open 16 file handles to those files and read one entry at a time, writing the lowest one to the final output. Since you know each of the files is already sorted, taking the lowest from each means you are then writing them in the correct order to the final output.
I have used such a system in C# for sorting large collections of spam words from emails. The original system required all of them to load into RAM in order to sort them and build a dictionary for spam counts. Once the file grew over 2 GB the in memory structures were requiring 6+GB of RAM and taking over 24 hours to sort due to paging and VM. The new system using the chunking above sorted the entire file in under 40 minutes. That was an impressive speedup for such a simple change.
I played with various load options (1/4 system memory per chunk, etc). It turned out that for our situation the best option was about 1/10 system memory. Then Windows had enough memory left over for decent File I/O buffering to offset the increased file traffic. And the machine was left very responsive to other processes running on it.
And yes, I do frequently like to ask these types of questions in interviews as well. Just to see if people can think outside the box. What do you do when you can't just use .Sort() on a list?
Just simulate a virtual memory, overload the array index operator, []
Find a quicksort implementation that sorts an array in C++ or C#. overload the indexer operator [] which will read from and save to file. That way, you can just plug existing sort algorithms, you just change what happens behind the scenes on those []
here's one example of simulating virtual memory on C#
source: http://msdn.microsoft.com/en-us/library/aa288465(VS.71).aspx
// indexer.cs
// arguments: indexer.txt
using System;
using System.IO;
// Class to provide access to a large file
// as if it were a byte array.
public class FileByteArray
{
Stream stream; // Holds the underlying stream
// used to access the file.
// Create a new FileByteArray encapsulating a particular file.
public FileByteArray(string fileName)
{
stream = new FileStream(fileName, FileMode.Open);
}
// Close the stream. This should be the last thing done
// when you are finished.
public void Close()
{
stream.Close();
stream = null;
}
// Indexer to provide read/write access to the file.
public byte this[long index] // long is a 64-bit integer
{
// Read one byte at offset index and return it.
get
{
byte[] buffer = new byte[1];
stream.Seek(index, SeekOrigin.Begin);
stream.Read(buffer, 0, 1);
return buffer[0];
}
// Write one byte at offset index and return it.
set
{
byte[] buffer = new byte[1] {value};
stream.Seek(index, SeekOrigin.Begin);
stream.Write(buffer, 0, 1);
}
}
// Get the total length of the file.
public long Length
{
get
{
return stream.Seek(0, SeekOrigin.End);
}
}
}
// Demonstrate the FileByteArray class.
// Reverses the bytes in a file.
public class Reverse
{
public static void Main(String[] args)
{
// Check for arguments.
if (args.Length == 0)
{
Console.WriteLine("indexer <filename>");
return;
}
FileByteArray file = new FileByteArray(args[0]);
long len = file.Length;
// Swap bytes in the file to reverse it.
for (long i = 0; i < len / 2; ++i)
{
byte t;
// Note that indexing the "file" variable invokes the
// indexer on the FileByteStream class, which reads
// and writes the bytes in the file.
t = file[i];
file[i] = file[len - i - 1];
file[len - i - 1] = t;
}
file.Close();
}
}
Use the above code to roll your own array class. Then just use any array sorting algorithms.