Actionscript, improving xor performance? - actionscript-3

I am using the below code to process some large files.
var joinedBytes:ByteArray = new ByteArray;
joinedBytes.length = _chunkSize;
for (var i:Number = 0; i < _chunkSize; i++) {
joinedBytes.writeByte(_xorBytesBuffer[i]^_rndBytesBuffer[i]);
}
Its taking about 2.5 seconds to process 10mb of data on a desktop.
Is this normal performance?
Does any way exist to speed it up?
I think that some of the time is writing to the byte array.
EDIT:
_xorBytesBuffer and _rndBytesBuffer are both byte arrays.

I didn't test everything. I could be wrong somewhere below but...
ByteArray is faster
The [i] suggest you are using Vector/Array using another ByteArray for _xorBytesBuffer and _rndBytesBuffer should speed things up.
You want also operate on larger data i.e. writeUnsignedInt() instead of writeByte()
See also this question
uint is faster vs Number
And if you only have 10MB, you would like to use var i:uint instead of Number.
Another thing is you can replace i++ wit ++i though I did't really test if this has much impact - I only heard that it's faster.
Remove additional steps.
You could even try something like:
for (var i:uint = 0; i < _chunkSize;) {
joinedBytes.writeByte(_xorBytesBuffer[i]^_rndBytesBuffer[i++]);
}
Please let us know it _rndBytesBuffer[i++] makes any difference ;)
Wait, I just said to not use indicies but another ByteArrays... Well If you still want to try above still let us know how it performs ;)
Make sure your condition check is as simple as possible.
Make sure you have something like var _chunkSize:uint instead of
function _chunkSize(){return something;}

Related

Pros and Cons of i != n vs i < n in an int for loop

What are the pros and cons of using one or the other iteration functions ?
function (int n) {
for (int i = 1; i != n; ++i) { ... }
}
vs
function (int n) {
for (int i = 1; i < n; i++) { ... }
}
I think the main argument against the first version is that it is a much less common idiom.
Remembering that code is read more often than it is written, it does not make sense to use a less familiar form of for loop if there isn't a very clear advantage to doing so. All it achieves is distracting anyone working on the code in future.
So primarily for code maintenance reasons (by others as well as the original coder) I would favour the more common second format.
The version with < will work correctly if n is less than 1. The version with != will go into an infinite loop (well, probably not infinite, as integer variables wrap around in most languages).
Using < also generalizes better. E.g.
for (i = start; i < end; i += increment)
This will work even if end - start is not a multiple of increment.
The first one is quite dangerous and could cause an infinite loop.
If n is ever less than 1, the loop will never exit.
Also if something changes i inside the loop, so that it skips the value of n, then again the loop will never exit.
Edit: OK to be more precise when I say never exit, it will ultimately exit one way or another, but it won't be in the manner most sane developers expect. I can just imagine the look on the poor guy that debugs your code that calls the database 2 billion times.

Running AS3 Function Asynchronously

I'm having a bit of trouble making sense of some of the tutorials for this online, thus why I'm asking here. (Using ActionScript 3, Adobe AIR, and Flash Professional CS5.5)
I have a very heavy function in my AS3 document class that I need to run asynchronously, so it doesn't stop the code on the MovieClip itself (don't ask me why, it just needs to be that way.)
So, simply put, how do I run this document class function (StartNow) asynchronously? The code can be placed in the document class or on the movieclip, I don't care where. It seems to be a relatively simple and common practice, but all my research is digging up nothing.
Thanks!
If your target is Flash player 11.4, there are Worker objects which can be assigned such a heavy function. I had no FP11, and eventually made a procedural generator that lasted more than 300 seconds per iteration in total. I had to use a state-based approach, paired with an enter frame listener. In my c ase, the entire complex generation process was split into logical chunks that were small enough to be completed within a reasonable timespan, and had a variable tracking the current generation phase. So, when another frame called the generation function, it read last completed step from that variable, performed one extra step with its set of data, stored the new value and exited for the frame. This, as is, is not a pure asynchronous process, but a pseudo-multitasking approach, this can suit you if your function that makes your SWF lag is splittable.
In Flash there is no such thing as run a function asynchronously, you have to do it yourself, unless you want to use Workers (like Vesper said). Workers gives you a separate process. Otherwise, you have to break your calculation into parts. This is how you do it:
Imaging 'trace' is a very heavy operation. It's not, but just to illustrate. This simple for-loop is runned on a frame, and causes a lower framerate, since it's all calculated before the frame actually renders.
for(var i:int = 0; i < 1000; i ++)
{
trace(i); // heavy calculation here
}
So you have to break the calculation into parts, and break it to be able to run the calculation over time.
To do that, you have to create a function which just takes a part of the loop every time:
calculatePart(0, 1000, 20);
function calculatePart(startIndex:int, endIndex:int, amountPerRun:int)
{
for(var i:int = startIndex; (i < startIndex + amountPerRun) || (i < endIndex); i ++)
{
trace(i); // heavy calculation here
}
if (i < endIndex)
{
calculatePart(i, endIndex, amountPerRun);
}
}
This is actually the same function as the simple for-loop in the first code, it also outputs 1000 traces. It is prepared to run in parts, but this isn't async yet. We can now change the function easily, so the function operates over time.
I use the setTimeout for this. You can also use a ENTER_FRAME event-listener or the Timer class for this, but for the sake of this example I try to keep it clear.
calculatePart(0, 1000, 20, 100);
function calculatePart(startIndex:int, endIndex:int, amountPerRun:int, timeBeforeNextRun:Number)
{
for(var i:int = startIndex; (i < startIndex + amountPerRun) && (i < endIndex); i ++)
{
trace(i); // heavy calculation here
}
if (i < endIndex)
{
setTimeout(calculatePart, timeBeforeNextRun, i, endIndex, amountPerRun, timeBeforeNextRun);
}
}
As you can see, I've added a timeBeforeNextRun parameter. If you run the example, you can see it takes 100 milliseconds before 20 traces are outputted.
If you set it very low, the calculation will be tried to be done very fast, however you cannot gain extra speed just by trying to do more in less time of course. You have to play with the time and amount variables, you can test which one actually gives better performance (or less lag).
// more time between a run, less calculations per run
calculatePart(0, 1000, 30, 10);
// more calculations per run, more time between a run
calculatePart(0, 1000, 100, 30);
Hope this helps.
If you want to use a smarter calculation of the time, I found this utility class very helpful, which measures how much time the calculation actually took, and alter the time itself.

Can someone translate this C++ into AS3?

This code stores the sqrt() of the numbers from 0 to 4095 in a table, and I would like to translate it into Actionscript 3.
unsigned short int_sqrt_x1024[4096];
for (int i=0; i<sizeof(int_sqrt_x1024)/sizeof(int_sqrt_x1024[0]); i++)
int_sqrt_x1024[i] = (int)(sqrtf((float)i + 0.5f) * 1024.0f);
I've done it halfway, but the 'sizeof' parts got me, I havent got a clue what to do with those!
So based on your suggestions I've come up with this, what do you think???:
var int_sqrt_x1024:Vector.<uint> = new Vector.<uint>(4096,true)
for (var i:int = 0; i < int_sqrt_x1024.length; i++)
int_sqrt_x1024[i] = Math.sqrt( i + 0.5) * 1024;
You can find the definition of sizeof HERE. To the best of my knowledge, there is no analogous operator in AS3. I have never encountered anything like it in documentation, and searches reveal nothing.
In fact, the closest thing I can find to it is the completely unrelated ByteArray, which I can guarantee would not achieve the same end, as one is an advanced data type and the other is an operator. Their usages aren't even similar.
I am curious, what is the goal of this code? Perhaps there is another way to achieve the same end. (And apparently from reading comments, there is actually a better way.)
EDIT: See Basic's comment below...there may be something similar.
Sorry, I can't provide a translation since I don't know Actionscript, but I think this will help you out too:
The C sizeof-Operator returns the size in bytes of its argument. This is not something you need to concern yourself with in a "managed" language like Actionscript. What the C code you posted (I don't really see anything in it that would necessarily make it C++) does, is iterating through the loop (size_of_the_array_in_bytes / size_of_one_array_element_in_bytes) times. In your case, that complicated expression would simply evaluate to 4096.
In other worlds, make a loop that executes the store of the square root 4096 times.
The C-code you're using as a basis seems to be pretty poorly written. I can't seem to find a reason one would use such a complicated, verbose and unreadable way to fill a simple lookup table. IMO, it should be something like this:
#define LOOKUPTABLE_LENGTH 4096
unsigned short int_sqrt_x1024[LOOKUPTABLE_LENGTH];
for (int i=0; i<LOOKUPTABLE_LENGTH; i++)
int_sqrt_x1024[i] = (int)(sqrtf((float)i + 0.5f) * 1024.0f);
Much more readable, no?

Do you extract the call to determine the length of an array/string from the for-header?

I've recently noticed a coworker of mine doing
int len = foo.length();
for (int i = 0; i < len; ++i)
doStuff(foo[i]);
I'm aware that this was considered good practice in C, where strlen() ran in O(length_of_string). But I'd expect newer languages (say, Java or Python) to store the length of the String alongside the characters, thus allowing length() to run in O(1). I usually write:
for (int i = 0; i < foo.length(); ++i)
doStuff(foo[i]);
Saving a line of code. But my Co-Worker got me wondering.... is this really good practice, or is it unreasonable to expect the O(1) behaviour?
(As a related question: can't modern compilers extract the strlen() call from inside the for-header automatically these days?)
These statements are actually two different statements.
int len = foo.length(); // Will run once
for (int i = 0; i<len;++i)
Here i<len will be checked every loop, len is just a variable that can be read though.
for (int i = 0; i < foo.length(); ++i)
Here i < foo.length() contains a function call, and since the length of foo can change within the loop itself (You could e.g. strip characters off of foo instead of incrementing i) the function foo.length() will be called every iteration.
There are some languages in which foo might be a constant and foo.length() could be optimised out by the compiler, but it's better to be save than sorry.
Additionally some languages might allow something like this:
for (int i=0, len=foo.length();i<len;++i)
which still saves you the line.
First, call the function foo.length() each iteration of the loop in any case would require more resources than using a temporary variable to store result of the call foo.length().
Furthermore, the use of your code may cause errors when refactoring code. For example, this cycle will never end:
for (int i = 0; i < foo.length(); ++i)
{
doStuff(foo[i]);
// few line of code, written another man
doWork(foo); // Passing by reference
}
void doWork(Foo fooObj)
{
// some work
fooObj.Add(new SomeObject());
}
This isn't very language agnostic. It depends on how smart is the compiler. If you can explicitly state that the length is constant, the whole loop can be inlined, so no tests happen at all. When it comes to Java, I would bet compiler can get pretty smart, so you don't have to be explicit that much. You are right about java.lang.String precomputing its length. When it comes to complexity, it is practical to define what are the important operations you are counting. Strictly speaking on a Turing machine you have to be O(n) in order to find the end ("$") of the input.

for-loop mechanism efficiency tips

As I am using for-loops on large multi-dim arrays, any saving on the for-loop mechanism itself is meaningful.
Accordingly, I am looking for any tips on how to reduce this overhead.
e.g. : counting down using uint instead of int and != 0 as stop instead of >0 allows the CPU to do less work (heard it once, not sure it is always true)
One important suggestion: move as much calculation to the outer loop as possible. Not all compilers can do that automatically. For eample, instead of:
for row = 0 to 999
for col = 0 to 999
cell[row*1000+col] = row * 7 + col
use:
for row = 0 to 999
x = row * 1000
y = row * 7
for col = 0 to 999
cell[x+col] = y + col
Try to make your loops contiguous in memory, this will optimize cache usage. That is, don't do this:
for (int i = 0; i < m; i++)
for (j = 0; j < n; j++)
s += arr[j][i];
If processing images, convert two loops to one loop on the pixels with a single index.
Don't make loops that will run zero times, as the pipeline is optimized to assume a loop will continue rather than end.
Have you measured the overhead? Do you know how much time is spent processing the for loops vs. how much time is spent executing your application code? What is your goal?
Loop-unrolling can be one way. That is:
for (i=0; i<N; i++) {
a[i]=...;
}
transforms into:
for (i=0; i<N; i+=4) {
a[i]=...;
a[i+1]=...;
a[i+2]=...;
a[i+3]=...;
}
You will need special handling when N is not a multiple of 4 in the example above.
First, don't sweat the small stuff. Details like counting up versus counting down are usually completely irrelevant in running time. Humans are notoriously bad at spotting areas in code that need to be sped up. Use a profiler. Pay little or no attention to any part of the loop that is not repeated, unless the profiler says otherwise. Remember that what is written in an inner loop is not necessarily executed in an inner loop, as modern compilers are pretty smart about avoiding unnecessary repetition.
That being said, be very wary of unrolling loops on modern CPUs. The tighter they are, the better they will fit into cache. In a high-performance application I worked on last year, I improved performance significantly by using loops instead of straight-line code, and tightening them up as much as I could. (Yes, I profiled; the function in question took up 80% of the run time. I also benchmarked times over typical input, so I knew the changes helped.)
Moreover, there's no harm in developing habits that favor efficient code. In C++, you should get in the habit of using pre-increment (++i) rather than post-increment (i++) to increment loop variables. It usually doesn't matter, but can make a significant difference, it doesn't make code less readable or writable, and won't hurt.
This isn't a language agnostic question, it depends highly on not only language, but also compiler. Most compilers I believe will compile these two equivalently:
for (int i = 0; i < 10; i++) { /* ... */ }
int i = 0;
while (i < 10) {
// ...
i++;
}
In most languages/compilers, the for loop is just syntactic sugar for the later while loop. Foreach is another question again, and is highly dependant on language/compiler as to how it's implemented, but it's generally less efficient that a normal for/while loop. How much more so is again, language and compiler dependant.
Your best bet would probably be to run some benchmarks with several different variations on a theme and see what comes out on top.
Edit: To that end, the suggestions here will probably save you more time rather than worrying about the loop itself.
BTW, unless you need post-increment, you should always use the pre-increment operator. It is only a minor difference, but it is more efficient.
Internally this is the difference:
Post Increment
i++;
is the same as:
int postincrement( int &i )
{
int itmp = i;
i = i + 1;
return itmp;
}
Pre Increment
++i;
is the same as:
int preincrement( int &i )
{
i = i + 1;
return i;
}
I agree with #Greg. First thing you need to do is put some benchmarking in place. There will be little point optimising anything until you prove where all your processing time is being spent. "Premature optimisation is the root of all evil"!
As your loops will have O(n^d) complexity (d=dimension), what really counts is what you put INTO the loop, not the loop itself. Optimizing a few cycles away in the loop framework from millions of cycles of an inefficient algorithm inside the loop is just snake oil.
By the way, is it good to use short instead of int in for-loop if Int16 capacity is guaranteed to be enough?
There is not enough information to answer your question accurately. What are you doing inside your loops? Does the calculation in one iteration depend on a value calculated in a previous iteration. If not, you can almost cut your time in half by simply using 2 threads, assuming you have at least a dual core processor.
Another thing to look at is how you are accessing your data, if you are doing large array processing, to make sure that you access the data sequentially as it is stored in memory, avoiding flushing your L1/L2 cache on every iteration (seen this before on smaller L1 caches, the difference can be dramatic).
Again, I would look at what is inside the loop first, where most of the gains (>99%) will be, rather than the outer loop plumbing.
But then again, if your loop code is I/O bound, then any time spent on optimization is wasted.
I think most compilers would probably do this anyway, stepping down to zero should be more efficient, as a check for zero is very fast for the processor. Again though, any compiler worth it's weight would do this with most loops anyway. You need to loo at what the compiler is doing.
There is some relevant information among the answers to another stackoverflow question, how cache memory works. I found the paper by Ulrich Drepper referred to in this answer especially useful.