Number type and bitwise operations - actionscript-3

I want to pack epoch milliseconds into 6 bytes but i have problem. Let me introduce it:
trace(t);
for (var i:int = 6; i > 0; i--) {
dataBuffer.writeByte(((t >>> 8*(i-1)) & 255));
trace(dataBuffer[dataBuffer.length - 1]);
}
Output:
1330454496254
131
254
197
68
131
254
What i'm doing wrong?

I'm just guessing but I think your t variable is getting automatically converted to an int before the bit operation takes effect. This, of course, destroys the value.
I don't think it's possible to use Number in bit operations - AS3 only supports those with int-s.
Depending on how you acquire the value in t, you may want to start with 2 int-s and then extrat the bytes from those.

The Number type is an IEEE 754 64-bit double-precision number, which is quite a different format to your normal int. The bits aren't lined up quite the same way. What you're looking for is a ByteArray representation of a normal 64-bit int type, which of course doesn't exist in ActionScript 3.
Here's a function that converts a Number object into its "int64" equivalent:
private function numberToInt64Bytes(n:Number):ByteArray
{
// Write your IEEE 754 64-bit double-precision number to a byte array.
var b:ByteArray = new ByteArray();
b.writeDouble(n);
// Get the exponent.
var e:int = ((b[0] & 0x7F) << 4) | (b[1] >> 4);
// Significant bits.
var s:int = e - 1023;
// Number of bits to shift towards the right.
var x:int = (52 - s) % 8;
// Read and write positions in the byte array.
var r:int = 8 - int((52 - s) / 8);
var w:int = 8;
// Clear the first two bytes of the sign bit and the exponent.
b[0] &= 0x80;
b[1] &= 0xF;
// Add the "hidden" fraction bit.
b[1] |= 0x10;
// Shift everything.
while (w > 1) {
if (--r > 0) {
if (w < 8)
b[w] |= b[r] << (8 - x);
b[--w] = b[r] >> x;
} else {
b[--w] = 0;
}
}
// Now you've got your 64-bit signed two's complement integer.
return b;
}
Note that it works only with integers within a certain range, and it doesn't handle values like "not a number" and infinity. It probably also fails in other cases.
Here's a usage example:
var n:Number = 1330454496254;
var bytes:ByteArray = numberToInt64Bytes(n);
trace("bytes:",
bytes[0].toString(16),
bytes[1].toString(16),
bytes[2].toString(16),
bytes[3].toString(16),
bytes[4].toString(16),
bytes[5].toString(16),
bytes[6].toString(16),
bytes[7].toString(16)
);
Output:
bytes: 0 0 1 35 c5 44 83 fe
It should be useful for serializing data in AS3 later to be read by a Java program.
Homework assignment: Write int64BytesToNumber()

Related

Converting uint32_t to binary in C

The main problem I'm having is to read out values in binary in C. Python and C# had some really quick/easy functions to do this, I found topic about how to do it in C++, I found topic about how to convert int to binary in C, but not how to convert uint32_t to binary in C.
What I am trying to do is to read bit by bit the 32 bits of the DR_REG_RNG_BASE address of an ESP32 (this is the address where the random values of the Random Hardware Generator of the ESP are stored).
So for the moment I was doing that:
#define DR_REG_RNG_BASE 0x3ff75144
void printBitByBit( ){
// READ_PERI_REG is the ESP32 function to read DR_REG_RNG_BASE
uint32_t rndval = READ_PERI_REG(DR_REG_RNG_BASE);
int i;
for (i = 1; i <= 32; i++){
int mask = 1 << i;
int masked_n = rndval & mask;
int thebit = masked_n >> i;
Serial.printf("%i", thebit);
}
Serial.println("\n");
}
At first I thought it was working well. But in fact it takes me out of binary representations that are totally false. Any ideas?
Your shown code has a number of errors/issues.
First, bit positions for a uint32_t (32-bit unsigned integer) are zero-based – so, they run from 0 thru 31, not from 1 thru 32, as your code assumes. Thus, in your code, you are (effectively) ignoring the lowest bit (bit #0); further, when you do the 1 << i on the last loop (when i == 32), your mask will (most likely) have a value of zero (although that shift is, technically, undefined behaviour for a signed integer, as your code uses), so you'll also drop the highest bit.
Second, your code prints (from left-to-right) the lowest bit first, but you want (presumably) to print the highest bit first, as is normal. So, you should run the loop with the i index starting at 31 and decrement it to zero.
Also, your code mixes and mingles unsigned and signed integer types. This sort of thing is best avoided – so it's better to use uint32_t for the intermediate values used in the loop.
Lastly (as mentioned by Eric in the comments), there is a far simpler way to extract "bit n" from an unsigned integer: just use value >> n & 1.
I don't have access to an Arduino platform but, to demonstrate the points made in the above discussion, here is a standard, console-mode C++ program that compares the output of your code to versions with the aforementioned corrections applied:
#include <iostream>
#include <cstdint>
#include <inttypes.h>
int main()
{
uint32_t test = 0x84FF0048uL;
int i;
// Your code ...
for (i = 1; i <= 32; i++) {
int mask = 1 << i;
int masked_n = test & mask;
int thebit = masked_n >> i;
printf("%i", thebit);
}
printf("\n");
// Corrected limits/order/types ...
for (i = 31; i >= 0; --i) {
uint32_t mask = (uint32_t)(1) << i;
uint32_t masked_n = test & mask;
uint32_t thebit = masked_n >> i;
printf("%"PRIu32, thebit);
}
printf("\n");
// Better ...
for (i = 31; i >= 0; --i) {
printf("%"PRIu32, test >> i & 1);
}
printf("\n");
return 0;
}
The three lines of output (first one wrong, as you know; last two correct) are:
001001000000000111111110010000-10
10000100111111110000000001001000
10000100111111110000000001001000
Notes:
(1) On the use of the funny-looking "%"PRu32 format specifier for printing the uint32_t types, see: printf format specifiers for uint32_t and size_t.
(2) The cast on the (uint32_t)(1) constant will ensure that the bit-shift is safe, even when int and unsigned are 16-bit types; without that, you would get undefined behaviour in such a case.
When you printing out a binary string representation of a number, you print the Most Signification Bit (MSB) first, whether the number is a uint32_t or uint16_t, so you will need to have a mask for detecting whether the MSB is a 1 or 0, so you need a mask of 0x80000000, and shift-down on each iteration.
#define DR_REG_RNG_BASE 0x3ff75144
void printBitByBit( ){
// READ_PERI_REG is the ESP32 function to read DR_REG_RNG_BASE
uint32_t rndval = READ_PERI_REG(DR_REG_RNG_BASE);
Serial.println(rndval, HEX); //print out the value in hex for verification purpose
uint32_t mask = 0x80000000;
for (int i=1; i<32; i++) {
Serial.println((rndval & mask) ? "1" : "0");
mask = (uint32_t) mask >> 1;
}
Serial.println("\n");
}
For Arduino, there are actually a couple of built-in functions that can print out the binary string representation of a number. Serial.print(x, BIN) allows you to specify the number base on the 2nd function argument.
Another function that can achieve the same result is itoa(x, str, base) which is not part of standard ANSI C or C++, but available in Arduino to allow you to convert the number x to a str with number base specified.
char str[33];
itoa(rndval, str, 2);
Serial.println(str);
However, both functions does not pad with leading zero, see the result here:
36E68B6D // rndval in HEX
00110110111001101000101101101101 // print by our function
110110111001101000101101101101 // print by Serial.print(rndval, BIN)
110110111001101000101101101101 // print by itoa(rndval, str, 2)
BTW, Arduino is c++, so don't use c tag for your post. I changed it for you.

Arrays better for serialization than Vectors

If I serialize the exact analogous of an array and a vector into ByteArrays, I get almost twice the size for the vector. Check this code:
var arr:Array = [];
var vec:Vector.<int> = new Vector.<int>;
for (var i:int = 0; i < 100; ++i) {
arr.push(i);
vec.push(i);
}
var b:ByteArray;
b = new ByteArray();
b.writeObject(arr);
trace("arr",b.length); // arr 204
b = new ByteArray();
b.writeObject(vec);
trace("vec",b.length); // vec 404
Seems like a buggy or unoptimized implementation on Adobe's behalf..? Or am I missing something here?
According to AMF 3 specification AMF stores arrays as a separate, optimized for dense arrays manner, and also stores ints in an optimized manner to minimize the number of bytes used for small ints. This way, your array of values 0-99 gets stored as:
00000000 0x09 ; array marker
00000001 0x81 0x49 ; array length in U29A format = 100
00000003 0x01 ; UTF8-empty - this array is all dense
00000004 to 000000CB - integer data, first integer marker 0x04, then an integer in U29.
All your ints are less than 128, so they are represented as single byte equal to the integer. This makes a small int in an array take 2 bytes instead of full 4 or even 8 for large integers.
Now, for Vector, as Vector is dense by default, it's better to not convert an integer to U29 format, because anything bigger than 0x40000000 gets converted to DOUBLE (aka Number) and this is bad for vectors, so a Vector.<int> is stored as is, with 4 bytes per integer inside a vector.
00000000 0x0D ; vector-int marker
00000001 0x81 0x49 ; U29 length of the vector = 100
00000003 0x00 ; fixed vector marker, 0 is not fixed
00000004 to 00000193 - integers in U32
So, for small integers an array takes less space than a Vector of ints, but for large integers an array can take up to 9 bytes per int stored, while a Vector will always use 4 bytes per integer.
Consider the following alteration to your code:
var arr:Array = [];
var vec:Vector.<int> = new Vector.<int>;
for (var i:int = 0; i < 100; ++i) {
var x:int=Math.floor(Math.random()*4294967295); // 0 to 0xFFFFFFFE
arr.push(x);
vec.push(x);
trace(x);
}
var b:ByteArray;
b = new ByteArray();
b.writeObject(arr);
trace("arr",b.length); // some variable value will be shown, up to 724
b = new ByteArray();
b.writeObject(vec);
trace("vec",b.length); // here the value will always be 404

Why does uint break my for loop?

This is not really a problem as the fix is simple and pretty costless. I'm guessing it's some property of for or uint that I don't understand and I just would like to know what is going on so...
Using ActionScript 3 I set up a for loop to run backwards through the elements of a Vector.
var limit:uint = myVector.length-1;
for(var a:uint = limit; a >= 0; a--)
{
trace(a);
}
When I run this it outputs 2, 1, 0 as expected but then moves on to 4294967295 and begins counting down from there until the loop times out and throws an Error #1502.
The fix is simply to type a as int rather than uint but I don't get why. Surely I am dealing with values of 0 and greater so uint is the correct data type right?
I guess that 4294967295 is the max value for uint but how does my count get there?
If you do
var myUint:uint = 0;
trace(myUint - 1);
Then the output is -1 so why, in my loop should I suddenly jump back up to 4294967295?
Sorry for the slightly rambling question and cheers for any help.
You are close. As you said, your loop gives you 2, 1, 0, 4294967295. This is because uint can't be negative. Your loop will always run while a >= 0 and since it is never -1 to break the loop condition, it continues to loop forever.
var myUint:uint = 0;
trace(myUint - 1);
I didn't test this but what is probably happening is that myUint is being converted to an int and then having 1 subtracted. The following code should be able to confirm this.
var myUint:uint = 0;
trace((myUint - 1) is uint);
trace((myUint - 1) is int);
To fix your loop you could use int or you could use a for each(var x:Type in myVector) loop if you don't need the index (a).
an iteration and a a-- will occur when a is 0, thus as your int is not signed, the max value for this type will be set -> this is the way minus values are set in unsigned types.
Change your code into:
var limit:uint = myVector.length-1;
for(var a:uint = limit; a > 0; a--)
{
trace(a);
}
In binary - first BIT of number is minus, that is in int values. In UINT value that is just a BIT of value.
One byte int looks like 1000 0000 == -127 and 0111 1111 = 127
One byte uint looks like 1000 0000 == 128 and 0111 1111 = 128
This is it.

AS3 ByteArray readShort

i have to read a sequence of bytes,that was written in different ways (writeBite, writeShort and writeMultiByte) and display them has list of HEX byte on video.
My problem is convert the number 1500, i tryed other number and the results was correct...
here is a an example:
var bytes:Array = [];
var ba:ByteArray = new ByteArray();
ba.writeShort(1500);
ba.position = 0;
for (var i=0; i<ba.length; i++)
{
bytes.push(ba.readByte().toString(16));
}
trace(bytes);//5,-24 i'm expetting 5,DC
The method readByte reads a signed byte (ranges from -128 to 127). The most significant bit defines the sign. In case of numbers greater than 127 (like DC) that bit will be 1 and the number will be seen as a negative number. The two's complement of the negative byte is used to get the signed value. In case of DC, which is 1101 1100 in binary the complement would be 0010 0011 which is 23. A one is added and the value will be regarded as negative, which will give you the -24 you are seeing.
You should use readUnsignedByte to read values from 0 to 255.
As there is no real Byte type in AS3, readByte() returns an int. You can try this instead:
for (var i=0; i<ba.length; i++)
{
bytes.push(ba[i].toString(16));
}

OCR: weighted Levenshtein distance

I'm trying to create an optical character recognition system with the dictionary.
In fact I don't have an implemented dictionary yet=)
I've heard that there are simple metrics based on Levenstein distance which take in account different distance between different symbols. E.g. 'N' and 'H' are very close to each other and d("THEATRE", "TNEATRE") should be less than d("THEATRE", "TOEATRE") which is impossible using basic Levenstein distance.
Could you help me locating such metric, please.
This might be what you are looking for: http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance (and kindly some working code is included in the link)
Update:
http://nlp.stanford.edu/IR-book/html/htmledition/edit-distance-1.html
Here is an example (C#) where weight of "replace character" operation depends on distance between character codes:
static double WeightedLevenshtein(string b1, string b2) {
b1 = b1.ToUpper();
b2 = b2.ToUpper();
double[,] matrix = new double[b1.Length + 1, b2.Length + 1];
for (int i = 1; i <= b1.Length; i++) {
matrix[i, 0] = i;
}
for (int i = 1; i <= b2.Length; i++) {
matrix[0, i] = i;
}
for (int i = 1; i <= b1.Length; i++) {
for (int j = 1; j <= b2.Length; j++) {
double distance_replace = matrix[(i - 1), (j - 1)];
if (b1[i - 1] != b2[j - 1]) {
// Cost of replace
distance_replace += Math.Abs((float)(b1[i - 1]) - b2[j - 1]) / ('Z'-'A');
}
// Cost of remove = 1
double distance_remove = matrix[(i - 1), j] + 1;
// Cost of add = 1
double distance_add = matrix[i, (j - 1)] + 1;
matrix[i, j] = Math.Min(distance_replace,
Math.Min(distance_add, distance_remove));
}
}
return matrix[b1.Length, b2.Length] ;
}
You see how it works here: http://ideone.com/RblFK
A few years too late but the following python package (with which I am NOT affiliated) allows for arbitrary weighting of all the Levenshtein edit operations and ASCII character mappings etc.
https://github.com/infoscout/weighted-levenshtein
pip install weighted-levenshtein
Also this one (also not affiliated):
https://github.com/luozhouyang/python-string-similarity
I've recently created a python package that does exactly that https://github.com/zas97/ocr_weighted_levenshtein.
In my Weigthed-Levenshtein implementation the distance between "THEATRE" and "TNEATRE" is 1.3 while the distance between "THEATRE" and "TOEATRE" is 1.42.
Other exemples are the d("O", "0") is 0.06 and d("e","c") is 0.57.
This distances have been calculated by running multiple ocrs in a synthetic dataset and doing statistics on the most common ocr errors. I hope it helps someone =)