Big unicode problems - AS3 - actionscript-3

I made a program where people can type in 4 letters and it will give you the corresponding unicode character that it inserts in a textflow element. Now i had a lot of problems with this, but in the end i succeeded with some help. Now the problem came when i typed "dddd" or "ddd1" as a test.
I got the error
- "An unpaired Unicode surrogate was encountered in the input."
Now i spend like 2 days testing for that, and there was absolutly no event triggering that made it possible for me to test for the error before it occurred.
The code:
str = "dddd"
num = parseInt(str,16)
res = String.fromCharCode(num)
Acutally when the error occurres res is equal to "?" in the console ... but if you test for it with if(res == "?") it returns false.
MY QUESTION:
Now i searched and searched and found abolutly no description on this error in adobes as3 reference, but after 2 days i found this page for javascript: http://scripts.sil.org/cms/scripts/page.php?item_id=IWS-Chapter04a
It says that
- The code units in the range 0xD800–0xDFFF, serve a special purpose, however. These code units, known as surrogate code units
So now i test with:
if( num > 0 && num < uint(0xD800)) || ( num > uint(0xDFFF) && num < uint(0xFFFF) ){
get unicode character.
}
my question is simply if i understood this correctly, that this will actually prevent the error from occurring? - I'm no unicode specialist and don't know really how to test for it, since there are ten's of thousands characters so i might have missed one and that would mean that the users by accident could get the error and risk crashing the application.

You are correct. A code point ("high surrogate") between 0xD800-0xDBFF must be paired with a code point ("low surrogate") between 0xDC00-0xDFFF. Those are reserved for use in UTF-16[1] - when needing to address the higher planes that don't fit in 16 bits - and hence those code points can't appear on their own. For example:
0xD802 DC01 corresponds to (I'll leave out the 0x hex markers):
10000 + (high - D800) * 0400 + (low - DC00)
10000 + (D802 - D800) * 0400 + (DC01 - DC00)
= 10000 + 0002 * 0400 + 0001
= 10801 expressed as UTF-16
... just adding that bit of into in case you later need to support it.
I haven't tested the AS3 functionality for the following, but you may want to also test the input below - you won't get the surrogate error for these, but might get another error message:
0xFFFE and 0xFFFF (when using higher planes, also any code point "ending" with those bits, e.g. 0x1FFFE and 0x1FFFF; 0x2FFFE and 0x2FFFF etc.) Those are "non-characters".
The same goes for 0xFDD0-0xFEDF - also "non-characters".
AS3 actually uses UTF-16 to store its strings, but even if it didn't, the surrogate code points would still have no meaning outside pairs - the code points are reserved and can't be used in other Unicode encodings either (e.g. UTF-8 or UTF-32)

Related

What's the proper use of output property in Octave?

I am not sure what is the use of output while using fminunc.
>>options = optimset('GradObj','on','MaxIter','1');
>>initialTheta=zeros(2,1);
>>[optTheta, functionVal, exitFlag, output, grad, hessian]=
fminunc(#CostFunc,initialTheta,options);
>> output
output =
scalar structure containing the fields:
iterations = 11
successful = 10
funcCount = 21
Even when I use max no of iteration = 1 still it is giving no of iteration = 11??
Could anyone please explain me why is this happening?
help me with grad and hessian properties too, means the use of those.
Given we don't have the full code, I think the easiest thing for you to do to understand exactly what is happening is to just set a breakpoint in fminunc.m itself, and follow the logic of the code. This is one of the nice things about working with Octave, since the source code is provided and you can check it freely (there's often useful information in octave source code in fact, such as references to papers which they relied on for the implementation, etc).
From a quick look, it doesn't seem like fminunc expects a maxiter of 1. Have a look at line 211:
211 while (niter < maxiter && nfev < maxfev && ! info)
Since niter is initialised just before (at line 176) with the value of 1, in theory this loop will never be entered if your maxiter is 1, which defeats the whole point of the optimization.
There are other interesting things happening in there too, e.g. the inner while loop starting at line 272:
272 while (! suc && niter <= maxiter && nfev < maxfev && ! info)
This uses "shortcut evaluation", to first check if the previous iteration was "unsuccessful", before checking if the number of iterations are less than "maxiter".
In other words, if the previous iteration was successful, you don't get to run the inner loop at all, and you never get to increment niter.
What flags an iteration as "successful" seems to be defined by the ratio of "actual vs predicted reduction", as per the following (non-consecutive) lines:
286 actred = (fval - fval1) / (abs (fval1) + abs (fval));
...
295 prered = -t/(abs (fval) + abs (fval + t));
296 ratio = actred / prered;
...
321 if (ratio >= 1e-4)
322 ## Successful iteration.
...
326 nsuciter += 1;
...
328 endif
329
330 niter += 1;
In other words, it seems like fminunc will respect your maxiters ignoring whether these have been "successful" or "unsuccessful", with the exception that it does not like to "end" the algorithm at a "successful" turn (since the success condition needs to be fulfilled first before the maxiters condition is checked).
Obviously this is an academic point, since you shouldn't even be entering this inner loop when you couldn't even make it past the outer loop in the first place.
I cannot really know exactly what is going on without knowing your specific code, but you should be able to follow easily if you run your code with a breakpoint at fminunc. The maths behind that implementation may be complex, but the code itself seems fairly simple and straightforward enough to follow.
Good luck!

AS3 parseInt() method limitations

I have this issue in converting a HEX string to Number in as3
I have a value
str = "24421bff100317"; decimal value = 10205787172373271
but when I parseInt it I get
parseInt(str, 16) = 10205787172373272
Can anyone please tell me what am I doing wrong here
Looks like adding one ("24421bff100318") works fine. I have to assume that means this is a case of precision error.
Because there are only a finite amount of numbers that can be represented with the memory available, there will be times that the computer is estimating. This is common when working with decimals and very large numbers. It's visible, for example, in this snippet where apparently the computer can't add basic decimals:
for(var i=0;i<3;i+=0.2){
trace(i);
}
There are a few workarounds if accuracy at this level is critical, namely using datatypes that store more information ("long" instead of "int" in Java - I believe "Number" might work in AS3 but I have not tested it for your scenario) or if that fails, breaking the numbers down into smaller parts and adding them together.
For further reading to understand this topic (since I do think it's fascinating), look up "precision errors" and "data types".

Actionscript regex false negative

The list of words is very long, I cannot paste the actual code that bugs out here.
The regex whitelist has approx 4500 words in it seprated by a |
Both the regex, whitelist and whitelist2 includes the word hello but the test for each returns different results and I have no idea why after testing the same with javascript which gives correct results.
Here is the actionscript for testing.
The line for whitelist might not be visible entirely, try copying pasting the code from the link below in your text/code editor.
http://wonderfl.net/c/jTmb/
Edit1: problem I'm facing is that sometimes the words are not an exact match.
Example saturdays need to match saturday.
Its why I was using regex.
About the string length.
I tried to check the length of the string and its being reported correctly.
http://wonderfl.net/c/a9yp/
Edit2:
Test showing it works in javascript
http://tinyurl.com/m74hmdj
Actual answer...
This question led me into finding some interesting AS3 limitations for the first time...
Your regex fails at the length it has by the word "metabrushite". As far as I can tell from various tests, this is where it hits the longest supported length of a regex in AS3: 31391 characters. Any regex longer than that seems to always return false on a call to test(). Note that "hello" appears in the list before "metabrushite", so it's not a matter of truncation - the regex simply silently fails to work at all - e.g. a regex that should always return true for all words, still returns false if it's that long.
The limit seems a rather arbitrary number, so it's hard to tell exactly what makes this limit.
Again, you should really not be using regex for a task like this, but if you feel you have to, you'll need to split it up into several regex'es, each of which don't exceed the maximum length.
Side note:
Another interesting thing, which I haven't examined more closely, is that creating the RegExp from a single-statement concatenated string, i.e.:
trace("You'll never see this traced if too many words are added below.");
var s:String = "firstword|" +
"secondword|" +
... +
"lastword";
... will fail for even shorter resulting strings. This seems to be due to a max length imposed on the length of a single statement, and has nothing to do with regex. It doesn't freeze; it doesn't output an error or even the first trace. The script is simply silently excluded from the swf and hence never executed.
I'm thinking #tsiki is right about the max length of an AS3 regex.
This is really a comment, but since I'd like to include a bit of code, I'm putting it as an answer:
Since you're not using the regex for anything other than a list of words separated by |, consider using an array instead. Another advantage of this approach is that it will be quite a bit faster.
// This is just a way of reusing your list,
// rather than manually transforming it to an array:
var whitelist:Array = "abasement|abastardize|abastardize|..."
.split("|");
// Simply use .toLowerCase() on the input string to make it case insensitive,
// assuming all your whitelist words are lower case.
trace(whitelist.indexOf("hello") >= 0);
ETA: Performance
Here are some performance comparisons.
_array is pre-initialized to a lower case array of strings, split by |.
_regex is pre-initialized to your regex.
_search is pre-initialized to a given word to search for.
I'm using your words up to (and including) words starting with L - to get around the max regex length limitation:
The code for each test:
regex.test:
_regex.test(_search);
array.indexOf:
_array.indexOf(_search.toLowerCase()) >= 0;
loop over array:
for (var j:int = 0; j < _array.length; j++)
{
if (_array[j] == _search)
{
break;
}
}
Update: loop, indexOf (check if search string is substring of item in whitelist):
for (var j:int = 0; j < _array.length; j++)
{
if (_search.indexOf(array[j]) !== -1)
{
break;
}
}
The AS3 compiler doesn't do any unfair optimization of this simple code (such as skipping executions due to not using the result - it's not all that clever).
10 runs, 1000 iterations each, FP 11.4.402.278 - release version:
Method Search for Avg. Min Max Iter.
---------------------------------------------------------------------------
array.indexOf "abasement" 0.0 ms 0 ms 0 ms 0 ms
regex.test "abasement" 18.4 ms 14 ms 22 ms 0.0184 ms
loop over array "abasement" 0.0 ms 0 ms 0 ms 0 ms
loop, indexOf "abasement"    0.0 ms       0 ms      0 ms           0 ms
array.indexOf "hello" 31.1 ms 25 ms 42 ms 0.0311 ms
regex.test "hello" 326.8 ms 309 ms 347 ms 0.3268 ms
loop over array "hello" 59.4 ms 50 ms 69 ms 0.0594 ms
loop, indexOf   "hello"    97.4 ms      92 ms    105 ms     0.0974 ms
Avg. = average time for the 1000 iterations in each run
Min = Minimum time for the 1000 iterations in each run
Max = Maximum time for the 1000 iterations in each run
Iter. = Calculated time for a single iteration on average
It's quite clear that looping over the array and comparing each value is faster than using a regex. You could do a fair bit of comparison before it would catch up to the time the regex comparison spends. And in any event, we're dealing with fractions of milliseconds for a single lookup - it's really premature optimization, unless you're doing hundreds of lookups in a short period of time. If we were talking optimization, a Vector.<String> might speed up things slightly more, compared to Array.
The main point of this whole thing is that, except for relatively complex scenarios, a regex is unlikely to be more efficient than a tailored parser/comparer/lookup - that goes for all languages. It's designed to be a general purpose tool, not to do things the smartest way in every case (or pretty much any case for that matter).

Function types declarations in Mathematica

I have bumped into this problem several times on the type of input data declarations mathematica understands for functions.
It Seems Mathematica understands the following types declarations:
_Integer,
_List,
_?MatrixQ,
_?VectorQ
However: _Real,_Complex declarations for instance cause the function sometimes not to compute. Any idea why?
What's the general rule here?
When you do something like f[x_]:=Sin[x], what you are doing is defining a pattern replacement rule. If you instead say f[x_smth]:=5 (if you try both, do Clear[f] before the second example), you are really saying "wherever you see f[x], check if the head of x is smth and, if it is, replace by 5". Try, for instance,
Clear[f]
f[x_smth]:=5
f[5]
f[smth[5]]
So, to answer your question, the rule is that in f[x_hd]:=1;, hd can be anything and is matched to the head of x.
One can also have more complicated definitions, such as f[x_] := Sin[x] /; x > 12, which will match if x>12 (of course this can be made arbitrarily complicated).
Edit: I forgot about the Real part. You can certainly define Clear[f];f[x_Real]=Sin[x] and it works for eg f[12.]. But you have to keep in mind that, while Head[12.] is Real, Head[12] is Integer, so that your definition won't match.
Just a quick note since no one else has mentioned it. You can pattern match for multiple Heads - and this is quicker than using the conditional matching of ? or /;.
f[x:(_Integer|_Real)] := True (* function definition goes here *)
For simple functions acting on Real or Integer arguments, it runs in about 75% of the time as the similar definition
g[x_] /; Element[x, Reals] := True (* function definition goes here *)
(which as WReach pointed out, runs in 75% of the time
as g[x_?(Element[#, Reals]&)] := True).
The advantage of the latter form is that it works with Symbolic constants such as Pi - although if you want a purely numeric function, this can be fixed in the former form with the use of N.
The most likely problem is the input your using to test the the functions. For instance,
f[x_Complex]:= Conjugate[x]
f[x + I y]
f[3 + I 4]
returns
f[x + I y]
3 - I 4
The reason the second one works while the first one doesn't is revealed when looking at their FullForms
x + I y // FullForm == Plus[x, Times[ Complex[0,1], y]]
3 + I 4 // FullForm == Complex[3,4]
Internally, Mathematica transforms 3 + I 4 into a Complex object because each of the terms is numeric, but x + I y does not get the same treatment as x and y are Symbols. Similarly, if we define
g[x_Real] := -x
and using them
g[ 5 ] == g[ 5 ]
g[ 5. ] == -5.
The key here is that 5 is an Integer which is not recognized as a subset of Real, but by adding the decimal point it becomes Real.
As acl pointed out, the pattern _Something means match to anything with Head === Something, and both the _Real and _Complex cases are very restrictive in what is given those Heads.

Rot13 for numbers

EDIT: Now a Major Motion Blog Post at http://messymatters.com/sealedbids
The idea of rot13 is to obscure text, for example to prevent spoilers. It's not meant to be cryptographically secure but to simply make sure that only people who are sure they want to read it will read it.
I'd like to do something similar for numbers, for an application involving sealed bids. Roughly I want to send someone my number and trust them to pick their own number, uninfluenced by mine, but then they should be able to reveal mine (purely client-side) when they're ready. They should not require further input from me or any third party.
(Added: Note the assumption that the recipient is being trusted not to cheat.)
It's not as simple as rot13 because certain numbers, like 1 and 2, will recur often enough that you might remember that, say, 34.2 is really 1.
Here's what I'm looking for specifically:
A function seal() that maps a real number to a real number (or a string). It should not be deterministic -- seal(7) should not map to the same thing every time. But the corresponding function unseal() should be deterministic -- unseal(seal(x)) should equal x for all x. I don't want seal or unseal to call any webservices or even get the system time (because I don't want to assume synchronized clocks). (Added: It's fine to assume that all bids will be less than some maximum, known to everyone, say a million.)
Sanity check:
> seal(7)
482.2382 # some random-seeming number or string.
> seal(7)
71.9217 # a completely different random-seeming number or string.
> unseal(seal(7))
7 # we always recover the original number by unsealing.
You can pack your number as a 4 byte float together with another random float into a double and send that. The client then just has to pick up the first four bytes. In python:
import struct, random
def seal(f):
return struct.unpack("d",struct.pack("ff", f, random.random() ))[0]
def unseal(f):
return struct.unpack("ff",struct.pack("d", f))[0]
>>> unseal( seal( 3))
3.0
>>> seal(3)
4.4533985422978706e-009
>>> seal(3)
9.0767582382536571e-010
Here's a solution inspired by Svante's answer.
M = 9999 # Upper bound on bid.
seal(x) = M * randInt(9,99) + x
unseal(x) = x % M
Sanity check:
> seal(7)
716017
> seal(7)
518497
> unseal(seal(7))
7
This needs tweaking to allow negative bids though:
M = 9999 # Numbers between -M/2 and M/2 can be sealed.
seal(x) = M * randInt(9,99) + x
unseal(x) =
m = x % M;
if m > M/2 return m - M else return m
A nice thing about this solution is how trivial it is for the recipient to decode -- just mod by 9999 (and if that's 5000 or more then it was a negative bid so subtract another 9999). It's also nice that the obscured bid will be at most 6 digits long. (This is plenty security for what I have in mind -- if the bids can possibly exceed $5k then I'd use a more secure method. Though of course the max bid in this method can be set as high as you want.)
Instructions for Lay Folk
Pick a number between 9 and 99 and multiply it by 9999, then add your bid.
This will yield a 5 or 6-digit number that encodes your bid.
To unseal it, divide by 9999, subtract the part to the left of the decimal point, then multiply by 9999.
(This is known to children and mathematicians as "finding the remainder when dividing by 9999" or "mod'ing by 9999", respectively.)
This works for nonnegative bids less than 9999 (if that's not enough, use 99999 or as many digits as you want).
If you want to allow negative bids, then the magic 9999 number needs to be twice the biggest possible bid.
And when decoding, if the result is greater than half of 9999, ie, 5000 or more, then subtract 9999 to get the actual (negative) bid.
Again, note that this is on the honor system: there's nothing technically preventing you from unsealing the other person's number as soon as you see it.
If you're relying on honesty of the user and only dealing with integer bids, a simple XOR operation with a random number should be all you need, an example in C#:
static Random rng = new Random();
static string EncodeBid(int bid)
{
int i = rng.Next();
return String.Format("{0}:{1}", i, bid ^ i);
}
static int DecodeBid(string encodedBid)
{
string[] d = encodedBid.Split(":".ToCharArray());
return Convert.ToInt32(d[0]) ^ Convert.ToInt32(d[1]);
}
Use:
int bid = 500;
string encodedBid = EncodeBid(bid); // encodedBid is something like 54017514:4017054 and will be different each time
int decodedBid = DecodeBid(encodedBid); // decodedBid is 500
Converting the decode process to a client side construct should be simple enough.
Is there a maximum bid? If so, you could do this:
Let max-bid be the maximum bid and a-bid the bid you want to encode. Multiply max-bid by a rather large random number (if you want to use base64 encoding in the last step, max-rand should be (2^24/max-bid)-1, and min-rand perhaps half of that), then add a-bid. Encode this, e.g. through base64.
The recipient then just has to decode and find the remainder modulo max-bid.
What you want to do (a Commitment scheme) is impossible to do client-side-only. The best you could do is encrypt with a shared key.
If the client doesn't need your cooperation to reveal the number, they can just modify the program to reveal the number. You might as well have just sent it and not displayed it.
To do it properly, you could send a secure hash of your bid + a random salt. That commits you to your bid. The other client can commit to their bid in the same way. Then you each share your bid and salt.
[edit] Since you trust the other client:
Sender:
Let M be your message
K = random 4-byte key
C1 = M xor hash(K) //hash optional: hides patterns in M xor K
//(you can repeat or truncate hash(K) as necessary to cover the message)
//(could also xor with output of a PRNG instead)
C2 = K append M //they need to know K to reveal the message
send C2 //(convert bytes to hex representation if needed)
Receiver:
receive C2
K = C2[:4]
C1 = C2[4:]
M = C1 xor hash(K)
Are you aware that you need a larger 'sealed' set of numbers than your original, if you want that to work?
So you need to restrict your real numbers somehow, or store extra info that you don't show.
One simple way is to write a message like:
"my bid is: $14.23: aduigfurjwjnfdjfugfojdjkdskdfdhfddfuiodrnfnghfifyis"
All that junk is randomly-generated, and different every time.
Send the other person the SHA256 hash of the message. Have them send you the hash of their bid. Then, once you both have the hashes, send the full message, and confirm that their bid corresponds to the hash they gave you.
This gives rather stronger guarantees than you need - it's actually not possible from them to work out your bid before you send them your full message. However, there is no unseal() function as you describe.
This simple scheme has various weaknesses that a full zero-knowledge scheme would not have. For example, if they fake you out by sending you a random number instead of a hash, then they can work out your bid without revealing their own. But you didn't ask for bullet-proof. This prevents both accidental and (I think) undetectable cheating, and uses only a commonly-available command line utility, plus a random number generator (dice will do).
If, as you say, you want them to be able to recover your bid without any further input from you, and you are willing to trust them only to do it after posting their bid, then just encrypt using any old symmetric cipher (gpg --symmetric, perhaps) and the key, "rot13". This will prevent accidental cheating, but allow undetectable cheating.
One idea that poped into my mind was to maybe base your algorithm on the mathematics
used for secure key sharing.
If you want to give two persons, Bob and Alice, half a key each so
that only when combining them they will be able to open whatever the key locks, how do you do that? The solution to this comes from mathematics. Say you have two points A (-2,2) and B (2,0) in a x/y coordinate system.
|
A +
|
C
|
---+---+---+---|---+---B---+---+---+---
|
+
|
+
If you draw a straight line between them it will cross the y axis at exactly one single point, C (0,1).
If you only know one of the points A or B it is impossible to tell where it will cross.
Thus you can let the points A and B be the shared keys which when combined will reveal the y-value
of the crossing point (i.e. 1 in this example) and this value is then typically used as
a real key for something.
For your bidding application you could let seal() and unseal() swap the y-value between the C and B points
(deterministic) but have the A point vary from time to time.
This way seal(y-value of point B) will give completely different results depending on point A,
but unseal(seal(y-value of point B)) should return the y-value of B which is what you ask for.
PS
It is not required to have A and B on different sides of the y-axis, but is much simpler conceptually to think of it this way (and I recommend implementing it that way as well).
With this straight line you can then share keys between several persons so that only two of
them are needed to unlock whatever. It is possible to use curve types other then straight lines to create other
key sharing properties (i.e. 3 out of 3 keys are required etc).
Pseudo code:
encode:
value = 2000
key = random(0..255); // our key is only 2 bytes
// 'sealing it'
value = value XOR 2000;
// add key
sealed = (value << 16) | key
decode:
key = sealed & 0xFF
unsealed = key XOR (sealed >> 16)
Would that work?
Since it seems that you are assuming that the other person doesn't want to know your bid until after they've placed their own, and can be trusted not to cheat, you could try a variable rotation scheme:
from random import randint
def seal(input):
r = randint(0, 50)
obfuscate = [str(r)] + [ str(ord(c) + r) for c in '%s' % input ]
return ':'.join(obfuscate)
def unseal(input):
tmp = input.split(':')
r = int(tmp.pop(0))
deobfuscate = [ chr(int(c) - r) for c in tmp ]
return ''.join(deobfuscate)
# I suppose you would put your bid in here, for 100 dollars
tmp = seal('$100.00') # --> '1:37:50:49:49:47:49:49' (output varies)
print unseal(tmp) # --> '$100.00'
At some point (I think we may have already passed it) this becomes silly, and because it is so easy, you should just use simple encryption, where the message recipient always knows the key - the person's username, perhaps.
If the bids are fairly large numbers, how about a bitwise XOR with some predetermined random-ish number? XORing again will then retrieve the original value.
You can change the number as often as you like, as long as both client and server know it.
You could set a different base (like 16, 17, 18, etc.) and keep track of which base you've "sealed" the bid with...
Of course, this presumes large numbers (> the base you're using, at least). If they were decimal, you could drop the point (for example, 27.04 becomes 2704, which you then translate to base 29...)
You'd probably want to use base 17 to 36 (only because some people might recognize hex and be able to translate it in their head...)
This way, you would have numbers like G4 or Z3 or KW (depending on the numbers you're sealing)...
Here's a cheap way to piggyback off rot13:
Assume we have a function gibberish() that generates something like "fdjk alqef lwwqisvz" and a function words(x) that converts a number x to words, eg, words(42) returns "forty two" (no hyphens).
Then define
seal(x) = rot13(gibberish() + words(x) + gibberish())
and
unseal(x) = rot13(x)
Of course the output of unseal is not an actual number and is only useful to a human, but that might be ok.
You could make it a little more sophisticated with words-to-number function that would also just throw away all the gibberish words (defined as anything that's not one of the number words -- there are less than a hundred of those, I think).
Sanity check:
> seal(7)
fhrlls hqufw huqfha frira afsb ht ahuqw ajaijzji
> seal(7)
qbua adfshua hqgya ubiwi ahp wqwia qhu frira wge
> unseal(seal(7))
sueyyf udhsj seven ahkua snsfo ug nuhdj nwnvwmwv
I know this is silly but it's a way to do it "by hand" if all you have is rot13 available.