formatting numbers in different culture - windows-phone-8

I have below XAML
<TextBlock HorizontalAlignment="{Binding AyaHorizentalAlignment}" Padding="20,0,30,0" Text="{Binding Aya}" Foreground="Black" FontSize="50" TextWrapping="Wrap" />
and Code behind
System.Globalization.CultureInfo culture = new System.Globalization.CultureInfo("fa-IR");
for (int rowIndex = startingAya; rowIndex < totalAyas; rowIndex++)
{
quranTextByLine = Regex.Replace(reader.ReadLine(), #"[|\d|]", string.Empty) + "﴿" + string.Format(culture,"{0}",counter++) + "﴾";
quranTranslationByLine = Regex.Replace(translationReader.ReadLine(), #"[|\d|]", string.Empty);
_sura.Add(new Sura() { Aya = quranTextByLine, AyaTranslation = quranTranslationByLine });
}
I want to show "counter" in farsi / arabic format and the format is "۱۲۳۴۵۶۷۸۹", I don't know how to format "counter"? (I have also the fonts available)
Thanks,

The 'full' .NET Framework contains the property NumberFormatInfo.DigitSubstitution. If you were developing in .NET, this would be the property you would use if you wanted numbers to be formatted using Arabic-Indic digits. However, this property isn't available in the Windows Phone version of the framework.
If a property or method exists in the full .NET Framework but not in Windows Phone, this is normally an indication that the functionality isn't in the Windows Phone version of the framework. As a result you will have to do the conversion of numbers to strings using Arabic-Indic digits yourself.
It's possible to write a conversion method that converts a number to a string using Latin digits, and then replaces the Latin digits with Arabic-Indic digits, for example:
private const char LatinDigitZero = '0';
private const char ArabicIndicDigitZero = '\u0660';
public static string ConvertToArabicIndicString(int number)
{
string result = string.Format("{0:d}", number);
for (int digit = 0; digit <= 9; ++digit)
{
result = result.Replace((char)(LatinDigitZero + digit), (char)(ArabicIndicDigitZero + digit));
}
return result;
}
This could easily be adapted to take a CultureInfo object as an additional parameter.
This uses the fact that the Arabic-Indic digits corresponding to the Latin digits 0 to 9 are Unicode characters \u0660 to \u0669, respectively.
I gave this a quick test and as far as I could tell it worked. However, I'm not at all fluent in any language that uses Arabic-Indic digits so I can't say whether this method converts the number to a string correctly. In particular, I don't know whether this method generates a number with the digits in the right order or handles negative numbers correctly. Hopefully, any necessary such modifications should be easy for you to make.

Related

Can Tesseract OCR recognize subscripts and superscripts?

I have problems with the general recognition of subscript and superscript in text fragments.
Example-image:
I used Tesseract 4.1.1 with the training data available under https://github.com/tesseract-ocr/tessdata_best. The numerous options had default values except:
tessedit_create_hocr = 1 (to get result as HOCR)
hocr_font_info = 1 (to get additional font infos like font size)
hocr_char_boxes = 1 (to get character-based result)
The language was set to eng. Neither with page segmentation mode 3 (PSM_AUTO_OSD) nor 11 (PSM_SPARSE_TEXT) nor 12 (PSM_SPARSE_TEXT_OSD) the subscript/superscript was recognized correctly.
In the output the sub/sup-fragments were all more or less wrong:
"SubtextSub" is recognized as "Subtextsu,"
"SuptextSub" is recognized as "Suptexts?"
"P0" is recognized as "Po"
"P100" is recognized as "P1go"
"a2+b2" is recognized as "a+b?"
Using Tesseract for OCR is there a way to ...?
optimize subscript/superscript handling
get infos about recognized subscript/superscript (in the hocr-output - ideally for each character)
Working on the quality of the image as suggested in other questions/answers to this topic didn't really change anything.
Following these 2 links from the tesseract-google-newsgroup at first it really seemed to be a question of training:
link1 and link2.
But after doing some experiments I found out, that the used OEM_DEFAULT-OCR engine mode just doesn't bring up the needed information. I found a partial solution to the problem. Partial, because I now get most infos about sub/sup and also the recognized characters are right in most cases, but not for all characters.
Using the OEM_TESSERACT_ONLY-OCR engine mode (=the legacy mode) and some API methods provided by Tess4J I came up with the following java test class:
public class SubSupEvaluator {
public void determineSubSupCharacters(BufferedImage image) {
//1. initialize Tesseract and set image infos
TessBaseAPI handle = TessAPI1.TessBaseAPICreate();
try {
int bpp = image.getColorModel().getPixelSize();
int bytespp = bpp / 8;
int bytespl = (int) Math.ceil(image.getWidth() * bpp / 8.0);
TessBaseAPIInit2(handle, new File("./tessdata/").getAbsolutePath(), "eng", TessOcrEngineMode.OEM_TESSERACT_ONLY);
TessBaseAPISetPageSegMode(handle, TessPageSegMode.PSM_AUTO_OSD);
TessBaseAPISetImage(handle, ImageIOHelper.convertImageData(image), image.getWidth(), image.getHeight(), bytespp, bytespl);
//2. start actual OCR run
TessBaseAPIRecognize(handle, null);
//3. iterate over the result character-wise
TessResultIterator ri = TessBaseAPIGetIterator(handle);
TessPageIterator pi = TessResultIteratorGetPageIterator(ri);
TessPageIteratorBegin(pi);
do {
//determine character
Pointer ptr = TessResultIteratorGetUTF8Text(ri, TessPageIteratorLevel.RIL_SYMBOL);
String character = ptr.getString(0);
TessDeleteText(ptr); //release memory
//determine position information
IntBuffer leftB = IntBuffer.allocate(1);
IntBuffer topB = IntBuffer.allocate(1);
IntBuffer rightB = IntBuffer.allocate(1);
IntBuffer bottomB = IntBuffer.allocate(1);
TessPageIteratorBoundingBox(pi, TessPageIteratorLevel.RIL_SYMBOL, leftB, topB, rightB, bottomB);
//write info to console
System.out.println(String.format("%s - position [%d %d %d %d], subscript: %b, superscript: %b", character, leftB.get(), topB.get(),
rightB.get(), bottomB.get(), TessAPI1.TessResultIteratorSymbolIsSubscript(ri) == TessAPI1.TRUE,
TessAPI1.TessResultIteratorSymbolIsSuperscript(ri) == TessAPI1.TRUE));
} while (TessPageIteratorNext(pi, TessPageIteratorLevel.RIL_SYMBOL) == TessAPI1.TRUE);
} finally {
TessBaseAPIDelete(handle); //release memory
}
}
}
The legacy mode only works with 'normal' training data. Using the '-best' training data is bringing an error.
There is very little information on this topic.
One option to enhance sub/superscript character recognition (even if not the position itself) is by preprocessing the image, with cv2 / pil (also pillow) e.g., and then tesseract it.
See
How to detect subscript numbers in an image using OCR?
Related (but otherwise not answering the question):
https://www.mail-archive.com/tesseract-ocr#googlegroups.com/msg19434.html
https://github.com/tesseract-ocr/tesseract/blob/master/src/ccmain/superscript.cpp
what do you guys think about getting tesseract to recognize single letters?
Tesseract does not recognize single characters
I tried it with the option --psm 10
tesseract imTstg.png out5 --psm 10
but it did not seem to work. I am thinking about just running yolo to detect the single letters.

ActionScript3 - add thousands separator to negative values

This question relates to an animated map template which we have developed at the UKs Office for National Statistics. It has been applied to many datasets and geographies many uses without problem. For example,
http://www.ons.gov.uk/ons/interactive/vp3-census-map/index.html
http://www.statistica.md/pageview.php?l=ro&idc=390&id=3807
The .fla calls on a supporting .as file (see below) to introduce a thousand separator (in the UK a comma, in Germany a full stop (period) defined elsewhwere.
However, the dataset I am currently mapping has large negative values, and it tutrns out that the ORIGINAL HELPER FUNCTION below does not like negative values with 3, 6, 9 or 12 (etc) digits.
-100 to -999 for instance are rendered NaN,100 to NaN,999.
This is because such values are recognised as being 4 digits long. They are being split, the comma introduced, and the -ve sign is misunderstood.
I reckon the approach must be to use absolute values, add in the comma and then (for the negative values) add the -ve sign back in afterwards. But so far, trials of the ADAPTED HELPER FUNCTION have produced only error. :-(
Can anyone tell me how to put the -ve sign back in , please?
Many thanks.
Bruce Mitchell
==================================================================================
//ORIGINAL HELPER FUNCTION: ACCEPTS A NUMBER AND RETURNS A STRING WITH THOUSANDS SEPARATOR ATTACHED IF NECESSARY
function addThouSep(num) {
/*
a. Acquire the number - 'myTrendValue' or 'myDataValue' - from function calcValues
b. Record it (still as a number) to data precision.
1. Turn dataORtrend into a string
2. See if there is a decimal in it.
3. If there isn't, just run the normal addThouSep.
4. If there is, run addThouSep just on the first bit of the string - then add the decimal back on again at the end.
*/
var myNum:Number = correctFPE(num); // Create number variable myNum and populate it with 'num'
// (myTrendvalue or myData Value from calcValues function) passed thru 'correctPFE'
var strNum:String = myNum+""; // Create string version of the dataORtrend number - so instead of 63, you get '63'
var myArray = strNum.split("."); // Create array representing elements of strNum, split by decimal point.
//trace(myArray.length); // How long is the array?
if (myArray.length==1) { // Integer, no decimal.
if (strNum.length < 4)//999 doesn't need a comma.
return strNum;
return addThouSep(strNum.slice(0, -3))+xmlData.thouSep+strNum.slice(-3);
}
else { // Float, with decimal
if (myArray[0].length < 4)//999 doesn't need a comma
return strNum;
return (addThouSep(myArray[0].slice(0, -3))+xmlData.thouSep+myArray[0].slice(-3)+"."+myArray[1]);
}
}
==================================================================================
//ADAPTED HELPER FUNCTION: ACCEPTS A NUMBER AND RETURNS A STRING WITH THOUSANDS SEPARATOR ATTACHED IF NECESSARY
function addThouSep(num) {
/*
a. Acquire the number - 'myTrendValue' or 'myDataValue' - from function calcValues
b. Record it (still as a number) to data precision.
1. Turn dataORtrend into a string
2. See if there is a decimal in it.
3. If there isn't, just run the normal addThouSep.
4. If there is, run addThouSep just on the first bit of the string - then add the decimal back on again at the end.
*/
var myNum:Number = correctFPE(num); // Create number variable myNum and populate it with 'num'
// (myTrendvalue or myData Value from calcValues function) passed thru 'correctPFE'
var myAbsNum:Number = Math.abs(myNum); // ABSOLUTE value of myNum
var strNum:String = myAbsNum+""; // Create string version of the dataORtrend number - so instead of 63, you get '63'
var myArray = strNum.split("."); // Create array representing elements of strNum, split by decimal point.
//trace(myArray.length); // How long is the array?
if (myNum <0){ // negatives
if (myArray.length==1) { // Integer, no decimal.
if (strNum.length < 4)//999 doesn't need a comma.
return strNum;
return addThouSep(strNum.slice(0, -3))+xmlData.thouSep+strNum.slice(-3);
}
else { // Float, with decimal
if (myArray[0].length < 4)//999 doesn't need a comma
return strNum;
return (addThouSep(myArray[0].slice(0, -3))+xmlData.thouSep+myArray[0].slice(-3)+"."+myArray[1]);
}
}
else // positive
if (myArray.length==1) { // Integer, no decimal.
if (strNum.length < 4)//999 doesn't need a comma.
return strNum;
return addThouSep(strNum.slice(0, -3))+xmlData.thouSep+strNum.slice(-3);
}
else { // Float, with decimal
if (myArray[0].length < 4)//999 doesn't need a comma
return strNum;
return (addThouSep(myArray[0].slice(0, -3))+xmlData.thouSep+myArray[0].slice(-3)+"."+myArray[1]);
}
}
==================================================================================
If you're adding commas often (or need to support numbers with decimals) then you may want a highly optimized utility function and go with straightforward string manipulation:
public static function commaify( input:Number ):String
{
var split:Array = input.toString().split( '.' ),
front:String = split[0],
back:String = ( split.length > 1 ) ? "." + split[1] : null,
pos:int = input < 0 ? 2 : 1,
commas:int = Math.floor( (front.length - pos) / 3 ),
i:int = 1;
for ( ; i <= commas; i++ )
{
pos = front.length - (3 * i + i - 1);
front = front.slice( 0, pos ) + "," + front.slice( pos );
}
if ( back )
return front + back;
else
return front;
}
While less elegant it's stable and performant — you can find a comparison suite at my answer of a similar question https://stackoverflow.com/a/13410560/934195
Why not use something simple like this function I've made?
function numberFormat(input:Number):String
{
var base:String = input.toString();
base = base.split("").reverse().join("");
base = base.replace(/\d{3}(?=\d)/g, "$&,");
return base.split("").reverse().join("");
}
Tests:
trace( numberFormat(-100) ); // -100
trace( numberFormat(5000) ); // 5,000
trace( numberFormat(-85600) ); // -85,600
Explanation:
Convert the input number to a string.
Reverse it.
Use .replace() to find all occurrences of three numbers followed by another number. We use $&, as the replacement, which basically means take all of those occurences and replace it with the value we found, plus a comma.
Reverse the string again and return it.
Did you try using the built in Number formatting options that support localized number values:
Localized Formatting with NumberFormatter

Convert an signed int to 8 digit hex in flex

How can i convert a type int into 8 digit hex decimal in flex
I need a function similiar in c# [ ToString("X8") ]. This function does the job in c#.
But what is the option in flex ?
As described in the docs, it's pretty much the same:
var myInt:int = 255;
var hex:String = myInt.toString(16);
trace(hex); //outputs "ff"
See http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/int.html#toString()
If it's colors you're after: the docs describe how to handle that case too.
There is however no built-in way to add the leading zeros. You can use a method like this one to do that:
public function pad(s:String, pattern:String="0", minChars:int=8):String {
while (s.length < minChars) s = pattern + s;
return s;
}
trace(pad(hex)); //000000ff
Note: this is for 6 digit hex colors but could easily be modified to any number of hex digits.
Found a lot of ways of outputting padded hex values that relied heavily on string padding.
I wasn't really happy with any of those so this is what I came up with: (as a bonus it fits on one line) You could even shorten it by removing the toUpperCase() call as case is really irrelevant.
"0x"+ (i+0x1000000).toString(16).substr(1,6).toUpperCase()
If you want to floor or ceiling that to black and white and put that in a function:
public static function toHexColor(i:Number):String {
return i<0 ? "0x000000" : i>0xFFFFFF ? "0xFFFFFF" : "0x"+ (i+0x1000000).toString(16).substr(1,6).toUpperCase() ;
}
Here is a more expanded version with comments:
public static function toHexColor(i:Number):String {
//enforce ceiling and floor
if(i>0xFFFFFF){ return "0xFFFFFF";}
if(i<0){return "0x000000";}
//add the "magic" number
i += 0x1000000;
//append the 0x and strip the extra 1
return "0x"+ i.toString(16).substr(1,6).toUpperCase();
}

Is there functions in coldfusion to get just 2 lines of text from a string?

I know this works in other languages, but wanted to see if there is existing code/functions.
This string can be populated from numerous different queries, but they need to be all displayed the same way, same length etc.
I have a function, to control string length by word count, but I would prefer to make sure that I have at least 2 sentences or 2 lines of text at most.
Thanks
I had a similar task at my job and you have to pick an arbitrary number, and it looks like you've chosen 190. That being said, you can't just hope that the characters/words returned are relevant. You have to ensure that they are if its something you care about, which is seems like you do looking at your comments.
Try to find the keyword in the string and use the mid() function to get a certain number of characters on either side of the keyword:
<cfscript>
max_chars = 190;
full_article = #the full article#;
keyword_position = find(keyword, full_article);
if( keyword_position != 0 ) {
excerpt = mid(full_article,
keyword_position - max_chars / 2 - len(keyword_position) / 2,
max_chars);
}
</cfscript>
...or something like that. I'll leave it to you to make sure that you're not trying to get characters before the start of the full_article, or after the end of it, and adding ellipses and stuff.
Try something like fullLeft or dig through the other string manipulation UDFs at CFLib. If you're looking for something more specific could you show us a comparable function in another language and we'd be better able to point you to something similar.
_TestString = "I know this works in other languages, but wanted to see if there is existing code/functions. This string can be populated from numerous different queries, but they need to be";
if ( len(_TestString) GT 190)
{
_TestString = Left(_TestString,190) & "...";
}
That will output:
I know this works in other languages, but wanted to see if there is existing code/functions. This string can be populated from numerous different queries, but they need to be all displayed t...
You probably don't want to do anything more than that, string manipulation can get expensive for no reason, you shouldn't waste processing on the display layer unless you have to.
CFLIB has plenty of string manipulation functions on offer. You may find abbreviate() is useful, especially for search results: http://cflib.org/udf/abbreviate
<cfscript>
/**
* Abbreviates a given string to roughly the given length, stripping any tags, making sure the ending doesn't chop a word in two, and adding an ellipsis character at the end.
* Fix by Patrick McElhaney
* v3 by Ken Fricklas kenf#accessnet.net, takes care of too many spaces in text.
*
* #param string String to use. (Required)
* #param len Length to use. (Required)
* #return Returns a string.
* #author Gyrus (kenf#accessnet.netgyrus#norlonto.net)
* #version 3, September 6, 2005
*/
function abbreviate(string,len) {
var newString = REReplace(string, "<[^>]*>", " ", "ALL");
var lastSpace = 0;
newString = REReplace(newString, " \s*", " ", "ALL");
if lenn(newString) gt len) {
newString = left(newString, len-2);
lastSpace = find(" ", reverse(newString));
lastSpace = len(newString) - lastSpace;
newString = left(newString, lastSpace) & " &##8230;";
}
return newString;
}
</cfscript>

Unicode, VBScript and HTML

I have the following radio box:
<input type="radio" value="香">香</input>
As you can see, the value is unicode. It represents the following Chinese character: 香
So far so good.
I have a VBScript that reads the value of that particular radio button and saves it into a variable. When I display the content with a message box, the Chinese Character appears. Additionally I have a variable called uniVal where I assign the unicode of the Chinese character directly:
radioVal = < read value of radio button >
MsgBox radioVal ' yields chinese character
uniVal = "香"
MsgBox uniVal ' yields unicode representation
Is there a possibility to read the radio box value in such a way that the unicode string is preserved and NOT interpreted as the chinese character?
For sure, I could try to recreate the unicode of the character, but the methods I found in VBScript are not working correctly due to VBScripts implicit UTF-16 setting (instead of UTF-8). So the following method does not work correctly for all characters:
Function StringToUnicode(str)
result = ""
For x=1 To Len(str)
result = result & "&#"&ascw(Mid(str, x, 1))&";"
Next
StringToUnicode = result
End Function
Cheers
Chris
I got a solution:
JavaScript is in possession of a function that actually works:
function convert(value) {
var tstr = value;
var bstr = '';
for(i=0; i<tstr.length; i++) {
if(tstr.charCodeAt(i)>127)
{
bstr += '&#' + tstr.charCodeAt(i) + ';';
}
else
{
bstr += tstr.charAt(i);
}
}
return bstr;
}
I call this function from my VBScript... :)
Here is a VBScript function that will always return a positive value for the Unicode code point of a given character:-
Function PositiveUnicode(s)
Dim val : val = AscW(s)
If (val And &h8000) <> 0 Then
PositiveUnicode = (val And &h7FFF) + &h8000&
Else
PositiveUnicode = CLng(val)
End If
End Function
This will save you loading two script engines to acheive a simple operation.
"not working correctly due to VBScripts implicit UTF-16 setting (instead of UTF-8)."
This issue has nothing to do with UTF-8. It is purely the result of AscW use of the signed integer type.
As to why you have to recreate the &#xxxxx; encodings that you sent this is result of how HTML (and XML) work. The use of this character encoding entity is a convnience that the specification does not require to remain intact. Since the character encoding of the document is quite capable or representing that character the DOM is at liberty to convert it.