Actionscript problems with social share encoding - actionscript-3

I'm trying to make some "social share" buttons at my site, but the urls I generate just don't get decoded by this services.
One example, for twitter:
private function twitter(e:Event):void {
var message:String = "Message with special chars âõáà";
var url:String = "http://www.twitter.com/home?status=";
var link:URLRequest = new URLRequest( url + escape(message) );
}
But when twitter opens up, the message is:
Message with special chars
%E2%F5%E1%E0
Something similar is happening with Facebook and Orkut (but these two hide the special chars).
Someone know why is this happening?

The problem is that the escape() function doesn't take UTF-8 enconding into account. The function you want for encoding the querystring using UTF-8 is encodeURIComponent().
So, let's say you have an "ñ" (eñe in Spanish, or n plus tilde). I'm using "ñ", because I remember both its code point and its UTF-8 representation, since I always use it for debugging, but the same applies for other non-ASCII, non-alphanumeric number.
Say you have the string "Año" ("year" in Spanish, by the way).
The code points (both in Unicode and iso-8859-1) are:
A: 0x41
ñ: 0xf1
o: 0x6f
If you call escape(), you'll get this:
A: A
ñ: %F1
o: o
"A" and "o" don't need to be encoded. The "ñ" is encoded as "%" plus its code point, which is 0xf1.
But, twitter, facebook, etc, expect UTF-8. 0xf1 is not a valid UTF-8 sequence and should be represented with a 2 bytes sequence. Meaning, "ñ" should be encoded as:
0xC3
0xB1
This is what encodeURIComponent does. It will encode "año" this way:
A: A
ñ: %C3
%B1
o: o
So, to sum up, instead of this:
var link:URLRequest = new URLRequest( url + escape(message) );
try this
var link:URLRequest = new URLRequest( url + encodeURIComponent(message) );
And it should work fine.

Related

Write a CSV file in Shift-JIS (MFC VC++, Windows Embedded - WinCE)

As the title says, I have been trying to write data that the user enters into a CEdit control to a file.
The system is a handheld terminal running Windows CE, in which my test application is running, and I try to enter test data (Japanese characters in Romaji, Hiragana, Katakana and Kanji mixed along with normal English alphanumeric data) that initially is displayed in a CListCtrl. The characters display properly on the handheld display screen in my test application UI.
Finally, I try to read back the data from the List control and write it to a text CSV file. The data I get on reading back from the control is correct, but on writing it to the CSV, things mess up and my CSV file is unreadable and shows strange symbols and nonsense alphanumeric garbage.
I searched about this, and I ended up with a similar question on stackOverflow:
UTF-8, CString and CFile? (C++, MFC)
I tried some of their suggestions and finally ended up with a proper UTF-8 CSV file.
The write-to-csv-file code goes like this:
CStdioFile cCsvFile = CStdioFile();
cCsvFile.Open(cFileName, CFile::modeCreate|CFile::modeWrite);
char BOM[3]={0xEF, 0xBB, 0xBF}; // Utf-8 BOM
cCsvFile.Write(BOM,3); // Write the BOM first
for(int i = 0; i < M_cDataList.GetItemCount(); i++)
{
CString cDataStr = _T("\"") + M_cDataList.GetItemText(i, 0) + _T("\",");
cDataStr += _T("\"") + M_cDataList.GetItemText(i, 1) + _T("\",");
cDataStr += _T("\"") + M_cDataList.GetItemText(i, 2) + _T("\"\r\n");
CT2CA outputString(cDataStr, CP_UTF8);
cCsvFile.Write(outputString, ::strlen(outputString));
}
cCsvFile.Close();
So far it is OK.
Now, for my use case, I would like to change things a bit such that the CSV file is encoded as Shift-JIS, not UTF-8.
For Shift-JIS, what BOM do I use, and what changes should I make to the above code?
Thank you for any suggestions and help.
Codepage for Shift-JIS is apparently 932. Use WideCharToMultiByte and MultiByteToWideChar for conversion. For example:
CStringW source = L"日本語ABC平仮名ABCひらがなABC片仮名ABCカタカナABC漢字ABC①";
CStringA destination = CW2A(source, 932);
CStringW convertBack = CA2W(destination, 932);
//Testing:
ASSERT(source == convertBack);
AfxMessageBox(convertBack);
As far as I can tell there is no BOM for Shift-JIS. Perhaps you just want to work with UTF16. For example:
CStdioFile file;
file.Open(L"utf16.txt", CFile::modeCreate | CFile::modeWrite| CFile::typeUnicode);
BYTE bom[2] = { 0xFF, 0xFE };
file.Write(bom, 2);
CString str = L"日本語";
file.WriteString(str);
file.Close();
ps, according to this page there are some problems between codepage 932 and Shift-JIS, although I couldn't duplicate any errors.

Character encoding issue when using Google Apps Script to extract data from web page

I have written a script using Google Apps Script to extract text from a web page into Google Sheets. I only need this script to work with a specific web page, so it does not need to be versatile. The script works almost exactly as I want it to except that I have run into a character encoding problem. I am extracting both Hebrew and English text. The meta tag in the HTML has charset=Windows-1255. The English extracts perfectly, but the Hebrew displays as black diamonds containing a question mark.
I found this question that says to pass the data into a blob then use the getDataAsString method to convert to another encoding. I tried converting to different encodings and got different results. UTF-8 displays the black diamonds with question marks, UTF-16 displays Korean, ISO 8859-8 returns an error and says it's not a valid parameter, and the original Windows-1255 displays one Hebrew character but a bunch of other gibberish.
However, I am able to copy and paste the Hebrew text into Google Sheets manually and it displays correctly.
I have even tested passing Hebrew directly from Google Apps Script code like so:
function passHebrew() {
return "וַיְדַבֵּר";
}
This displays the Hebrew text properly on Google Sheets.
My code is as follows:
function parseText(book, chapter) {
//var bk = book;
//var ch = chapter;
var bk = '04'; //hard-coded for testing purposes
var ch = '01'; //hard-coded for testing purposes
var url = 'http://www.mechon-mamre.org/p/pt/pt' + bk + ch + '.htm';
var xml = UrlFetchApp.fetch(url).getContentText();
//I had to "fix" these xml errors for XmlService.parse(xml) below
//to function.
xml = xml.replace('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">', '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">');
xml = xml.replace('<LINK REL="stylesheet" HREF="p.css" TYPE="text/css">', '<LINK REL="stylesheet" HREF="p.css" TYPE="text/css"></LINK>');
xml = xml.replace('<meta http-equiv="Content-Type" content="text/html; charset=Windows-1255">', '<meta http-equiv="Content-Type" content="text/html; charset=Windows-1255"></meta>');
xml = xml.replace(/ALIGN=CENTER/gi, 'ALIGN="CENTER"');
xml = xml.replace(/<BR>/gi, '<BR></BR>');
xml = xml.replace(/class=h/gi, 'class="h"');
//This section is the specific route to the table in the page I want
var document = XmlService.parse(xml);
var body = document.getRootElement().getChildren("BODY");
var maintable = body[0].getChildren("TABLE");
var maintablechildren = maintable[0].getChildren();
//This creates a two-dimensional array so that I can store the Hebrew
//in the first column and the English in the second column
var array = new Array(maintablechildren.length);
for (var i = 0; i < maintablechildren.length; i++) {
array[i] = new Array(2);
}
//This is where the table gets parsed into the array
for (var i = 0; i < maintablechildren.length; i++) {
var verse = maintablechildren[i].getChildren();
//This is where the encoding problem occurs.
//I originally tried verse[0].getText() but it didn't work.
array[i][0] = Utilities.newBlob(verse[0].getText()).getDataAsString('UTF-8');
//This array receives the English text and works fine.
array[i][1] = verse[1].getText();
}
return array;
}
What am I overlooking, misunderstanding, or doing wrong? I don't have a very good understanding of how encoding works so I don't understand why converting it to UTF-8 isn't working.
Your problem occurs before the lines you've commented as an encoding problem: because the default encoding for UrlFetchApp is munging the unicode text from the start.
You should use the variation of the .getContentText() method that Returns the content of an HTTP response encoded as a string of the given charset. For your case:
var xml = UrlFetchApp.fetch(url).getContentText("Windows-1255");
That should be all you need to change, although the blob() work-around is no longer needed. (It's harmless, though.) Other comments:
The logical OR operator (||) is very helpful for setting default values. I've tweaked the first few lines to enable testing but still let the function operate normally with arguments.
The way you're setting up an empty array before populating it with strings is Bad JavaScript; it's complex code that isn't needed, so toss it. Instead, we'll declare the array Array, then push() rows onto it.
The .replace() functions can be reduced with more clever RegExp use; I've included the URLs for demos of the really tricky ones.
There were \n newline characters in the text which I guessed were unnecessary for your purposes, so added a replace() for them as well.
Here's what you're left with:
function parseText(book, chapter) {
var bk = book || '04'; //hard-coded for testing purposes
var ch = chapter || '01'; //hard-coded for testing purposes
var url = 'http://www.mechon-mamre.org/p/pt/pt' + bk + ch + '.htm';
var xml = UrlFetchApp.fetch(url).getContentText("Windows-1255");
//I had to "fix" these xml errors for XmlService.parse(xml) below
//to function.
xml = xml.replace(/(<!DOCTYPE.*EN")>/gi, '$1 "">')
.replace(/(<(LINK|meta).*>)/gi,'$1</$2>') // https://regex101.com/r/nH3pU8/1
.replace(/(<.*?=)([^"']*?)([ >])/gi,'$1"$2"$3') // https://regex101.com/r/eP7wO7/1
.replace(/<BR>/gi, '<BR/>')
.replace(/\n/g, '')
//This section is the specific route to the table in the page I want
var document = XmlService.parse(xml);
var body = document.getRootElement().getChildren("BODY");
var maintable = body[0].getChildren("TABLE");
var maintablechildren = maintable[0].getChildren();
//This is where the table gets parsed into the array
var array = [];
for (var i = 0; i < maintablechildren.length; i++) {
var verse = maintablechildren[i].getChildren();
//I originally tried verse[0].getText() but it didn't work.** It does now!
var hebrew = verse[0].getText();
//This array receives the English text and works fine.
var english = verse[1].getText();
array.push([hebrew,english]);
}
return array;
}
Results
[
[
"  וַיְדַבֵּר יְהוָה אֶל-מֹשֶׁה בְּמִדְבַּר סִינַי, בְּאֹהֶל מוֹעֵד:  בְּאֶחָד לַחֹדֶשׁ הַשֵּׁנִי בַּשָּׁנָה הַשֵּׁנִית, לְצֵאתָם מֵאֶרֶץ מִצְרַיִם--לֵאמֹר.",
" And the LORD spoke unto Moses in the wilderness of Sinai, in the tent of meeting, on the first day of the second month, in the second year after they were come out of the land of Egypt, saying:"
],
[
"  שְׂאוּ, אֶת-רֹאשׁ כָּל-עֲדַת בְּנֵי-יִשְׂרָאֵל, לְמִשְׁפְּחֹתָם, לְבֵית אֲבֹתָם--בְּמִסְפַּר שֵׁמוֹת, כָּל-זָכָר לְגֻלְגְּלֹתָם.",
" 'Take ye the sum of all the congregation of the children of Israel, by their families, by their fathers' houses, according to the number of names, every male, by their polls;"
],
[
"  מִבֶּן עֶשְׂרִים שָׁנָה וָמַעְלָה, כָּל-יֹצֵא צָבָא בְּיִשְׂרָאֵל--תִּפְקְדוּ אֹתָם לְצִבְאֹתָם, אַתָּה וְאַהֲרֹן.",
" from twenty years old and upward, all that are able to go forth to war in Israel: ye shall number them by their hosts, even thou and Aaron."
],
...

Conversion of Accented characters( e.g characters Ç, ü, é ,â) to respective non-accented characters in as3

Can anyone please help me in converting special accented characters to their respective non-accented characters in action script (as3)? Please let me know if there is any predefined method like toLowerCase to do the same. If no such method exists let me know the logic for conversion. Thanks in advance.
This is a very quick and dirty solution, but it should work. There is also some grunt work involved in building the map of characters, you may want to find a way to pull in a JSON or XML file for this purpose instead of hard-coding it in.
var map:Dictionary = new Dictionary();
map["â"] = "a";
map["ã"] = "a";
map["ë"] = "e";
//... complete the map using this site: https://docs.oracle.com/cd/E29584_01/webhelp/mdex_basicDev/src/rbdv_chars_mapping.html
function removeDiacritics($s:String):String
{
for(var $key:String in map)
{
$s = $s.split($key).join(map[$key]);
}
return $s;
}
var s:String = "this is ã tëst";
trace(s); // "this is ã tëst";
trace(removeDiacritics(s)); // this is a test

preg_replace not working

I have this function in my website.
function autolink($content) {
$pattern = "/>>[0-9]/i" ;
$replacement = ">>$0";
return preg_replace($pattern, $replacement, $content, -1);
This is for making certain characters into a clickable hyperlink.
For example, (on a thread) when a user inputs '>>4' to denote to the another reply number 4, the function can be useful.
But it's not working. the characters are not converted into a hyperlink. They just remain as plain text. Not clickable.
Could someone tell me what is wrong with the function?
So the objective is to convert:
This is a reference to the >>4 reply
...into:
This is a reference to the >>4 reply
...where ">" is the HTML UTF-8 equivalent of ">". (remember, you don't want to create HTML issues)
The problems: (1) you forgot to escape the quotes in the replacement (2) since you want to isolate the number, you need to use parentheses to create a sub-pattern for later reference.
Once you do this, you arrive at:
function autolink($contents) {
return preg_replace( "/>>([0-9])/i",
">>$1",
$contents,
-1
);
}
Good luck

Unicode, VBScript and HTML

I have the following radio box:
<input type="radio" value="香">香</input>
As you can see, the value is unicode. It represents the following Chinese character: 香
So far so good.
I have a VBScript that reads the value of that particular radio button and saves it into a variable. When I display the content with a message box, the Chinese Character appears. Additionally I have a variable called uniVal where I assign the unicode of the Chinese character directly:
radioVal = < read value of radio button >
MsgBox radioVal ' yields chinese character
uniVal = "香"
MsgBox uniVal ' yields unicode representation
Is there a possibility to read the radio box value in such a way that the unicode string is preserved and NOT interpreted as the chinese character?
For sure, I could try to recreate the unicode of the character, but the methods I found in VBScript are not working correctly due to VBScripts implicit UTF-16 setting (instead of UTF-8). So the following method does not work correctly for all characters:
Function StringToUnicode(str)
result = ""
For x=1 To Len(str)
result = result & "&#"&ascw(Mid(str, x, 1))&";"
Next
StringToUnicode = result
End Function
Cheers
Chris
I got a solution:
JavaScript is in possession of a function that actually works:
function convert(value) {
var tstr = value;
var bstr = '';
for(i=0; i<tstr.length; i++) {
if(tstr.charCodeAt(i)>127)
{
bstr += '&#' + tstr.charCodeAt(i) + ';';
}
else
{
bstr += tstr.charAt(i);
}
}
return bstr;
}
I call this function from my VBScript... :)
Here is a VBScript function that will always return a positive value for the Unicode code point of a given character:-
Function PositiveUnicode(s)
Dim val : val = AscW(s)
If (val And &h8000) <> 0 Then
PositiveUnicode = (val And &h7FFF) + &h8000&
Else
PositiveUnicode = CLng(val)
End If
End Function
This will save you loading two script engines to acheive a simple operation.
"not working correctly due to VBScripts implicit UTF-16 setting (instead of UTF-8)."
This issue has nothing to do with UTF-8. It is purely the result of AscW use of the signed integer type.
As to why you have to recreate the &#xxxxx; encodings that you sent this is result of how HTML (and XML) work. The use of this character encoding entity is a convnience that the specification does not require to remain intact. Since the character encoding of the document is quite capable or representing that character the DOM is at liberty to convert it.