Parsing Special characters in google apps script - google-apps-script

I am trying to pars a xml file into new google users,
This xml file contains the usernames made from the front and last name of the specified user. However when creating users it generates an error becaus some names have ilegal caracters in them.
These ilegal caracts are stuff like
ï ë é í
etc
Now i made a function which loops trough every username from the xml file and checks every caracter if it is one of the ilegal caracters and replaces them with there "normal" counterpart (fe) ä --> a.
but is there a way to automaticly convert all special caracters to there normal counterpart?
so i am searching for a function with the interface var
Username = ConvertUserName(UserNameWithIlegalCaracters);
now i am looping trough the caracters in the username and checking for each of them if they are (by refering to a list) if they are ilegal caracters and replacing them.
However now the program will fail if there are any ilegal caracters added to the names which are not in my list.

Anytime you're looking for an answer about how to do something in Google Apps Script, but it has little-or-nothing to do with Google services, check javascript first.
Search javascript diacritics: 68 results
You'll most likely find the function you need in Remove accents/diacritics in a string in JavaScript, which compares the performance of several options.
Below is the one that was fastest. To use, var userName = removeDiacritics(userNameWithIllegalCharacters);
var defaultDiacriticsRemovalap = [
{'base':'A', 'letters':'\u0041\u24B6\uFF21\u00C0\u00C1\u00C2\u1EA6\u1EA4\u1EAA\u1EA8\u00C3\u0100\u0102\u1EB0\u1EAE\u1EB4\u1EB2\u0226\u01E0\u00C4\u01DE\u1EA2\u00C5\u01FA\u01CD\u0200\u0202\u1EA0\u1EAC\u1EB6\u1E00\u0104\u023A\u2C6F'},
{'base':'AA','letters':'\uA732'},
{'base':'AE','letters':'\u00C6\u01FC\u01E2'},
{'base':'AO','letters':'\uA734'},
{'base':'AU','letters':'\uA736'},
{'base':'AV','letters':'\uA738\uA73A'},
{'base':'AY','letters':'\uA73C'},
{'base':'B', 'letters':'\u0042\u24B7\uFF22\u1E02\u1E04\u1E06\u0243\u0182\u0181'},
{'base':'C', 'letters':'\u0043\u24B8\uFF23\u0106\u0108\u010A\u010C\u00C7\u1E08\u0187\u023B\uA73E'},
{'base':'D', 'letters':'\u0044\u24B9\uFF24\u1E0A\u010E\u1E0C\u1E10\u1E12\u1E0E\u0110\u018B\u018A\u0189\uA779'},
{'base':'DZ','letters':'\u01F1\u01C4'},
{'base':'Dz','letters':'\u01F2\u01C5'},
{'base':'E', 'letters':'\u0045\u24BA\uFF25\u00C8\u00C9\u00CA\u1EC0\u1EBE\u1EC4\u1EC2\u1EBC\u0112\u1E14\u1E16\u0114\u0116\u00CB\u1EBA\u011A\u0204\u0206\u1EB8\u1EC6\u0228\u1E1C\u0118\u1E18\u1E1A\u0190\u018E'},
{'base':'F', 'letters':'\u0046\u24BB\uFF26\u1E1E\u0191\uA77B'},
{'base':'G', 'letters':'\u0047\u24BC\uFF27\u01F4\u011C\u1E20\u011E\u0120\u01E6\u0122\u01E4\u0193\uA7A0\uA77D\uA77E'},
{'base':'H', 'letters':'\u0048\u24BD\uFF28\u0124\u1E22\u1E26\u021E\u1E24\u1E28\u1E2A\u0126\u2C67\u2C75\uA78D'},
{'base':'I', 'letters':'\u0049\u24BE\uFF29\u00CC\u00CD\u00CE\u0128\u012A\u012C\u0130\u00CF\u1E2E\u1EC8\u01CF\u0208\u020A\u1ECA\u012E\u1E2C\u0197'},
{'base':'J', 'letters':'\u004A\u24BF\uFF2A\u0134\u0248'},
{'base':'K', 'letters':'\u004B\u24C0\uFF2B\u1E30\u01E8\u1E32\u0136\u1E34\u0198\u2C69\uA740\uA742\uA744\uA7A2'},
{'base':'L', 'letters':'\u004C\u24C1\uFF2C\u013F\u0139\u013D\u1E36\u1E38\u013B\u1E3C\u1E3A\u0141\u023D\u2C62\u2C60\uA748\uA746\uA780'},
{'base':'LJ','letters':'\u01C7'},
{'base':'Lj','letters':'\u01C8'},
{'base':'M', 'letters':'\u004D\u24C2\uFF2D\u1E3E\u1E40\u1E42\u2C6E\u019C'},
{'base':'N', 'letters':'\u004E\u24C3\uFF2E\u01F8\u0143\u00D1\u1E44\u0147\u1E46\u0145\u1E4A\u1E48\u0220\u019D\uA790\uA7A4'},
{'base':'NJ','letters':'\u01CA'},
{'base':'Nj','letters':'\u01CB'},
{'base':'O', 'letters':'\u004F\u24C4\uFF2F\u00D2\u00D3\u00D4\u1ED2\u1ED0\u1ED6\u1ED4\u00D5\u1E4C\u022C\u1E4E\u014C\u1E50\u1E52\u014E\u022E\u0230\u00D6\u022A\u1ECE\u0150\u01D1\u020C\u020E\u01A0\u1EDC\u1EDA\u1EE0\u1EDE\u1EE2\u1ECC\u1ED8\u01EA\u01EC\u00D8\u01FE\u0186\u019F\uA74A\uA74C'},
{'base':'OI','letters':'\u01A2'},
{'base':'OO','letters':'\uA74E'},
{'base':'OU','letters':'\u0222'},
{'base':'OE','letters':'\u008C\u0152'},
{'base':'oe','letters':'\u009C\u0153'},
{'base':'P', 'letters':'\u0050\u24C5\uFF30\u1E54\u1E56\u01A4\u2C63\uA750\uA752\uA754'},
{'base':'Q', 'letters':'\u0051\u24C6\uFF31\uA756\uA758\u024A'},
{'base':'R', 'letters':'\u0052\u24C7\uFF32\u0154\u1E58\u0158\u0210\u0212\u1E5A\u1E5C\u0156\u1E5E\u024C\u2C64\uA75A\uA7A6\uA782'},
{'base':'S', 'letters':'\u0053\u24C8\uFF33\u1E9E\u015A\u1E64\u015C\u1E60\u0160\u1E66\u1E62\u1E68\u0218\u015E\u2C7E\uA7A8\uA784'},
{'base':'T', 'letters':'\u0054\u24C9\uFF34\u1E6A\u0164\u1E6C\u021A\u0162\u1E70\u1E6E\u0166\u01AC\u01AE\u023E\uA786'},
{'base':'TZ','letters':'\uA728'},
{'base':'U', 'letters':'\u0055\u24CA\uFF35\u00D9\u00DA\u00DB\u0168\u1E78\u016A\u1E7A\u016C\u00DC\u01DB\u01D7\u01D5\u01D9\u1EE6\u016E\u0170\u01D3\u0214\u0216\u01AF\u1EEA\u1EE8\u1EEE\u1EEC\u1EF0\u1EE4\u1E72\u0172\u1E76\u1E74\u0244'},
{'base':'V', 'letters':'\u0056\u24CB\uFF36\u1E7C\u1E7E\u01B2\uA75E\u0245'},
{'base':'VY','letters':'\uA760'},
{'base':'W', 'letters':'\u0057\u24CC\uFF37\u1E80\u1E82\u0174\u1E86\u1E84\u1E88\u2C72'},
{'base':'X', 'letters':'\u0058\u24CD\uFF38\u1E8A\u1E8C'},
{'base':'Y', 'letters':'\u0059\u24CE\uFF39\u1EF2\u00DD\u0176\u1EF8\u0232\u1E8E\u0178\u1EF6\u1EF4\u01B3\u024E\u1EFE'},
{'base':'Z', 'letters':'\u005A\u24CF\uFF3A\u0179\u1E90\u017B\u017D\u1E92\u1E94\u01B5\u0224\u2C7F\u2C6B\uA762'},
{'base':'a', 'letters':'\u0061\u24D0\uFF41\u1E9A\u00E0\u00E1\u00E2\u1EA7\u1EA5\u1EAB\u1EA9\u00E3\u0101\u0103\u1EB1\u1EAF\u1EB5\u1EB3\u0227\u01E1\u00E4\u01DF\u1EA3\u00E5\u01FB\u01CE\u0201\u0203\u1EA1\u1EAD\u1EB7\u1E01\u0105\u2C65\u0250'},
{'base':'aa','letters':'\uA733'},
{'base':'ae','letters':'\u00E6\u01FD\u01E3'},
{'base':'ao','letters':'\uA735'},
{'base':'au','letters':'\uA737'},
{'base':'av','letters':'\uA739\uA73B'},
{'base':'ay','letters':'\uA73D'},
{'base':'b', 'letters':'\u0062\u24D1\uFF42\u1E03\u1E05\u1E07\u0180\u0183\u0253'},
{'base':'c', 'letters':'\u0063\u24D2\uFF43\u0107\u0109\u010B\u010D\u00E7\u1E09\u0188\u023C\uA73F\u2184'},
{'base':'d', 'letters':'\u0064\u24D3\uFF44\u1E0B\u010F\u1E0D\u1E11\u1E13\u1E0F\u0111\u018C\u0256\u0257\uA77A'},
{'base':'dz','letters':'\u01F3\u01C6'},
{'base':'e', 'letters':'\u0065\u24D4\uFF45\u00E8\u00E9\u00EA\u1EC1\u1EBF\u1EC5\u1EC3\u1EBD\u0113\u1E15\u1E17\u0115\u0117\u00EB\u1EBB\u011B\u0205\u0207\u1EB9\u1EC7\u0229\u1E1D\u0119\u1E19\u1E1B\u0247\u025B\u01DD'},
{'base':'f', 'letters':'\u0066\u24D5\uFF46\u1E1F\u0192\uA77C'},
{'base':'g', 'letters':'\u0067\u24D6\uFF47\u01F5\u011D\u1E21\u011F\u0121\u01E7\u0123\u01E5\u0260\uA7A1\u1D79\uA77F'},
{'base':'h', 'letters':'\u0068\u24D7\uFF48\u0125\u1E23\u1E27\u021F\u1E25\u1E29\u1E2B\u1E96\u0127\u2C68\u2C76\u0265'},
{'base':'hv','letters':'\u0195'},
{'base':'i', 'letters':'\u0069\u24D8\uFF49\u00EC\u00ED\u00EE\u0129\u012B\u012D\u00EF\u1E2F\u1EC9\u01D0\u0209\u020B\u1ECB\u012F\u1E2D\u0268\u0131'},
{'base':'j', 'letters':'\u006A\u24D9\uFF4A\u0135\u01F0\u0249'},
{'base':'k', 'letters':'\u006B\u24DA\uFF4B\u1E31\u01E9\u1E33\u0137\u1E35\u0199\u2C6A\uA741\uA743\uA745\uA7A3'},
{'base':'l', 'letters':'\u006C\u24DB\uFF4C\u0140\u013A\u013E\u1E37\u1E39\u013C\u1E3D\u1E3B\u017F\u0142\u019A\u026B\u2C61\uA749\uA781\uA747'},
{'base':'lj','letters':'\u01C9'},
{'base':'m', 'letters':'\u006D\u24DC\uFF4D\u1E3F\u1E41\u1E43\u0271\u026F'},
{'base':'n', 'letters':'\u006E\u24DD\uFF4E\u01F9\u0144\u00F1\u1E45\u0148\u1E47\u0146\u1E4B\u1E49\u019E\u0272\u0149\uA791\uA7A5'},
{'base':'nj','letters':'\u01CC'},
{'base':'o', 'letters':'\u006F\u24DE\uFF4F\u00F2\u00F3\u00F4\u1ED3\u1ED1\u1ED7\u1ED5\u00F5\u1E4D\u022D\u1E4F\u014D\u1E51\u1E53\u014F\u022F\u0231\u00F6\u022B\u1ECF\u0151\u01D2\u020D\u020F\u01A1\u1EDD\u1EDB\u1EE1\u1EDF\u1EE3\u1ECD\u1ED9\u01EB\u01ED\u00F8\u01FF\u0254\uA74B\uA74D\u0275'},
{'base':'oi','letters':'\u01A3'},
{'base':'ou','letters':'\u0223'},
{'base':'oo','letters':'\uA74F'},
{'base':'p','letters':'\u0070\u24DF\uFF50\u1E55\u1E57\u01A5\u1D7D\uA751\uA753\uA755'},
{'base':'q','letters':'\u0071\u24E0\uFF51\u024B\uA757\uA759'},
{'base':'r','letters':'\u0072\u24E1\uFF52\u0155\u1E59\u0159\u0211\u0213\u1E5B\u1E5D\u0157\u1E5F\u024D\u027D\uA75B\uA7A7\uA783'},
{'base':'s','letters':'\u0073\u24E2\uFF53\u00DF\u015B\u1E65\u015D\u1E61\u0161\u1E67\u1E63\u1E69\u0219\u015F\u023F\uA7A9\uA785\u1E9B'},
{'base':'t','letters':'\u0074\u24E3\uFF54\u1E6B\u1E97\u0165\u1E6D\u021B\u0163\u1E71\u1E6F\u0167\u01AD\u0288\u2C66\uA787'},
{'base':'tz','letters':'\uA729'},
{'base':'u','letters': '\u0075\u24E4\uFF55\u00F9\u00FA\u00FB\u0169\u1E79\u016B\u1E7B\u016D\u00FC\u01DC\u01D8\u01D6\u01DA\u1EE7\u016F\u0171\u01D4\u0215\u0217\u01B0\u1EEB\u1EE9\u1EEF\u1EED\u1EF1\u1EE5\u1E73\u0173\u1E77\u1E75\u0289'},
{'base':'v','letters':'\u0076\u24E5\uFF56\u1E7D\u1E7F\u028B\uA75F\u028C'},
{'base':'vy','letters':'\uA761'},
{'base':'w','letters':'\u0077\u24E6\uFF57\u1E81\u1E83\u0175\u1E87\u1E85\u1E98\u1E89\u2C73'},
{'base':'x','letters':'\u0078\u24E7\uFF58\u1E8B\u1E8D'},
{'base':'y','letters':'\u0079\u24E8\uFF59\u1EF3\u00FD\u0177\u1EF9\u0233\u1E8F\u00FF\u1EF7\u1E99\u1EF5\u01B4\u024F\u1EFF'},
{'base':'z','letters':'\u007A\u24E9\uFF5A\u017A\u1E91\u017C\u017E\u1E93\u1E95\u01B6\u0225\u0240\u2C6C\uA763'}
];
var diacriticsMap = {};
for (var i=0; i < defaultDiacriticsRemovalap.length; i++){
var letters = defaultDiacriticsRemovalap[i].letters;
for (var j=0; j < letters.length ; j++){
diacriticsMap[letters[j]] = defaultDiacriticsRemovalap[i].base;
}
}
// "what?" version ... http://jsperf.com/diacritics/12
function removeDiacritics (str) {
return str.replace(/[^\u0000-\u007E]/g, function(a){
return diacriticsMap[a] || a;
});
}

Related

Pasting Bold text into a doc after being taken form Spreadsheet

So I've been working through a problem of pulling structured data from a spreadsheet and them using App Script to insert it into a Template Google Doc.
I have it working simply as concatenated strings, but I'm trying to do it with the BODY class so if I want to put the end product into Gmail it could be easier. Or if I want to retain table structure....
So, everything is fine and dandy, except for this one bit of code. I'm struggling with setBold. It's a weird syntax in that it's a boolean operation, right?
So here is what I have and it's pretty easy to grok I think:
for(var i = 0; i < num; i++) {
var songName = String(dataArray[i][1]);
var sWs = String(dataArray[i][2]);
var pub = String(dataArray[i][3]);
newText.editAsText().appendText('SONG NAME:'+ nLi).setBold(true);
newText.editAsText().appendText(songName + brk).setBold(false);
newText.editAsText().appendText('SONGWRITER(S):' + nLi);
But it's coming out as:
SONG NAME:
I have also tried this code:
newText.editAsText().setBold(true);
newText.editAsText().appendText('SONG NAME:'+ nLi);
newText.editAsText().setBold(true);
newText.editAsText().appendText(songName + brk);
newText.editAsText().appendText('SONGWRITER(S):' + nLi);
Thinking of the setBold as setting then unsetting a flag.
Neither worked.
I prefer using styles:
function addboldtext() {
const doc=DocumentApp.getActiveDocument();
const body=doc.getBody();
const style1={};
style1[DocumentApp.Attribute.BOLD]=true;
style1[DocumentApp.Attribute.FOREGROUND_COLOR]='#000000';//you can add all of the attributes that you wish
body.appendParagraph("This is text").setAttributes(style1)
}

Is there a simple way to have a local webpage display a variable passed in the URL?

I am experimenting with a Firefox extension that will load an arbitrary URL (only via HTTP or HTTPS) when certain conditions are met.
With certain conditions, I just want to display a message instead of requesting a URL from the internet.
I was thinking about simply hosting a local webpage that would display the message. The catch is that the message needs to include a variable.
Is there a simple way to craft a local web page so that it can display a variable passed to it in the URL? I would prefer to just use HTML and CSS, but adding a little inline javascript would be okay if absolutely needed.
As a simple example, when the extension calls something like:
folder/messageoutput.html?t=Text%20to%20display
I would like to see:
Message: Text to display
shown in the browser's viewport.
You can use the "search" property of the Location object to extract the variables from the end of your URL:
var a = window.location.search;
In your example, a will equal "?t=Text%20to%20display".
Next, you will want to strip the leading question mark from the beginning of the string. The if statement is just in case the browser doesn't include it in the search property:
var s = a.substr(0, 1);
if(s == "?"){s = substr(1);}
Just in case you get a URL with more than one variable, you may want to split the query string at ampersands to produce an array of name-value pair strings:
var R = s.split("&");
Next, split the name-value pair strings at the equal sign to separate the name from the value. Store the name as the key to an array, and the value as the array value corresponding to the key:
var L = R.length;
var NVP = new Array();
var temp = new Array();
for(var i = 0; i < L; i++){
temp = R[i].split("=");
NVP[temp[0]] = temp[1];
}
Almost done. Get the value with the name "t":
var t = NVP['t'];
Last, insert the variable text into the document. A simple example (that will need to be tweaked to match your document structure) is:
var containingDiv = document.getElementById("divToShowMessage");
var tn = document.createTextNode(t);
containingDiv.appendChild(tn);
getArg('t');
function getArg(param) {
var vars = {};
window.location.href.replace( location.hash, '' ).replace(
/[?&]+([^=&]+)=?([^&]*)?/gi, // regexp
function( m, key, value ) { // callback
vars[key] = value !== undefined ? value : '';
}
);
if ( param ) {
return vars[param] ? vars[param] : null;
}
return vars;
}

Character encoding issue when using Google Apps Script to extract data from web page

I have written a script using Google Apps Script to extract text from a web page into Google Sheets. I only need this script to work with a specific web page, so it does not need to be versatile. The script works almost exactly as I want it to except that I have run into a character encoding problem. I am extracting both Hebrew and English text. The meta tag in the HTML has charset=Windows-1255. The English extracts perfectly, but the Hebrew displays as black diamonds containing a question mark.
I found this question that says to pass the data into a blob then use the getDataAsString method to convert to another encoding. I tried converting to different encodings and got different results. UTF-8 displays the black diamonds with question marks, UTF-16 displays Korean, ISO 8859-8 returns an error and says it's not a valid parameter, and the original Windows-1255 displays one Hebrew character but a bunch of other gibberish.
However, I am able to copy and paste the Hebrew text into Google Sheets manually and it displays correctly.
I have even tested passing Hebrew directly from Google Apps Script code like so:
function passHebrew() {
return "וַיְדַבֵּר";
}
This displays the Hebrew text properly on Google Sheets.
My code is as follows:
function parseText(book, chapter) {
//var bk = book;
//var ch = chapter;
var bk = '04'; //hard-coded for testing purposes
var ch = '01'; //hard-coded for testing purposes
var url = 'http://www.mechon-mamre.org/p/pt/pt' + bk + ch + '.htm';
var xml = UrlFetchApp.fetch(url).getContentText();
//I had to "fix" these xml errors for XmlService.parse(xml) below
//to function.
xml = xml.replace('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">', '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">');
xml = xml.replace('<LINK REL="stylesheet" HREF="p.css" TYPE="text/css">', '<LINK REL="stylesheet" HREF="p.css" TYPE="text/css"></LINK>');
xml = xml.replace('<meta http-equiv="Content-Type" content="text/html; charset=Windows-1255">', '<meta http-equiv="Content-Type" content="text/html; charset=Windows-1255"></meta>');
xml = xml.replace(/ALIGN=CENTER/gi, 'ALIGN="CENTER"');
xml = xml.replace(/<BR>/gi, '<BR></BR>');
xml = xml.replace(/class=h/gi, 'class="h"');
//This section is the specific route to the table in the page I want
var document = XmlService.parse(xml);
var body = document.getRootElement().getChildren("BODY");
var maintable = body[0].getChildren("TABLE");
var maintablechildren = maintable[0].getChildren();
//This creates a two-dimensional array so that I can store the Hebrew
//in the first column and the English in the second column
var array = new Array(maintablechildren.length);
for (var i = 0; i < maintablechildren.length; i++) {
array[i] = new Array(2);
}
//This is where the table gets parsed into the array
for (var i = 0; i < maintablechildren.length; i++) {
var verse = maintablechildren[i].getChildren();
//This is where the encoding problem occurs.
//I originally tried verse[0].getText() but it didn't work.
array[i][0] = Utilities.newBlob(verse[0].getText()).getDataAsString('UTF-8');
//This array receives the English text and works fine.
array[i][1] = verse[1].getText();
}
return array;
}
What am I overlooking, misunderstanding, or doing wrong? I don't have a very good understanding of how encoding works so I don't understand why converting it to UTF-8 isn't working.
Your problem occurs before the lines you've commented as an encoding problem: because the default encoding for UrlFetchApp is munging the unicode text from the start.
You should use the variation of the .getContentText() method that Returns the content of an HTTP response encoded as a string of the given charset. For your case:
var xml = UrlFetchApp.fetch(url).getContentText("Windows-1255");
That should be all you need to change, although the blob() work-around is no longer needed. (It's harmless, though.) Other comments:
The logical OR operator (||) is very helpful for setting default values. I've tweaked the first few lines to enable testing but still let the function operate normally with arguments.
The way you're setting up an empty array before populating it with strings is Bad JavaScript; it's complex code that isn't needed, so toss it. Instead, we'll declare the array Array, then push() rows onto it.
The .replace() functions can be reduced with more clever RegExp use; I've included the URLs for demos of the really tricky ones.
There were \n newline characters in the text which I guessed were unnecessary for your purposes, so added a replace() for them as well.
Here's what you're left with:
function parseText(book, chapter) {
var bk = book || '04'; //hard-coded for testing purposes
var ch = chapter || '01'; //hard-coded for testing purposes
var url = 'http://www.mechon-mamre.org/p/pt/pt' + bk + ch + '.htm';
var xml = UrlFetchApp.fetch(url).getContentText("Windows-1255");
//I had to "fix" these xml errors for XmlService.parse(xml) below
//to function.
xml = xml.replace(/(<!DOCTYPE.*EN")>/gi, '$1 "">')
.replace(/(<(LINK|meta).*>)/gi,'$1</$2>') // https://regex101.com/r/nH3pU8/1
.replace(/(<.*?=)([^"']*?)([ >])/gi,'$1"$2"$3') // https://regex101.com/r/eP7wO7/1
.replace(/<BR>/gi, '<BR/>')
.replace(/\n/g, '')
//This section is the specific route to the table in the page I want
var document = XmlService.parse(xml);
var body = document.getRootElement().getChildren("BODY");
var maintable = body[0].getChildren("TABLE");
var maintablechildren = maintable[0].getChildren();
//This is where the table gets parsed into the array
var array = [];
for (var i = 0; i < maintablechildren.length; i++) {
var verse = maintablechildren[i].getChildren();
//I originally tried verse[0].getText() but it didn't work.** It does now!
var hebrew = verse[0].getText();
//This array receives the English text and works fine.
var english = verse[1].getText();
array.push([hebrew,english]);
}
return array;
}
Results
[
[
"  וַיְדַבֵּר יְהוָה אֶל-מֹשֶׁה בְּמִדְבַּר סִינַי, בְּאֹהֶל מוֹעֵד:  בְּאֶחָד לַחֹדֶשׁ הַשֵּׁנִי בַּשָּׁנָה הַשֵּׁנִית, לְצֵאתָם מֵאֶרֶץ מִצְרַיִם--לֵאמֹר.",
" And the LORD spoke unto Moses in the wilderness of Sinai, in the tent of meeting, on the first day of the second month, in the second year after they were come out of the land of Egypt, saying:"
],
[
"  שְׂאוּ, אֶת-רֹאשׁ כָּל-עֲדַת בְּנֵי-יִשְׂרָאֵל, לְמִשְׁפְּחֹתָם, לְבֵית אֲבֹתָם--בְּמִסְפַּר שֵׁמוֹת, כָּל-זָכָר לְגֻלְגְּלֹתָם.",
" 'Take ye the sum of all the congregation of the children of Israel, by their families, by their fathers' houses, according to the number of names, every male, by their polls;"
],
[
"  מִבֶּן עֶשְׂרִים שָׁנָה וָמַעְלָה, כָּל-יֹצֵא צָבָא בְּיִשְׂרָאֵל--תִּפְקְדוּ אֹתָם לְצִבְאֹתָם, אַתָּה וְאַהֲרֹן.",
" from twenty years old and upward, all that are able to go forth to war in Israel: ye shall number them by their hosts, even thou and Aaron."
],
...

Can Microsoft Translator answer "does a word exist" queries?

I want to know if the API can do something like this:
I send a word in an predetermined language
API answer if the word exist in that language.
To add some context to the question, the idea is to develop a Scrabble like game, and I'm investigating a method to detect valid words, for all (or most common) languages that is.
I've already asked for a solution in one of their forums, but they are kind of dead.
I tested the MS translator service.
var result = MST.TranslateText("xyz", "en", "de"); // custom routine that calls MS service
var result2 = MST.TranslateText("dog", "en", "de");
var result2 = MST.TranslateText("sdfasfgd", "en", "de");
Result = XYZ // source xyz
Result2 = Hund // source dog
Result3 = sdfasfgd // sdfasfgd
Looks like when not found or a translation is not possible the string
is returned untouched.
The only strange behavior i've noted is conversion to uppercase for Some 3 letter scenarios that
arent obvious TLAs in either langauge.
public string TranslateText(string sourceText, string fromLang, string toLang) {
var httpRequestProperty = GetAuthorizationRequestHeader();
var msTransClient = new TranslatorService.LanguageServiceClient();
// Creates a block within which an OperationContext object is in scope.
using (var scope = new OperationContextScope(msTransClient.InnerChannel))
{
OperationContext.Current.OutgoingMessageProperties[HttpRequestMessageProperty.Name] = httpRequestProperty;
//Keep appId parameter blank as we are sending access token in authorization header.
var translationResult = msTransClient.Translate("", sourceText, fromLang, toLang, "text/plain", "");
return translationResult;
}
}

Losing leading 0s when string converts to array

I have a textInput control that sends .txt value to an array collection. The array collection is a collection of US zip codes so I use a regular expression to ensure I only get digits from the textInput.
private function addSingle(stringLoader:ArrayCollection):ArrayCollection {
arrayString += (txtSingle.text) + '';
var re:RegExp = /\D/;
var newArray:Array = arrayString.split(re);
The US zip codes start at 00501. Following the debugger, after the zip is submitted, the variable 'arrayString' is 00501. But once 'newArray' is assigned a vaule, it removes the first two 0s and leaves me with 501. Is this my regular expression doing something I'm not expecting? Could it be the array changing the value? I wrote a regexp test in javascript.
<script type="text/javascript">
var str="00501";
var patt1=/\D/;
document.write(str.match(patt1));
</script>
and i get null, which leads me to believe the regexp Im using is fine. In the help docs on the split method, I dont see any reference to leading 0s being a problem.
**I have removed the regular expression from my code completely and the same problem is still happening. Which means it is not the regular expression where the problem is coming from.
Running this simplified case:
var arrayString:String = '00501';
var re:RegExp = /\D/;
var newArray:Array = arrayString.split(re);
trace(newArray);
Yields '00501' as expected. There's nothing in the code you've posted that would strip leading zeros. You may want to dig around a bit more.
This smells suspiciously like Number coercion: Number('00501') yields 501. Read through the docs for implicit conversions and check if any pop up in your code.
What about this ?
/^\d+$/
You can also specify exactly 5 numbers like this :
/^\d{5}$/
I recommend just getting the zip codes instead of splitting on non-digits (especially if 'arrayString' might have multiple zip codes):
var newArray:Array = [];
var pattern:RegExp = /(\d+)/g;
var zipObject:Object;
while ((zipObject = pattern.exec(arrayString)) != null)
{
newArray.push(zipObject[1]);
}
for (var i:int = 0; i < newArray.length; i++)
{
trace("zip code " + i + " is: " + newArray[i]);
}