I use a TextInput component of Flex 4.5 to enter some text in English. I use the restrict attribute to ... restrict the keyboard input to characters a-zA-Z only. The problem is that if i copy/paste a word in another language, i can then paste it into the TextInput component. Is there a way to avoid that? If no, how can i validate the input against a specified language?
I found out that the unicode set of Chinese+ language symbols is \u4E00 to \u9FFF. So i write the following:
var chRE:RegExp = new RegExp("[\u4E00-\u9FFF]", "g");
if (inputTI.text.match(chRE)) {
trace("chinese");
}
else {
trace("other");
}
But if i type in the TextInput the word 'hello' then it validates...What is the error?
Since i cannot (my fault? or a bug?) use unicode range with RegExp, i wrote the following function to check if a word is in Chinese and that's it.
private function isChinese(word:String):Boolean
{
var wlength:int = word.length;
for (var i:int = 0; i < wlength; i++) {
var charCode:Number = word.charCodeAt(i);
if (charCode <= 0x4E00 || charCode >= 0x9FFF) {
return false;
}
}
return true;
}
The String.match() method returns an array which will always test to true, even if it's empty (see docs here: http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/String.html#match%28%29)
Use the RegExp.test() method instead to see if it matches:
// Check your character ranges:
//var chRE:RegExp = new RegExp("[\u4E00-\u9FFF]", "g"); // \u9FFF is unrecognised and iscausing issues.
var chRE:RegExp = new RegExp("[\u4E00]+", "g"); // This works.
if (chRE.test(inputTI.text)) {
trace("chinese");
}
else {
trace("other");
}
You'll need to check the character ranges too - I couldn't get it to match with \u9FFF in the regex.
Related
I made a formula to extract some Wikipedia data in Google Seets which works fine. Here is the formula:
=regexreplace(join("",flatten(IMPORTXML(D2,".//p[preceding-sibling::h2[1][contains(., 'Geography')]]"))),"\[[^\]]+\]","")&char(10)&char(10)&iferror(regexreplace(join("",flatten(IMPORTXML(D2,".//p[preceding-sibling::h2[1][contains(., 'Education')]]"))),"\[[^\]]+\]",""))
Where D2 is a URL like https://en.wikipedia.org/wiki/Abbeville,_Alabama
This extracts some Geography and Education data from the Wikipedia page. Trouble is that importxml only runs a few times before it dies due to quota.
So I thought maybe better to use Apps Script where there are much higher limits on fetching and parsing. I could not see a good way however of using Xpath in Apps Script. Older posts on the web discuss using a deprecated service called Xml but it seems to no longer work. There is a Service called XmlService which looks like it may do the job but you can't just plug in an Xpath. It looks like a lot of sweating to get to the result. Any solutions out there where you can just plug in Xpath?
Here is an alternative solution I actually do in a case like this.
I have used XmlService but only for parsing the content, not for using Xpath. This makes use of the element tags and so far pretty consistent on my tests. Although, it might need tweaks when certain tags are in the result and you might have to include them into the exclusion condition.
Tested the code below in both links:
https://en.wikipedia.org/wiki/Abbeville,_Alabama#Geography
https://en.wikipedia.org/wiki/Montgomery,_Alabama#Education
My test shows that the formula above used did not return the proper output from the 2nd link while the code does. (Maybe because it was too long)
Code:
function getGeoAndEdu(path) {
var data = UrlFetchApp.fetch(path).getContentText();
// wikipedia is divided into sections, if output is cut, increase the number
var regex = /.{1,100000}/g;
var results = [];
// flag to determine if matches should be added
var foundFlag = false;
do {
m = regex.exec(data);
if (foundFlag) {
// if another header is found during generation of data, stop appending the matches
if (matchTag(m[0], "<h2>"))
foundFlag = false;
// exclude tables, sub-headers and divs containing image description
else if(matchTag(m[0], "<div") || matchTag(m[0], "<h3") ||
matchTag(m[0], "<td") || matchTag(m[0], "<th"))
continue;
else
results.push(m[0]);
}
// start capturing if either IDs are found
if (m != null && (matchTag(m[0], "id=\"Geography\"") ||
matchTag(m[0], "id=\"Education\""))) {
foundFlag = true;
}
} while (m);
var output = results.map(function (str) {
// clean tags for XmlService
str = str.replace(/<[^>]*>/g, '').trim();
decode = XmlService.parse('<d>' + str + '</d>')
// convert html entity codes (e.g. ) to text
return decode.getRootElement().getText();
// filter blank results due to cleaning and empty sections
// separate data and remove citations before returning output
}).filter(result => result.trim().length > 1).join("\n").replace(/\[\d+\]/g, '');
return output;
}
// check if tag is found in string
function matchTag(string, tag) {
var regex = RegExp(tag);
return string.match(regex) && string.match(regex)[0] == tag;
}
Output:
Difference:
Formula ending output
Script ending output
Education ending in wikipedia
Note:
You still have quota when using UrlFetchApp but should be better than IMPORTXML's limit depending on the type of your account.
Reference:
Apps Script Quotas
Sorry I got very busy this week so I didn't reply. I took a look at your answer which seems to work fine, but it was quite code heavy. I wanted something I would understand so I coded my own solution. not that mine is any simpler. It's just my own code so it's easier for me to follow:
function getTextBetweenTags(html, paramatersInFirstTag, paramatersInLastTag) { //finds text values between 2 tags and removes internal tags to leave plain text.
//eg getTextBetweenTags(html,[['class="mw-headline"'],['id="Geography"']],[['class="wikitable mw-collapsible mw-made-collapsible"']])
// **Note: you may want to replace &#number; with ascII number
var openingTagPos = null;
var closingTagPos = null;
var previousChar = '';
var readingTag = false;
var newTag = '';
var tagEnd = false;
var regexFirstTagParams = [];
var regexLastTagParams = [];
//prepare regexes to test for parameters in opening and closing tags. put regexes in arrays so each condition can be tested separately
for (var i in paramatersInFirstTag) {
regexFirstTagParams.push(new RegExp(escapeRegex(paramatersInFirstTag[i][0])))
}
for (var i in paramatersInLastTag) {
regexLastTagParams.push(new RegExp(escapeRegex(paramatersInLastTag[i][0])))
}
var startTagIndex = null;
var endTagIndex = null;
var matches = 0;
for (var i = 0; i < html.length - 1; i++) {
var nextChar = html.substr(i, 1);
if (nextChar == '<' && previousChar != '\\') {
readingTag = true;
}
if (nextChar == '>' && previousChar != '\\') { //if end of tag found, check tag matches start or end tag
readingTag = false;
newTag += nextChar;
//test for firstTag
if (startTagIndex == null) {
var alltestsPass = true;
for (var j in regexFirstTagParams) {
if (!regexFirstTagParams[j].test(newTag)) alltestsPass = false;
}
if (alltestsPass) {
startTagIndex = i + 1;
//console.log('Start Tag',startTagIndex)
matches++;
}
}
//test for lastTag
else if (startTagIndex != null) {
var alltestsPass = true;
for (var j in regexLastTagParams) {
if (!regexLastTagParams[j].test(newTag)) alltestsPass = false;
}
if (alltestsPass) {
endTagIndex = i + 1;
matches++;
}
}
if(startTagIndex && endTagIndex) break;
newTag = '';
}
if (readingTag) newTag += nextChar;
previousChar = nextChar;
}
if (matches < 2) return 'No matches';
else return html.substring(startTagIndex, endTagIndex).replace(/<[^>]+>/g, '');
}
function escapeRegex(string) {
if (string == null) return string;
return string.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}
My function requires an array of attributes for the start tag and an array of attributes for the end tag. It gets any text in between and removes any tags found inbetween. One issue I also noticed was there were often special characters (eg ) so they need to be replaced. I did that outside the scope of the function above.
The function could be easily improved to check the tag type (eg h2), but it wasn't necessary for the wikipedia case.
Here is a function where I called the above function. the html variable is just the result of UrlFetchApp.fetch('some wikipedia city url').getContextText();
function getWikiTexts(html) {
var geography = getTextBetweenTags(html, [['class="mw-headline"'], ['id="Geography']], [['class="mw-headline"']]);
var economy = getTextBetweenTags(html, 'span', [['class="mw-headline"'], ['id="Economy']], 'span', [['class="mw-headline"']])
var education = getTextBetweenTags(html, 'span', [['class="mw-headline"'], ['id="Education']], 'span', [['class="mw-headline"']])
var returnString = '';
if (geography != 'No matches' && !/Wikipedia/.test(geography)) returnString += geography + '\n';
if (economy != 'No matches' && !/Wikipedia/.test(economy)) returnString += economy + '\n';
if (education != 'No matches' && !/Wikipedia/.test(education)) returnString += education + '\n';
return returnString
}
Thanks for posting your answer.
I'm trying to create a google apps script that will format certain parts of a paragraph. For example, text that is underlined will become bolded/italicized as well.
One docs add-on I have tried has a similar feature: https://imgur.com/a/5Cw6Irn (this is exactly what I'm trying to achieve)
How can I write a function that will select a certain type of text and format it?
**I managed to write a script that iterates through every single letter in a paragraph and checks if it's underlined, but it becomes extremely slow as the paragraph gets longer, so I'm looking for a faster solution.
function textUnderline() {
var selectedText = DocumentApp.getActiveDocument().getSelection();
if(selectedText) {
var elements = selectedText.getRangeElements();
for (var index = 0; index < elements.length; index++) {
var element = elements[index];
if(element.getElement().editAsText) {
var text = element.getElement().editAsText();
var textLength = text.getText().length;
//For every single character, check if it's underlined and then format it
for (var i = 0; i < textLength; i++) {
if(text.isUnderline(i)) {
text.setBold(i, i, true);
text.setBackgroundColor(i,i,'#ffff00');
} else {
text.setFontSize(i, i, 8);
}
}
}
}
}
}
Use getTextAttributeIndices:
There is no need to check each character in the selection. You can use getTextAttributeIndices() to get the indices in which the text formatting changes. This method:
Retrieves the set of text indices that correspond to the start of distinct text formatting runs.
You just need to iterate through these indices (that is, check the indices in which text formatting changes), which are a small fraction of all character indices. This will greatly increase efficiency.
Code sample:
function textUnderline() {
var selectedText = DocumentApp.getActiveDocument().getSelection();
if(selectedText) {
var elements = selectedText.getRangeElements();
for (var index = 0; index < elements.length; index++) {
var element = elements[index];
if(element.getElement().editAsText) {
var text = element.getElement().editAsText();
var textRunIndices = text.getTextAttributeIndices();
var textLength = text.getText().length;
for (let i = 0; i < textRunIndices.length; i++) {
const startOffset = textRunIndices[i];
const endOffset = i + 1 < textRunIndices.length ? textRunIndices[i + 1] - 1 : textLength - 1;
if (text.isUnderline(textRunIndices[i])) {
text.setBold(startOffset, endOffset, true);
text.setBackgroundColor(startOffset, endOffset,'#ffff00');
} else {
text.setFontSize(startOffset, endOffset, 8);
}
}
}
}
}
}
Reference:
getTextAttributeIndices()
Based on the example shown in the animated gif, it seems your procedure needs to
handle a selection
set properties if the selected region is of some format (e.g. underlined)
set properties if the selected region is NOT of some format (e.g. not underlined)
finish as fast as possible
and your example code achieves all these goals expect the last one.
The problem is that you are calling the text.set...() functions at each index position. Each call is synchronous and blocks the code until the document is updated, thus your run time grows linearly with each character in the selection.
My suggestion is to build up a collection of subranges from the selection range and then for each subrange use text.set...(subrange.start, subrange.end) to apply the formatting. Now the run time will be dependent on chunks of characters, rather than single characters. i.e., you will only update when the formatting switches back and forth from, in your example, underlined to not underlined.
Here is some example code that implements this subrange idea. I separated the specific predicate function (text.isUnderline) and specific formatting effects into their own functions so as to separate the general idea from the specific implementation.
// run this function with selection
function transformUnderlinedToBoldAndYellow() {
transformSelection("isUnderline", boldYellowOrSmall);
}
function transformSelection(stylePredicateKey, stylingFunction) {
const selectedText = DocumentApp.getActiveDocument().getSelection();
if (!selectedText) return;
const getStyledSubRanges = makeStyledSubRangeReducer(stylePredicateKey);
selectedText.getRangeElements()
.reduce(getStyledSubRanges, [])
.forEach(stylingFunction);
}
function makeStyledSubRangeReducer(stylePredicateKey) {
return function(ranges, rangeElement) {
const {text, start, end} = unwrapRangeElement(rangeElement);
if (start >= end) return ranges; // filter out empty selections
const range = {
text, start, end,
styled: [], notStyled: [] // we will extend our range with subranges
};
const getKey = (isStyled) => isStyled ? "styled" : "notStyled";
let currentKey = getKey(text[stylePredicateKey](start));
range[currentKey].unshift({start: start});
for (let index = start + 1; index <= end; ++index) {
const isStyled = text[stylePredicateKey](index);
if (getKey(isStyled) !== currentKey) { // we are switching styles
range[currentKey][0].end = index - 1; // note end of this style
currentKey = getKey(isStyled);
range[currentKey].unshift({start: index}); // start new style range
}
}
ranges.push(range);
return ranges;
}
}
// a helper function to unwrap a range selection, deals with isPartial,
// maps RangeElement => {text, start, end}
function unwrapRangeElement(rangeElement) {
const isPartial = rangeElement.isPartial();
const text = rangeElement.getElement().asText();
return {
text: text,
start: isPartial
? rangeElement.getStartOffset()
: 0,
end: isPartial
? rangeElement.getEndOffsetInclusive()
: text.getText().length - 1
};
}
// apply specific formatting to satisfy the example
function boldYellowOrSmall(range) {
const {text, start, end, styled, notStyled} = range;
styled.forEach(function setTextBoldAndYellow(range) {
text.setBold(range.start, range.end || end, true);
text.setBackgroundColor(range.start, range.end || end, '#ffff00');
});
notStyled.forEach(function setTextSmall(range) {
text.setFontSize(range.start, range.end || end, 8);
});
}
I'm a fiction writer and I used to do my writing in MS Word. I've written some macros to help me edit the fiction text and one of them check the paragraph and marks (red) the duplicate (or triplicate words, etc). Example:
"I came **home**. And while at **home** I did this and that."
Word "home" is used twice and worth checking if I really can't change the sentence.
Now I mostly use google documents for writing, but I still have to do my editing in MS Word, mostly just because of this macro - I am not able to program it in the google script.
function PobarvajBesede() {
var doc = DocumentApp.getActiveDocument();
var cursor = DocumentApp.getActiveDocument().getCursor();
var surroundingText = cursor.getSurroundingText().getText();
var WordsString = WORDS(surroundingText);
Logger.log(WordsString);
//so far, so good. But this doesn't work:
var SortedWordsString = SORT(WordsString[1],1,False);
// and I'm lost.
}
function WORDS(input) {
var input = input.toString();
var inputSplit = input.split(" ");
// Logger.log(inputSplit);
inputSplit = inputSplit.toString();
var punctuationless = inputSplit.replace(/[.,\/#!$%\?^&\*;:{}=\-_`~()]/g," ");
var finalString = punctuationless.replace(/\s{2,}/g," ");
finalString = finalString.toLowerCase();
return finalString.split(" ") ;
}
If I could only get a list of words (in uppercase, longer than 3 characters), sorted by the number of their appearances in the logger, it would help me a lot:
HOME (2)
AND (1)
...
Thank you.
Flow:
Transform the string to upper case and sanitize the string of all non ascii characters
After splitting the string to word array, reduce the array to a object of word:count
Map the reduced object to a 2D array [[word,count of this word],[..],...] and sort the array by the inner array's count.
Snippet:
function wordCount(str) {
str = str || 'I came **home**. And while at **home** I did this and that.';
var countObj = str
.toUpperCase() //'I CAME **HOME**...'
.replace(/[^A-Z ]/g, '') //'I CAME HOME...'
.split(' ') //['I', 'CAME',..]
.reduce(function(obj, word) {
if (word.length >= 3) {
obj[word] = obj[word] ? ++obj[word] : 1;
}
return obj;
}, {}); //{HOME:2,DID:1}
return Object.keys(countObj)
.map(function(word) {
return [word, countObj[word]];
}) //[['HOME',2],['CAME',1],...]
.sort(function(a, b) {
return b[1] - a[1];
});
}
console.info(wordCount());
To read and practice:
Object
Array methods
This is a combination of TheMaster answer and some of my work. I need to learn more about the way he did it so I spent some learning time today. This function eliminates some problems I was having the carriage returns and it also removes items that only appear once. You should probably pick TheMasters solution as I couldn't have done it without his work.
function getDuplicateWords() {
var str=DocumentApp.getActiveDocument().getBody().getText();
var countObj = str
.toUpperCase()
.replace(/\n/g,' ')
.replace(/[^A-Z ]/g, '')
.split(' ')
.reduce(function(obj, word) {
if (word.length >= 2) {
obj[word] = obj[word] ? ++obj[word] : 1;
}
return obj;
}, {});
var oA=Object.keys(countObj).map(function(word){return [word, countObj[word]];}).filter(function(elem){return elem[1]>1;}).sort(function(a,b){return b[1]-a[1]});
var userInterface=HtmlService.createHtmlOutput(oA.join("<br />"));
DocumentApp.getUi().showSidebar(userInterface);
}
function onOpen() {
DocumentApp.getUi().createMenu('MyMenu')
.addItem('Get Duplicates','getDuplicateWords' )
.addToUi();
}
And yes I was having problems with get the results to change in my last solution.
I have 'myTextarea' so users can put their text into it, and they should use only "numbers" and ",".
Text they input must be like this:
2,4,6,2,67,43,...number, comma, number, comma and so on.
This line of code helps me:
levelTextarea.restrict = "0-9,";
But the problem is that users can type many commas in a row
2,,,,3,44,5,6,5,5....
and I need number,comma, number...
I will convert their input into an array.
Is it possible to validate input text, something like:
if (myTextarea is valid)
{
continue
}
else
{
trace ("invalid input");
}
There may be a better way, but one simple way that comes to mind is just doing this:
var myValue:String = myTextarea.text;
while(myValue.indexOf(",,") >= 0){
myValue = myValue.replace(",,",",");
}
Of course, if you just want an array of numbers at the end of the day, you could just do this instead:
//create the array
var arr:Array = myTextarea.text.split(",");
//loop backwards through the array and remove anything that is empty
for(var i:int=arr.length-1;i>=0;i--){
if(!arr[i] || arr[i] == ""){
arr.splice(i,1);
continue;
}
//convert the value to a number
arr[i] = Number(arr[i]);
}
Now you'd have an array of all the number (separated by commas) from the text input
This is one way to do it:
var a:String="4,4,4,4";
var valid:Boolean=true;
for(var i:int=0;i<a.length-1;i++)
{
if(a.charAt(i)=="," && a.charAt(i)==a.charAt(i+1))
{
trace(a.charAt(i));
valid=false;
}
}
Only here I used strings.
I have a Goggle Spreadsheet with some data, and I want to write a custom function to use in the sheet, which accepts a range of cells and a delimiter character, takes each cell value, splits it by the delimiter, and counts the total.
For example
Column A has the following values in rows 1-3: {"Sheep","Sheep,Dog","Cat"}
My function would be called like this: =CountDelimitedValues(A1:A3;",");
It should return the value: 4 (1+2+1)
The problem I am having is in my custom script I get errors like
"TypeError: cannot get function GetValues from type Sheep"
This is my current script:
function CountArrayList(arrayList, delimiter) {
var count = 0;
//for (i=0; i<array.length; i++)
//{
//count += array[i].split(delimiter).length;
//}
var newArray = arrayList.GetValues();
return newArray.ToString();
//return count;
}
I understand that the parameter arraylist is receiving an array of objects from the spreadsheet, however I don't know how to get the value out of those objects, or perhaps cast them into strings.
Alternatively I might be going about this in the wrong way? I have another script which extracts the text from a cell between two characters which works fine for a single cell. What is it about a range of cells that is different?
That's something you can achieve without using script but plain old formula's:
=SUM(ARRAYFORMULA(LEN(A1:A3)-LEN(SUBSTITUTE(A1:A3; ","; "")) + 1))
Credit goes here: https://webapps.stackexchange.com/q/37744/29140
something like this works :
function CountArrayList(arrayList) {
return arrayList.toString().split(',').length
}
wouldn't it be sufficient ?
edit Oooops, sorry I forgot the user defined delimiter, so like this
function CountArrayList(arrayList,del) {
return arrayList.toString().split(del).length
}
usage : =CountArrayList(A1:C1;",")
NOTE : in this example above it would be dangerous to use another delimiter than "," since the toString() joins the array elements with commas... if you really need to do so try using a regex to change the commas to what you use and apply the split on that.
try like this :
function CountArrayList(arrayList,del) {
return arrayList.toString().replace(/,/g,del).split(del).length
}
Another solution I have was that I needed to implicitly cast the objects in the array being passed as a string.
For example this function accepts the array of cells, and outputs their contents as a string with del as the delimiter (similar to the String.Split() function). Note the TrimString function and that it is being passed an element of the array.
function ArrayToString(array,del) {
var string = "";
for (i=0; i < array.length; i++) {
if (array[i] != null) {
var trimmedString = TrimString(array[i]);
if (trimmedString != "") {
if (string.length > 0) {
string += del;
}
string += trimmedString;
}
}
}
return string;
}
Below is the TrimString function.
function TrimString(string) {
var value = "";
if (string != "" && string != null) {
var newString = "";
newString += string;
var frontStringTrimmed = newString.replace(/^\s*/,"");
var backStringTrimmed = frontStringTrimmed.replace(/\s*$/,"");
value = backStringTrimmed;
}
return value;
}
What I found is that this code threw a TypeError unless I included the declaration of the newString variable, and added the array element object to it, implicitly casting the array element object as a string. Otherwise the replace() functions could not be called.