Count instances of text string and replace with number - google-apps-script

I am trying to find ways to speed up adding footnotes to a Google slides document. What I want is a script that looks for every instance of a text string throughout the document (say ‘*’) and then replaces each instance of that string with the number corresponding to that instance e.g. the first * gets replaced with 1, second * gets replaced with 2, and so on. I realise this method can only be used once but this would still save me a lot of time. Is there an easy way to do this? I can’t work out how to replace with a variable but it seems like it should be possible.

Assuming that we have this slide below as our sample data.
Sample Data:
If we want to replace all occurrences of a string (e.g. "replace"), then we will need to traverse all shapes of each slides and replace its occurrences with the counter.
Code:
function myFunction() {
var presentation = SlidesApp.getActivePresentation();
var slides = presentation.getSlides();
var counter = 0;
// traverse each slide
slides.forEach(function (slide) {
var shapes = slide.getShapes();
// traverse each shape
shapes.forEach(function (shape) {
// get its text content
var text = shape.getText()
var string = text.asString();
// replace all occurrences of string (e.g. "replace")
// by an incrementing number
string = string.replace(/replace/g, function() {
return ++counter;
});
// set the shape's text
text.setText(string);
});
});
}
Output:

Not exactly a ready solution. Rather the way to solve the task.
You can download all texts of your presentation as a TXT file:
Then you can process this text with JS script. Something like this:
// your text with markers (#)
var txt = `
doleste # atus etur, consequi odi quos alit audipsunt as is est# ant.
consequi # odi quos alit audipsunt es vere ipsam aut am
doluptae et que nonse # um volupta aped ulloreictat as is est ant.
`;
// get every marker + several characters before and after
var find_for = txt.match(/...#.../g);
console.log(find_for); // Output: [ 'te # at', 'est# an', 'ui # od', 'se # um' ]
// replace marker with numbers 1, 2, 3...
var replace_with = find_for.map((m,i) => m.replace(/#/, i+1));
console.log(replace_with); // Output: [ 'te 1 at', 'est2 an', 'ui 3 od', 'se 4 um' ]
This way you will get two arrays: find_for and replace_with.
Then you will need a script to perform the text replaces.
'te # at' --> 'te 1 at'
'est# an' --> 'est2 an'
'ui # od' --> 'ui 3 od'
'se # um' --> 'se 4 um'
Which is, I believe, a quite trivial task.
But there can be errors if some markets has the same neighbor characters. Probably you need to take four or five neighbor characters with markers: ....#.... or asymetric ......#... It's up to you.

Related

How to extract invoices data from an image in android app?

My task is to extract text from a scanned document/ JPG and then get only below mentioned 6 values so that I can auto-fill a form-data in my next screen/ activity.
I used google cloud vision api in my android app with a Blaze version(paid), And I got the result as a text block, but I want to extract only some of information out of them, how I can achieve that?
Bills or receipt can be different all the time but I want 6 things out of all the invoices text block for Ex -
Vendor
Account
Description
Due Date
Invoice Number
Amount
Is there any tool/3rd party library available so that I can use in my android development.
Note - I don't think any sample of receipt or bill image needed for this because it can be any type of bill or invoice we just need to extract 6 mentioned things from that extracted text.
In the next scenarios I will create two fictive bill formats, then write the code algorithm to parse them. I will write only the algorithm because I don't know JAVA.
On the first column we have great pictures from two bills. In the second column we have text data obtained from OCR software. It's like a simple text file, with no logic implemented. But we know certain keywords that can make it have meaning. Bellow is the algorithm that translates the meaningless file in a perfect logical JSON.
// Text obtained from BILL format 1
var TEXT_FROM_OCR = "Invoice no 12 Amount 55$
Vendor name BusinessTest 1 Account No 1213113
Due date 2019-12-07
Description Lorem ipsum dolor est"
// Text obtained from BILL format 2
var TEXT_FROM_OCR =" BusinessTest22
Invoice no 19 Amount 12$
Account 4564544 Due date 2019-12-15
Description
Lorem ipsum dolor est
Another description line
Last description line"
// This is a valid JSON object which describes the logic behind the text
var TEMPLATES = {
"bill_template_1": {
"vendor":{
"line_no_start": null, // This means is unknown and will be ignored by our text parsers
"line_no_end": null, // This means is unknown and will be ignored by our text parsers
"start_delimiter": "Vendor name", // Searched value starts immediatedly after this start_delimiters
"end_delimiter": "Account" // Searched value ends just before this end_delimter
"value_found": null // Save here the value we found
},
"account": {
"line_no_start": null, // This means is unknown and will be ignored by our text parsers
"line_no_end": null, // This means is unknown and will be ignored by our text parsers
"start_delimiter": "Account No", // Searched value starts immediatedly after this start_delimiters
"end_delimiter": null // Extract everything untill the end of current line
"value_found": null // Save here the value we found
},
"description": {
// apply same logic as above
},
"due_date" {
// apply same logic as above
},
"invoice_number" {
// apply same logic as above
},
"amount" {
// apply same logic as above
},
},
"bill_template_2": {
"vendor":{
"line_no_start": 0, // Extract data from line zero
"line_no_end": 0, // Extract data untill line zero
"start_delimiter": null, // Ignore this, because our delimiter is a complete line
"end_delimiter": null // Ignore this, because our delimiter is a complete line
"value_found": null // Save here the value we found
},
"account": {
"line_no_start": null, // This means is unknown and will be ignored by our text parsers
"line_no_end": null, // This means is unknown and will be ignored by our text parsers
"start_delimiter": "Account", // Searched value starts immediatedly after this start_delimiters
"end_delimiter": "Due date" // Searched value ends just before this end_delimter
"value_found": null // Save here the value we found
},
"description": {
"line_no_start": 6, // Extract data from line zero
"line_no_end": 99999, // Extract data untill line 99999 (a very big number which means EOF)
"start_delimiter": null, // Ignore this, because our delimiter is a complete line
"end_delimiter": null // Ignore this, because our delimiter is a complete line
"value_found": null // Save here the value we found
},
"due_date" {
// apply same logic as above
},
"invoice_number" {
// apply same logic as above
},
"amount" {
// apply same logic as above
},
}
}
// ALGORITHM
// 1. convert into an array the TEXT_FROM_OCR variable (each index, means a new line in file)
// in JavaScript we would do something like this:
TEXT_FROM_OCR = TEXT_FROM_OCR.split("\r\n");
var MAXIMUM_SCORE = 6; // we are looking to extract 6 values, out of 6
foreach TEMPLATES as TEMPLATE_TO_PARSE => PARSE_METADATA{
SCORE = 0; // for each field we find, we increment score
foreach PARSE_METADATA as SEARCHED_FIELD_NAME => DELIMITERS_METADATA{
// Search by line first
if (DELIMITERS_METADATA['line_no_start'] !== NULL && DELIMITERS_METADATA['line_no_end'] !== NULL){
// Initiate value with an empty string
DELIMITERS_METADATA['value_found'] = '';
// Concatenate the value found across these lines
for (LINE_NO = DELIMITERS_METADATA['line_no_start']; LINE_NO <= DELIMITERS_METADATA['line_no_end']; LINE_NO++){
// Add line, one by one as defined by your delimiters
DELIMITERS_METADATA['value_found'] += TEXT_FROM_OCR[ LINE_NO ];
}
// We have found a good value, continue to next field
SCORE++;
continue;
}
// Search by text delimiters
if (DELIMITERS_METADATA['start_delimiter'] !== NULL){
// Search for text inside each line of the file
foreach TEXT_FROM_OCR as LINE_CONTENT{
// If we found start_delimiter on this line, then let's parse it
if (LINE_CONTENT.indexOf(DELIMITERS_METADATA['start_delimiter']) > -1){
// START POSITION OF OUR SEARCHED VALUE IS THE OFFSET WE FOUND + THE TOTAL LENGTH OF START DELIMITER
START_POSITION = LINE_CONTENT.indexOf(DELIMITERS_METADATA['start_delimiter']) + LENGTH( DELIMITERS_METADATA['start_delimiter'] );
// by default we try to extract all data from START_POSITION untill the end of current line
END_POSITION = 999999999999; // till the end of line
// HOWEVER, IF THERE IS AN END DELIMITER DEFINED, WE WILL USE THAT
if (DELIMITERS_METADATA['end_delimiter'] !== NULL){
// IF WE FOUND THE END DELIMITER ON THIS LINE, WE WILL USE ITS OFFSET as END_POSITION
if (LINE_CONTENT.indexOf(DELIMITERS_METADATA['end_delimiter']) > -1){
END_POSITION = LINE_CONTENT.indexOf(DELIMITERS_METADATA['end_delimiter']);
}
}
// SUBSTRACT THE VALUE WE FOUND
DELIMITERS_METADATA['value_found'] = LINE_CONTENT.substr(START_POSITION, END_POSITION);
// We have found a good value earlier, increment the score
SCORE++;
// break this foreach as we found a good value, and we need to move to next field
break;
}
}
}
}
print(TEMPLATE_TO_PARSE obtained a score of SCORE out of MAXIMUM_SCORE):
}
At the end you will know which template extracted most of the data, and based on this which one to use for that bill. Feel free to ask anything in comments. If I stayed 45 minute to write this answer, I'll surely answer to your comments as well. :)

Partial replace in docs what matches only and preserve formatting

Let's assume that we have first paragraph in our google document:
Wo1rd word so2me word he3re last.
We need to search and replace some parts of text but it must be highlighted in editions history just like we changed only that parts and we must not loose our format (bold, italic, color etc).
What i have/understood for that moment: capturing groups didn't work in replaceText() as described in documentation. We can use pure js replace(), but it can be used only for strings. Our google document is array of objects, not strings. So i did a lot of tries and stopped at that code, attached in this message later.
Can't beat: how i can replace only part of what i've found. Capturing groups is very powerful and suitable instrument, but i can't use it for replacement. They didn't work or i can replace whole paragraph, that is unacceptable because of editions history will show full paragraph replace and paragraphs will lose formatting. What if what we searching will be in each and every paragraph, but only one letter must be changed? We will see full document replacement in history and it will be hard to find what really changed.
My first idea was to compare strings, that replace() gives to me with contents of paragraph then compare symbol after symbol and replace what is different, but i understand, that it will work only if we are sure that only one letter changed. But what if replace will delete/add some words, how it can be synced? It will be a lot bigger problem.
All topics that i've found and read triple times didn't helped and didn't moved me from the dead point.
So, is there any ideas how to beat that problem?
function RegExp_test() {
var docParagraphs = DocumentApp.getActiveDocument().getBody().getParagraphs();
var i = 0, text0, text1, test1, re, rt, count;
// equivalent of .asText() ???
text0 = docParagraphs[i].editAsText(); // obj
// equivalent of .editAsText().getText(), .asText().getText()
text1 = docParagraphs[i].getText(); // str
if (text1 !== '') {
re = new RegExp(/(?:([Ww]o)\d(rd))|(?:([Ss]o)\d(me))|(?:([Hh]e)\d(re))/g); // v1
// re = new RegExp(/(?:([Ww]o)\d(rd))/); // v2
count = (text1.match(re) || []).length; // re v1: 7, re v2: 3
if (count) {
test1 = text1.match(re); // v1: ["Wo1rd", "Wo", "rd", , , , , ]
// for (var j = 0; j < count; j++) {
// test1 = text1.match(re)[j];
// }
text0.replaceText("(?:([Ww]o)\\d(rd))", '\1-A-\2'); // GAS func
// #1: \1, \2 etc - didn't work: " -A- word so2me word he3re last."
test1 = text0.getText();
// js func, text2 OK: "Wo1rd word so-B-me word he3re last.", just in memory now
text1 = text1.replace(/(?:([Ss]o)\d(me))/, '$1-B-$2'); // working with str, not obj
// rt OK: "Wo1rd word so-B-me word he-C-re last."
rt = text1.replace(/(?:([Hh]e)\d(re))/, '$1-C-$2');
// #2: we used capturing groups ok, but replaced whole line and lost all formatting
text0.replaceText(".*", rt);
test1 = text0.getText();
}
}
Logger.log('Test finished')
}
Found a solution. It's a primitive enough but it can be a base for a more complex procedure that can fix all occurrences of capture groups, detect them, mix them etc. If someone wants to improve that - you are welcome!
function replaceTextCG(text0, re, to) {
var res, pos_f, pos_l;
var matches = text0.getText().match(re);
var count = (matches || []).length;
to = to.replace(/(\$\d+)/g, ',$1,').replace(/^,/, '').replace(/,$/, '').split(",");
for (var i = 0; i < count; i++) {
res = re.exec(text0.getText())
for (var j = 1; j < res.length - 1; j++) {
pos_f = res.index + res[j].length;
pos_l = re.lastIndex - res[j + 1].length - 1;
text0.deleteText(pos_f, pos_l);
text0.insertText(pos_f, to[1]);
}
}
return count;
}
function RegExp_test() {
var docParagraphs = DocumentApp.getActiveDocument().getBody().getParagraphs();
var i = 0, text0, count;
// equivalent of .asText() ???
text0 = docParagraphs[i].editAsText(); // obj
if (text0.getText() !== '') {
count = replaceTextCG(text0, /(?:([Ww]o)\d(rd))/g, '$1A$2');
count = replaceTextCG(text0, /(?:([Ss]o)\d(me))/g, '$1B$2');
count = replaceTextCG(text0, /(?:([Hh]e)\d(re))/g, '$1C$2');
}
Logger.log('Test finished')
}

Is there a simple way to have a local webpage display a variable passed in the URL?

I am experimenting with a Firefox extension that will load an arbitrary URL (only via HTTP or HTTPS) when certain conditions are met.
With certain conditions, I just want to display a message instead of requesting a URL from the internet.
I was thinking about simply hosting a local webpage that would display the message. The catch is that the message needs to include a variable.
Is there a simple way to craft a local web page so that it can display a variable passed to it in the URL? I would prefer to just use HTML and CSS, but adding a little inline javascript would be okay if absolutely needed.
As a simple example, when the extension calls something like:
folder/messageoutput.html?t=Text%20to%20display
I would like to see:
Message: Text to display
shown in the browser's viewport.
You can use the "search" property of the Location object to extract the variables from the end of your URL:
var a = window.location.search;
In your example, a will equal "?t=Text%20to%20display".
Next, you will want to strip the leading question mark from the beginning of the string. The if statement is just in case the browser doesn't include it in the search property:
var s = a.substr(0, 1);
if(s == "?"){s = substr(1);}
Just in case you get a URL with more than one variable, you may want to split the query string at ampersands to produce an array of name-value pair strings:
var R = s.split("&");
Next, split the name-value pair strings at the equal sign to separate the name from the value. Store the name as the key to an array, and the value as the array value corresponding to the key:
var L = R.length;
var NVP = new Array();
var temp = new Array();
for(var i = 0; i < L; i++){
temp = R[i].split("=");
NVP[temp[0]] = temp[1];
}
Almost done. Get the value with the name "t":
var t = NVP['t'];
Last, insert the variable text into the document. A simple example (that will need to be tweaked to match your document structure) is:
var containingDiv = document.getElementById("divToShowMessage");
var tn = document.createTextNode(t);
containingDiv.appendChild(tn);
getArg('t');
function getArg(param) {
var vars = {};
window.location.href.replace( location.hash, '' ).replace(
/[?&]+([^=&]+)=?([^&]*)?/gi, // regexp
function( m, key, value ) { // callback
vars[key] = value !== undefined ? value : '';
}
);
if ( param ) {
return vars[param] ? vars[param] : null;
}
return vars;
}

Can Google App Scripts access the location of footnote superscripts programmatically?

Is it possible to use DocumentApp to find the location of footnote references in the body?
Searching the body or an element using editAsText() or findText() does not show the superscript footnote markers.
For example, in the following document:
This is a riveting story with statistics!1 You can see other stuff here too.
body.getText() returns 'This is a riveting story with statistics! You can see other stuff here too.' No reference, no 1
If I want to replace, edit, or manipulate text around the footnote reference (e.g. 1 ), how can I find its location?
It turns out that the footnote reference is indexed as a child in the Doc. So you can get the index of the footnote reference, insert some text at that index, and then remove the footnote from its parent.
function performConversion (docu) {
var footnotes = docu.getFootnotes() // get the footnote
var noteText = footnotes.map(function (note) {
return '((' + note.getFootnoteContents() + ' ))' // reformat text with parens and save in array
})
footnotes.forEach(function (note, index) {
var paragraph = note.getParent() // get the paragraph
var noteIndex = paragraph.getChildIndex(note) // get the footnote's "child index"
paragraph.insertText(noteIndex, noteText[index]) // insert formatted text before footnote child index in paragraph
note.removeFromParent() // delete the original footnote
})
}
You can use getFootnotes() to edit footnotes. getFootnotes() returns an arrays of objects , you need to iterate over them.
You can list the locations (i.e Parent Paragraphs) of footnotes in Logger.log(), in the following fashion:
function getFootnotes(){
var doc = DocumentApp.openById('...');
var footnotes = doc.getFootnotes();
var textLocation = {};
for(var i in footnotes ){
textLocation = footnotes[i].getParent().getText();
Logger.log(textLocation);
}
}
To get the paragraph truncated right upto the footnote superscript. You can use:
textLocation = footnotes[i].getPreviousSibling().getText();
in your case it should return: This is a riveting story with statistics! only this portion, because [1] is just after the word statistics!

Creating a line graph with highcharts and data in an external csv

I've read through the Highcharts how-to, checked the demo galleries, searched google, read the X amount of exact similar threads here on stackoverflow yet I cannot get it to work.
I'm logging data in a csv file in the form of date,value.
Here's what the date looks like
1355417598678,22.25
1355417620144,22.25
1355417625616,22.312
1355417630851,22.375
1355417633906,22.437
1355417637134,22.437
1355417641239,22.5
1355417641775,22.562
1355417662373,22.125
1355417704368,21.625
And this is how far I've managed to get the code:
http://jsfiddle.net/whz7P/
This renders a chart, but with no series or data at all. I think I'm fudging things up while formatting the data so it can be interpreted in highcharts.
Anyone able to give a helping hand?
So, you have the following data structure, right ?
1355417598678,22.25
1355417620144,22.25
1355417625616,22.312
1355417630851,22.375
1355417633906,22.437
1355417637134,22.437
1355417641239,22.5
1355417641775,22.562
1355417662373,22.125
1355417704368,21.625
Then you split it into an array of lines, so each array item is a line.
Then for each line you do the following.
var items = line.split(';'); // wrong, use ','
But there ins't ; into the line, you should split using ,.
The result will be a multidimencional array which each item is an array with the following structure. It will be stored in a var named data.
"1355417598678","22.25" // date in utc, value
This is the expected data for each serie, so you can pass it directly to your serie.
var serie = {
data: data,
name: 'serie1' // chose a name
}
The result will be a working chart.
So everything can be resumed to the following.
var lines = data.split('\n');
lines = lines.map(function(line) {
var data = line.split(',');
data[1] = parseFloat(data[1]);
return data;
});
var series = {
data: lines,
name: 'serie1'
};
options.series.push(series);
Looking at your line.split part:
$.get('data.csv', function(data) {
// Split the lines
var lines = data.split('\n');
$.each(lines, function(lineNo, line) {
var items = line.split(';');
It looks like you are trying to split on a semi-colon (;) instead of a comma (,) which is what is in your sample CSV data.
You need to put
$(document).ready(function() {
in the 1st line, and
});
in the last line of the javascript to make this work.
Could you upload your csv file? Is it identical to what you wrote in your original post? I ran into the same problem, and it turns out there are errors in the data file.