How to extract invoices data from an image in android app? - ocr

My task is to extract text from a scanned document/ JPG and then get only below mentioned 6 values so that I can auto-fill a form-data in my next screen/ activity.
I used google cloud vision api in my android app with a Blaze version(paid), And I got the result as a text block, but I want to extract only some of information out of them, how I can achieve that?
Bills or receipt can be different all the time but I want 6 things out of all the invoices text block for Ex -
Vendor
Account
Description
Due Date
Invoice Number
Amount
Is there any tool/3rd party library available so that I can use in my android development.
Note - I don't think any sample of receipt or bill image needed for this because it can be any type of bill or invoice we just need to extract 6 mentioned things from that extracted text.

In the next scenarios I will create two fictive bill formats, then write the code algorithm to parse them. I will write only the algorithm because I don't know JAVA.
On the first column we have great pictures from two bills. In the second column we have text data obtained from OCR software. It's like a simple text file, with no logic implemented. But we know certain keywords that can make it have meaning. Bellow is the algorithm that translates the meaningless file in a perfect logical JSON.
// Text obtained from BILL format 1
var TEXT_FROM_OCR = "Invoice no 12 Amount 55$
Vendor name BusinessTest 1 Account No 1213113
Due date 2019-12-07
Description Lorem ipsum dolor est"
// Text obtained from BILL format 2
var TEXT_FROM_OCR =" BusinessTest22
Invoice no 19 Amount 12$
Account 4564544 Due date 2019-12-15
Description
Lorem ipsum dolor est
Another description line
Last description line"
// This is a valid JSON object which describes the logic behind the text
var TEMPLATES = {
"bill_template_1": {
"vendor":{
"line_no_start": null, // This means is unknown and will be ignored by our text parsers
"line_no_end": null, // This means is unknown and will be ignored by our text parsers
"start_delimiter": "Vendor name", // Searched value starts immediatedly after this start_delimiters
"end_delimiter": "Account" // Searched value ends just before this end_delimter
"value_found": null // Save here the value we found
},
"account": {
"line_no_start": null, // This means is unknown and will be ignored by our text parsers
"line_no_end": null, // This means is unknown and will be ignored by our text parsers
"start_delimiter": "Account No", // Searched value starts immediatedly after this start_delimiters
"end_delimiter": null // Extract everything untill the end of current line
"value_found": null // Save here the value we found
},
"description": {
// apply same logic as above
},
"due_date" {
// apply same logic as above
},
"invoice_number" {
// apply same logic as above
},
"amount" {
// apply same logic as above
},
},
"bill_template_2": {
"vendor":{
"line_no_start": 0, // Extract data from line zero
"line_no_end": 0, // Extract data untill line zero
"start_delimiter": null, // Ignore this, because our delimiter is a complete line
"end_delimiter": null // Ignore this, because our delimiter is a complete line
"value_found": null // Save here the value we found
},
"account": {
"line_no_start": null, // This means is unknown and will be ignored by our text parsers
"line_no_end": null, // This means is unknown and will be ignored by our text parsers
"start_delimiter": "Account", // Searched value starts immediatedly after this start_delimiters
"end_delimiter": "Due date" // Searched value ends just before this end_delimter
"value_found": null // Save here the value we found
},
"description": {
"line_no_start": 6, // Extract data from line zero
"line_no_end": 99999, // Extract data untill line 99999 (a very big number which means EOF)
"start_delimiter": null, // Ignore this, because our delimiter is a complete line
"end_delimiter": null // Ignore this, because our delimiter is a complete line
"value_found": null // Save here the value we found
},
"due_date" {
// apply same logic as above
},
"invoice_number" {
// apply same logic as above
},
"amount" {
// apply same logic as above
},
}
}
// ALGORITHM
// 1. convert into an array the TEXT_FROM_OCR variable (each index, means a new line in file)
// in JavaScript we would do something like this:
TEXT_FROM_OCR = TEXT_FROM_OCR.split("\r\n");
var MAXIMUM_SCORE = 6; // we are looking to extract 6 values, out of 6
foreach TEMPLATES as TEMPLATE_TO_PARSE => PARSE_METADATA{
SCORE = 0; // for each field we find, we increment score
foreach PARSE_METADATA as SEARCHED_FIELD_NAME => DELIMITERS_METADATA{
// Search by line first
if (DELIMITERS_METADATA['line_no_start'] !== NULL && DELIMITERS_METADATA['line_no_end'] !== NULL){
// Initiate value with an empty string
DELIMITERS_METADATA['value_found'] = '';
// Concatenate the value found across these lines
for (LINE_NO = DELIMITERS_METADATA['line_no_start']; LINE_NO <= DELIMITERS_METADATA['line_no_end']; LINE_NO++){
// Add line, one by one as defined by your delimiters
DELIMITERS_METADATA['value_found'] += TEXT_FROM_OCR[ LINE_NO ];
}
// We have found a good value, continue to next field
SCORE++;
continue;
}
// Search by text delimiters
if (DELIMITERS_METADATA['start_delimiter'] !== NULL){
// Search for text inside each line of the file
foreach TEXT_FROM_OCR as LINE_CONTENT{
// If we found start_delimiter on this line, then let's parse it
if (LINE_CONTENT.indexOf(DELIMITERS_METADATA['start_delimiter']) > -1){
// START POSITION OF OUR SEARCHED VALUE IS THE OFFSET WE FOUND + THE TOTAL LENGTH OF START DELIMITER
START_POSITION = LINE_CONTENT.indexOf(DELIMITERS_METADATA['start_delimiter']) + LENGTH( DELIMITERS_METADATA['start_delimiter'] );
// by default we try to extract all data from START_POSITION untill the end of current line
END_POSITION = 999999999999; // till the end of line
// HOWEVER, IF THERE IS AN END DELIMITER DEFINED, WE WILL USE THAT
if (DELIMITERS_METADATA['end_delimiter'] !== NULL){
// IF WE FOUND THE END DELIMITER ON THIS LINE, WE WILL USE ITS OFFSET as END_POSITION
if (LINE_CONTENT.indexOf(DELIMITERS_METADATA['end_delimiter']) > -1){
END_POSITION = LINE_CONTENT.indexOf(DELIMITERS_METADATA['end_delimiter']);
}
}
// SUBSTRACT THE VALUE WE FOUND
DELIMITERS_METADATA['value_found'] = LINE_CONTENT.substr(START_POSITION, END_POSITION);
// We have found a good value earlier, increment the score
SCORE++;
// break this foreach as we found a good value, and we need to move to next field
break;
}
}
}
}
print(TEMPLATE_TO_PARSE obtained a score of SCORE out of MAXIMUM_SCORE):
}
At the end you will know which template extracted most of the data, and based on this which one to use for that bill. Feel free to ask anything in comments. If I stayed 45 minute to write this answer, I'll surely answer to your comments as well. :)

Related

How to get a value after a character, from a string based on a text match using formula?

I got the following value:
tradicional;cropped$9$10;mullet$5$7
In cell A1, I can choose between tradicional, cropped and mullet. In cell A2, I pick 1, or 2.
If I pick cropped and 2, the value to be returned would be 10.
If I pick mullet and 1, the value to be returned would be 5.
If
I'd go for len and left, but I don't see how this is going to work using the matching criteria.
Here's a practical example: https://docs.google.com/spreadsheets/d/1dFzXmtKj15EzApTKUKv8yF7_mAIB1COPSgMLMMmFE4E/edit?usp=sharing
Appreciate your help.
Description
You can split the text string on semicolon ";" into 3 parts. The depending on which part you choose, you can split it on dollar sign "$" then you can get the "item" and return an integer. I leave it to you to figure out how to incorporate into your script.
Script (Test Case)
function makeAChoice() {
try {
console.log("You chose "+getChoice("cropped",2));
console.log("You chose "+getChoice("mullet",1));
console.log("You chose "+getChoice("somethingelse",1));
}
catch(err) {
console.log(err);
}
}
function getChoice(choice,item) {
try {
var text = "tradicional;cropped$9$10;mullet$5$7";
text = text.split(";");
text = text.filter( s => s.includes(choice) );
if( text.length < 1 ) throw "Error choice ["+choice+"] not found!";
text = text[0].split("$");
return parseInt(text[item]);
}
catch(err) {
console.log(err);
}
}
Console.log
8:32:38 AM Notice Execution started
8:32:38 AM Info You chose 10
8:32:38 AM Info You chose 5
8:32:38 AM Info Error choice [somethingelse] not found!
8:32:38 AM Info You chose undefined
8:32:38 AM Notice Execution completed
Reference
https://www.w3schools.com/jsref/jsref_split.asp
https://www.w3schools.com/jsref/jsref_filter.asp
https://www.w3schools.com/jsref/jsref_includes.asp
https://www.w3schools.com/jsref/jsref_parseint.asp
Per my comments to your original post, I feel that there is a lot we don't know about your bigger goal. But as you aren't able to provide that, this solution will work for your one exact example.
Place the following formula in C4:
=ArrayFormula(IFERROR(VLOOKUP(A4;SPLIT(FLATTEN(SPLIT(E4;";"));"$");B4+1;FALSE)))
(See the new sheet "Erik Help.")
The inner SPLIT splits the E4 string at every semicolon.
FLATTEN sends that all to one column.
The outer SPLIT then splits at each "$".
VLOOKUP can then try to find the Col-A term in the first column of the resulting virtual chart. If found, it will return the column value that matches the Col-B value + 1 (since column 1 of the virtual array is the labels, e.g., 'tradicional,' etc.).
If no match is found for both the Col-A and Col-B data, then IFERROR returns null.

Using DataTables how to display a running total of an amount entered in each row?

http://live.datatables.net/dalogaci/1/edit
I have an amount of money to be dispersed and am using DataTables to display a list of people and allow entry of an amount next to each person (their share of the disbursement). I want to provide a running total of the amount entered into the table so I can warn when the total to be dispersed has been reached or passed.
Kind regards,
Glyn
You can use the following approach.
In my case, I display the running total in a <div>, rather than an input box, as the value is only for display purposes:
<div id="showsum">Grand Total: $0.00</div>
The end result:
The script for this - which I have tried to explain with comments in the code:
<script type="text/javascript">
// define the table variable here so the doSum()
// function will have access to it, when needed:
var table;
// reads each value from the final column in the table, checks
// if the value is a number (as opposed to blank), and then
// keeps a running total. Ensure we round fractions of pennies
// as needed.
//
// When handling money, use a big number library - see this:
// https://stackoverflow.com/questions/1458633/how-to-deal-with-floating-point-number-precision-in-javascript
//
function doSum() {
//var foop = table.columns(5).nodes().to$();
var sum = 0.0;
// this gets each node (cell) in the final column:
table.columns(5).nodes().to$()[0].forEach(function (item) {
// see if the display value is a number (i.e. not blank):
var amt = parseFloat($('input', item ).val());
if (!isNaN(amt)) {
sum += amt;
}
});
// round and display to 2 decimal places:
sum = (Math.round((sum + Number.EPSILON) * 100) / 100).toFixed(2);
$('#showsum').text("Grand Total: $" + sum);
}
$(document).ready(function() {
table = $('#example').DataTable( {
"columnDefs": [ {
"targets": 5,
"data": function ( row, type, val, meta ) {
// note the use of onchange="doSum()" in the following:
return '<input type="number" min="0" max="99999.99" step=".01" placeholder="0.00" onchange="doSum()">';
}
} ]
} );
} );
</script>
For a change to be added to the grand total, you have to hit "enter", or click outside of the input field, if you type the value in manually.
Because you are dealing with money, the code should really be using a "big number" format to eliminate the risk of inaccuracies in fractions of pennies (due to limitations in floating point arithmetic). For example see here.
Final note: I see this question was down-voted. I think that may have been because you only link to your demo code, instead of showing the relevant parts in the question itself. The link to the demo is useful - but showing code in the question itself is generally a "must-do", I think.

How to create an array (or one long string) of hyperlinks in google docs using google script

Or how hyperlink of google docs looks like in raw format.
I tried to do the next thing:
var links;
var nameArr = ["1", "2", "3", "4", "5", "6"];
var tempArr= ["11", "12", "13", "14", "15", "16"];
for (i = 0; i < nameArr.length; i++) {
nameArr[i].setUrlLink("https://en.wikipedia.org/wiki/" + tempArr[i] + "/detection"
links = links + ", "+ nameArr[i];
}
I get an error, as i can't use setLinkUrl on string, only on text object - didn't find a way to cast string into text.
Although, if i paste it "as it", the "http..." shows as a regular string - not a link.
I Want to get something like this:
1, 2, 3 ...... and paste it into google docs document.
Links are "rich" features of the associated element (usually Text). So to add a link to generic text, first you must get the associated Text element, and then invoke setLinkUrl on it.
As with other rich format methods, appended elements inherit the formatting specification of the preceding sibling element. Thus, if you format the last element of a parent, the next element appended to the parent will likely also be formatted in that manner. I explicitly specify a nullstring URL for the separator text to avoid the link extending beyond the actual display text. (This means that if you programmatically append to the Paragraph after calling this function, that appended text will have the same URL as the last display text from your array.)
This simple function takes a Paragraph as input, along with the array of display text and the URIs, and adds them to the end.
/**
* Create links at the end of the given paragraph with the given text and the given urls.
* #param {GoogleAppsScript.Document.Paragraph} pg The paragraph to hold the link array
* #param {string[]} values The display text associated with the given links
* #param {string[]} links The URI for the given link text
* #param {string} [separator] text that should separate the given links. Default is comma + space, `", "`
* #returns {GoogleAppsScript.Document.Paragraph} the input paragraph, for chaining
*/
function appendLinkArray(pg, values, links, separator) {
if (!pg || !values || !links)
return;
if (!values.length || !links.length || values.length > links.length)
throw new Error("Bad input arguments");
if (separator === undefined)
separator = ", ";
// Add a space before the link array if there isn't one at the end of any existing text.
if (pg.getText() && (!pg.getText().match(/ $/) || !pg.getText().match(/ $/).length))
pg.appendText(" ").setLinkUrl("");
// Add each link display text as a new `Text` object, and set its link url.
links.forEach(function (url, i) {
var text = values[i] || url;
pg.appendText(text)
.setLinkUrl(0, text.length - 1, url);
if (separator && i < links.length - 1)
pg.appendText(separator).setLinkUrl("");
});
return pg;
}

Search Array of Strings with Non-sensitivity and Non-exact Match

Notice: I have made a few changes to the original question as my problem was not with commas within string.
I have a function I've been working on to exclude a cell value from a new array that contains a string I am searching for. I am doing this in order to put together a list for .setHiddenValues, since .setVisibleValues is not supported/implemented yet.
Here are my requirements for the sake of clarity:
Currently working:
Able to handle numbers as well as strings
Can search for lowercase and uppercase. visibleValueStr is user inputted so it can't be so sensitive.
colValueArr may have strings with commas within.
Still working on:
visibleValueStr can be a single value or array.
Case sensitivity("apple" to match "Apple")
Not exact matches("apple" to match "apple and banana")
Here is the function I currently have with the above met/unmet conditions:
function getHiddenValueArray(colValueArr,visibleValueArr){
var flatUniqArr = colValueArr.map(function(e){return e[0].toString();})
.filter(function(e,i,a){
return (a.indexOf(e.toString())==i && visibleValueArr.indexOf(e.toString()) == -1);
})
return flatUniqArr;
}
Please let me know what other info I need. I will update this question as I continue to do my research in the meanwhile.
Clarification from comments:
User inputs input(s) on HTML form and the variable is passed on as visibleValueArr.
When using Logger.log(visibleValueArr).
[apple, banana]
When using Logger.log(colValueArr).
[[Apple],[apple][apple][apple and banana],[apple],[banana, and apple],
[apple, and banana],[orange],[orange, and banana],[kiwi],[kiwi, and orange],
[strawberry]]
So when I use:
SpreadsheetApp.newFilterCriteria().setHiddenValues(newArray).build();
newArray should be the hidden values. In this case it should be:
orange
kiwi
kiwi, and orange
strawberry
Basically anything that does not contain what visibleValueArr is.
Instead, it returns all values back, hiding them all.
When I use [Apple, Banana] the "Apple" and "Banana" values are left out of newArray as they should be, but "Apple and Banana" and "Apple, and Banana" are not"
In addition, I would also like to understand what the e,i,a in function(e,i,a) represent. I'm trying to apply .toLowerCase() in different places to see if that resolves part of my issue but I'm not sure where to do it.
Issues:
Case sensitivity("apple" to match "Apple")
Not exact matches("apple" to match "apple and banana")
Solution:
Use regex-search with case insensitivity
Modified Script:
function getHiddenValueArray(colValueArr,visibleValueArr){
/*colValueArr = [["Apple"],["apple"],["orange"],["Apple, and Banana"]];
visibleValueArr = ['apple','banana'];*/
var flatUniqArr = colValueArr.map(function(e){return e[0].toString();})
.filter(function(e,i,a){
return (a.indexOf(e)==i && !(visibleValueArr.some(function(f){
return e.search(new RegExp(f,'i'))+1;
})));
});
//Logger.log(flatUniqArr); will log orange
return flatUniqArr;
}
References:
String#search
Array#some
Array#filter
Array#map

ActionScript3 - add thousands separator to negative values

This question relates to an animated map template which we have developed at the UKs Office for National Statistics. It has been applied to many datasets and geographies many uses without problem. For example,
http://www.ons.gov.uk/ons/interactive/vp3-census-map/index.html
http://www.statistica.md/pageview.php?l=ro&idc=390&id=3807
The .fla calls on a supporting .as file (see below) to introduce a thousand separator (in the UK a comma, in Germany a full stop (period) defined elsewhwere.
However, the dataset I am currently mapping has large negative values, and it tutrns out that the ORIGINAL HELPER FUNCTION below does not like negative values with 3, 6, 9 or 12 (etc) digits.
-100 to -999 for instance are rendered NaN,100 to NaN,999.
This is because such values are recognised as being 4 digits long. They are being split, the comma introduced, and the -ve sign is misunderstood.
I reckon the approach must be to use absolute values, add in the comma and then (for the negative values) add the -ve sign back in afterwards. But so far, trials of the ADAPTED HELPER FUNCTION have produced only error. :-(
Can anyone tell me how to put the -ve sign back in , please?
Many thanks.
Bruce Mitchell
==================================================================================
//ORIGINAL HELPER FUNCTION: ACCEPTS A NUMBER AND RETURNS A STRING WITH THOUSANDS SEPARATOR ATTACHED IF NECESSARY
function addThouSep(num) {
/*
a. Acquire the number - 'myTrendValue' or 'myDataValue' - from function calcValues
b. Record it (still as a number) to data precision.
1. Turn dataORtrend into a string
2. See if there is a decimal in it.
3. If there isn't, just run the normal addThouSep.
4. If there is, run addThouSep just on the first bit of the string - then add the decimal back on again at the end.
*/
var myNum:Number = correctFPE(num); // Create number variable myNum and populate it with 'num'
// (myTrendvalue or myData Value from calcValues function) passed thru 'correctPFE'
var strNum:String = myNum+""; // Create string version of the dataORtrend number - so instead of 63, you get '63'
var myArray = strNum.split("."); // Create array representing elements of strNum, split by decimal point.
//trace(myArray.length); // How long is the array?
if (myArray.length==1) { // Integer, no decimal.
if (strNum.length < 4)//999 doesn't need a comma.
return strNum;
return addThouSep(strNum.slice(0, -3))+xmlData.thouSep+strNum.slice(-3);
}
else { // Float, with decimal
if (myArray[0].length < 4)//999 doesn't need a comma
return strNum;
return (addThouSep(myArray[0].slice(0, -3))+xmlData.thouSep+myArray[0].slice(-3)+"."+myArray[1]);
}
}
==================================================================================
//ADAPTED HELPER FUNCTION: ACCEPTS A NUMBER AND RETURNS A STRING WITH THOUSANDS SEPARATOR ATTACHED IF NECESSARY
function addThouSep(num) {
/*
a. Acquire the number - 'myTrendValue' or 'myDataValue' - from function calcValues
b. Record it (still as a number) to data precision.
1. Turn dataORtrend into a string
2. See if there is a decimal in it.
3. If there isn't, just run the normal addThouSep.
4. If there is, run addThouSep just on the first bit of the string - then add the decimal back on again at the end.
*/
var myNum:Number = correctFPE(num); // Create number variable myNum and populate it with 'num'
// (myTrendvalue or myData Value from calcValues function) passed thru 'correctPFE'
var myAbsNum:Number = Math.abs(myNum); // ABSOLUTE value of myNum
var strNum:String = myAbsNum+""; // Create string version of the dataORtrend number - so instead of 63, you get '63'
var myArray = strNum.split("."); // Create array representing elements of strNum, split by decimal point.
//trace(myArray.length); // How long is the array?
if (myNum <0){ // negatives
if (myArray.length==1) { // Integer, no decimal.
if (strNum.length < 4)//999 doesn't need a comma.
return strNum;
return addThouSep(strNum.slice(0, -3))+xmlData.thouSep+strNum.slice(-3);
}
else { // Float, with decimal
if (myArray[0].length < 4)//999 doesn't need a comma
return strNum;
return (addThouSep(myArray[0].slice(0, -3))+xmlData.thouSep+myArray[0].slice(-3)+"."+myArray[1]);
}
}
else // positive
if (myArray.length==1) { // Integer, no decimal.
if (strNum.length < 4)//999 doesn't need a comma.
return strNum;
return addThouSep(strNum.slice(0, -3))+xmlData.thouSep+strNum.slice(-3);
}
else { // Float, with decimal
if (myArray[0].length < 4)//999 doesn't need a comma
return strNum;
return (addThouSep(myArray[0].slice(0, -3))+xmlData.thouSep+myArray[0].slice(-3)+"."+myArray[1]);
}
}
==================================================================================
If you're adding commas often (or need to support numbers with decimals) then you may want a highly optimized utility function and go with straightforward string manipulation:
public static function commaify( input:Number ):String
{
var split:Array = input.toString().split( '.' ),
front:String = split[0],
back:String = ( split.length > 1 ) ? "." + split[1] : null,
pos:int = input < 0 ? 2 : 1,
commas:int = Math.floor( (front.length - pos) / 3 ),
i:int = 1;
for ( ; i <= commas; i++ )
{
pos = front.length - (3 * i + i - 1);
front = front.slice( 0, pos ) + "," + front.slice( pos );
}
if ( back )
return front + back;
else
return front;
}
While less elegant it's stable and performant — you can find a comparison suite at my answer of a similar question https://stackoverflow.com/a/13410560/934195
Why not use something simple like this function I've made?
function numberFormat(input:Number):String
{
var base:String = input.toString();
base = base.split("").reverse().join("");
base = base.replace(/\d{3}(?=\d)/g, "$&,");
return base.split("").reverse().join("");
}
Tests:
trace( numberFormat(-100) ); // -100
trace( numberFormat(5000) ); // 5,000
trace( numberFormat(-85600) ); // -85,600
Explanation:
Convert the input number to a string.
Reverse it.
Use .replace() to find all occurrences of three numbers followed by another number. We use $&, as the replacement, which basically means take all of those occurences and replace it with the value we found, plus a comma.
Reverse the string again and return it.
Did you try using the built in Number formatting options that support localized number values:
Localized Formatting with NumberFormatter