String replacement in OCR'd purchase receipts - google-apps-script

What I have is an OCR'd Walmart receipt in a Google Document (Walmart allows you to email to yourself a .jpg version of your receipt, and this image can be opened with Google Docs, during which it applies OCR to extract text. The result is excellent with very few errors.)
Here is a link to the actual receipt from the OCR:
https://docs.google.com/document/d/1zSV09UGajna4DPtrHUrB6F82NugpYjaftMjomoKVXpE/edit?usp=sharing
I have OCR'd hundreds of Walmart receipts using Google Docs. The OCR'd document suffers from some formatting issues, so I have written some code to help regularize it, as a part of a larger goal to get all of my receipts into a database.
While I am able to solve many of the formatting replacements, I am stuck trying to replace the percent sign that comes after "TAX 1" and "TAX 2" with a tab character, so that I can then push down the "TAX 2" entry to a new line. I noticed that the % sign will always be followed by a newline character and then the actual numerical value of the tax (for both "TAX 1" and "TAX 2") on the next line:
Example OCR Text:
SUBTOTAL 126.61 TAX 1 6.750 %
7.78 TAX 2 2.000 %
0.23 TOTAL 134.62
Desired Output Text:
SUBTOTAL 126.61
TAX 1 6.750 % 7.78
TAX 2 2.000 % 0.23
TOTAL 134.62
Objective:
Each (SUBTOTAL, TAX 1, TAX 2, and TOTAL) gets a new line. (this works)
There should be a tab after each (SUBTOTAL, TAX 1, TAX 2, and TOTAL) so that the numeric value for each is a tab-stop away. (this works)
I would like to replace the (space+percent sign+newline character) with just a percent sign and a tab (thinking the 7.78 should "rise" one line up as the newline character is taken out (this is what is failing).
I can do this using the CTRL-F "Find and Replace" menu in the Google Docs UI, using regex options without any problem, but I cant write an Apps Script function to do the same. I have searched everywhere. I realize that the RegEx in GAS is limited. But I don't know enough to know if that is my problem AND what a workaround could be. Likewise, I don't know enough of RegEx to know if the limited version GAS supports is not the cause of my problem (e.g. am I overlooking something).
Here's the code excerpt I use for formatting:
var body = DocumentApp.getActiveDocument().getBody();
/**
* other formatting stuff
*/
//Find SUBTOTAL, remove the space before SUBTOTAL and move it down one line.
body.replaceText(' SUBTOTAL', '\n\nSUBTOTAL\t');
//Find TAX 1, remove the space before TAX 1 and move it down one line.
body.replaceText(' TAX 1', '\nTAX 1\t');
//Find TAX 2, remove the space before TAX 2 and move it down one line.
body.replaceText(' TAX 2', '\nTAX 2\t');
//Find TOTAL, replace it.
body.replaceText('TOTAL', '\nTOTAL\t');
//Find PERCENT SIGN AND ADD A NEWLINE AFTER IT, replace it all with a tab character.
body.replaceText("[ %\n]","\t");
The first 4 replaces work great. Its the last one (the percent sign) that doesnt work. I've tried to escape that percent sign like this:
body.replaceText("[ \%\n]","\t");
and
body.replaceText("[ \\%\n]","\t");
I've tried to remove the braces like this:
body.replaceText(" \%\n","\t");
and
body.replaceText(" \\%\n","\t");
But each gives different results, frankly - messing up the entire receipt text badly.
So the percent sign is the problem - I think.
How can I fix the formatting for the "TAX 1" and "TAX 2" lines?
Example fulltext OCR'd receipt: https://docs.google.com/document/d/1zSV09UGajna4DPtrHUrB6F82NugpYjaftMjomoKVXpE/edit?usp=sharing

You want to replace the text of shared Document to the values you want. The values you want can be retrieved by the script in my comment. If my understanding is correct, how about this?
In the sample script in my comment, the whole text is retrieved and replace to the values you want. When this situation is reflected to the Document, how about this sample script? When you use this script, please do as follows.
Open the shared document.
Open script editor.
Copy and paste the sample script to the script editor.
Run myFunction().
Authorize the scopes.
See the Document.
Sample script :
function myFunction() {
var body = DocumentApp.getActiveDocument().getBody();
var newText = body.getText()
.match(/(SUBTOTAL[\s\S]+?)VISA/)[1]
.replace(/TAX/g, "\nTAX")
.replace(/ TOTAL/g, "\nTOTAL\t")
.replace(/%\n/g, "%\t");
body.clear();
body.setText(newText);
}
If this was not what you want, I'm sorry.

Related

Google Sheets - Combine multiple IF Functions into one cell

I'm trying to produce a SKU in Google Sheets for a product using the values of three variants (Title, Colour and Size)
The product is 'Lightweight trainers' with colour variants of 'Red' and 'Blue', and the sizes range from 5 - 12.
Link to spreadsheet
https://docs.google.com/spreadsheets/d/1trq0X3MjR-n2THFnT8gYYlwKscnQavCeeZ8L-ifYaHw/edit?usp=sharing
Aim
I'm hoping to have a SKU that displays the product, the colour variant and the shoes size.
Example: LW-1-8 (Lightweight trainer, colour Red, size 8)
Product is Lightweight Trainers with a value of LW.
Colour variant 'Red' with a value of 1 and 'Blue' with a value of 2.
Shoe size variant = number ranging from 5 to 12.
Here's what I have so far, joining the colour and size variants.
=IFS(I2="Red",1,I2="Blue",2)&"-"& IFS(K2="5",5,K2="6",6,K2="7",7,K2="8",8,K2="9",9,K2="10",10,K2="11",11,K2="12",12)
However, I'm getting stuck in joining the data in column B with this function.
Any help with combining this data from multiple cells into one would be greatly appreciated.
TL;DR
=ARRAYFORMULA(IF(B2:B<>"", IFS(B2:B="Lightweight Trainers", "LW")&"-"&IFS(I2:I="Blue", 1, I2:I="Red", 2)&"-"&K2:K,))
Answer
What you want is basically:
<title>-<color number>-<shoe size>
To convert this to a function we can split it into each part and take it step by step:
Step 1: Title
For the first part -the title- we need to match the value with the shorthand. A simple list in an IFS is enough.
IFS(B2="Lightweight Trainers", "LW")
Obviously for now it only has a single value (Lightweight Trainers) but you could add more:
IFS(B2="Lightweight Trainers", "LW", B2="Heavyweight Trainers", "HW")
Step 2: color number
Similar to the previous step, it’s a mapping using ifs:
IFS(I2="Blue", "-1", I2="Red", "-2")
The dash is added so when adding everything it will only have it if
Step 3: shoe size
In this case we can simply get the value:
K2
Step 4: Adding everything together
We only need to add it with the dashes in between:
=IFS(B2="Lightweight Trainers", "LW")&"-"&IFS(I2="Blue", 1, I2="Red", 2)&"-"&K2
Step 5: Extending for the entire column automatically
We will use ARRAYFORMULA to add a single formula to the first cell and get it automatically extended to the entire column. We first add it to the formula we already have, and then extend the ranges to the entire column:
=ARRAYFORMULA(IFS(B2:B="Lightweight Trainers", "LW")&"-"&IFS(I2:I="Blue", 1, I2:I="Red", 2)&"-"&K2:K)
Remember to remove all the values in the column so array formula doesn’t override them (it would generate an error).
As you can see the formula generates errors for the rows that have no values. A good way of handling this case is to filter the rows without a title. In a single row would be:
=IF(B2<>"", [the entire formula],)
Notice the last comma.
So putting everything together and extending its range to the column, is:
=ARRAYFORMULA(IF(B2:B<>"", IFS(B2:B="Lightweight Trainers", "LW")&"-"&IFS(I2:I="Blue", 1, I2:I="Red", 2)&"-"&K2:K,))
Adding this to N2 should work.
Final notes
It seems that you use 150 when the size it’s not a whole number. If you want to keep that functionality you may use:
IF(K2-int(K2)=0, K2, 150)
On the last component and expand it the same way.
You may also want to prevent having two dashes when a value is missing (LW-5 instead of LW--5). To do so, I’d recommend adding it to each component instead of the formula that adds them together.
References
IFS (Docs Editors Help)
IF (Docs Editors Help)
ARRAYFORMULA (Docs Editors Help)
try in N2:
=IFS(I2="Red",1,I2="Blue",2)&"-"&
IFS(K2=5,5,K2=6,6,K2=7,7,K2=8,8,K2=9,9,K2=10,10,K2=11,11,K2=12,12)
or use:
=IF(I2="red", 1, IF(I2="blue", 2, )&IF((K5>=5)*(K5<=12), "-"&K5, )

Customize the vAxis values in a Google Sheets Line Graph

I'm making a Sheet to be used by elementary students, to track some energy "usage" in their class, based on a date. To make it dead easy, I've created dropdowns for their choices (text).
In order to make a graph, I've changed the choices into numbers ("All"=3, "Some"=2, "None"=1, "N/A"=0) onto another tab (using Apps Script). This makes a nice graph, but the vertical axis of course shows the numbers. I'm hoping there is a way to swap them out for the text.
I've tried the 'ticks' option, but nothing changes:
var vAxisOptions = {
0: {
ticks: [{v:0, f:'N/A'}, {v:1, f:'None'}, {v:2, f:'Some'}, {v:3, f:'All'}, {v:4, f:''}],
maxValue: 4,
gridlines: {count: 5} //add an extra line of space to see the lines better
}
};
And then apply it by .setOption('vAxes', vAxisOptions).
I suspect this just isn't possible, but is it? Thanks!!
Example: https://docs.google.com/spreadsheets/d/1zOeXJy92LdCmhdLW6MmLA0JCNVk34kk50B3tQGACwlY/edit?usp=sharing
p.s. Click the "View Results" button to make the graph if you make data changes
There is this Google Issue tracker issue on this matter that it is being worked on.
You can go there and click on the star next to the title of the issue so you will get updates on the issue.

Trying to have + on positive and - on negative

This is the code I'm trying to make work...
"=IF (C11>0, "+",OTHERWISE "")
Trying to make it so that if the cell nect to it "C11" is over "0" It gets a "+" sign, otherwise if it's a negative number, Ex "-5", nothing will show up since the sign is negative.
This is my workaround to having to insert the apostrophe in front of the + everytime I want it to show up outside of a formula interaction.
Select every cell that you want to format (use ctrl-a to select all cells).
In the Format menu mouse-over Number, then mouse-over More Formats and select Custom Number Format. In the box presented to you type +0.00;-0.00.
Press Apply. This will format your numbers as desired.
See the Google Sheets documentation for more details.

Way to add a calculated number of X's to a form input?

I have certain product codes with varying number of letters/digits e.g. 53HD6J, HH88WBD3 (varies between 5 to 10 letters/digits). In order for our barcode to scan these correctly there has to be 13 letters/digits. I don't want to make the user to input -XXXX after each code but rather have Access calculate the difference between 13 and the length of the code and fill the remaining with a X's. Is this possible either by vba or and expression?
I currently am using about 6 IIFs in one formula to fill remaining blanks with X's but hoping there is an easier way.
I have a form to enter in the batch number (product code). Once that form is submitted it links to a report that is printed. On the report are those batch numbers (53HD6J, HH88WBD3). The spot I want to have this feature is in a text box right next to the codes where Access determines the length of the codes and computes the remaining X's to add. This is in barcode font so this text box is where the 53HD6JXXXXXXX would go. Hope that clears it up!
So I have that part figured out. My problem now is my barcode font reads the text no matter what and translates it still so barcode shows up when the batch number is blank (I have four spots for batch codes to be inputted). So what I had before was =IIf([Text31]="",""&[Text31]&"","") which seemed to work. Hopefully I can continue this with the new formula. If that's unclear let me know.
**(The "" & & "" is so the barcode can be scanned).
My formula was wrong right above with the IIf. I figured it out! Forgot I had used ' Like "*" '. Thanks!
You can do what you want with String() and Left().
Here is an example from the Access Immediate window:
product_code = "53HD6J"
? product_code & String(13, "X")
53HD6JXXXXXXXXXXXXX
? Left(product_code & String(13, "X"), 13)
53HD6JXXXXXXX
Based on the update to your question, I think you can use that approach for the Control Source of a text box where you want to display the "expanded" product code.
Pretend your report has a text box named txtProduct_code where the raw product code, such as 53HD6J, is displayed. And there is a second text box where you want to display that value with the required number of X characters (53HD6JXXXXXXX).
Use this as the Control Source property of that second text box:
= Left([txtProduct_code] & String(13, "X"), 13)
Alternatively, you could make it a field expression in the report's Record Source query.
SELECT
product_code,
Left(product_code & String(13, "X"), 13) AS expanded_product_code
FROM YourTable;

Google Apps Script Chart Use row 1 as headers

I have the following chart:
var ss = ... // get sheet
var chart = ss.newChart().asLineChart();
chart.setTitle("My Title");
chart.addRange(dataSheet.getRange(1, 1, ss.getLastRow(), 4));
chart.setColors(["#3c78d8", "#a61c00", "#38761d"]);
chart.setXAxisTitle("X Title");
chart.setYAxisTitle("Y Title");
chart.setPosition(1, 1, 0, 0);
chart.setCurveStyle(Charts.CurveStyle.SMOOTH);
ss.insertChart(chart.build());
This code will use the show the first row as part of the data, instead of using it to label the legend. As a side note, if I use asColumnChart instead of line chart, it does the right thing.
How can I tell a chart to use the first row as headers using Google Apps Script?
First off - your code is OK. I took a copy and made minor changes to it just to use a single sheet for illustration:
The problem is likely with your data, specifically the first row. Here's that same example, with some of the "Headers" changed from text into numbers. I believe it is behaving in the bad way you're describing.
One more time, but with the numbers in the header forced to be text. (In the spreadsheet, I input them as '4 and '5 - the leading single quote tells Spreadsheet to treat as text.)
Just take a look at the data types in your first row, and make sure that they are Text.