Anchor base not working in uipath for pdf extraction - ocr

I want to extract certain text and numbers from a PDF invoice, one of which is the total amount. The thing is that the position of the total amount keeps on changing from pdf to pdf based on how many number of items are there. If there are lot of item then the total amount field will be lower in the pdf and if the number of items are less then the total amount will be higher up in the pdf. See below image for ref. There are only 2 items in the invoice so the total field is at a higher position. But I also have invoices where there are 15 items in the invoice and the total field is either lower in the page or is in the next page.
How do I extract it then? I tried using Anchor base but it is not working!
This is the work I have done till now:
1.) Assign a for loop to open each and every pdf in the folder one by one.
2.) for each pdf, I have assigned a hot key which fits one full page to the window.
3.) Then I am using Anchor Base (total in the image f=given below is the anchor and the amount is the value to be extracted).
4.) Using a message box to print the value
5.) close the pdf

Two potential solutions.
Use UiPath Document Understanding
You can get a certain amount of DU Data on the Community License, then you can setup the templates and use anchor bases, token selection, custom area selectors etc.
Read Lines Approach
Convert the PDF to Text.
Have a look through the extracted Text and find a phrase/keyword that you could use as your anchor. Going by your example you might you "Total: "
Then use Invoke Code (I'll use C# for below example)
Arguments: in_text (the text from the PDF) | out_totalAmount
Code:
var invoiceTotal = File.ReadLines(in_text).Last(e => e.StartsWith("Total: ")).Trim();
out_totalAmount = invoiceTotal.Split(new []{":"}).LastOrDefault();

Related

SharePoint (lists) - Multiline clamping (display only the first line)

I have a column that contains multiline data and I am trying to keep only the first line visible and hide the rest so I can have a nice displayed list with minimum gap between the lines. Ideally a JSON script to customize the column formatting. I tried to keep only a certain number of characters visible, works but then again the problem is that when you open the record, the information in that multiline field is not showing but only the number of characters we limited to be displayed in the first place.
I was thinking about putting the value of the first line in a different column (single line text) and have the rest in a different column, that i can just hide. but when i hide that column, i am not able to use my power automate flow as it detects that it is not available (...) if i can get that to work then my problem is resolved.
Appreciate everyone's inpu
There is no such function to hide a part of the multiline. We can only show or hide the whole item in the column. SharePoint will display top 5 lines and hide the others as default.
Elaborating on the comment: Power Automate does not have to work with the view that the user sees in the SharePoint browser interface. You can create a new view in SharePoint and include only the columns you need for the workflow. Give the view a nice, descriptive name.
Then, in Power Automate you can use the Get Items command and under Advanced options > Limit Columns by View specify which view you want to use to return the list items. In the screenshot, I'm using a view I called wfView. This will return the columns of that view, plus some of the SharePoint default fluff.

SSRS Tablix Group Reset Page Number and Page Name not working when Exported to Word

I have a SSRS (.rdl) report with a tablix whose details group is set to put a page break between group items, reset the page number and set the page name (as per http://blogs.msdn.com/b/robertbruckner/archive/2010/04/25/report-design-reset-page-number-on-group.aspx).
This works correctly when rendered to HTML or as a PDF.
When rendered to Word the page numbers do not reset and the page name never changes (the page name is always the value set on the first page). The page breaks work as expected.
I have read (at https://msdn.microsoft.com/en-us/library/dd283105.aspx#ReportHeadersFooters) that complex expressions must be converted into runs of simple expressions in order to display correctly when exporting to Word. I have done this but the problem persists.
Is there any way to make the tablix group reset page number and page name functionality work when exporting to Word?
If not is there a way of achieving the same effect when exporting a report to Word from SSRS?
This is an older question but recently ran into this issue myself so it might help someone else.
There are a lot of examples out there that use casting in the examples for the page number display (CStr or ToString()). Whenever I invoked those methods, the counts would be off when exporting to Word (either doc or docx).
The only way I could get it to work is with three separate text boxes in the footer with these expressions:
=Globals!PageNumber
"of"
=Globals!TotalPages
Avoid the use of those other approaches. Three separate text boxes was the only way I could get this to work.

Export to swf image to pdf or png

I made a simple certificate maker application using Adobe Flash & action script for calculations, that has 2 frames which takes input from user, eg. on frame1, it takes basic information like name, address, phone number etc. then user click next button to navigate on 2nd frame, then it takes some numerical data, like marks in some subjects for first, second and third terminal examinations and again pressing the next button navigates the user to 3rd frame which shows the final certificate after some calculations on numerical data to calculate grades and cumulative percentages. Since this is an swf movie, I right click on the final result and print it.. There is no mechanism to save it for future re-printing/viewing (read-only). I was thinking to give a button for save that could convert the final output with a background image & some dynamic fields to pdf or png. How can I do it?
Use AlivePDF, you can find it here:
http://alivepdf.bytearray.org/
Example code how to use it can be viewed here
http://snipplr.com/view/45819/

Some of the page numbers are repeated when exporting to word

I have a report with 50 pages. I have one list control, and this list control contains a table control. I am using grouping in the list, with a page break at the end. I want each group on one page. For example, if the size of my data is bit more for some of the pages then it moves to the second page.
The problem is I am getting some of the pages with the same number where data for one group is more than one page. I am using this expression in the footer:
format(Globals!PageNumber & "of" & Globals!TotalPages)
When I have 50 pages then after exporting to Word I will get 45, becuase 5 or 6 pages will be repeated, giving "1 of 45" instead of "1 of 50".
Note that I am using SSRS 2005.
This is an older question but recently ran into this issue myself so it might help someone else.
There are a lot of examples out there that use casting in the examples for the page number display (CStr or ToString()). Whenever I invoked those methods, the counts would be off when exporting to Word (either doc or docx).
The only way I could get it to work is with three separate text boxes in the footer with these expressions:
=Globals!PageNumber
"of"
=Globals!TotalPages
Avoid the use of those other approaches (including the wrapping Format function -- I don't think that's doing anything). Three separate text boxes was the only way I could get this to work.

Mimicking Spreadsheet Style in a MS-Access Report

I've been tasked with creating a report in MS-Access that looks exactly like a spreadsheet that a vendor supplies to us for my company to fill in.
The number of records per page is about 40 and there are usually 3-6 pages that need to be prepared. Each month there is a new report sent out and I just got finished writing it all in manually while looking at a report I generated. The purpose of this is to avoid manually transcribing the data.
They are adamant about using their format and will not accept a different report, so I'm trying to be sneaky about it.
Problems
I can duplicate the header of the spreadsheet and the rows just fine, I've just run into a few snags.
Blank rows need to be displayed on the last page of the report instead of nothing being printed (whitespace) and then the page footer.
Whitespace that exists between the Details and the Page Footer is present. The page footer should instead appear to be another row of cells, except that it has the text Page Total and the page total on that row.
The second item happens because the Page Footer always appears at the bottom of the page in a set location as opposed to where the records ended (even if they took up the entire page).
Ideas
If there is someway I could create a
group based on page, then I could
stick that right after the details
section so that it would line up
nicely as opposed to the page total
and still be able to display the page
total.
Inserting blank rows into the rows to
match the number of records, is this
possible? I could calculate how many
extra rows I would need to complete
the page, but how would I insert
those rows into the data source?
Creating a new excel spreadsheet from a template and just writing to there the rows.
I'm using MS-Access 2007 here with a MS-Access 2003 MDB.
Any help is greatly appreciated.
If you need gridlines to print at the end of an Access report, one option is to create a background bitmap that you insert into the report's picture property.
This would be rather fussy, as you could use it only if your headers and footers are identical on all pages, and you'd have to be sure that controls entirely cover the whole detail area so that the background graphic will not show through except on pages where there is blank space. Also, if you altered the width of your detail fields, you'd need to edit the graphic to harmonize with those changes.
Let me just say that I consider the insistance on replicating the look of the spreadsheet to be incredibly boneheaded stupid. What purpose is served by these gridlines except to replicate the visual appearance of a spreadsheet? Are they going to use the grid to write things in? If not, then it's just a really idiotic requirement.
Start by turning a copy of their Excel report into a template file. Remove the data, but keep headers, formating, and formulas as needed (Some data manipulation will be easier in Access.).
This way you can enter and store data in Access. Instead of having users fill-in the spreadsheet in Excel with VBA based on the template file.
You'll run into different issues of how to place the results of a query to a worksheet and filling in formulas in specific fields, etc., but those can be later questions to post.