How to use name of child element in PLC tag? - osisoft

I have multiple child elements with different numbers:
ex:
001_Child1
030_Child2
109_Child3
I am trying to create an attribute within a library that will allow me to essentially take the number at the beginning of the child name and assign it to a point within a PLC tag.
Ex:
001_Child1 looks to: PLC(1).Cnt
030_Child2 looks to: PLC(30).Cnt
109_Child3 looks to PLC(109).Cnt
I have tried several Left commands trying to replace 0's and end up just completely removing the 0's in places I dont want, ex: 030 becomes 3 and 109 becomes 19.
I have also tried it so i remove just the first two digits, which obviously results in 0 for 030 and 9 for 109. I am just at a loss as to what to do, considering I want to stick to one attribute for the entire thing.
Does anyone know what to do/can help?

Solution I came up with:
Attribute 1:
Left(%element%, 3)
Results in 001, 030, 109 etc.
Attribute 2:
Replace('Attribute1', "0", " ")
Results in " 1" or " 3 " or "1 9"
Attribute 3:
LTrim('Attribute2')
Removes all the blanks on the left side.
Results in "1", "3 ", "1 9"
Attribute 4:
Replace('Attribute3', " ", "0")
Replaces all blank spaces with a zero
Results in "1", "30", "109"
So far so good!

Related

How to add more XPATH in parsefilter.json in stormcrawler

I am using stormcrawler (v 1.16) & Elasticsearch(v 7.5.0) for extracting data from about 5k news websites. I have added some XPATH patterns for extracting author name in parsefilter.json.
Parsefilter.json is as shown below:
{
"com.digitalpebble.stormcrawler.parse.ParseFilters": [
{
"class": "com.digitalpebble.stormcrawler.parse.filter.XPathFilter",
"name": "XPathFilter",
"params": {
"canonical": "//*[#rel=\"canonical\"]/#href",
"parse.description": [
"//*[#name=\"description\"]/#content",
"//*[#name=\"Description\"]/#content"
],
"parse.title": [
"//TITLE",
"//META[#name=\"title\"]/#content"
],
"parse.keywords": "//META[#name=\"keywords\"]/#content",
"parse.datePublished": "//META[#itemprop=\"datePublished\"]/#content",
"parse.author":[
"//META[#itemprop=\"author\"]/#content",
"//input[#id=\"authorname\"]/#value",
"//META[#name=\"article:author\"]/#content",
"//META[#name=\"author\"]/#content",
"//META[#name=\"byline\"]/#content",
"//META[#name=\"dc.creator\"]/#content",
"//META[#name=\"byl\"]/#content",
"//META[#itemprop=\"authorname\"]/#content",
"//META[#itemprop=\"article:author\"]/#content",
"//META[#itemprop=\"byline\"]/#content",
"//META[#itemprop=\"dc.creator\"]/#content",
"//META[#rel=\"authorname\"]/#content",
"//META[#rel=\"article:author\"]/#content",
"//META[#rel=\"byline\"]/#content",
"//META[#rel=\"dc.creator\"]/#content",
"//META[#rel=\"author\"]/#content",
"//META[#id=\"authorname\"]/#content",
"//META[#id=\"byline\"]/#content",
"//META[#id=\"dc.creator\"]/#content",
"//META[#id=\"author\"]/#content",
"//META[#class=\"authorname\"]/#content",
"//META[#class=\"article:author\"]/#content",
"//META[#class=\"byline\"]/#content",
"//META[#class=\"dc.creator\"]/#content",
"//META[#class=\"author\"]/#content"
]
}
},
I have also made change in crawler-conf.yaml and it is as shown below.
indexer.md.mapping:
- parse.author=author
metadata.persist:
- author
The issue i am facing is : I am getting result only for 1st pattern (i.e. "//META[#itemprop="author"]/#content") of "parse.author". What changes I should do so that all patterns can be taken as input.
What changes I should do so that all patterns can be taken as input.
I read this as "How can I make a single XPath expression that tries all different ways an author can appear in the document?"
Simplest approach: Join the all expressions you already have into a single one with the XPath Union operator |:
input[...]|meta[...]|meta[...]|meta[...]
And since this potentially selects more than one node, we could state explicitly that we only care for the first match:
(input[...]|meta[...]|meta[...]|meta[...])[1]
This probably works but it will be very long and hard to read. XPath can do better.
Your expressions are all pretty repetitive, that's a good starting point to reduce the size of the expression. For example, those two are the same, except for the attribute value:
//meta[#class='author']/#content|//meta[#class='authorname']/#content
We could use or and it would get shorter already:
//meta[#class='author' or #class='authorname']/#content
But when you have 5 or 6 potential values, it still is pretty long. Next try, a predicate for the attribute:
//meta[#class[.='author' or .='authorname']]/#content
A little shorter, as we don't need to type #class all the time. But still pretty long with 5 or 6 potential values. How about a value list and a substring search (I'm using / as a delimiter character):
//meta[contains(
'/author/authorname/',
concat('/', #class, '/')
)]/#content
Now we can easily expand the list of valid values, and even look at different attributes, too:
//meta[contains(
'/author/authorname/article:author/',
concat('/', #class|#id , '/')
)]/#content
And since we're looking for almost the same possible strings across multiple possible attributes, we could use a fixed list of values that all possible attributes are checked against:
//meta[
contains(
'/author/article:author/authorname/dc.creator/byline/byl/',
concat('/', #name|#itemprop|#rel|#id|#class, '/')
)
]/#content
Combined with the first two points, we could end up with this:
(
//meta[
contains(
'/author/article:author/authorname/dc.creator/byline/byl/',
concat('/', #name|#itemprop|#rel|#id|#class, '/')
)
]/#content
|
//input[
#id='authorname'
]/#value
)[1]
Caveat: This only works as expected when a <meta> will never have both e.g. #name and #rel, or if, that they at least both have the same value. Otherwise concat('/', #name|#itemprop|#rel|#id|#class, '/') might pick the wrong one. It's a calculated risk, I think it's not usual for this to happen in HTML. But you need to decide, you're the one who knows your input data.

String replacement in OCR'd purchase receipts

What I have is an OCR'd Walmart receipt in a Google Document (Walmart allows you to email to yourself a .jpg version of your receipt, and this image can be opened with Google Docs, during which it applies OCR to extract text. The result is excellent with very few errors.)
Here is a link to the actual receipt from the OCR:
https://docs.google.com/document/d/1zSV09UGajna4DPtrHUrB6F82NugpYjaftMjomoKVXpE/edit?usp=sharing
I have OCR'd hundreds of Walmart receipts using Google Docs. The OCR'd document suffers from some formatting issues, so I have written some code to help regularize it, as a part of a larger goal to get all of my receipts into a database.
While I am able to solve many of the formatting replacements, I am stuck trying to replace the percent sign that comes after "TAX 1" and "TAX 2" with a tab character, so that I can then push down the "TAX 2" entry to a new line. I noticed that the % sign will always be followed by a newline character and then the actual numerical value of the tax (for both "TAX 1" and "TAX 2") on the next line:
Example OCR Text:
SUBTOTAL 126.61 TAX 1 6.750 %
7.78 TAX 2 2.000 %
0.23 TOTAL 134.62
Desired Output Text:
SUBTOTAL 126.61
TAX 1 6.750 % 7.78
TAX 2 2.000 % 0.23
TOTAL 134.62
Objective:
Each (SUBTOTAL, TAX 1, TAX 2, and TOTAL) gets a new line. (this works)
There should be a tab after each (SUBTOTAL, TAX 1, TAX 2, and TOTAL) so that the numeric value for each is a tab-stop away. (this works)
I would like to replace the (space+percent sign+newline character) with just a percent sign and a tab (thinking the 7.78 should "rise" one line up as the newline character is taken out (this is what is failing).
I can do this using the CTRL-F "Find and Replace" menu in the Google Docs UI, using regex options without any problem, but I cant write an Apps Script function to do the same. I have searched everywhere. I realize that the RegEx in GAS is limited. But I don't know enough to know if that is my problem AND what a workaround could be. Likewise, I don't know enough of RegEx to know if the limited version GAS supports is not the cause of my problem (e.g. am I overlooking something).
Here's the code excerpt I use for formatting:
var body = DocumentApp.getActiveDocument().getBody();
/**
* other formatting stuff
*/
//Find SUBTOTAL, remove the space before SUBTOTAL and move it down one line.
body.replaceText(' SUBTOTAL', '\n\nSUBTOTAL\t');
//Find TAX 1, remove the space before TAX 1 and move it down one line.
body.replaceText(' TAX 1', '\nTAX 1\t');
//Find TAX 2, remove the space before TAX 2 and move it down one line.
body.replaceText(' TAX 2', '\nTAX 2\t');
//Find TOTAL, replace it.
body.replaceText('TOTAL', '\nTOTAL\t');
//Find PERCENT SIGN AND ADD A NEWLINE AFTER IT, replace it all with a tab character.
body.replaceText("[ %\n]","\t");
The first 4 replaces work great. Its the last one (the percent sign) that doesnt work. I've tried to escape that percent sign like this:
body.replaceText("[ \%\n]","\t");
and
body.replaceText("[ \\%\n]","\t");
I've tried to remove the braces like this:
body.replaceText(" \%\n","\t");
and
body.replaceText(" \\%\n","\t");
But each gives different results, frankly - messing up the entire receipt text badly.
So the percent sign is the problem - I think.
How can I fix the formatting for the "TAX 1" and "TAX 2" lines?
Example fulltext OCR'd receipt: https://docs.google.com/document/d/1zSV09UGajna4DPtrHUrB6F82NugpYjaftMjomoKVXpE/edit?usp=sharing
You want to replace the text of shared Document to the values you want. The values you want can be retrieved by the script in my comment. If my understanding is correct, how about this?
In the sample script in my comment, the whole text is retrieved and replace to the values you want. When this situation is reflected to the Document, how about this sample script? When you use this script, please do as follows.
Open the shared document.
Open script editor.
Copy and paste the sample script to the script editor.
Run myFunction().
Authorize the scopes.
See the Document.
Sample script :
function myFunction() {
var body = DocumentApp.getActiveDocument().getBody();
var newText = body.getText()
.match(/(SUBTOTAL[\s\S]+?)VISA/)[1]
.replace(/TAX/g, "\nTAX")
.replace(/ TOTAL/g, "\nTOTAL\t")
.replace(/%\n/g, "%\t");
body.clear();
body.setText(newText);
}
If this was not what you want, I'm sorry.

Text Box Formatting

How do I display address values in one row in a Text Box?
Currently values are appear as multiple rows one under another for example:
5,Irivine Place
po box 2345
usa
I'm looking for something like:
5,Irivine Place,po box 2345,usa
If your address is one row in the database then linefeeds must also be being stored. You need to replace these with something else -i.e. a ", " e.g.
=Replace(Fields!Address.Value,VBCRLF,", ")
You would need to get rid of the final ", " in the line so it then becomes:
LEFT(Replace(Fields!Address.Value,VBCRLF,", "),LEN(Replace(Fields!Address.Value,VBCRLF,", ")-2)
Of course you will need to substitute VBCRLF with whatever linefeed character your database is using.

Way to add a calculated number of X's to a form input?

I have certain product codes with varying number of letters/digits e.g. 53HD6J, HH88WBD3 (varies between 5 to 10 letters/digits). In order for our barcode to scan these correctly there has to be 13 letters/digits. I don't want to make the user to input -XXXX after each code but rather have Access calculate the difference between 13 and the length of the code and fill the remaining with a X's. Is this possible either by vba or and expression?
I currently am using about 6 IIFs in one formula to fill remaining blanks with X's but hoping there is an easier way.
I have a form to enter in the batch number (product code). Once that form is submitted it links to a report that is printed. On the report are those batch numbers (53HD6J, HH88WBD3). The spot I want to have this feature is in a text box right next to the codes where Access determines the length of the codes and computes the remaining X's to add. This is in barcode font so this text box is where the 53HD6JXXXXXXX would go. Hope that clears it up!
So I have that part figured out. My problem now is my barcode font reads the text no matter what and translates it still so barcode shows up when the batch number is blank (I have four spots for batch codes to be inputted). So what I had before was =IIf([Text31]="",""&[Text31]&"","") which seemed to work. Hopefully I can continue this with the new formula. If that's unclear let me know.
**(The "" & & "" is so the barcode can be scanned).
My formula was wrong right above with the IIf. I figured it out! Forgot I had used ' Like "*" '. Thanks!
You can do what you want with String() and Left().
Here is an example from the Access Immediate window:
product_code = "53HD6J"
? product_code & String(13, "X")
53HD6JXXXXXXXXXXXXX
? Left(product_code & String(13, "X"), 13)
53HD6JXXXXXXX
Based on the update to your question, I think you can use that approach for the Control Source of a text box where you want to display the "expanded" product code.
Pretend your report has a text box named txtProduct_code where the raw product code, such as 53HD6J, is displayed. And there is a second text box where you want to display that value with the required number of X characters (53HD6JXXXXXXX).
Use this as the Control Source property of that second text box:
= Left([txtProduct_code] & String(13, "X"), 13)
Alternatively, you could make it a field expression in the report's Record Source query.
SELECT
product_code,
Left(product_code & String(13, "X"), 13) AS expanded_product_code
FROM YourTable;

SQL query - Replace/Move some parts of content

I need to update about 2000 records in MySQL
I have a column 'my_content' from table 'my_table' with the folowing value
Title: some title here<br />Author: John Smith<br />Size: 2MB<br />
I have created 3 new columns (my_title, my_author and my_size) and now I need to separate the content of 'my_content' like this
'my_title'
some title here
'my_author'
John Smith
'my_size'
2MB
As you can imagine the title, author and size are always different for each row.
What I'm thinking is to query the following, but I'm not great at SQL queries and I'm not sure what the actually query would look like.
This is what I'm trying to do:
Within 'my_content' find everything that starts with "title:..." and ends with "...<br />au" and move it to 'my_title'
Within 'my_content' find everything that starts with "thor:..." and ends with "...<br />s" and move it to 'my_author'
Within 'my_content' find everything that starts with "ize:..." and ends with "...<br />" and move it to 'my_size'
I just don't know how to write a query to do this.
Once all the content is in the new columns, I can just find and delete the content that's not needed any more, for example 'thor:' , etc.
You can use INSTR to find the index of your delimiters and SUBSTRING to select out the part you want. So, for instance, the author would be
SUBSTR(my_content,
INSTR(my_content, "Author: ") + 8,
INSTR(my_content, "Size: ") - INSTR(my_content, "Author: ") - 8)
You'd need a bit more work to trim the <br/> and any surrounding whitespace.
Please try the below:
SELECT SUBSTRING(SUBSTRING_INDEX(mycontent,'<br />',1),LOCATE('Title: ',mycontent)+7) as mytitle,
SUBSTRING(SUBSTRING_INDEX(mycontent,'<br />',2),LOCATE('Author: ',mycontent)+8) as myauthor,
SUBSTRING(SUBSTRING_INDEX(mycontent,'<br />',3),LOCATE('Size: ',mycontent)+6) as mysize
FROM mytable;