Extract string from html file

Extract string from html file - html

Given this portion of a html file, I am looking for a way to extract the text starting from "Metronidazole ...." to the end under "INDICATIONS & USAGE".
Any suggestions?
<div class="Section" data-sectionCode="34067-9">
<a name="section-4"></a>
<p></p>
<h1>
<span class="Bold">INDICATIONS & USAGE
</span>
</h1>
<p class="First">Metronidazole vaginal gel USP, 0.75% is indicated in the treatment of bacterial vaginosis (formerly referred to as <span class="Italics">Haemophilus</span> vaginitis, <span class="Italics">Gardnerella</span> vaginitis, nonspecific vaginitis, <span class="Italics">Corynebacterium</span> vaginitis, or anaerobic vaginosis).</p>
<dl>
<dt></dt>
<dd>
<p class="First">
<span class="Bold">NOTE:</span> For purposes of this indication, a clinical diagnosis of bacterial vaginosis is usually defined by the presence of a homogeneous vaginal discharge that (a) has a pH of greater than 4.5, (b) emits a “fishy” amine odor when mixed with a 10% KOH solution, and (c) contains clue cells on microscopic examination. Gram’s stain results consistent with a diagnosis of bacterial vaginosis include (a) markedly reduced or absent <span class="Italics">Lactobacillus</span> morphology, (b) predominance of <span class="Italics">Gardnerella</span> morphotype, and (c) absent or few white blood cells.</p>
</dd>
</dl>
<p>Other pathogens commonly associated with vulvovaginitis, e.g., <span class="Italics">Trichomonas vaginalis</span>, <span class="Italics">Chlamydia trachomatis</span>, <span class="Italics">N</span>. <span class="Italics">gonorrhoeae</span>, <span class="Italics">Candida albicans</span>, and <span class="Italics">Herpes simplex</span> virus should be ruled out.</p>
</div>
INDICATIONS & USAGE
Metronidazole vaginal gel USP, 0.75% is indicated in the treatment of bacterial vaginosis (formerly referred to as Haemophilus vaginitis, Gardnerella vaginitis, nonspecific vaginitis, Corynebacterium vaginitis, or anaerobic vaginosis).
NOTE: For purposes of this indication, a clinical diagnosis of bacterial vaginosis is usually defined by the presence of a homogeneous vaginal discharge that (a) has a pH of greater than 4.5, (b) emits a “fishy” amine odor when mixed with a 10% KOH solution, and (c) contains clue cells on microscopic examination. Gram’s stain results consistent with a diagnosis of bacterial vaginosis include (a) markedly reduced or absent Lactobacillus morphology, (b) predominance of Gardnerella morphotype, and (c) absent or few white blood cells.
Other pathogens commonly associated with vulvovaginitis, e.g., Trichomonas vaginalis, Chlamydia trachomatis, N. gonorrhoeae, Candida albicans, and Herpes simplex virus should be ruled out.

You can use some old school tricks like,
First convert the NSString to Character Array (Source Array).
Create an empty target Character Array.
Start adding the character from source to target Array using logic.
If you find '<' (start of html tag) character stop adding character in target Array till you find '>' (end of html tag) character.
Or you could use another Trick like
NSString* startTag = #"<";
NSString* endTag = #">";
NSString* replacementString = #"";
while ([str rangeOfString:startTag].length != 0 && [str rangeOfString:endTag].length != 0)
{
NSRange range1 = [str rangeOfString:startTag];
NSRange range2 = [str rangeOfString:endTag];
if(range1.location>range2.location)
break;
NSRange newRange;
newRange.length =range2.location-range1.location+range2.length;
newRange.location = range1.location;
str = [str stringByReplacingCharactersInRange:newRange withString:replacementString];
}
then you can find the text as you said under "INDICATIONS & USAGE" by rangeOfString Method , your desire text should be after this range.

Related

How to replace numbered list elements with an identifier containing the number

I have gotten amazing help here today!
I'm trying to do something else. I have a numbered list of questions in a Google Doc, and I'd like to replace the numbers with something else.
For example, I'd like to replace the numbers in a list such as:
The Earth is closest to the Sun in which month of the year?
~July
~June
=January
~March
~September
In Australia (in the Southern Hemisphere), when are the days the shortest and the nights the longest?
~in late December
~in late March
=in late June
~in late April
~days and nights are pretty much the same length throughout the year in Australia
With:
::Q09:: The Earth is closest to the Sun in which month of the year?
~July
~June
=January
~March
~September
::Q11:: In Australia (in the Southern Hemisphere), when are the days the shortest and the nights the longest?
~in late December
~in late March
=in late June
~in late April
~days and nights are pretty much the same length throughout the year in Australia
I've tried using suggestions from previous posts but have come up only with things such as the following, which doesn't seem to work.
Thank you for being here!!!
function questionName2(){
var body = DocumentApp.getActiveDocument().getBody();
var text = body.editAsText();
var pattern = "^[1-9]";
var found = body.findText(pattern);
var matchPosition = found.getStartOffset();
while(found){
text.insertText(matchPosition,'::Q0');
found = body.findText(pattern, found);
}
}

Regular expressions
Text.findText(searchPattern) uses a string that will be parsed as a regular expression using Google's RE2 library for the searchPattern. Using a string in this way requires we add an extra backslash whenever we are removing special meaning from a character, such as matching the period after the question number, or using a character matching set like \d for digits.
^\\s*\\d+?\\. will match a set of digits, of any non-zero length, that begin a line, with any length (including zero) of leading white space. \d is for digits, + is one or more, and the combination +? makes the match lazy. The lazy part is not required here, but it's my habit to default to lazy to avoid bugs. An alternative would be \d{1,2} to specifically match 1 to 2 digits.
To extract just the digits from the matched text, we can use a JavaScript RegExp object. Unlike the Doc regular expression, this regular expression will not require extra backslashes and will allow us to use capture groups using parentheses.
^\s*(\d+?)\. is almost the same as above, except no extraneous slashes and we will now "save" the digits so we can use them in our replacement string. We mark what we want to save using parentheses. Because this will be a normal JavaScript regular expression literal, we will wrap the whole thing in slashes: /^\s*(\d+?)\./, but the starting and ending / are just to indicate this is a RegExp literal.
text elements and text strings
Text.findText can return more than just the exact match we asked for: it returns the entire element that contains the text plus indices for what the regular expression matched. In order to perform search and replace with capture groups, we have to use the indices to delete the old text and then insert the new text.
The following assignments get us all the data we need to do the search and replace: first the element, then the start & stop indices, and finally extracting the matched text string using slice (note that slice uses an exclusive end, whereas the Doc API uses an inclusive end, hence the +1).
var found = DocumentApp.getActiveDocument().getBody().findText(pattern);
var matchStart = found.getStartOffset();
var matchEnd = found.getEndOffsetInclusive();
var matchElement = found.getElement().asText();
var matchText = matchElement.getText().slice(matchStart, matchEnd + 1);
Caveats
As Tanaike pointed out in the comments, this assumes the numbering is not List Items, which automatically generates numbers, but numbers you typed in manually. If you are using an automatically generated list of numbers, the API does not allow you to edit the format of the numbering.
This answer also assumes that in the example, when you mapped "9." to "::Q09::" and "10." to "::Q11::", that the mapping of 10 to 11 was a typo. If this was intended, please update the question to clarify the rules for why the numbering might change.
Also assumed is that the numbers are supposed to be less than 100, given the example zero padding of "Q09". The example should be flexible enough to allow you to update this to a different padding scheme if needed.
Full example
Since the question did not use any V8 features, this assumes the older Rhino environment.
/**
* Replaces "1." with "::Q01::"
*/
function updateQuestionNumbering(){
var text = DocumentApp.getActiveDocument().getBody();
var pattern = "^\\s*\\d+?\\.";
var found = text.findText(pattern);
while(found){
var matchStart = found.getStartOffset();
var matchEnd = found.getEndOffsetInclusive();
var matchElement = found.getElement().asText();
var matchText = matchElement.getText().slice(matchStart, matchEnd + 1);
matchElement.deleteText(matchStart, matchEnd);
matchElement.insertText(matchStart, matchText.replace(/^\s*(\d+?)\./, replacer));
found = text.findText(pattern, found);
}
/**
* #param {string} _ - full match (ignored)
* #param {string} number - the sequence of digits matched
*/
function replacer(_, number) {
return "::Q" + padStart(number, 2, "0") + "::";
}
// use String.prototype.padStart() in V8 environment
// above usage would become `number.padStart(2, "0")`
function padStart(string, targetLength, padString) {
while (string.length < targetLength) string = padString + string;
return string;
}
}

Convert HTML to PDF in Power Automate

I am working on Power Automate and trying to convert an HTML page into a pdf file. However, before the HTML page content loads completely, the conversion process takes place. As a result, the pdf file created is either blank or has a loading symbol.
I believe the need is to add a manual delay of a few seconds between page load and pdf file conversion, but am unable to do so.
Below is the concerned 'definition' file JSON code extracted after exported the Power Automate Flow. The connector for pdf conversion can be searched as "operationId":"ConvertFileByPath"
{"name":"611652cf-aec0-4733-8871-b0f0f40af783","id":"/providers/Microsoft.Flow/flows/611652cf-aec0-4733-8871-b0f0f40af783","type":"Microsoft.Flow/flows","properties":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_logicflows","displayName":"Enterprise
Assessment Tool","definition":{"metadata":{"workflowEntityId":null,"creator":{"id":"4205125c-5d6f-4e96-b565-709e4a8dcbde","type":"User","tenantId":"971f0e31-00d6-4e42-b8e0-47b342bc4455"},"provisioningMethod":"FromDefinition","failureAlertSubscription":true,"clientLastModifiedTime":"2020-03-20T08:55:47.5792257Z"},"$schema":"https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#","contentVersion":"1.0.0.0","parameters":{"$connections":{"defaultValue":{},"type":"Object"},"$authentication":{"defaultValue":{},"type":"SecureObject"}},"triggers":{"When_a_new_response_is_submitted":{"type":"OpenApiConnectionWebhook","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_microsoftforms","connectionName":"shared_microsoftforms_2","operationId":"CreateFormWebhook"},"parameters":{"form_id":"MQ4fl9YAQk644EezQrxEVVwSBUJvXZZOtWVwnkqNy95UNkVCVlJZSVlHQlBEWFhOMkFJUE5PT1pSWS4u"},"authentication":"#parameters('$authentication')"}}},"actions":{"Apply_to_each":{"foreach":"#triggerOutputs()?['body/value']","actions":{"Get_response_details":{"runAfter":{},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_microsoftforms","connectionName":"shared_microsoftforms_2","operationId":"GetFormResponseById"},"parameters":{"form_id":"MQ4fl9YAQk644EezQrxEVVwSBUJvXZZOtWVwnkqNy95UNkVCVlJZSVlHQlBEWFhOMkFJUE5PT1pSWS4u","response_id":"#items('Apply_to_each')?['resourceData/responseId']"},"authentication":"#parameters('$authentication')"}},"Add_a_row_into_a_table":{"runAfter":{"Get_response_details":["Succeeded"]},"metadata":{"016AIIWUWO74RBMREYLVAIQPOYYNXRWO3P":"/Enterprise
Assessment Tool.xlsx","tableId":"{9C186D95-CBB6-477E-8699-17B3089A0368}","01BSP3ENPNPETDO5YJUVBI2X6MEG5ARSKV":"/Enterprise Assessment Tool.xlsx"},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_excelonlinebusiness","connectionName":"shared_excelonlinebusiness_1","operationId":"AddRowV2"},"parameters":{"source":"me","drive":"b!wjpSLG6KfES6F9-MfcSm-aeq5llhoTVMpfXuBQBmrywjSt-PswuwSbm1_6BG5sFo","file":"01BSP3ENPNPETDO5YJUVBI2X6MEG5ARSKV","table":"{9C186D95-CBB6-477E-8699-17B3089A0368}","item/ID":"#items('Apply_to_each')?['resourceData/responseId']","item/Your
name":"#outputs('Get_response_details')?['body/r6b6a573a836044eab553ba2e0ab92446']","item/Organization’s Name":"#outputs('Get_response_details')?['body/ra2af29025bff44bbac289f9c61c76666']","item/Your Email Address":"#outputs('Get_response_details')?['body/rf505d38dc6334aa7b69d2c77f230f09e']","item/Providing
clear and effective leadership":"#outputs('Get_response_details')?['body/r4b5eade301884d9b8629c9b38d9a3c2d']","item/Anticipating opportunities and threats to keep us ahead of change":"#outputs('Get_response_details')?['body/rca90ae117bc347f9886072f118ceb630']","item/Willingness
to take on risks for long-term growth, even if that could decrease current year profits":"#outputs('Get_response_details')?['body/rc8656608760f4e19abc1d78ecacbf825']","item/Making disciplined IT investment decisions":"#outputs('Get_response_details')?['body/r14226cb30f604f20a626157ec1786086']","item/Using
IT to gain competitive advantage":"#outputs('Get_response_details')?['body/r963c2865db4d4e90b20fa82547a48c62']","item/Articulating a clear and consistent vision to employees, consumers and partners":"#outputs('Get_response_details')?['body/r01698107f2ea4286a8f59ec20f17f3b8']","item/Enabling
the enterprise to navigate change":"#outputs('Get_response_details')?['body/r2c02b757c2444656a4c32388e2353724']","item/Fostering and changing the culture in IT":"#outputs('Get_response_details')?['body/r09d450406bfb477bb55cfd5dd572eb09']","item/Please
rate the clarity and consistency of your enterprise’s overall business strategy_x002e_":"#outputs('Get_response_details')?['body/r5bdc0ce9ba0b46348149fb2422b4e041']","item/Please indicate the nature of your organization’s CIO’s (or the most senior IT
leader’s) relationship with the CEO (or most senior Business executive)_x002e_":"#outputs('Get_response_details')?['body/r29c1767ce4bd4da6bec697ede635a908']","item/Has your organization faced any of these situations in the past four years? Please select
all that apply_x002e_":"#outputs('Get_response_details')?['body/rb4f8047c739d42938a52145e9a9cbef1']","item/Please share some details regarding the business disruption that you faced in the past four years_x002e_":"#outputs('Get_response_details')?['body/r7652d524e6524ba19fe6c629c4869aa0']","item/External
disruption of your business environment":"#outputs('Get_response_details')?['body/rd3d4bb9074a2411d9234551e0092500c']","item/Adverse regulatory intervention":"#outputs('Get_response_details')?['body/rc01f99fd45e74f5695e7fbcbcd66b6a4']","item/Cyber security
issue":"#outputs('Get_response_details')?['body/reb816f5b028b43ccb22128f0bac6bc00']","item/IT Service failure":"#outputs('Get_response_details')?['body/r95c7899a06e74c04b6a4f96fb905abe8']","item/Product~1service failure":"#outputs('Get_response_details')?['body/r15ea7200480b45febe0384a5ad4fe683']","item/Operating
cost pressure":"#outputs('Get_response_details')?['body/r952f66721ebf48b6bb9e23f64273ae6a']","item/Labor disruption":"#outputs('Get_response_details')?['body/ree57c3756cc04cf8b0d37e158256b1ef']","item/Shifting consumer demand":"#outputs('Get_response_details')?['body/r712b33b666694e80be27b2bcfb4bba86']","item/Funding
shortfall":"#outputs('Get_response_details')?['body/r493be3af3d244d638863da8b4e68fadc']","item/Organizational disruption":"#outputs('Get_response_details')?['body/r90bcccc8a1c3475186f5454677e9199a']","item/Some other disruptive business situation":"#outputs('Get_response_details')?['body/r939f57c83c054ab08d6c64c33bb5c62b']","item/The
overall business performance of the enterprise":"#outputs('Get_response_details')?['body/r729caa64609d4b998cfd200997bdca41']","item/Speed at which new business initiatives are launched":"#outputs('Get_response_details')?['body/r3061360c935d4f0093939e6b64bec6b7']","item/Ability
to fund new business initiatives":"#outputs('Get_response_details')?['body/rb186360b5d6a455d8ed624c1ab6a8ac0']","item/Speed at which business initiatives are successfully completed":"#outputs('Get_response_details')?['body/r3061360c935d4f0093939e6b64bec6b7']","item/Ability
to use data to achieve intended outcomes":"#outputs('Get_response_details')?['body/r2f1fc86f6d6f4cf6a78528bdf5b8e7e2']","item/Ability to attract the right talent to fill our needs":"#outputs('Get_response_details')?['body/rc3a23d40f20f49b78cab3880f0c8db38']","item/Ability
to get value from new business initiatives":"#outputs('Get_response_details')?['body/r4e4371bf1cdc472c8935bc398f7751b5']","item/IT budget growth":"#outputs('Get_response_details')?['body/rc0a78909625144acbc4d9d94658b0c74']","item/Operating cost competitiveness":"#outputs('Get_response_details')?['body/rac841e5b9e064ee88ec84dd54f48e40e']","item/Reputation
as an innovative enterprise":"#outputs('Get_response_details')?['body/r5ebbb6aebc8e4623bf987dca1e01afd1']","item/Our long-term viability":"#outputs('Get_response_details')?['body/rb83eaa31847f4306a54fa6375d5d544d']","item/The stability of the leadership
team (CEO and downward)":"#outputs('Get_response_details')?['body/r7ac35c66284e4215b4dd6ac173134b54']"},"authentication":"#parameters('$authentication')"}}},"runAfter":{},"type":"Foreach"},"Refresh_a_dataset":{"runAfter":{"Delay_2":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_powerbi","connectionName":"shared_powerbi","operationId":"RefreshDataset"},"parameters":{"groupid":"42cf205d-726b-418a-b227-d03cbcaa9f6b","datasetid":"36269b8f-45cd-43c7-a2fa-a3995da63c51"},"authentication":"#parameters('$authentication')"}},"Delay":{"runAfter":{"Refresh_a_dataset":["Succeeded"]},"type":"Wait","inputs":{"interval":{"count":1,"unit":"Minute"}}},"Delay_2":{"runAfter":{"Apply_to_each":["Succeeded"]},"type":"Wait","inputs":{"interval":{"count":1,"unit":"Minute"}}},"Apply_to_each_3":{"foreach":"#triggerOutputs()?['body/value']","actions":{"Get_response_details_3":{"runAfter":{},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_microsoftforms","connectionName":"shared_microsoftforms_2","operationId":"GetFormResponseById"},"parameters":{"form_id":"MQ4fl9YAQk644EezQrxEVVwSBUJvXZZOtWVwnkqNy95UNkVCVlJZSVlHQlBEWFhOMkFJUE5PT1pSWS4u","response_id":"#items('Apply_to_each_3')?['resourceData/responseId']"},"authentication":"#parameters('$authentication')"}},"Convert_HTML_to_PDF":{"runAfter":{"Get_response_details_3":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_encodiandocumentmanager","connectionName":"shared_encodiandocumentmanager","operationId":"HtmlToPDF"},"parameters":{"operation/outputFilename":"Enterprise
Fitness Assessment Report_#{outputs('Get_response_details_3')?['body/ra2af29025bff44bbac289f9c61c76666']}","operation/htmlData":"
<!DOCTYPE html>\n
<html>\n\n
<head>\n
<script>
\
n\ nwindow.addEventListener('load', function() {\
nsetInterval(function() {\
ndocument.getElementById(\"delayedText\").style.visibility = \"visible\";\n},10000);\n\n}, false);\n\n/*window.onload = function(){\n \n var theDelay = 60;\n var timer = setTimeout(\"showText()\",theDelay*1000)\n}\nfunction showText(){\n document.getElementById(\"delayedText\").style.visibility = \"visible\";\n}*/\n\n\n
</script>\n</head>\n\n
<body>\n
<div id=\ "delayedText\" style=\ "visibility:hidden\">This is a test\n\n<iframe width=\ "1140\" height=\ "541.25\" src=\
"https://app.powerbi.com/reportEmbed?reportId=d27f0160-eb09-442b-a7ab-ded938ed33ec&autoAuth=true&ctid=971f0e31-00d6-4e42-b8e0-47b342bc4455&config=eyJjbHVzdGVyVXJsIjoiaHR0cHM6Ly93YWJpLXdlc3QtdXMtcmVkaXJlY3QuYW5hbHlzaXMud2luZG93cy5uZXQvIn0%3D\" frameborder=\ "0\" allowFullScreen=\ "true\"></iframe>\n</div>\n\n</body>\n
</html>","operation/pageOrientation":"Landscape","operation/pageSize":"A4","operation/viewPort":"Default","operation/MarginTop":25,"operation/MarginBottom":25,"operation/MarginRight":25,"operation/MarginLeft":25,"operation/enableBookmarks":true,"operation/enableJavaScript":true,"operation/enableHyperlinks":true,"operation/createPdfForm":false,"operation/decodeHtmlData":true,"operation/cssType":"Screen","operation/repeatTableHeader":true,"operation/repeatTableFooter":true,"operation/splitImages":false,"operation/splitTextLines":false,"operation/encoding":"UTF8","operation/FinalOperation":true},"authentication":"#parameters('$authentication')"}},"Create_file":{"runAfter":{"Convert_HTML_to_PDF":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_onedriveforbusiness","connectionName":"shared_onedriveforbusiness","operationId":"CreateFile"},"parameters":{"folderPath":"/Enterprise
Assessment Reports","name":"#outputs('Convert_HTML_to_PDF')?['body/Filename']","body":"#outputs('Convert_HTML_to_PDF')?['body/FileContent']"},"authentication":"#parameters('$authentication')"},"runtimeConfiguration":{"contentTransfer":{"transferMode":"Chunked"}}},"Send_an_email":{"runAfter":{"Create_file":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_office365","connectionName":"shared_office365","operationId":"SendEmailV2"},"parameters":{"emailMessage/To":"#outputs('Get_response_details_3')?['body/rf505d38dc6334aa7b69d2c77f230f09e']","emailMessage/Subject":"Gartner
Enterprise Fitness Assessment Report","emailMessage/Body":"
<p>Hi #{outputs('Get_response_details_3')?['body/r6b6a573a836044eab553ba2e0ab92446']}<br>\n<br>\nThanks for submitting your response! Please view the attachement for your organisation's assessment.<br>\n<br>\nTeam PRM</p>","emailMessage/From":"Prakhar.Gupta#gartner.com","emailMessage/Attachments":[{"Name":"#outputs('Convert_HTML_to_PDF')?['body/Filename']","ContentBytes":"#outputs('Convert_HTML_to_PDF')?['body/FileContent']"}]},"authentication":"#parameters('$authentication')"}}},"runAfter":{"Send_an_email_(V2)":["Succeeded"]},"type":"Foreach"},"Send_an_email_(V2)":{"runAfter":{"Create_file_3":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_office365","connectionName":"shared_office365","operationId":"SendEmailV2"},"parameters":{"emailMessage/To":"Prakhar.Gupta#gartner.com","emailMessage/Subject":"test","emailMessage/Body":"
<!DOCTYPE html>\n
<html>\n\n
<head>\n
<script>
\
n\ nwindow.addEventListener('load', function() {\
nsetInterval(function() {\
ndocument.getElementById(\"delayedText\").style.visibility = \"visible\";\n},10000);\n\n}, false);\n\n/*window.onload = function(){\n \n var theDelay = 60;\n var timer = setTimeout(\"showText()\",theDelay*1000)\n}\nfunction showText(){\n document.getElementById(\"delayedText\").style.visibility = \"visible\";\n}*/\n\n\n
</script>\n</head>\n\n
<body>\n
<div id=\ "delayedText\" style=\ "visibility:hidden\">This is a test\n\n<iframe width=\ "1140\" height=\ "541.25\" src=\
"https://app.powerbi.com/reportEmbed?reportId=d27f0160-eb09-442b-a7ab-ded938ed33ec&autoAuth=true&ctid=971f0e31-00d6-4e42-b8e0-47b342bc4455&config=eyJjbHVzdGVyVXJsIjoiaHR0cHM6Ly93YWJpLXdlc3QtdXMtcmVkaXJlY3QuYW5hbHlzaXMud2luZG93cy5uZXQvIn0%3D\" frameborder=\ "0\" allowFullScreen=\ "true\"></iframe>\n</div>\n\n</body>\n
</html>"},"authentication":"#parameters('$authentication')"}},"Create_file_2":{"runAfter":{"Delay":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_onedriveforbusiness","connectionName":"shared_onedriveforbusiness","operationId":"CreateFile"},"parameters":{"folderPath":"/Enterprise
Assessment Reports","name":"Test.html","body":"
<!DOCTYPE html>\n
<html>\n\n
<head>\n
<script>
\
n\ nwindow.addEventListener('load', function() {\
nsetInterval(function() {\
ndocument.getElementById(\"delayedText\").style.visibility = \"visible\";\n},10000);\n\n}, false);\n\n/*window.onload = function(){\n \n var theDelay = 60;\n var timer = setTimeout(\"showText()\",theDelay*1000)\n}\nfunction showText(){\n document.getElementById(\"delayedText\").style.visibility = \"visible\";\n}*/\n\n\n
</script>\n</head>\n\n
<body>\n
<div id=\ "delayedText\" style=\ "visibility:hidden\">This is a test\n\n<iframe width=\ "1140\" height=\ "541.25\" src=\
"https://app.powerbi.com/reportEmbed?reportId=d27f0160-eb09-442b-a7ab-ded938ed33ec&autoAuth=true&ctid=971f0e31-00d6-4e42-b8e0-47b342bc4455&config=eyJjbHVzdGVyVXJsIjoiaHR0cHM6Ly93YWJpLXdlc3QtdXMtcmVkaXJlY3QuYW5hbHlzaXMud2luZG93cy5uZXQvIn0%3D\" frameborder=\ "0\" allowFullScreen=\ "true\"></iframe>\n</div>\n\n</body>\n
</html>"},"authentication":"#parameters('$authentication')"},"runtimeConfiguration":{"contentTransfer":{"transferMode":"Chunked"}}},"Delay_3":{"runAfter":{"Create_file_2":["Succeeded"]},"type":"Wait","inputs":{"interval":{"count":3,"unit":"Minute"}}},"Convert_file_using_path":{"runAfter":{"Delay_3":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_onedriveforbusiness","connectionName":"shared_onedriveforbusiness","operationId":"ConvertFileByPath"},"parameters":{"path":"#outputs('Create_file_2')?['body/Path']","type":"PDF"},"authentication":"#parameters('$authentication')"}},"Create_file_3":{"runAfter":{"Convert_file_using_path":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_onedriveforbusiness","connectionName":"shared_onedriveforbusiness","operationId":"CreateFile"},"parameters":{"folderPath":"/Enterprise
Assessment Reports","name":"#outputs('Convert_file_using_path')?['headers/x-ms-file-name']","body":"#outputs('Convert_file_using_path')?['body']"},"authentication":"#parameters('$authentication')"},"runtimeConfiguration":{"contentTransfer":{"transferMode":"Chunked"}}}},"outputs":{},"description":"Track
Microsoft Forms responses in an Excel Online (Business) spreadsheet. The spreadsheet must have columns: SubmissionTime, ResponderEmail."},"connectionReferences":{"shared_microsoftforms_2":{"connectionName":"shared-microsoftform-ff875ca3-62f2-4c71-bed9-8d02ce26ada2","source":"Embedded","id":"/providers/Microsoft.PowerApps/apis/shared_microsoftforms","tier":"NotSpecified"},"shared_excelonlinebusiness_1":{"connectionName":"shared-excelonlinebu-aabd11c2-e15f-4595-a539-d4ffe5ecd544","source":"Embedded","id":"/providers/Microsoft.PowerApps/apis/shared_excelonlinebusiness","tier":"NotSpecified"},"shared_powerbi":{"connectionName":"shared-powerbi-07a589e5-e541-4241-83c7-2e5ba184ec9f","source":"Embedded","id":"/providers/Microsoft.PowerApps/apis/shared_powerbi","tier":"NotSpecified"},"shared_encodiandocumentmanager":{"connectionName":"shared-encodiandocum-29f09c50-052b-4d59-8c60-7876ab0cf806","source":"Embedded","id":"/providers/Microsoft.PowerApps/apis/shared_encodiandocumentmanager","tier":"NotSpecified"},"shared_onedriveforbusiness":{"connectionName":"shared-onedriveforbu-2262692c-87ba-4be3-a32d-febd64f70219","source":"Embedded","id":"/providers/Microsoft.PowerApps/apis/shared_onedriveforbusiness","tier":"NotSpecified"},"shared_office365":{"connectionName":"shared-office365-28369e56-7ed4-431e-9b7b-ba4930b0f010","source":"Embedded","id":"/providers/Microsoft.PowerApps/apis/shared_office365","tier":"NotSpecified"}},"flowFailureAlertSubscribed":false}}

You can use the Run after property on the action to get the HTML.
So once the action return then you convert your PDF

How to create joined string of all text under span classes using XPATH

I'm unable to properly collect the data from the span classes I am looking for. I want to create a list (or really, a combined string) of all of the text from all of these span classes.
The classes I'm looking at are embedded under other classes, but it seems from the output I'm currently getting, that my code is able to location the number of occurrences of the class, and just not extract the text.
<div class="author-group" id="author-group">
<a class="author size-m workspace-trigger" name="bau2" href="#!">
<span class="content">
<span class="text given-name">Jane</span>
<span class="text surname">Doe</span>
<span class="author-ref" id="baff1">
<sup>a</sup></span></span></a>
They're all under their own as shown above, and all of them are under the same .
From this, I would want to be able to get Jane Doe. This class repeats multiple times, and the end goal is to get "Jane Doe; Sam Smith; Joe Gregory." This is my relevant code thus far.
doc <- read_html(x)
just_scripts <- html_nodes(doc, "script") %>% html_text()
sur_author = html_nodes(doc, xpath = '//span[#class="text surname"]/text()') %>%
html_attr('content')
given_author = html_nodes(doc, xpath = '//span[#class="text given-name"]/text()') %>%
html_attr('content')
Given_Author <- paste(given_author, collapse=" ; ")
Sur_Author <- paste(sur_author, collapse=" ; ")
Outside of this function, I have the code writing into an excel spreadsheet and I'm getting results like this:
NA ; NA ; NA
It seems to be able to determine how many authors there are, and properly create a space for each, yet it cannot extract the actual text of the authors names into my file.

R Parses incomplete text from webpages (HTML)

I am trying to parse the plain text from multiple scientific articles for subsequent text analysis. So far I use a R script by Tony Breyal based on the packages RCurl and XML. This works fine for all targeted journals, except for those published by http://www.sciencedirect.com. When I try to parse the articles from SD (and this is consistent for all tested journals I need to access from SD), the text object in R just stores the first part of the whole document in it. Unfortunately, I am not too familiar with html, but I think the problem should be in the SD html code, since it works in all other cases.
I am aware that some journals are not open accessible, but I have access authorisations and the problems also occur in open access articles (check the example).
This is the code from Github:
htmlToText <- function(input, ...) {
###---PACKAGES ---###
require(RCurl)
require(XML)
###--- LOCAL FUNCTIONS ---###
# Determine how to grab html for a single input element
evaluate_input <- function(input) {
# if input is a .html file
if(file.exists(input)) {
char.vec <- readLines(input, warn = FALSE)
return(paste(char.vec, collapse = ""))
}
# if input is html text
if(grepl("</html>", input, fixed = TRUE)) return(input)
# if input is a URL, probably should use a regex here instead?
if(!grepl(" ", input)) {
# downolad SSL certificate in case of https problem
if(!file.exists("cacert.perm")) download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.perm")
return(getURL(input, followlocation = TRUE, cainfo = "cacert.perm"))
}
# return NULL if none of the conditions above apply
return(NULL)
}
# convert HTML to plain text
convert_html_to_text <- function(html) {
doc <- htmlParse(html, asText = TRUE)
text <- xpathSApply(doc, "//text()[not(ancestor::script)][not(ancestor::style)][not(ancestor::noscript)][not(ancestor::form)]", xmlValue)
return(text)
}
# format text vector into one character string
collapse_text <- function(txt) {
return(paste(txt, collapse = " "))
}
###--- MAIN ---###
# STEP 1: Evaluate input
html.list <- lapply(input, evaluate_input)
# STEP 2: Extract text from HTML
text.list <- lapply(html.list, convert_html_to_text)
# STEP 3: Return text
text.vector <- sapply(text.list, collapse_text)
return(text.vector)
}
This is now my code and an example article:
target <- "http://www.sciencedirect.com/science/article/pii/S1754504816300319"
temp.text <- htmlToText(target)
The unformatted text stops somewhere in the Method section:
DNA was extracted using the MasterPure™ Yeast DNA Purification Kit
(Epicentre, Madison, Wisconsin, USA) following the manufacturer's
instructions.
Any suggestions/ideas?
P.S. I also tried html_text based on rvest with the same outcome.

You can prbly use your existing code and just add ?np=y to the end of the URL, but this is a bit more compact:
library(rvest)
library(stringi)
target <- "http://www.sciencedirect.com/science/article/pii/S1754504816300319?np=y"
pg <- read_html(target)
pg %>%
html_nodes(xpath=".//div[#id='centerContent']//child::node()/text()[not(ancestor::script)][not(ancestor::style)][not(ancestor::noscript)][not(ancestor::form)]") %>%
stri_trim() %>%
paste0(collapse=" ") %>%
write(file="output.txt")
A bit of the output (total for that article was >80K):
Fungal Ecology Volume 22 , August 2016, Pages 61–72 175394|| Species richness
influences wine ecosystem function through a dominant species Primrose J. Boynton a , , ,
Duncan Greig a , b a Max Planck Institute for Evolutionary Biology, Plön, 24306, Germany
b The Galton Laboratory, Department of Genetics, Evolution, and Environment, University
College London, London, WC1E 6BT, UK Received 9 November 2015, Revised 27 March 2016,
Accepted 15 April 2016, Available online 1 June 2016 Corresponding editor: Marie Louise
Davey Abstract Increased species richness does not always cause increased ecosystem function.
Instead, richness can influence individual species with positive or negative ecosystem effects.
We investigated richness and function in fermenting wine, and found that richness indirectly
affects ecosystem function by altering the ecological dominance of Saccharomyces cerevisiae .
While S. cerevisiae generally dominates fermentations, it cannot dominate extremely species-rich
communities, probably because antagonistic species prevent it from growing. It is also diluted
from species-poor communities,

HTML5 input pattern for French licence plate number

I'm searching for a html pattern to check an input field containing a licence plate number.
Problem is we have many possible patterns :
AA-123-ZZ
1234-AZ-09
123-ABC-90
Can you help me write such a pattern ?
Cherry on the cake would be if the user can write the - or not.
Thank's

This should cover the three input options as specified:
<form action="carCheck.asp" method="post">
Number Plate: <input type="text" name="number plate" pattern="^([A-Za-z]{2}-?[0-9]{3}-?[A-Za-z]{2})?([0-9]{4}-?[A-Za-z]{2}-?[0-9]{2})?([0-9]{3}-?[A-Za-z]{3}-?[0-9]{2})?$" title="French Number Plate">
<input type="submit">
</form>
Edit: also worth considering is restricted/unused characters in French Number plates (I,O,U)...
pattern="^((?![IOUiou])[A-Za-z]{2}-?[0-9]{3}-?(?![IOUiou])[A-Za-z]{2})?([0-9]{4}-?(?![IOUiou])[A-Za-z]{2}-?[0-9]{2})?([0-9]{3}-?(?![IOUiou])[A-Za-z]{3}-?[0-9]{2})?$"
EDIT: 2nd pattern above to allow lowercase alpha as well as uppercase.
This should cover:
Aa-999-Aa and Aa999Aa
9999-Aa-99 and 9999Aa99
999-AaA-99 and 999AaA99

How about:
pattern="^[A-Z0-9]{1,4}-?[A-Z0-9]{1,4}-?[A-Z0-9]{1,4}$"
3 groups of A-Z/0-9 (1 to 4 symbols), separated by (maybe missing) hypens.
Edit: if you want each group to contain only letters or only numbers, pattern will be the following:
pattern="^([A-Z]{1,4}|[0-9]{1,4})-?([A-Z]{1,4}|[0-9]{1,4})-?([A-Z]{1,4}|[0-9]{1,4})$"
Also, Paul McCombie's answer below contains an amendment on characters unused in license plates, you may want to look at it too.
Update:
pattern="^([A-HJ-NP-TV-Z]{2}|[0-9]{3,4})-?([A-HJ-NP-TV-Z]{2,3}|[0-9]{3})-?([A-HJ-NP-TV-Z]{2}|[0-9]{2})$"

There are 2 solution for your problem That need Regex:
HTML5 input pattern.
Using javaScript to validate input with regex.Test now
HTML5 input pattern
The pattern attribute specifies a regular expression that the element's value is checked against.
<input type="text" name="licence" pattern="(\w+-\w+-\w+)"title="Your input">
Javascript Regex
var re = /(\w+-\w+-\w+)/;
var str = '123-ABC-90';
var m;
if ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex) {
re.lastIndex++;
}
// View your result using the m-variable.
// eg m[0] etc.
}

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Extract string from html file - html

Related

How to replace numbered list elements with an identifier containing the number

Convert HTML to PDF in Power Automate

How to create joined string of all text under span classes using XPATH

R Parses incomplete text from webpages (HTML)

HTML5 input pattern for French licence plate number

Categories

Resources