Converting text to html format in R - html

I have a dataframe with lots of text data that needs to be formatted into html. Here is one example with sub-bullets. Not all lists have sub-bullets. Sub-bullet sare those that start with o
text <- "• Direct the Department’s technical/analytical activities and drive
collaborative relationships with internal personnel and vendors to improve financial
operations and detect, mitigate and prevent financial risk
• Perform supervisory and managerial responsibilities as leader of the program
o Set direction to ensure goals and objectives align with corporate and division strategy
o Select management and other key personnel; oversee talent development and succession
planning
o Collaborate with executive colleagues to develop and execute corporate initiatives and
department strategy
o Oversee the preparation and execution of department’s Annual Financial Plan and budget
o Manage merit pay in accordance with specified objectives and guidelines
• Perform other duties as assigned"
The formatted version is supposed to look like this.
<ul>
<li>Direct the Department's technical/analytical activities and drive collaborative relationships with internal personnel and vendors to improve financial operations and detect, mitigate and prevent financial risk</li>
<li>Perform supervisory and managerial responsibilities as leader of the program
<ul>
<li>Set direction to ensure goals and objectives align with corporate and division strategy</li>
<li>Select management and other key personnel; oversee talent development and succession planning</li>
<li>Collaborate with executive colleagues to develop and execute corporate initiatives and department strategy</li>
<li>Oversee the preparation and execution of department's Annual Financial Plan and budget</li>
<li>Manage merit pay in accordance with specified objectives and guidelines</li>
</ul>
<li>Perform other duties as assigned</li>
</ul>
The "text" above is an example of the type of data. Some have sub bullets other not.
I have no problem doing the li>, the tricky part is finding the sub-bullets and surrounding them with ul> and /ul>
This code does most of it, but how to I put back into the dataframe dfClean?
# Use the str_split() function from the stringr package to split the text into a list of sentences
for (k in 1:6) {
temp_sentence <- str_split(dfClean[k,24], "\\r\\n")[[1]]
# Find the indices of the sentences starting with <li>o
o_indices <- ifelse(grep("^o", temp_sentence), grep("^o", temp_sentence), 0)
# Use the paste0() function to create the HTML list
html_list <- paste0("<ul>\n")
for (i in 1:length(temp_sentence)) {
if(length(o_indices) > 0){
if (i == o_indices[1]) {
html_list <- paste0(html_list, "\t<ul>\n") # add <ul> before subbullets
}
# add extra tab for sub-bullets
if (i %in% o_indices) {
html_list <- paste0(html_list, "\t\t<li>", temp_sentence[i], "</li>\n")
}
else {
html_list <- paste0(html_list, "\t<li>", temp_sentence[i], "</li>\n")
}
# add </ul> after last sub bullet
if (i == o_indices[length(o_indices)]) {
html_list <- paste0(html_list, "\t</ul>\n")
}
}
else {html_list <- paste0(html_list, "\t<li>", temp_sentence[i], "</li>\n")
}
}
print(k)
html_list <- paste0(html_list, "</ul>\n")
# remove dots
html_list <- gsub("<li>•|<li>o", "<li>", html_list)
# Print the HTML list
cat(html_list)
}

Related

Json Value access in Jquery

I am getting Json Response from ajax page like this
[{"City":"","Email":["khyatiramaswamy9#gmail.com"],"Father\u2019s Name":"Khyati Ramaswamy","Job_Title":[],"Name":"Khyati Ramaswamy","PERSONAL DETAILS":"Khyati Ramaswamy \n\nSobha Hillview \n\nKanakpura road\n\nBangalore - 560052\n\n+91-9920374975 (M)\n\n\n\n\n\n\n\n\n\n\tOBJECTIVE \t\n\n \n\nI seek to give my best on the work front to grow and explore my skills to contribute to the organization that offers professional growth.\n\n\n\n\tWORK EXPERIENCE\t\n\n\n\nDubai4u Investments 2019\n\n\n\nSales and Operations Manager\n\n\n\nResponsibilities \n\n\u2022 Responsible for overall operations of the company\n\n\u2022 Managing Sales Team \n\n\u2022 Managing overall coordination with Clients\n\n.\n\nHop \u2013 Jump \t\t\t\t\t\t\t\t\t\t2004 - 2006\n\n\n\nI started my own business \u2018Hop \u2013 Jump\u2019 a specialist at event planning and organizing\n\n\n\nResponsibilities\n\n\n\nI was the founder of the Company that specialized in customizing parties and events for Clients \n\nI established a strong network with caterers, decorators, anchors, performing artists, party suppliers that could help set up the event as per the specifications and budgets of the client.\n\nWe generated business through word of mouth, partnering with caterers & decorators to generate leads\n\n\n\n\n\nHathway\t\t\t\t\t\t\t\t\t\t2003 - 2004\n\n\n\nWorked a Public Relations Officer \n\n\n\nResponsibilities\n\n\n\nUndertook collections and follow up \n\n\n\n\n\nNisus Integrated Marketing Solutions Pvt Ltd. Media\t\t\t\t2000 - 2002\n\n\n\nWorked a Business Development Manager \n\n\n\nResponsibilities\n\n\n\nGrow the market for Nisus by working out solutions that establish it as a one destination marketing solutions provider.\n\nDrive not just regular but also new business for Nisus\n\nMarket research to recognize changing requirements and trends to offer better solutions to clients\n\nProactively understanding client requirements and offering integrated marketing solutions \n\nLooking at non convention business opportunities and tapping its potential\n\nEstablish relations with vendors who are more competitive and upbeat\n\n\n\n\n\nNisus Integrated Marketing Solutions Pvt Ltd. Media\t\t\t\t1197 - 1998\n\n\n\nWorked a Sales and development Executive \n\n\n\nResponsibilities\n\n\n\nLead Generation and follow up with potential customers to maximize business opportunities\n\nBuild and manage database to pool in more business \n\nTele marketing to establish and sell Nisus\u2019 marketing solutions across media.\n\n\n\n\n\n\tEDUCATION\t\n\n\n\nQualification: \tPostgraduate Diploma in Advertising and Marketing \n\nFrom: \t\tBharti Vidya Bhavan\u2019s Rajendra Prasad Institute of Communication and Management \t\n\nYear: \t\t2000-2001\n\n\n\nQualification: \tB. Com \n\nFrom: \t\tNagpur University\t\n\nYear: \t\t1999-2000\n\n \n\n\tINTERESTS \t\n\n\n\nReading \n\nEvents Planning and Organizing\n\nTravelling & backpacking\n\nCooking and Nutrition Planning\n\n\n\n\tPERSONAL DETAILS\t\n\n\n\nDate of Birth: 9th November 1979\n\nGender: Female\n\nMarital Status: Married\n\nLanguages Know: English, Guajarati, Hindi\n\nBasic Skills: Word, Power Point\n\nEmail Address- khyatiramaswamy9#gmail.com","Phone":["+91-9920374975"],"Skill":[],"State":"","colleges":[],"pin_code":"560052","spoken_languages":["English","Hindi"],"universities":[]}]
when i can the city value like this -> alert(data.city) it coming like undefined.
Ajax code :
<div class="col-lg-6">
<span class="pf-title">Resume</span>
<div class="pf-field">
<input type="file" name="resume" id="resume" accept=".doc,.docx,.pdf" value="" class="resume"/> <?php if($row['Upload_Resume'] == "" || $row['Upload_Resume'] == "NULL") { ?> Update Resume <?php } else{ ?> <a href="All-resumes/<?php echo $row['Upload_Resume']; ?>" target="blank" class="btn btn-danger" style="font-size: 10px;" >View Or Download Resume</a><?php } ?>
<i class="fa fa-upload"></i>
</div>
</div>
$(document).ready(function(){
$(".resume").on('change', function(){
var name = $("#resume").val();
var fd = new FormData();
var files = $('#resume')[0].files;
if(files.length > 0 ){
fd.append('file',files[0]);
$.ajax({
url: 'get_resume_data.php',
type: 'post',
data: fd,
contentType: false,
processData: false,
success: function(dataa){
var dataVal = JSON.parse( dataa );
alert(dataVal);
alert(dataVal[0].City);
},
});
}else{
alert("Please select a file.");
}
});
});
Ajax Page code:
$filename = $_FILES['file']['name'];
$files =$filename;
$postData = curl_file_create(realpath($files),mime_content_type($files),basename($files));
$data = array('file' => $postData);
$request = curl_init('');
curl_setopt_array($request, array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => '',
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 0,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => 'POST',
CURLOPT_POSTFIELDS => $data,
));
$result = curl_exec($request);
curl_close($request);
echo $result;
This is what i have trying to do,Thanks in Advance
In comments you mention
its only alert [object][object] when alert(data),alert(data[0]),but alert(data[0].City) return null value
This is only possible if the sample data you provided in your question is not actually the case (in all scenarios).
However if we go by your sample ajax response, we can test it for valid json by directly assigning it to a variable. Note that in your actual code you wouldn't do this, so this ANSWER is to only illustrate that the json in your sample is valid.
let data = [{"City":"Hello","Email":["khyatiramaswamy9#gmail.com"],"Father\u2019s Name":"Khyati Ramaswamy","Job_Title":[],"Name":"Khyati Ramaswamy","PERSONAL DETAILS":"Khyati Ramaswamy \n\nSobha Hillview \n\nKanakpura road\n\nBangalore - 560052\n\n+91-9920374975 (M)\n\n\n\n\n\n\n\n\n\n\tOBJECTIVE \t\n\n \n\nI seek to give my best on the work front to grow and explore my skills to contribute to the organization that offers professional growth.\n\n\n\n\tWORK EXPERIENCE\t\n\n\n\nDubai4u Investments 2019\n\n\n\nSales and Operations Manager\n\n\n\nResponsibilities \n\n\u2022 Responsible for overall operations of the company\n\n\u2022 Managing Sales Team \n\n\u2022 Managing overall coordination with Clients\n\n.\n\nHop \u2013 Jump \t\t\t\t\t\t\t\t\t\t2004 - 2006\n\n\n\nI started my own business \u2018Hop \u2013 Jump\u2019 a specialist at event planning and organizing\n\n\n\nResponsibilities\n\n\n\nI was the founder of the Company that specialized in customizing parties and events for Clients \n\nI established a strong network with caterers, decorators, anchors, performing artists, party suppliers that could help set up the event as per the specifications and budgets of the client.\n\nWe generated business through word of mouth, partnering with caterers & decorators to generate leads\n\n\n\n\n\nHathway\t\t\t\t\t\t\t\t\t\t2003 - 2004\n\n\n\nWorked a Public Relations Officer \n\n\n\nResponsibilities\n\n\n\nUndertook collections and follow up \n\n\n\n\n\nNisus Integrated Marketing Solutions Pvt Ltd. Media\t\t\t\t2000 - 2002\n\n\n\nWorked a Business Development Manager \n\n\n\nResponsibilities\n\n\n\nGrow the market for Nisus by working out solutions that establish it as a one destination marketing solutions provider.\n\nDrive not just regular but also new business for Nisus\n\nMarket research to recognize changing requirements and trends to offer better solutions to clients\n\nProactively understanding client requirements and offering integrated marketing solutions \n\nLooking at non convention business opportunities and tapping its potential\n\nEstablish relations with vendors who are more competitive and upbeat\n\n\n\n\n\nNisus Integrated Marketing Solutions Pvt Ltd. Media\t\t\t\t1197 - 1998\n\n\n\nWorked a Sales and development Executive \n\n\n\nResponsibilities\n\n\n\nLead Generation and follow up with potential customers to maximize business opportunities\n\nBuild and manage database to pool in more business \n\nTele marketing to establish and sell Nisus\u2019 marketing solutions across media.\n\n\n\n\n\n\tEDUCATION\t\n\n\n\nQualification: \tPostgraduate Diploma in Advertising and Marketing \n\nFrom: \t\tBharti Vidya Bhavan\u2019s Rajendra Prasad Institute of Communication and Management \t\n\nYear: \t\t2000-2001\n\n\n\nQualification: \tB. Com \n\nFrom: \t\tNagpur University\t\n\nYear: \t\t1999-2000\n\n \n\n\tINTERESTS \t\n\n\n\nReading \n\nEvents Planning and Organizing\n\nTravelling & backpacking\n\nCooking and Nutrition Planning\n\n\n\n\tPERSONAL DETAILS\t\n\n\n\nDate of Birth: 9th November 1979\n\nGender: Female\n\nMarital Status: Married\n\nLanguages Know: English, Guajarati, Hindi\n\nBasic Skills: Word, Power Point\n\nEmail Address- khyatiramaswamy9#gmail.com","Phone":["+91-9920374975"],"Skill":[],"State":"","colleges":[],"pin_code":"560052","spoken_languages":["English","Hindi"],"universities":[]}]
alert( data[0].City ); // alerts Hello, not null
I only changed the City property value of [0] to "Hello" as opposed to an empty string. However, even an empty string will not return null.
I didn't want to keep extending comments and provided this answer only to illustrate that something else must be wrong with your code that you don't show.

Computing a Multinominal Logistic multilevel regression using glmr from R

Problem: I'm trying to perform a Computing a multinominal logistic multilevel regression. I try to follow this approach:
multinomial logistic multilevel models in R
Details: Therefor I computed six separate models with glmr from the lm4 package from R.
I would like to investigate the influence that meaning in life has on people's everyday lives.
As dependent variables I have pleasant days, meaningful days, pleasant-meaningful days and meaningful-unpleasant days.
I have always made pairwise comparisons as described in the link and have always excluded the other cases.
Question I: Is my approach correct?
#Model-1
#comparision_1:meaningful-pleasant days vs. pleasant days
Model.1 <- glmer(comparision_1~ 1+ (1|subject_id), data = data, family = binomial(), na.action = na.omit)
summary(eelModel.1)
#comparision_2: meaningfulday-pleasant days vs. meaningfuldays,
eelModel.2 <- glmer(comparision_2~ 1+ (1|subject_id), data = data, family = binomial(), na.action = na.omit)
... and so on.
#Model-2
#comparision_1:meaningful-pleasant days vs. pleasant days
Mode2.1 <- glmer(comparision_1~ meaning_in_life+ (1|subject_id), data = data, family = binomial(), na.action = na.omit)
summary(eelModel.1)
#comparision_2: meaningfulday-pleasant days vs. meaningfuldays,
Mode2.2 <- glmer(comparision_2~ meaning_in_life+ (1|subject_id), data = data, family = binomial(), na.action = na.omit)
... and so on.
Question II: Are the estimates from the Output the log-odds? Or do I have to compute them?
Thanks for your help,
Christoph

Convert HTML to PDF in Power Automate

I am working on Power Automate and trying to convert an HTML page into a pdf file. However, before the HTML page content loads completely, the conversion process takes place. As a result, the pdf file created is either blank or has a loading symbol.
I believe the need is to add a manual delay of a few seconds between page load and pdf file conversion, but am unable to do so.
Below is the concerned 'definition' file JSON code extracted after exported the Power Automate Flow. The connector for pdf conversion can be searched as "operationId":"ConvertFileByPath"
{"name":"611652cf-aec0-4733-8871-b0f0f40af783","id":"/providers/Microsoft.Flow/flows/611652cf-aec0-4733-8871-b0f0f40af783","type":"Microsoft.Flow/flows","properties":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_logicflows","displayName":"Enterprise
Assessment Tool","definition":{"metadata":{"workflowEntityId":null,"creator":{"id":"4205125c-5d6f-4e96-b565-709e4a8dcbde","type":"User","tenantId":"971f0e31-00d6-4e42-b8e0-47b342bc4455"},"provisioningMethod":"FromDefinition","failureAlertSubscription":true,"clientLastModifiedTime":"2020-03-20T08:55:47.5792257Z"},"$schema":"https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#","contentVersion":"1.0.0.0","parameters":{"$connections":{"defaultValue":{},"type":"Object"},"$authentication":{"defaultValue":{},"type":"SecureObject"}},"triggers":{"When_a_new_response_is_submitted":{"type":"OpenApiConnectionWebhook","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_microsoftforms","connectionName":"shared_microsoftforms_2","operationId":"CreateFormWebhook"},"parameters":{"form_id":"MQ4fl9YAQk644EezQrxEVVwSBUJvXZZOtWVwnkqNy95UNkVCVlJZSVlHQlBEWFhOMkFJUE5PT1pSWS4u"},"authentication":"#parameters('$authentication')"}}},"actions":{"Apply_to_each":{"foreach":"#triggerOutputs()?['body/value']","actions":{"Get_response_details":{"runAfter":{},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_microsoftforms","connectionName":"shared_microsoftforms_2","operationId":"GetFormResponseById"},"parameters":{"form_id":"MQ4fl9YAQk644EezQrxEVVwSBUJvXZZOtWVwnkqNy95UNkVCVlJZSVlHQlBEWFhOMkFJUE5PT1pSWS4u","response_id":"#items('Apply_to_each')?['resourceData/responseId']"},"authentication":"#parameters('$authentication')"}},"Add_a_row_into_a_table":{"runAfter":{"Get_response_details":["Succeeded"]},"metadata":{"016AIIWUWO74RBMREYLVAIQPOYYNXRWO3P":"/Enterprise
Assessment Tool.xlsx","tableId":"{9C186D95-CBB6-477E-8699-17B3089A0368}","01BSP3ENPNPETDO5YJUVBI2X6MEG5ARSKV":"/Enterprise Assessment Tool.xlsx"},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_excelonlinebusiness","connectionName":"shared_excelonlinebusiness_1","operationId":"AddRowV2"},"parameters":{"source":"me","drive":"b!wjpSLG6KfES6F9-MfcSm-aeq5llhoTVMpfXuBQBmrywjSt-PswuwSbm1_6BG5sFo","file":"01BSP3ENPNPETDO5YJUVBI2X6MEG5ARSKV","table":"{9C186D95-CBB6-477E-8699-17B3089A0368}","item/ID":"#items('Apply_to_each')?['resourceData/responseId']","item/Your
name":"#outputs('Get_response_details')?['body/r6b6a573a836044eab553ba2e0ab92446']","item/Organization’s Name":"#outputs('Get_response_details')?['body/ra2af29025bff44bbac289f9c61c76666']","item/Your Email Address":"#outputs('Get_response_details')?['body/rf505d38dc6334aa7b69d2c77f230f09e']","item/Providing
clear and effective leadership":"#outputs('Get_response_details')?['body/r4b5eade301884d9b8629c9b38d9a3c2d']","item/Anticipating opportunities and threats to keep us ahead of change":"#outputs('Get_response_details')?['body/rca90ae117bc347f9886072f118ceb630']","item/Willingness
to take on risks for long-term growth, even if that could decrease current year profits":"#outputs('Get_response_details')?['body/rc8656608760f4e19abc1d78ecacbf825']","item/Making disciplined IT investment decisions":"#outputs('Get_response_details')?['body/r14226cb30f604f20a626157ec1786086']","item/Using
IT to gain competitive advantage":"#outputs('Get_response_details')?['body/r963c2865db4d4e90b20fa82547a48c62']","item/Articulating a clear and consistent vision to employees, consumers and partners":"#outputs('Get_response_details')?['body/r01698107f2ea4286a8f59ec20f17f3b8']","item/Enabling
the enterprise to navigate change":"#outputs('Get_response_details')?['body/r2c02b757c2444656a4c32388e2353724']","item/Fostering and changing the culture in IT":"#outputs('Get_response_details')?['body/r09d450406bfb477bb55cfd5dd572eb09']","item/Please
rate the clarity and consistency of your enterprise’s overall business strategy_x002e_":"#outputs('Get_response_details')?['body/r5bdc0ce9ba0b46348149fb2422b4e041']","item/Please indicate the nature of your organization’s CIO’s (or the most senior IT
leader’s) relationship with the CEO (or most senior Business executive)_x002e_":"#outputs('Get_response_details')?['body/r29c1767ce4bd4da6bec697ede635a908']","item/Has your organization faced any of these situations in the past four years? Please select
all that apply_x002e_":"#outputs('Get_response_details')?['body/rb4f8047c739d42938a52145e9a9cbef1']","item/Please share some details regarding the business disruption that you faced in the past four years_x002e_":"#outputs('Get_response_details')?['body/r7652d524e6524ba19fe6c629c4869aa0']","item/External
disruption of your business environment":"#outputs('Get_response_details')?['body/rd3d4bb9074a2411d9234551e0092500c']","item/Adverse regulatory intervention":"#outputs('Get_response_details')?['body/rc01f99fd45e74f5695e7fbcbcd66b6a4']","item/Cyber security
issue":"#outputs('Get_response_details')?['body/reb816f5b028b43ccb22128f0bac6bc00']","item/IT Service failure":"#outputs('Get_response_details')?['body/r95c7899a06e74c04b6a4f96fb905abe8']","item/Product~1service failure":"#outputs('Get_response_details')?['body/r15ea7200480b45febe0384a5ad4fe683']","item/Operating
cost pressure":"#outputs('Get_response_details')?['body/r952f66721ebf48b6bb9e23f64273ae6a']","item/Labor disruption":"#outputs('Get_response_details')?['body/ree57c3756cc04cf8b0d37e158256b1ef']","item/Shifting consumer demand":"#outputs('Get_response_details')?['body/r712b33b666694e80be27b2bcfb4bba86']","item/Funding
shortfall":"#outputs('Get_response_details')?['body/r493be3af3d244d638863da8b4e68fadc']","item/Organizational disruption":"#outputs('Get_response_details')?['body/r90bcccc8a1c3475186f5454677e9199a']","item/Some other disruptive business situation":"#outputs('Get_response_details')?['body/r939f57c83c054ab08d6c64c33bb5c62b']","item/The
overall business performance of the enterprise":"#outputs('Get_response_details')?['body/r729caa64609d4b998cfd200997bdca41']","item/Speed at which new business initiatives are launched":"#outputs('Get_response_details')?['body/r3061360c935d4f0093939e6b64bec6b7']","item/Ability
to fund new business initiatives":"#outputs('Get_response_details')?['body/rb186360b5d6a455d8ed624c1ab6a8ac0']","item/Speed at which business initiatives are successfully completed":"#outputs('Get_response_details')?['body/r3061360c935d4f0093939e6b64bec6b7']","item/Ability
to use data to achieve intended outcomes":"#outputs('Get_response_details')?['body/r2f1fc86f6d6f4cf6a78528bdf5b8e7e2']","item/Ability to attract the right talent to fill our needs":"#outputs('Get_response_details')?['body/rc3a23d40f20f49b78cab3880f0c8db38']","item/Ability
to get value from new business initiatives":"#outputs('Get_response_details')?['body/r4e4371bf1cdc472c8935bc398f7751b5']","item/IT budget growth":"#outputs('Get_response_details')?['body/rc0a78909625144acbc4d9d94658b0c74']","item/Operating cost competitiveness":"#outputs('Get_response_details')?['body/rac841e5b9e064ee88ec84dd54f48e40e']","item/Reputation
as an innovative enterprise":"#outputs('Get_response_details')?['body/r5ebbb6aebc8e4623bf987dca1e01afd1']","item/Our long-term viability":"#outputs('Get_response_details')?['body/rb83eaa31847f4306a54fa6375d5d544d']","item/The stability of the leadership
team (CEO and downward)":"#outputs('Get_response_details')?['body/r7ac35c66284e4215b4dd6ac173134b54']"},"authentication":"#parameters('$authentication')"}}},"runAfter":{},"type":"Foreach"},"Refresh_a_dataset":{"runAfter":{"Delay_2":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_powerbi","connectionName":"shared_powerbi","operationId":"RefreshDataset"},"parameters":{"groupid":"42cf205d-726b-418a-b227-d03cbcaa9f6b","datasetid":"36269b8f-45cd-43c7-a2fa-a3995da63c51"},"authentication":"#parameters('$authentication')"}},"Delay":{"runAfter":{"Refresh_a_dataset":["Succeeded"]},"type":"Wait","inputs":{"interval":{"count":1,"unit":"Minute"}}},"Delay_2":{"runAfter":{"Apply_to_each":["Succeeded"]},"type":"Wait","inputs":{"interval":{"count":1,"unit":"Minute"}}},"Apply_to_each_3":{"foreach":"#triggerOutputs()?['body/value']","actions":{"Get_response_details_3":{"runAfter":{},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_microsoftforms","connectionName":"shared_microsoftforms_2","operationId":"GetFormResponseById"},"parameters":{"form_id":"MQ4fl9YAQk644EezQrxEVVwSBUJvXZZOtWVwnkqNy95UNkVCVlJZSVlHQlBEWFhOMkFJUE5PT1pSWS4u","response_id":"#items('Apply_to_each_3')?['resourceData/responseId']"},"authentication":"#parameters('$authentication')"}},"Convert_HTML_to_PDF":{"runAfter":{"Get_response_details_3":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_encodiandocumentmanager","connectionName":"shared_encodiandocumentmanager","operationId":"HtmlToPDF"},"parameters":{"operation/outputFilename":"Enterprise
Fitness Assessment Report_#{outputs('Get_response_details_3')?['body/ra2af29025bff44bbac289f9c61c76666']}","operation/htmlData":"
<!DOCTYPE html>\n
<html>\n\n
<head>\n
<script>
\
n\ nwindow.addEventListener('load', function() {\
nsetInterval(function() {\
ndocument.getElementById(\"delayedText\").style.visibility = \"visible\";\n},10000);\n\n}, false);\n\n/*window.onload = function(){\n \n var theDelay = 60;\n var timer = setTimeout(\"showText()\",theDelay*1000)\n}\nfunction showText(){\n document.getElementById(\"delayedText\").style.visibility = \"visible\";\n}*/\n\n\n
</script>\n</head>\n\n
<body>\n
<div id=\ "delayedText\" style=\ "visibility:hidden\">This is a test\n\n<iframe width=\ "1140\" height=\ "541.25\" src=\
"https://app.powerbi.com/reportEmbed?reportId=d27f0160-eb09-442b-a7ab-ded938ed33ec&autoAuth=true&ctid=971f0e31-00d6-4e42-b8e0-47b342bc4455&config=eyJjbHVzdGVyVXJsIjoiaHR0cHM6Ly93YWJpLXdlc3QtdXMtcmVkaXJlY3QuYW5hbHlzaXMud2luZG93cy5uZXQvIn0%3D\" frameborder=\ "0\" allowFullScreen=\ "true\"></iframe>\n</div>\n\n</body>\n
</html>","operation/pageOrientation":"Landscape","operation/pageSize":"A4","operation/viewPort":"Default","operation/MarginTop":25,"operation/MarginBottom":25,"operation/MarginRight":25,"operation/MarginLeft":25,"operation/enableBookmarks":true,"operation/enableJavaScript":true,"operation/enableHyperlinks":true,"operation/createPdfForm":false,"operation/decodeHtmlData":true,"operation/cssType":"Screen","operation/repeatTableHeader":true,"operation/repeatTableFooter":true,"operation/splitImages":false,"operation/splitTextLines":false,"operation/encoding":"UTF8","operation/FinalOperation":true},"authentication":"#parameters('$authentication')"}},"Create_file":{"runAfter":{"Convert_HTML_to_PDF":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_onedriveforbusiness","connectionName":"shared_onedriveforbusiness","operationId":"CreateFile"},"parameters":{"folderPath":"/Enterprise
Assessment Reports","name":"#outputs('Convert_HTML_to_PDF')?['body/Filename']","body":"#outputs('Convert_HTML_to_PDF')?['body/FileContent']"},"authentication":"#parameters('$authentication')"},"runtimeConfiguration":{"contentTransfer":{"transferMode":"Chunked"}}},"Send_an_email":{"runAfter":{"Create_file":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_office365","connectionName":"shared_office365","operationId":"SendEmailV2"},"parameters":{"emailMessage/To":"#outputs('Get_response_details_3')?['body/rf505d38dc6334aa7b69d2c77f230f09e']","emailMessage/Subject":"Gartner
Enterprise Fitness Assessment Report","emailMessage/Body":"
<p>Hi #{outputs('Get_response_details_3')?['body/r6b6a573a836044eab553ba2e0ab92446']}<br>\n<br>\nThanks for submitting your response! Please view the attachement for your organisation's assessment.<br>\n<br>\nTeam PRM</p>","emailMessage/From":"Prakhar.Gupta#gartner.com","emailMessage/Attachments":[{"Name":"#outputs('Convert_HTML_to_PDF')?['body/Filename']","ContentBytes":"#outputs('Convert_HTML_to_PDF')?['body/FileContent']"}]},"authentication":"#parameters('$authentication')"}}},"runAfter":{"Send_an_email_(V2)":["Succeeded"]},"type":"Foreach"},"Send_an_email_(V2)":{"runAfter":{"Create_file_3":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_office365","connectionName":"shared_office365","operationId":"SendEmailV2"},"parameters":{"emailMessage/To":"Prakhar.Gupta#gartner.com","emailMessage/Subject":"test","emailMessage/Body":"
<!DOCTYPE html>\n
<html>\n\n
<head>\n
<script>
\
n\ nwindow.addEventListener('load', function() {\
nsetInterval(function() {\
ndocument.getElementById(\"delayedText\").style.visibility = \"visible\";\n},10000);\n\n}, false);\n\n/*window.onload = function(){\n \n var theDelay = 60;\n var timer = setTimeout(\"showText()\",theDelay*1000)\n}\nfunction showText(){\n document.getElementById(\"delayedText\").style.visibility = \"visible\";\n}*/\n\n\n
</script>\n</head>\n\n
<body>\n
<div id=\ "delayedText\" style=\ "visibility:hidden\">This is a test\n\n<iframe width=\ "1140\" height=\ "541.25\" src=\
"https://app.powerbi.com/reportEmbed?reportId=d27f0160-eb09-442b-a7ab-ded938ed33ec&autoAuth=true&ctid=971f0e31-00d6-4e42-b8e0-47b342bc4455&config=eyJjbHVzdGVyVXJsIjoiaHR0cHM6Ly93YWJpLXdlc3QtdXMtcmVkaXJlY3QuYW5hbHlzaXMud2luZG93cy5uZXQvIn0%3D\" frameborder=\ "0\" allowFullScreen=\ "true\"></iframe>\n</div>\n\n</body>\n
</html>"},"authentication":"#parameters('$authentication')"}},"Create_file_2":{"runAfter":{"Delay":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_onedriveforbusiness","connectionName":"shared_onedriveforbusiness","operationId":"CreateFile"},"parameters":{"folderPath":"/Enterprise
Assessment Reports","name":"Test.html","body":"
<!DOCTYPE html>\n
<html>\n\n
<head>\n
<script>
\
n\ nwindow.addEventListener('load', function() {\
nsetInterval(function() {\
ndocument.getElementById(\"delayedText\").style.visibility = \"visible\";\n},10000);\n\n}, false);\n\n/*window.onload = function(){\n \n var theDelay = 60;\n var timer = setTimeout(\"showText()\",theDelay*1000)\n}\nfunction showText(){\n document.getElementById(\"delayedText\").style.visibility = \"visible\";\n}*/\n\n\n
</script>\n</head>\n\n
<body>\n
<div id=\ "delayedText\" style=\ "visibility:hidden\">This is a test\n\n<iframe width=\ "1140\" height=\ "541.25\" src=\
"https://app.powerbi.com/reportEmbed?reportId=d27f0160-eb09-442b-a7ab-ded938ed33ec&autoAuth=true&ctid=971f0e31-00d6-4e42-b8e0-47b342bc4455&config=eyJjbHVzdGVyVXJsIjoiaHR0cHM6Ly93YWJpLXdlc3QtdXMtcmVkaXJlY3QuYW5hbHlzaXMud2luZG93cy5uZXQvIn0%3D\" frameborder=\ "0\" allowFullScreen=\ "true\"></iframe>\n</div>\n\n</body>\n
</html>"},"authentication":"#parameters('$authentication')"},"runtimeConfiguration":{"contentTransfer":{"transferMode":"Chunked"}}},"Delay_3":{"runAfter":{"Create_file_2":["Succeeded"]},"type":"Wait","inputs":{"interval":{"count":3,"unit":"Minute"}}},"Convert_file_using_path":{"runAfter":{"Delay_3":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_onedriveforbusiness","connectionName":"shared_onedriveforbusiness","operationId":"ConvertFileByPath"},"parameters":{"path":"#outputs('Create_file_2')?['body/Path']","type":"PDF"},"authentication":"#parameters('$authentication')"}},"Create_file_3":{"runAfter":{"Convert_file_using_path":["Succeeded"]},"type":"OpenApiConnection","inputs":{"host":{"apiId":"/providers/Microsoft.PowerApps/apis/shared_onedriveforbusiness","connectionName":"shared_onedriveforbusiness","operationId":"CreateFile"},"parameters":{"folderPath":"/Enterprise
Assessment Reports","name":"#outputs('Convert_file_using_path')?['headers/x-ms-file-name']","body":"#outputs('Convert_file_using_path')?['body']"},"authentication":"#parameters('$authentication')"},"runtimeConfiguration":{"contentTransfer":{"transferMode":"Chunked"}}}},"outputs":{},"description":"Track
Microsoft Forms responses in an Excel Online (Business) spreadsheet. The spreadsheet must have columns: SubmissionTime, ResponderEmail."},"connectionReferences":{"shared_microsoftforms_2":{"connectionName":"shared-microsoftform-ff875ca3-62f2-4c71-bed9-8d02ce26ada2","source":"Embedded","id":"/providers/Microsoft.PowerApps/apis/shared_microsoftforms","tier":"NotSpecified"},"shared_excelonlinebusiness_1":{"connectionName":"shared-excelonlinebu-aabd11c2-e15f-4595-a539-d4ffe5ecd544","source":"Embedded","id":"/providers/Microsoft.PowerApps/apis/shared_excelonlinebusiness","tier":"NotSpecified"},"shared_powerbi":{"connectionName":"shared-powerbi-07a589e5-e541-4241-83c7-2e5ba184ec9f","source":"Embedded","id":"/providers/Microsoft.PowerApps/apis/shared_powerbi","tier":"NotSpecified"},"shared_encodiandocumentmanager":{"connectionName":"shared-encodiandocum-29f09c50-052b-4d59-8c60-7876ab0cf806","source":"Embedded","id":"/providers/Microsoft.PowerApps/apis/shared_encodiandocumentmanager","tier":"NotSpecified"},"shared_onedriveforbusiness":{"connectionName":"shared-onedriveforbu-2262692c-87ba-4be3-a32d-febd64f70219","source":"Embedded","id":"/providers/Microsoft.PowerApps/apis/shared_onedriveforbusiness","tier":"NotSpecified"},"shared_office365":{"connectionName":"shared-office365-28369e56-7ed4-431e-9b7b-ba4930b0f010","source":"Embedded","id":"/providers/Microsoft.PowerApps/apis/shared_office365","tier":"NotSpecified"}},"flowFailureAlertSubscribed":false}}
You can use the Run after property on the action to get the HTML.
So once the action return then you convert your PDF

Extract academic publication information from IDEAS

I want to extract the list of publications from a specific IDEAS's page. I want to retrieve information about name of the paper, authors, and year. However, I am bit stuck in doing so. By inspecting the page, all information is inside the div class="tab-pane fade show active" [...], then with h3 we do have the year of publication while inside each li class="list-group-item downfree" [...] we can find each paper with relative author (as showed in this image). At the end, what I willing to obtain is a dataframe containing three columns: title, author, and year.
Nonetheless, while I am able to retrieve each paper's name, when I want to add also year and author(s) I get confused. What I wrote so far is the following short code:
from requests import get
url = 'https://ideas.repec.org/s/rtr/wpaper.html'
response = get(url)
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
containers = soup.findAll("div", {'class': 'tab-pane fade show active'})
title_list = []
year_list = []
for container in containers:
year = container.findAll('h3')
year_list.append(int(year[0].text))
title_containers = container.findAll("li", {'class': 'list-group-item downfree'})
title = title_containers[0].a.text
title_list.append(title)
What I get are two list of only one element each. This because the initial containers has the size of 1. Regarding instead how to retrieve author(s) name I have no idea, I tried in several ways without success. I think I have to stripe the titles using 'by' as separator.
I hope someone could help me or re-direct to some other discussion which face a similar situation. Thank you in advance. Apologize for my (probably) silly question, I am still a beginner in web scraping with BeautifulSoup.
You can get the desired information like this:
from requests import get
import pprint
from bs4 import BeautifulSoup
url = 'https://ideas.repec.org/s/rtr/wpaper.html'
response = get(url)
soup = BeautifulSoup(response.text, 'html.parser')
container = soup.select_one("#content")
title_list = []
author_list = []
year_list = [int(h.text) for h in container.find_all('h3')]
for panel in container.select("div.panel-body"):
title_list.append([x.text for x in panel.find_all('a')])
author_list.append([x.next_sibling.strip() for x in panel.find_all('i')])
result = list(zip(year_list, title_list, author_list))
pp = pprint.PrettyPrinter(indent=4, width=250)
pp.pprint(result)
outputs:
[ ( 2020,
['The Role Of Public Procurement As Innovation Lever: Evidence From Italian Manufacturing Firms', 'A voyage in the role of territory: are territories capable of instilling their peculiarities in local production systems'],
['Francesco Crespi & Serenella Caravella', 'Cristina Vaquero-Piñeiro']),
( 2019,
[ 'Probability Forecasts and Prediction Markets',
'R&D Financing And Growth',
'Mission-Oriented Innovation Policies: A Theoretical And Empirical Assessment For The Us Economy',
'Public Investment Fiscal Multipliers: An Empirical Assessment For European Countries',
'Consumption Smoothing Channels Within And Between Households',
'A critical analysis of the secular stagnation theory',
'Further evidence of the relationship between social transfers and income inequality in OECD countries',
'Capital accumulation and corporate portfolio choice between liquidity holdings and financialisation'],
[ 'Julia Mortera & A. Philip Dawid',
'Luca Spinesi & Mario Tirelli',
'Matteo Deleidi & Mariana Mazzucato',
'Enrico Sergio Levrero & Matteo Deleidi & Francesca Iafrate',
'Simone Tedeschi & Luigi Ventura & Pierfederico Asdrubal',
'Stefano Di Bucchianico',
"Giorgio D'Agostino & Luca Pieroni & Margherita Scarlato",
'Giovanni Scarano']),
( 2018, ...
I got the years using a list comprehension. I got the titles and authors by appending a list to the title_list and title_list for the required elements in each div element with the class panel-body again using a list comprehension and using next.sibling for the i element to get the authors. Then I zipped the three lists and cast the result to a list. Finally I pretty printed the result.

R Parses incomplete text from webpages (HTML)

I am trying to parse the plain text from multiple scientific articles for subsequent text analysis. So far I use a R script by Tony Breyal based on the packages RCurl and XML. This works fine for all targeted journals, except for those published by http://www.sciencedirect.com. When I try to parse the articles from SD (and this is consistent for all tested journals I need to access from SD), the text object in R just stores the first part of the whole document in it. Unfortunately, I am not too familiar with html, but I think the problem should be in the SD html code, since it works in all other cases.
I am aware that some journals are not open accessible, but I have access authorisations and the problems also occur in open access articles (check the example).
This is the code from Github:
htmlToText <- function(input, ...) {
###---PACKAGES ---###
require(RCurl)
require(XML)
###--- LOCAL FUNCTIONS ---###
# Determine how to grab html for a single input element
evaluate_input <- function(input) {
# if input is a .html file
if(file.exists(input)) {
char.vec <- readLines(input, warn = FALSE)
return(paste(char.vec, collapse = ""))
}
# if input is html text
if(grepl("</html>", input, fixed = TRUE)) return(input)
# if input is a URL, probably should use a regex here instead?
if(!grepl(" ", input)) {
# downolad SSL certificate in case of https problem
if(!file.exists("cacert.perm")) download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.perm")
return(getURL(input, followlocation = TRUE, cainfo = "cacert.perm"))
}
# return NULL if none of the conditions above apply
return(NULL)
}
# convert HTML to plain text
convert_html_to_text <- function(html) {
doc <- htmlParse(html, asText = TRUE)
text <- xpathSApply(doc, "//text()[not(ancestor::script)][not(ancestor::style)][not(ancestor::noscript)][not(ancestor::form)]", xmlValue)
return(text)
}
# format text vector into one character string
collapse_text <- function(txt) {
return(paste(txt, collapse = " "))
}
###--- MAIN ---###
# STEP 1: Evaluate input
html.list <- lapply(input, evaluate_input)
# STEP 2: Extract text from HTML
text.list <- lapply(html.list, convert_html_to_text)
# STEP 3: Return text
text.vector <- sapply(text.list, collapse_text)
return(text.vector)
}
This is now my code and an example article:
target <- "http://www.sciencedirect.com/science/article/pii/S1754504816300319"
temp.text <- htmlToText(target)
The unformatted text stops somewhere in the Method section:
DNA was extracted using the MasterPure™ Yeast DNA Purification Kit
(Epicentre, Madison, Wisconsin, USA) following the manufacturer's
instructions.
Any suggestions/ideas?
P.S. I also tried html_text based on rvest with the same outcome.
You can prbly use your existing code and just add ?np=y to the end of the URL, but this is a bit more compact:
library(rvest)
library(stringi)
target <- "http://www.sciencedirect.com/science/article/pii/S1754504816300319?np=y"
pg <- read_html(target)
pg %>%
html_nodes(xpath=".//div[#id='centerContent']//child::node()/text()[not(ancestor::script)][not(ancestor::style)][not(ancestor::noscript)][not(ancestor::form)]") %>%
stri_trim() %>%
paste0(collapse=" ") %>%
write(file="output.txt")
A bit of the output (total for that article was >80K):
Fungal Ecology Volume 22 , August 2016, Pages 61–72 175394|| Species richness
influences wine ecosystem function through a dominant species Primrose J. Boynton a , , ,
Duncan Greig a , b a Max Planck Institute for Evolutionary Biology, Plön, 24306, Germany
b The Galton Laboratory, Department of Genetics, Evolution, and Environment, University
College London, London, WC1E 6BT, UK Received 9 November 2015, Revised 27 March 2016,
Accepted 15 April 2016, Available online 1 June 2016 Corresponding editor: Marie Louise
Davey Abstract Increased species richness does not always cause increased ecosystem function.
Instead, richness can influence individual species with positive or negative ecosystem effects.
We investigated richness and function in fermenting wine, and found that richness indirectly
affects ecosystem function by altering the ecological dominance of Saccharomyces cerevisiae .
While S. cerevisiae generally dominates fermentations, it cannot dominate extremely species-rich
communities, probably because antagonistic species prevent it from growing. It is also diluted
from species-poor communities,