I am trying to scrape text from a news article - I am doing this as follows:
library(rvest)
url <- "https://www.bbc.com/future/article/20220823-how-auckland-worlds-most-spongy-city-tackles-floods"
final = url %>%
read_html() %>%
html_elements(".article__body-content p") %>%
html_text()
This seems to have worked, but I am trying to combine the results of this code into a single object. For example, the current results look like this:
[1] "Tangled mats of muddy vegetation line the footpaths of Underwood Park, a narrow stripe of green winding along a creek beneath the small volcanic cone of Ōwairaka (Mt Albert) in Auckland, New Zealand. In the water, clumps of sticks and the occasional plastic bag are marooned on protruding rocks and branches."
[2] "A winter storm swept through the city overnight, dropping heavy rain, and Te Auaunga (Oakley Creek), one of the city’s longest urban streams, has overflowed its banks."
[3] "\"But that’s supposed to happen,\" says Julie Fairey, chair of the Puketāpapa local board, who is showing me around Underwood and the neighbouring Walmsley Park."
I would like to make a single object of this text - for example (remove all " "):
final <- "Tangled mats of muddy vegetation line the footpaths of Underwood Park, a narrow stripe of green winding along a creek beneath the small volcanic cone of Ōwairaka (Mt Albert) in Auckland, New Zealand. In the water, clumps of sticks and the occasional plastic bag are marooned on protruding rocks and branches.
A winter storm swept through the city overnight, dropping heavy rain, and Te Auaunga (Oakley Creek), one of the city’s longest urban streams, has overflowed its banks.
But that’s supposed to happen, says Julie Fairey, chair of the Puketāpapa local board, who is showing me around Underwood and the neighbouring Walmsley Park."
When I inspect the results, I initially thought it was a list - but it's actually a character object. Had this been a list, I could have used the "unlist" command. But now I am not sure how to proceed.
Can someone please show me how to proceed?
Thanks!
The output from html_text is a vector of strings. We could join them as a single string with paste and collapse.
library(rvest)
library(magrittr)
final <- url %>%
read_html() %>%
html_elements(".article__body-content p") %>%
html_text() %>%
paste(collapse = "\n")
Now, we check the output
cat(final, sep = "\n")
Tangled mats of muddy vegetation line the footpaths of Underwood Park, a narrow stripe of green winding along a creek beneath the small volcanic cone of Ōwairaka (Mt Albert) in Auckland, New Zealand. In the water, clumps of sticks and the occasional plastic bag are marooned on protruding rocks and branches.
A winter storm swept through the city overnight, dropping heavy rain, and Te Auaunga (Oakley Creek), one of the city’s longest urban streams, has overflowed its banks.
"But that’s supposed to happen," says Julie Fairey, chair of the Puketāpapa local board, who is showing me around Underwood and the neighbouring Walmsley Park.
The connected parks are designed to collect excess stormwater, soak it up like a sponge, and slowly release it back into the creek. The debris left behind is evidence this "secret infrastructure" is working, Fairey says. The two parks are flanked on both sides by public housing developments. "This stuff is designed to flood so that the houses don’t," she says.
It wasn’t always this way, Fairey tells me, as we watch a black shag drying its wings on a rock. Less than a decade ago, the waterway was a concrete-lined culvert that ran through seldom-visited muddy fields. When it flooded, water sloshed into the surrounding suburbs. It collected engine oil, sediment and rubbish and sucked this unhealthy mixture out into the city’s famous harbour, rendering the beaches unsafe to swim.
But in 2016, work began to free Te Auaunga from rigid concrete, and restore it to a more natural, meandering shape. Its banks are now lush with native vegetation like harakeke (flax) and tī kouka (cabbage trees), as well as reeds, ferns and other filtering wetland plants.
The changes have increased this part of the city’s ability to absorb excess rainfall, an attribute sometimes called “sponginess”. Auckland was recently named the most spongy global city in a report by multinational architecture and design firm Arup, thanks to its geography, soil type, and urban design – but experts warn it may not lead the pack for long.
As climate change intensifies extreme weather events worldwide, what can other cities learn from Auckland's successes – and failures?
The connected parks around Te Auaunga creek in Auckland are designed to soak up excess stormwater like a sponge (Credit: Kate Evans)
....
I'm currently having a small problem with getting containers to work properly in inform7. I've created a bed, which is marked as an enterable container. When the scene begins, the player is started out in the bed, and that appears to work ok. You can leave the container just fine. BUT, you can't re-enter the container. If I try to enter/get in/go inside the bed I get a message saying "you can't go that way" and I don't understand why. I'm quite new so there's probably something super simple that I'm missing here. Any ideas? Thanks so much!
Here's my code:
A Warm Cabin is a room. "A one room cabin with a fireplace on the south wall. There's a single window, frosted over from the cold."
Coming to is a scene. Coming to begins when Unconscious ends.
When coming to begins:
move the player to the double bed;
try waking up;
continue the action.
The Double Bed is a container. The double bed is in a warm cabin. The double bed is enterable and fixed in place. The description of the double bed is "A double bed strewn with soft, botanically-embroidered quilts and over-fluffed pillows. . . It's quite cozy."
Instead of entering the double bed: try going inside.
I am setting up JSON+LD tags on my product pages for Facebooks "Dynamic Ads & Commerce".
https://developers.facebook.com/docs/marketing-api/catalog/reference/
Two tags it states are required:
id
Required for dynamic ads and commerce.
availability
Required for dynamic ads and commerce.
I've added these, but it causes Google to reject my json+ld markup.
Invalid enum value in field 'availability'
1.
2.
How could I please both parties here? If I am adding json+ld markup, it may as well have all required elements to be valid on Google and Facebook (Catalog).
[
{
"#context":"http:\/\/www.schema.org",
"#type":"Product",
"name":"Silentnight Safe Nights Toddler Bedset – 4.5 Tog",
"title":"Silentnight Safe Nights Toddler Bedset – 4.5 Tog",
"url":"http:\/\/myexampledomain.co.uk\/product\/silentnight-safe-nights-toddler-bedset-4-5-tog\/",
"link":"http:\/\/myexampledomain.co.uk\/product\/silentnight-safe-nights-toddler-bedset-4-5-tog\/",
"image":"http:\/\/myexampledomain.co.uk\/app\/uploads\/sites\/3\/2019\/10\/silentnight-safe-nights-toddler-bedset-4.5-tog-pack.jpg",
"image_link":"http:\/\/myexampledomain.co.uk\/app\/uploads\/sites\/3\/2019\/10\/silentnight-safe-nights-toddler-bedset-4.5-tog-pack.jpg",
"description":"The Silentnight Toddler Duvet and Pillow set makes a great first bed set for your little one. The cot bed sized set includes a snuggly duvet and a soft pillow that is specially designed for children. The slim profile of the pillow offers your child just the right amount of support they need. It's important that small children don't overheat in bed, so the light and soft duvet comes in a 4.5 tog weight rating. Both the pillow and duvet are made from a smooth polycotton cover offering breathable comfort. The products are filled with anti-allergy hollowfibre that actively defends against the bacteria and dust mites that can cause allergies providing a cleaner, fresher and altogether a safer option for a good night's sleep. Our anti-allergy fibres are approved by the British Allergy Foundation, which means they have the ultimate seal of approval. Our fibres have been scientifically tested and are proven to reduce or remove allergens from the indoor environment.\nSuitable only for children over 12 months. Both the pillow and duvet are fully machine washable, and thanks to the easy care polycotton covers, the products have great recovery and can be washed time and time again.",
"sku":"506984LS",
"id":17214,
"productID":17214,
"offers":[
{
"#type":"Offer",
"price":"30.99",
"priceCurrency":"GBP",
"url":"http:\/\/myexampledomain.co.uk\/product\/silentnight-safe-nights-toddler-bedset-4-5-tog\/",
"gtin":"5012701506984",
"gtin8":"5012701506984",
"condition":"new",
"availability":"in stock",
"inventoryLevel":54
}
],
"brand":"Silentnight"
}
]
I'm trying to get a username from Instagram page. And I should use a part of data which I get after "data = soup.find_all('script') [3]"
It looks like this:
Blockquote
(script type="text/javascript">window._sharedData = {"config":{"csrf_token":"hIuZDxW17bTXz5EDLY25ftqivOOrLEeZ","viewer":null,"viewerId":null},"supports_es6":false,"country_code":"RU","language_code":"en","locale":"en_US","entry_data":{"PostPage":[{"graphql":{"shortcode_media":{"__typename":"GraphImage","id":"1968747493659350883","shortcode":"BtSZWokAZdj","dimensions":{"height":640,"width":640},"gating_info":null,"media_preview":"ACoq5miitSxxIGTHPXPGcd8ZFAGXRXSSWypFsAAZ/lzjpn/Csm5sjAu7Ib8MUAUaKU0lABVq0lMUqsPUA/Q8VVpynBB9CKAOtuOFB9CD+uP5Gq19HuiOPTP5Ul1exhdgy7kdF7fU/wCGatJiRPqv5ZFIZybnP4UynOpUlT1HFNpiClDFeRSUUATLcSJ904+lPF5MvR2H41WooAc7lzuY5J702iigD//Z","display_url":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","display_resources":[{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":640,"config_height":640},{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":750,"config_height":750},{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":1080,"config_height":1080}],"accessibility_caption":"Image may contain: one or more people and closeup","is_video":false,"should_log_client_event":false,"tracking_token":"eyJ2ZXJzaW9uIjo1LCJwYXlsb2FkIjp7ImlzX2FuYWx5dGljc190cmFja2VkIjp0cnVlLCJ1dWlkIjoiN2Q1Yjg2NmY5OGIwNDVhNWIxMmRhNjEwZTA3NDY1MmYxOTY4NzQ3NDkzNjU5MzUwODgzIn0sInNpZ25hdHVyZSI6IiJ9","edge_media_to_tagged_user":{"edges":[]},"edge_media_to_caption":{"edges":[{"node":{"text":"\u2022\nScars show your story. \nYour pain. \nYour hate.\nYour sadness and despair. \nThey make you who you are, and one of a kind with every different mark. \nSome stay, some go.\nSome brighter, some lighter.\nSome bigger, some smaller.\nSome deeper, some one the surface. \nBut they are really all the same, you see?\nThey are all scars, just telling different points of our life, our story. \nOur souvenir throughout our whole life, that shows us how much we've grown. \nHow much we have overcome. How strong we've become.\nHow brave and courageous we've become from the hardest and darkest times of our life. \u2022\n\u2022\n\u2022\n\u2022\n#poem #cuts #selfharm #tatoo #dark #pain #sad #lonely #anxiety #depressed"}}]},"caption_is_edited":true,"has_ranked_comments":false,"edge_media_to_comment":{"count":1,"page_info":{"has_next_page":false,"end_cursor":null},"edges":[]},"comments_disabled":false,"taken_at_timestamp":1548913011,"edge_media_preview_like":{"count":17,"edges":[]},"edge_media_to_sponsor_user":{"edges":[]},"location":null,"viewer_has_liked":false,"viewer_has_saved":false,"viewer_has_saved_to_collection":false,"viewer_in_photo_of_you":false,"viewer_can_reshare":true,"owner":{"id":"10173498181","is_verified":false,"profile_pic_url":"https://instagram.fhel3-1.fna.fbcdn.net/vp/9a17134e8d0a36efec53f1da5cac1f38/5D14BC0F/t51.2885-19/s150x150/47690762_475199173011446_4764198224049209344_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","username":"devils..tea.","blocked_by_viewer":false,"followed_by_viewer":false,"full_name":"depressed\ud83e\udd40","has_blocked_viewer":false,"is_private":false,"is_unpublished":false,"requested_by_viewer":false}......
There is "username" part (at the end of blockquote). I think that it is a string, but I can't catch it. So it's not a string, but what is it? It is a class? Which method I should use to retreive the username "username":"devils..tea.". Thank you in advance, if you can help.
....
req = requests.get(url)
soup = BeautifulSoup(req.text, "lxml")
data = soup.find_all('script') [3]
username = data.find_all_next(string="username")
print (username)
You could use regex
import re
data = '''
(script type="text/javascript">window._sharedData = {"config":{"csrf_token":"hIuZDxW17bTXz5EDLY25ftqivOOrLEeZ","viewer":null,"viewerId":null},"supports_es6":false,"country_code":"RU","language_code":"en","locale":"en_US","entry_data":{"PostPage":[{"graphql":{"shortcode_media":{"__typename":"GraphImage","id":"1968747493659350883","shortcode":"BtSZWokAZdj","dimensions":{"height":640,"width":640},"gating_info":null,"media_preview":"ACoq5miitSxxIGTHPXPGcd8ZFAGXRXSSWypFsAAZ/lzjpn/Csm5sjAu7Ib8MUAUaKU0lABVq0lMUqsPUA/Q8VVpynBB9CKAOtuOFB9CD+uP5Gq19HuiOPTP5Ul1exhdgy7kdF7fU/wCGatJiRPqv5ZFIZybnP4UynOpUlT1HFNpiClDFeRSUUATLcSJ904+lPF5MvR2H41WooAc7lzuY5J702iigD//Z","display_url":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","display_resources":[{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":640,"config_height":640},{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":750,"config_height":750},{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":1080,"config_height":1080}],"accessibility_caption":"Image may contain: one or more people and closeup","is_video":false,"should_log_client_event":false,"tracking_token":"eyJ2ZXJzaW9uIjo1LCJwYXlsb2FkIjp7ImlzX2FuYWx5dGljc190cmFja2VkIjp0cnVlLCJ1dWlkIjoiN2Q1Yjg2NmY5OGIwNDVhNWIxMmRhNjEwZTA3NDY1MmYxOTY4NzQ3NDkzNjU5MzUwODgzIn0sInNpZ25hdHVyZSI6IiJ9","edge_media_to_tagged_user":{"edges":[]},"edge_media_to_caption":{"edges":[{"node":{"text":"\u2022\nScars show your story. \nYour pain. \nYour hate.\nYour sadness and despair. \nThey make you who you are, and one of a kind with every different mark. \nSome stay, some go.\nSome brighter, some lighter.\nSome bigger, some smaller.\nSome deeper, some one the surface. \nBut they are really all the same, you see?\nThey are all scars, just telling different points of our life, our story. \nOur souvenir throughout our whole life, that shows us how much we've grown. \nHow much we have overcome. How strong we've become.\nHow brave and courageous we've become from the hardest and darkest times of our life. \u2022\n\u2022\n\u2022\n\u2022\n#poem #cuts #selfharm #tatoo #dark #pain #sad #lonely #anxiety #depressed"}}]},"caption_is_edited":true,"has_ranked_comments":false,"edge_media_to_comment":{"count":1,"page_info":{"has_next_page":false,"end_cursor":null},"edges":[]},"comments_disabled":false,"taken_at_timestamp":1548913011,"edge_media_preview_like":{"count":17,"edges":[]},"edge_media_to_sponsor_user":{"edges":[]},"location":null,"viewer_has_liked":false,"viewer_has_saved":false,"viewer_has_saved_to_collection":false,"viewer_in_photo_of_you":false,"viewer_can_reshare":true,"owner":{"id":"10173498181","is_verified":false,"profile_pic_url":"https://instagram.fhel3-1.fna.fbcdn.net/vp/9a17134e8d0a36efec53f1da5cac1f38/5D14BC0F/t51.2885-19/s150x150/47690762_475199173011446_4764198224049209344_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","username":"devils..tea.","blocked_by_viewer":false,"followed_by_viewer":false,"full_name":"depressed\ud83e\udd40","has_blocked_viewer":false,"is_private":false,"is_unpublished":false,"requested_by_viewer":false}......
'''
r = re.compile(r'username":"(.*)(?=","blocked)')
print(r.findall(data))
Or, for those of us who don't like regex (nudge, nudge #QHarr :D), you can try this:
data = [your quote above]
data_list = data.split(",")
for i in data_list:
if 'username' in i:
print(i)
Output:
"username":"devils..tea."
I have to do this repeatedly, so I was wondering if there was a working around....
here is a set of amenities for an apartment, they have to be transferred to a larger list containing those amenities by clicking off multiple choice cirlces. Is there away to edit the script to do it all at once>
example
accessible
air conditioning
dishwasher
garage
hardwood floors
parking
patio / balcony
gym
in unit laundry
cats allowed
dogs allowed
pet friendly
basketball court
bathtub
bbq/grill
bike storage
business center
carpet
ceiling fan
clubhouse
game room
granite counters
microwave
oven
package receiving
playground
pool table
range
refrigerator
stainless steel
walk in closets
I think you want to just checkmark all of the checkboxes on the amenities list so something like this will work:
$('div.whitelist-searchable-amenity input[type=checkbox]').each(function(){
$(this).prop('checked', true);
});
https://jsfiddle.net/r62qgou3/
Edit:
I am not sure of what you mean by "copying text to script" but I modified the script to check if the values that you are looking for are within the document. This should search the values of the checkboxes and then you can use your own script to "copy" the value elsewhere. Just add more values to the array if you need more keywords.
x = [
"accessible",
"air conditioning",
"dishwasher",
"garage",
];
$('div.whitelist-searchable-amenity input[type=checkbox]').each(function(){
if(x.indexOf(this.value) > -1)
{
alert("copy here");
}
});