How to extract something I want in html using 'xpath'

How to extract something I want in html using 'xpath' - html

The html code is looking like this:
<img alt="Papa's Cupcakeria To Go!" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-old-hires="" class="a-dynamic-image a-stretch-vertical" id="landingImage" data-a-dynamic-image="{"https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L.png":[512,512],"https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L._SX425_.png":[425,425],"https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L._SX466_.png":[466,466],"https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L._SY450_.png":[450,450],"https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L._SY355_.png":[355,355]}" style="max-width:512px;max-height:512px;">
I want to get "https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L.png" and now I'm using
extract_item(hxs.xpath("//img[#id='landingImage']/#data-a-dynamic-image"))
, what I got is all the content inside that tag.
How can I get the first url only?

If you just want the first URL:
full_content = extract_item(hxs.xpath("//img[#id='landingImage']/#data-a-dynamic-image"))
list_contents = full_content.split(";")
first_image = list_contents[1].replace("&quot","")
print first_image
Also, you can refer this for extracting URL using regex.

Related

how to make img link to a simple code (Html and Css)

so i want to make all my img links into a simple word/code in html and css
Example:
//Not like this
<img src="https://img1.com">
<img src="https://img2.com">
<img src="https://img3.com">
//I want to do something a little bit more like this instead
value01 = https://img1.com
value02 = https://img2.com
value03 = https://img3.com
<img src="value01">
<img src="value02">
<img src="value03">
I don't know what to do I am new to HTML and CSS

I think you can't do this in html because
The <img> tag is used to embed an image in an HTML page, maybe you can do this in python, instead, you can do this:
<b>
<img src="value1.jpg" alt="Value1" >
</b>
Source :
img tag html

there are two ways I can think of.
::THIS FIRST OPTION ONLY WORKS IF YOU SAVE THE PAGE IN (.PHP) EXTENSION
1° Method => You can create a php file apart, store the links of images in variables like this.
< ? php
$img = 'https : // upload . wikimedia . org /wikipedia/commons/thumb/c/c3/Python-logo-notext . svg/1200px-Python-logo-notext . svg . png';
? >
next, you can call this file in the main page/index.
< ? php
include ". /page/images . php";
? >
< html >
< img src="< ? php echo $img; ? >" alt="" srcset="">
< / html >
2° Method => you can just save the image to a folder easy to target.
create a folder inside the same folder you are accessing your main page.
for example: I created a folder called (img) in the same folder my index.html is found, save the image with a short name.
so to access that image i would call the image like this
< img src="image/img.png" alt="" srcset="">

How do I search for an attribute using BeautifulSoup?

I am trying to scrape a that contains the following HTML.
<div class="FeedCard urn:publicid:ap.org:db2b278b7e4f9fea9a2df48b8508ed14 Component-wireStory-0-2-116 card-0-2-117" data-key="feed-card-wire-story-with-image" data-tb-region-item="true">
<div class="FeedCard urn:publicid:ap.org:2f23aa3df0f2f6916ad458785dd52c59 Component-wireStory-0-2-116 card-0-2-117" data-key="feed-card-wire-story-with-image" data-tb-region-item="true">
As you can see, "FeedCard " is something they have in common. Therefore, I am trying to use a regular expression in conjunction with BeautifulSoup. Here is the code I've tried.
pattern = r"\AFeedCard"
for card in soup.find('div', 'class'==re.compile(pattern)):
print(card)
print('**********')
I'm expecting it to give me each on of the divs from above, with the asterisks separating them. Instead it is giving me the entire HTML of the page in a single instance
Thank you,

No need to use regular expression here. Just use CSS selector or BS4 Api:
from bs4 import BeautifulSoup
html = """\
<div class="FeedCard urn:publicid:ap.org:db2b278b7e4f9fea9a2df48b8508ed14 Component-wireStory-0-2-116 card-0-2-117" data-key="feed-card-wire-story-with-image" data-tb-region-item="true">
Item 1
</div>
<div class="FeedCard urn:publicid:ap.org:2f23aa3df0f2f6916ad458785dd52c59 Component-wireStory-0-2-116 card-0-2-117" data-key="feed-card-wire-story-with-image" data-tb-region-item="true">
Item 2
</div>
"""
soup = BeautifulSoup(html, "html.parser")
for card in soup.select(".FeedCard"):
print(card.text.strip())
Prints:
Item 1
Item 2

cannot get tag however it is appear on html

I am trying a scraping job using BeatifulSoup and find methods, I get the HTML with lxml parser as following :
result = requests.get('https://wuzzuf.net/jobs/p/xgUqkfYngXZL-Senior-Python-Developer-Remote---Part-Time-Cairo-Egypt?o=2&l=sp&t=sj&a=python|search-v3|hpb')
#print(result.status_code)
soup1 =BeautifulSoup(result.content , "html5lib")
sections = soup1.find( 'section' ,class_="css-3kx5e2")
divs = sections.find_all('div')
spans = sections.find_all('span')
span = divs[3].find('span' , class_ ='css-47jx3m')
divs[3]
I get the following
<div class="css-rcl8e5"><span class="css-wn0avc">Salary<!-- -->:</span></div>
however, the original HTML is
<div class="css-rcl8e5"><span class="css-wn0avc">Salary<!-- -->:</span>
<span class="css-47jx3m"><span class="css-8il94u">Confidential, Hourly Based</span>
</span>
</div>
I need to get the ('span class="css-8il94u"') which have the text ('Confidential, Hourly Based') but it does not appear
thanks

HTML data update for XML column with new value in SQL Server

I have some experience in XQuery to update the XML data. I have tried to use the same logic for the HTML data in SQL Server.
But not working as expected.
For example I have a XML column Value (actually HTML data) as below.
Declare #template xml = '<div>
<div id="divHeader">Congratulation<div id="Salutation">ravi</div></div><br/>
<div>From now you are a part of the Company<div id="cmpnyUserDetails"></div></div><br/>
<div id="clickSection">Please Click Here to Access Your New Features</div>
</div>'
and I would like change the html value od the div with ID "Salutation" to "New Value" and Append the href value to a valid link using the XQuery.
SET #template.modify('replace value of (//div[id=("Salutation")]/text())[1] with "New Value"')
SELECT #template AS data
But it's not working.
Can someone please suggest to me how to make it happen?
Thanks a ton in advance,
Ravi.

You were close. Notice the #id vs. your id
Example
SET #template.modify('replace value of (//div[#id=("Salutation")]/text())[1] with "New Value"')
select #template as data
Returns
<div>
<div id="divHeader">Congratulation<div id="Salutation">New Value</div></div>
<br />
<div>From now you are a part of the Company<div id="cmpnyUserDetails" /></div>
<br />
<div id="clickSection">Please Click Here to Access Your New Features</div>
</div>

xpath find specific link in page

I'm trying to get the email to a friend link from this page using xpath.
http://www.guardian.co.uk/education/2009/oct/14/30000-miss-university-place
The link itself is wrapped up in tags like this
<li><a class="rollover sendlink" href="http://www.guardian.co.uk/email/354237257" title="Opens an email form" name="&lid={pageToolbox}{Email a friend}&lpos={pageToolbox}{2}"><img src="http://static.guim.co.uk/static/80163/common/images/icon_email-friend.gif" alt="" class="trail-icon" /><span>Send to a friend</span></a></li>
I'm using this for my query, but it's not quite right.
$links = $xpath->query("//a/span[text()='Send to a friend']/#href");

You're trying to get the href of the span there. I think you want
$links = $xpath->query("//a[span/text()='Send to a friend']/#href");

You need to use something like this (since href is an attribute of a):
$links = $xpath->query("//a[span/text()='Send to a friend']/#href");

The href is an attribute of the anchor hence you need:-
$links = $xpath->query("//a[span[text()='Send to a friend']]/#href");

try this
$links = $xpath->query("//a[span='Send to a friend']/#href");

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to extract something I want in html using 'xpath' - html

If you just want the first URL: full_content = extract_item(hxs.xpath("//img[#id='landingImage']/#data-a-dynamic-image")) list_contents = full_content.split(";") first_image = list_contents[1].replace("&quot","") print first_image Also, you can refer this for extracting URL using regex.

Related

how to make img link to a simple code (Html and Css)

How do I search for an attribute using BeautifulSoup?

cannot get tag however it is appear on html

HTML data update for XML column with new value in SQL Server

xpath find specific link in page

Categories

Resources