xpath find specific link in page - html

I'm trying to get the email to a friend link from this page using xpath.
http://www.guardian.co.uk/education/2009/oct/14/30000-miss-university-place
The link itself is wrapped up in tags like this
<li><a class="rollover sendlink" href="http://www.guardian.co.uk/email/354237257" title="Opens an email form" name="&lid={pageToolbox}{Email a friend}&lpos={pageToolbox}{2}"><img src="http://static.guim.co.uk/static/80163/common/images/icon_email-friend.gif" alt="" class="trail-icon" /><span>Send to a friend</span></a></li>
I'm using this for my query, but it's not quite right.
$links = $xpath->query("//a/span[text()='Send to a friend']/#href");

You're trying to get the href of the span there. I think you want
$links = $xpath->query("//a[span/text()='Send to a friend']/#href");

You need to use something like this (since href is an attribute of a):
$links = $xpath->query("//a[span/text()='Send to a friend']/#href");

The href is an attribute of the anchor hence you need:-
$links = $xpath->query("//a[span[text()='Send to a friend']]/#href");

try this
$links = $xpath->query("//a[span='Send to a friend']/#href");

Related

how to make img link to a simple code (Html and Css)

so i want to make all my img links into a simple word/code in html and css
Example:
//Not like this
<img src="https://img1.com">
<img src="https://img2.com">
<img src="https://img3.com">
//I want to do something a little bit more like this instead
value01 = https://img1.com
value02 = https://img2.com
value03 = https://img3.com
<img src="value01">
<img src="value02">
<img src="value03">
I don't know what to do I am new to HTML and CSS
I think you can't do this in html because
The <img> tag is used to embed an image in an HTML page, maybe you can do this in python, instead, you can do this:
<b>
<img src="value1.jpg" alt="Value1" >
</b>
Source :
img tag html
there are two ways I can think of.
::THIS FIRST OPTION ONLY WORKS IF YOU SAVE THE PAGE IN (.PHP) EXTENSION
1° Method => You can create a php file apart, store the links of images in variables like this.
< ? php
$img = 'https : // upload . wikimedia . org /wikipedia/commons/thumb/c/c3/Python-logo-notext . svg/1200px-Python-logo-notext . svg . png';
? >
next, you can call this file in the main page/index.
< ? php
include ". /page/images . php";
? >
< html >
< img src="< ? php echo $img; ? >" alt="" srcset="">
< / html >
2° Method => you can just save the image to a folder easy to target.
create a folder inside the same folder you are accessing your main page.
for example: I created a folder called (img) in the same folder my index.html is found, save the image with a short name.
so to access that image i would call the image like this
< img src="image/img.png" alt="" srcset="">

How to get the text of img alt inside <a> tag

I have a url with the following html part
<div class="shop cf">
<a class="shop-logo js-shop-logo" href="/m/3870/GMobile">
<noscript>
<img alt="GMobile" class="js-lazy" data-src="//a.scdn.gr/ds/shops/logos/3870/mid_20160920155600_71ff515d.jpeg" src="//a.scdn.gr/ds/shops/logos/3870/mid_20160920155600_71ff515d.jpeg" />
</noscript>
<img alt="GMobile" class="js-lazy" data-src="//a.scdn.gr/ds/shops/logos/3870/mid_20160920155600_71ff515d.jpeg" src="//c.scdn.gr/assets/transparent-325472601571f31e1bf00674c368d335.gif" />
</a>
</div>
I want to get the first img alt inside the div class shop cf and I do
Set seller = Doc.querySelectorAll("img")
wks.Cells(i, "D").Value = seller.getAttribute("alt").Content(0)
I get nothing what I forget to include?!?
Can I get it from
<noscript>
tag?
I tried the following as well
Set seller = Doc.getElementsByClassName("js-lazy")
wks.Cells(i, "D").Value = seller.getAttribute("alt")
Use element with attribute selector
CSS:
img[alt]
VBA:
ie.document.querySelector("img[alt]")
You may need to add
ie.document.querySelector("img[alt]").getAttribute("alt")
To include the class use
ie.document.querySelector("img.js-lazy[alt]")
If more than one element then use querySelectorAll and index into returned nodeList e.g.
Set list = ie.document.querySelectorAll("img.js-lazy[alt]")
list.item(0).getAttribute('alt')
list.item(1).getAttribute('alt')
have you try this way?
let lazy1 = document.querySelectorAll(".js-lazy")[0]
let lazyalt = lazy1.getAttribute("alt");
let shop = document.querySelector('.shop');
shop.classList.add(lazyalt);
console.log(lazyalt)

How to extract something I want in html using 'xpath'

The html code is looking like this:
<img alt="Papa's Cupcakeria To Go!" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-old-hires="" class="a-dynamic-image a-stretch-vertical" id="landingImage" data-a-dynamic-image="{"https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L.png":[512,512],"https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L._SX425_.png":[425,425],"https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L._SX466_.png":[466,466],"https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L._SY450_.png":[450,450],"https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L._SY355_.png":[355,355]}" style="max-width:512px;max-height:512px;">
I want to get "https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L.png" and now I'm using
extract_item(hxs.xpath("//img[#id='landingImage']/#data-a-dynamic-image"))
, what I got is all the content inside that tag.
How can I get the first url only?
If you just want the first URL:
full_content = extract_item(hxs.xpath("//img[#id='landingImage']/#data-a-dynamic-image"))
list_contents = full_content.split(";")
first_image = list_contents[1].replace("&quot","")
print first_image
Also, you can refer this for extracting URL using regex.

Web scraping using excel VBA

I am looking at an HTML code link below:
<h1 class="wer wer">
<a href="http://somelink.com" rel="bookmark" title="Permanent Link to Title of this page that covers some random topic">
Short title of this page...</a>
</h1>
I am currently using the below code to pull out innertext ("Short title of this page...")
For Each ele In .document.all
Select Case ele.classname
Case "wer wer"
RowCount = RowCount + 1
sht.Range("A" & RowCount) = ele.innertext
End Select
Next ele
How can I modify this code to pull out title ("Permanent Link to Title of this page that covers some random topic") and href ("http://somelink.com")?
Any help would be much appreciated. Thanks.
Select the element by its styling.
.document.querySelector("a[href=http://somelink.com]").innerText
a[href=http://somelink.com] is a CSS selector of first element with an a tag having href = 'http://somelink.com'.

How to concatenate .png to the a href

I have the following code inside my asp.net MVC view:-
<img class="thumbnailimag" src="~/Content/uploads/#item.ID.ToString()" + ".png" />
but I am unable to concatenate the .png to my href & src . can anyone advice please ?
Thanks
You'll want to wrap the code in parentheses, as explained here. No need to call .ToString() then:
href="~/Content/uploads/#(item.ID).png"
Your quotes are not closed properly.
href='#string.Format("~/Content/uploads/{0}.png", item.ID)'
Complete Code
<a href='#string.Format("~/Content/uploads/{0}.png", item.ID)'><img class="thumbnailimag" src='#string.Format("~/Content/uploads/{0}.png", item.ID)' /></a>
Alternatively, declare fileName outside of the href (IMO makes it more readable)
#{
var fileName = item.ID.ToString() + ".png";
}
<img class="thumbnailimag" src="~/Content/uploads/#fileName" />