I'm attempting to use VBA to scrape the link to a .gif file from this HTML fragment:
<div class="row">
<div class="col-md-12">
<div id='imageDiv' style='width:99%'>
<img style='width:99% !important; border:5px solid silver;' src="http://www.[rest of link].gif" alt="" />
</div>
My code below :
parent_url = "http://www.[webpage url]"
objIE.navigate parent_url
While objIE.Busy Or objIE.readyState <> 4
DoEvents
Wend
For Each ele In objIE.document.getElementsByTagName("imageDiv")
If InStr(ele.Style, "width") > 0 Then
ws1.Cells(2, 2) = ele.innerText: Exit For
End If
Next
objIE.Quit
This doesn't write anything to the spreadsheet even when I try a number of different element types.
Any pointers about what I'm doing wrong here?
TIA
Firstly, as pointed out by Tim Williams, imageDiv is an id and not a tag, so it can be reached via .getElementById() method, which returns a unique HTML element, in contrast to .getElementsByTagName() which returns a collection of HTML elements.
A tag in HTML, in its simplest form looks like <TagName>Inner Text</>.
So in your case, the tag name you are looking for is img and the id you're looking for is imageDiv.
So, if you want to get the element whose id is imageDiv and then get its img elements, and more specifically its first img element you would have to do it like so:
Dim img As HTMLImg
Set img = objIE.document.getElementById("imageDiv").getElementsByTagName("img")(0)
Secondly, the innerText is not what you are looking for. What you need is the src.
This can be reached like so:
Debug.Print img.src
To take advantage of the .src property, we store the element in a HTMLImg variable.
The code above will print the following to your immediate window:
http://www.[rest%20of%20link].gif/
References Used: Microsoft HTML Object Library
Try:
For Each ele In objIE.document.getElementById("imageDiv").getElementsByTagName("img")
In addition to the answers given it is more efficient and faster to use a css selector which returns a single node
Debug.Print objIE.document.querySelector("#imageDiv img").src
Related
I have done few VBA + IE connections before, especially with regards to selecting different buttons and lists [including automated data inputs e.g. for logins, date, and so on]. However, I have not done copying of specific data from the IE to excel before.
The question is how to extract this data from the IE to excel. Namely the number 257 (which changes everyday). The additional issue is the structure of the IE page [which is Google Analytics]. Before you advise me that I can use "Export" on GA page, please note I have majority of my options blocked. Also google drive and google docs are out of the equation.
The part of the source code is:
<div class="_GAlF _GALn">P R R</div>
<div class="_GAef" id="ID-layout-1536671725872"><div class="_GANY"><div class="_GAxN"><img width="75" height="18" class="_GANU" alt="" src="s/cleardot.gif"></div><div><div class="_GAeS _GAHeb _GAA6">257</div></div><div><span class="_GAkhb">% of Total:</span> <span class="_GAvQb">0.04%</span> <span class="_GAqs">(601,038)</span></div></div></div>
<div class="_GANY"><div class="_GAxN"><img width="75" height="18" class="_GANU" alt="" src="s/cleardot.gif"></div><div><div class="_GAeS _GAHeb _GAA6">257</div></div><div><span class="_GAkhb">% of Total:</span> <span class="_GAvQb">0.04%</span> <span class="_GAqs">(601,038)</span></div></div>
<div class="_GAxN"><img width="75" height="18" class="_GANU" alt="" src="s/cleardot.gif"></div>
<div><div class="_GAeS _GAHeb _GAA6">257</div></div>
<div class="_GAeS _GAHeb _GAA6">257</div>
cleaner screenshot:
Can the value be perhaps identified through the "ID-layout", which seems to be unique to this particular box? Yet that would have to descend to the area which holds the value of 257 anyway. Please advise. Thank you.
That's what I get Q:
For HTML shown you could attempt to narrow down with parent id and local class attribute selector combination
ThisWorkbook.Worksheets("Sheet1").Cells(1,1) = ie.document.querySelector("#ID-layout-1536671725872 [class='_GAeS _GAHeb _GAA6']").innerText
Not all the selector shows in the query box below but this is the result:
The id selector is added to try and localize the class selector, given the small HTML sample. The "#" is id CSS selector and "[]" is the attribute selector.
The selector combination is applied via the .querySelector method of document. Provided you are using above IE8 this should work fine.
If the page is not JS heavy/slow loading you may be able to ditch IE opening for issuing an XMLHTTP request:
Dim html As New HTMLDocument '<==Tools > references > add reference to microsoft html object library
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
With html
.body.innerHTML = sResponse
ThisWorkbook.Worksheets("Sheet1").Cells(1,1) = .querySelector("#ID-layout-1536671725872 [class='_GAeS _GAHeb _GAA6']").innerText
End With
End With
Is class "_GAeS _GAHeb _GAA6" used only at this line?
If it is, this should work:
IE.document.getElementsByClassName("_GAeS _GAHeb _GAA6")(0).innerText
I'm creating an automation that will go through almost 110 pages with VBA. These pages have identical layout. I would need to go from one page to another automatically by "clicking" next button. At the very end of every page, there is a "button" (list anchor) that says "Next page". Problem is that the source code does not contain ID which would make it easy to refer with:
getElementById("id").Click
I open browser. That works fine. and I've tried something like this but it doesn't work:
Dim ieDoc As Object
Dim links As Object
Dim link As Object
Set ieDoc = ieApp.Document
Set links = ieDoc.Anchors
For Each link In links
If link.innerHTML = "innerHTML" Then
link.Click
Exit For
End If
Next link
I have tried almost everything I could find from stackoverflow but nothing worked for my needs.
THis is the source code of the "Next button" that I'm trying to click:
<li class="pager-next"><a title="Next page" href="/fi/tyosuhde- edut/kayttokohdehakupage=1&service_type=lunch&keywords=&city=&service=&service_areas=&payment_method=&municipality=&service_coupon_code=&items_per_page=50">seuraava ›</a></li>
I quess the problem is that the ClassName is in "li" and not in "a"?
Could some help me??
EDIT
Found a workaround!!:
Set pages = doc.getElementsByTagName("a")
For Each page In pages
If (page.getAttribute("title") = "Siirry seuraavalle sivulle") Then
page.Click
End If
Next page
You will need to keep reseting the html document with each refresh.
After a refresh try
ieApp.document.querySelector("a[title=""Next page""]").Click
CSS Selector
More info about CSS Selectors: CSS selectors
EDIT:
In your case the actual HTML selector is
appIE.doc.querySelector("a[title = ""Siirry seuraavalle sivulle""]").Click
Note there is no space after the "a" and you will need to leave enough time between clicks to allow the new page to load.
how can I parse this html code:
163 Punti<
I want to parse "163 Punti". I've tried to search on google but I didn't found nothing..
Someone could help me? Thanx
The element doesn't have an ID. So you can't get the element by using GetElementById method (which is the surest way to identify an element). However you can use other methods.
Dim allLinks As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("A")
For Each link As HtmlElement In allLinks
If link.GetAttribute("href") = "http://member.20dollars2surf.com/points.php" Then
Dim linkText As String = link.InnerHtml
MessageBox.Show(linkText)
End If
Next
The above code will work properly only if there is only one link on the page with that URL. Otherwise you will need to further customize this code.
Bear in mind that I know only a bit about HTML:
There is a site I'm trying to interact with using a WebBrowser. The site has a textarea element as follows:
<textarea name="ctl00$ContentPlaceHolderMain$txtCallDesc" rows="2" cols="20" id="ctl00_ContentPlaceHolderMain_txtCallDesc" tabindex="205" style="width: 100%; height: 80px; font-size: 8pt"></textarea>
From what I've read, the generated ID of the textarea signifies that it's placed inside another form of some sort, and I'm not sure if this is where I'm running into my problem.
Once the page has loaded, I have something like the following in a button:
Dim theCol As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("textarea")
For Each curElement As HtmlElement In theCol
ListBox1.Items.Add(curElement.TagName)
Next
Nothing populates in the list. I've also tried using the ID of the text box gathered by the 'inspect element' feature of Chrome:
Dim value As HtmlElement = WebBrowser1.Document.GetElementById("ctl00_ContentPlaceHolderMain_txtCallDesc")
MsgBox(value.GetAttribute("value"))
No matter what I do, I can't seem to get the program to recognize that there ARE textarea elements in the document. The source for the page is far to long to spam everyone with here, but is there anything I'm missing that I should be looking out for? Perhaps needing to get another element first, then search that for elements within it?
Edit:
The element I'm trying to get seems to be within an iFrame, but it looks like it's from the same domain so the same origin policy shouldn't come into play, should it?
<iframe id="mainFrame" width="100%" height="100%" frameborder="0" class="mainFrame" name="Main" src="/Calls/OpenCalls.aspx">
Using the code shown in Get Iframe HTML:
For i = 0 To WebBrowser1.Document.Window.Frames.Count - 1
Dim frameDoc = WebBrowser1.Document.Window.Frames(i)
Dim theCol = frameDoc.Document.GetElementsByTagName("textarea")
For Each curElement As HtmlElement In theCol
ListBox1.Items.Add(String.Format("TagName: {0} Id:{1}", curElement.TagName, curElement.Id))
Next
Next
The essential part being the use of WebBrowser1.Document.Window.Frames.
You can't reference elements inside an iframe directly since they are inside another document. So first get a reference to the document element inside the iframe and then you can query it the same way.
Dim frameDoc = WebBrowser1.Document.GetElementById("mainFrame").DomElement.contentWindow.Document
And the rest you already know...
Dim theCol = frameDoc.GetElementsByTagName("textarea")
For Each curElement In theCol
ListBox1.Items.Add(curElement.TagName)
Next
is it possible to get the contents of a tag from a web browser control like this: <div class="desc">contents</div> and then strip all HTML characters from it?
say WebBrowser1 has a website loaded into it. I want to extract the source code from it and find this:
<div class="desc"><b>these are the contents I want</b></div>
and extract it like this: these are the contents I want
Dim divs = WebBrowser1.Document.Body.GetElementsByTagName("div")
For Each d As HtmlElement In divs
If d.GetAttribute("className") = "desc" Then
Return d.InnerText
End If
Next