Web scraping by link text - html

I have some experience and knowledge how to scrape by tagName or ClassName. However in this particular case className is not unique also link is changing all the time after accessing the page so it is not possible to get a direct link. The only unique combination is class and link text. What would be the code to access for example Budget and Forecast updating with a_1_610 and Budget and Forecast updating with a_1_611?
My code (edited according to QHarr answer):
Sub GoToLiinosBot()
'This will load a webpage in IE
Dim ie As InternetExplorer
Dim HWNDSrc As Long
Dim elements As Object
Set ie = Nothing
Set ie = New InternetExplorerMedium
ie.Visible = True
ie.Navigate "http://link.com"
With ie
Do
DoEvents
Loop Until ie.ReadyState = READYSTATE_COMPLETE
End With
Application.Wait (Now + TimeValue("0:00:04"))
ie.Document.querySelector(".data .a_1_611").innerText
'Unload IE
Set ie = Nothing
End Sub
Here is source code:

They are class names not ids. A loop is perhaps required, with test of innerText value of node, if ordering changes but otherwise you want the first match for the example shown in image
.data .a_1_611
Which is
ie.document.querySelector(".data .a_1_611").click
nth-of-type is useful for fixed position selection but more expensive than class selectors.

Related

Addressing specific item within HTML string using VBA

I am trying to use VBA to populate a webform and I am struggling to correctly address the field I want to populate.
This is one line of the table in HTML
Here is the overall HTML structure
<th>
<label for="location_sales_target_2_2022_15_target_val">2022w15 (03/04)</label>
</th>
<td>
<input name="location_sales_target[2][2022/15][id]" id="location_sales_target_2_2022_15_id" type="hidden" value="12751">
<input name="location_sales_target[2][2022/15][target_val]" id="location_sales_target_2_2022_15_target_val" type="text" value="0.00">
</td>
<td>£2,097.33</td>
I need to be able to address the final value field and update its value.
This is the VBA code I have so far. My VBA is limited and this is copied from online and modified.
Sub IEWebScrape1()
Dim IE As InternetExplorer 'Reference to Microsoft Internet Controls
Set IE = New InternetExplorer
With IE
.Visible = True
.Navigate2 "https://"
'we add a loop to be sure the website is loaded and ready.
'Does not work consistently. Cannot be relied upon.
Do While .Busy = True Or .readyState <> READYSTATE_COMPLETE 'Equivalent = .ReadyState <> 4
' DoEvents - worth considering. Know implications before you use it.
Application.Wait (Now + TimeValue("00:00:01")) 'Wait 1 second, then check again.
Loop
'Print info in immediate window
With .document 'the source code HTML "below" the displayed page.
Debug.Print.getElementById("sf_admin_container").Children(1).getElementsByTagName("tr")(16).textContent
End With '.document
' .Quit 'close the application window
End With 'ie
End Sub
The VBA code produces this result, which confirms it is correctly referencing the record I am trying to reference.
2022W15 (03/04)
£2,097.33
How do I correctly address the specific element within the record?
I have solved the problem now. Obviously completely confused about using getElement statements. Problem solved by correct use of getElementById

Select HTML Menu Item via VBA

I'm fairly new to HTML, so please bear with me on this. I am using Excel VBA to interact with an website, with the intent to automate this interaction.
Problem Statement
I have a web page with (what looks like) a button that I need to click on. In the HTML it is listed as a Menu Item. Im able to successfully click other buttons on the page, but those have < button > tags.
I have tried to use the click method by selecting it by ID but I get an 'Object Variable or With block variable not set' error.
Sub WD_auto()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLSel As MSHTML.IHTMLElement
IE.Visible = True
IE.navigate "https://wd3.myworkday.com/redacted/d/home.htmld"
Do While IE.Busy = True Or IE.readyState <> 4: DoEvents: Loop
Set HTMLDoc = IE.document
Set HTMLSel = HTMLDoc.getElementById("88831e18a0894109a83c10bc9a9be6c7")
HTMLSel.Click
End Sub
The block of HTML that i think i need to interact with is shown below.
<div class="GNMRENADFGC GNMRENADBHC GNMRENADHHC" tabindex="-2"
id="88831e18a0894109a83c10bc9a9be6c7" role="menuitem"
aria-posinset="1" aria-setsize="3">
Any pointers or literature would be appreciated.
Ok after some more research i have a solution. I am just looping all the div tags and checking the Inner Text property of each until I find the one I want to click.
It works, but if anyone has a more elegant solution I'm all ears. The loop seems a bit wasteful, I would prefer to just refer directly to, and then click the element.
Set ElementsA = IE.document.getElementsByTagName("div")
For Each ElementA In ElementsA
On Error Resume Next
If ElementA.innerText = "User Name and Password login" Then
ElementA.Click
Exit For
End If
Next ElementA

using MS Excel VBA to extracting data from complex HTML/JS

Short introduction, i consider myself as a intermediate VBA coder without any significant HTML experience. I would like to extract data from a HTML/JS webpage using MS Excel VBA. I have spent couple of hours testing my code on various pages as well as looking for training materials and various forums and Q&A pages.
I am desperately asking for you help. (Office 2013, IE 11.0.96)
The goal is to get the FX rate of a certain bloomberg webpage. The long term goal is to run a macro on various exchange rates and get the daily rate out of the system to an excel table per working day, but i will be handle that part.
I would be happy either with
(1)the current rate (span class="priceText__1853e8a5") or
(2) previous closing (section class="dataBox opreviousclosingpriceonetradingdayago numeric") or
(3) opening rate (section class="dataBox openprice numeric").
My issue is that I cannot fetch the part of the html code where the rate is.
Dim IE As Object
Dim div As Object, holdingsClass As Object, botoes As Object
Dim html As HTMLDocument
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = False
.Navigate "https://www.bloomberg.com/quote/EURHKD:CUR"
Do Until .ReadyState = 4: DoEvents: Loop
End With
Set html = IE.document
Set div = IE.document.getElementById("leaderboard") 'works just fine, populates the objects
Set holdingsClass = IE.document.getElementsByclass("dataBox opreviousclosingpriceonetradingdayago numeric") 'i am not sure is it a class element at all
Set botoes = IE.document.getElementsByTagName("dataBox openprice numeric") 'i am not sure is it a tag name at all
Range("a1").Value = div.textContent 'example how i would place it by using .textContent
Range("A2").Value = holdingsClass.textContent
Range("A3").Value = botoes.textContent
Much appreciate your help!
Instead of digging through html why not use Bloomberg API to request the specific rate?
Likely would be faster and would save you a lot of time in the future doing the same kind of thing.
Please see my similiar project where I create a macro to pull historical FX rates from the European central bank.
https://github.com/dmegaffi/VBA-GET-Requests/blob/master/FX%20-%20GET.xlsm
If you right-click the webpage element you want in chrome and select inspect, it'll bring up the details of that element. You can also press f12 to bring up the HTML of any page. This also works in other browsers.
Is this the element you're looking for?
screen shot of mentioned webpage
Based on your code above, you could reference this element with IE.document.getElementsByclass("priceText__1853e8a5"). Elements in HTML can share classes but can't share ID's, so if there is another element with the class priceText__1853e8a5 it won't work since it won't select a single element. Then, of course, you have to select the text within the element since at this point you'd just have the and would need the text inside of it.
Hope this helps.
To address your questions generally, see below.
(1)the current rate (span class="priceText__1853e8a5")
That can be written as a CSS query selector of:
span.priceText__1853e8a5
(2) previous closing (section class="dataBox
opreviousclosingpriceonetradingdayago numeric")
That can be written as a CSS query selector of:
.dataBox.opreviousclosingpriceonetradingdayago.numeric
(3) opening rate (section class="dataBox openprice numeric")
That can be written as a CSS query selector of:
.dataBox.openprice.numeric
They are applied with querySelector or querySelectorAll (if more than one match and a later match than the first is required) of HTMLDocument.
E.g.
Debug.Print IE.document.querySelector("span.priceText__1853e8a5").innerText
If more using querySelectorAll
IE.document.querySelectorAll("span.priceText__1853e8a5")(0).innerText
In the above you replace 0 with the appropriate index where your target element is found.
Observing the page the actual selectors appear to be as follows but I think this website is probably using ecmascript syntax that is not supported on legacy browsers i.e. Internet Explorer or is attempting blocked cross domain requests.
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://www.bloomberg.com/quote/EURHKD:CUR"
While .Busy Or .readyState < 4: DoEvents: Wend
With .document
Debug.Print "Current: " & .querySelector(".priceText__1853e8a5").innerText
Debug.Print "Prev close: " & .querySelector(".value__b93f12ea").innerText
Debug.Print "Open: " & .querySelector(".value__b93f12ea").innerText
End With
.Quit
End With
End Sub
Using Selenium Basic and Chrome the page renders fine:
Option Explicit
Public Sub GetInfo()
Dim d As WebDriver
Set d = New ChromeDriver
Const URL = "https://www.bloomberg.com/quote/EURHKD:CUR"
With d
.Start "Chrome"
.get URL
Debug.Print "Current: " & .FindElementByCss(".priceText__1853e8a5").Text
Debug.Print "Prev close: " & .FindElementByCss(".value__b93f12ea").Text
Debug.Print "Open: " & .FindElementByCss(".value__b93f12ea").Text
.Quit
End With
End Sub

VBA Excel Run time error 438 / getElementbyClassName

I'm a newbie, attempting to web scrape aspect ratio details from the imdb.com website.
I've plundered some code on You Tube and adapted it using inspect element.
The code opens imdb and runs a search by title but returns a Run Time error 438.
Ideally I'd like it to return the html of the top result so I could perform a further click the top result to follow through to the page with tech details from where I could get the aspect ratio information and paste it into a cell.
Unfortunately I get a fail from my Click instruction - haven't even got to the point of extracting the aspect ratio info.
Can anyone see where I've gone wrong?
Many thanks,
Nick
Private Sub Worksheet_Change(ByVal Target As Range)
If Target.Row = Range("Title").Row And Target.Column = Range("Title").Column Then
Dim ie As New InternetExplorer
ie.Visible = True
ie.navigate "https://www.imdb.com/find?ref_=nv_sr_fn&q=" & Range("Title").Value
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Dim doc As HTMLDocument
Set doc = ie.document
Dim sDD As String
doc.getElementsByTagName("a").Click
End If
End Sub
So, addressing your code
You can use a shorter version of Target.Address = Range("Title").Address
You don't want the first a tag element. You want the first search result a tag element.
You can use a CSS selector combination to get the first search result a tag element as shown below.
I use a CSS selector combination of .result_text a to target elements within parent class result_text with tag a. The . is a class selector.
This combination is known as a descendant selector.
Using search term in sheet of Red October this is what the CSS query first result is:
It is a relative link with base string https://www.imdb.com.
Applying via querySelector method means only first matched result is returned i.e. the top result.
VBA:
Option Explicit
Private Sub Worksheet_Change(ByVal Target As Range)
Application.EnableEvents = False
If Target.Address = Range("Title").Address Then
Dim ie As New InternetExplorer
ie.Visible = True
ie.navigate "https://www.imdb.com/find?ref_=nv_sr_fn&q=" & Range("Title").value
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Dim doc As HTMLDocument
Set doc = ie.document
doc.querySelector(".result_text a").Click
'other code
End If
Application.EnableEvents = True
End Sub
This line of code:
doc.getElementsByTagName("a")
gives you the Collection of Hyperlinks in your HTML Document. That is, it gives you ALL the elements that match your given criteria, if any are available.
However, some issues may arrive:
There may not be any hyperlinks available - So there are no elements to click on.
You are not referencing any element to click. If you want the first one in the collection of found items, you could go with the index, as suggested. Else, you might look for another clicking criteria (such as what is its text or another given attribute).
Even still, a found element might not be clickable by your browser, if, for example, it is shadowed by another element.

Excel VBA: Webpage HTML not showing when I navigate to new page

I'm trying to create a VBA macro in Excel that:
Navigates to a webpage
Searches the HTML document for all elements with the tag name "input"
Prints the attributes of each element found (name, type, and value)
Clicks the button on the webpage to navigate to the second webpage.
Search the HTML document on the second page for all elements with the tag name "input"
Prints the attributes of each element found (name, type, and value)
Everything works find up until Step 5. When I try to search the HTML document, for some reason it doesn't search the HTML document of the second page, instead it looks at the HTML of the initial webpage in Step 2 and prints out the same results in Step 3.
Could you guys please take a look at my code to see what I am doing wrong? I listed my code below and tried to make comments to make it readable.
Sub C_R()
Dim ie As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLInput As MSHTML.IHTMLElement
Dim HTMLButtons As MSHTML.IHTMLElementCollection
Dim HTMLButton As MSHTML.IHTMLElement
'Opens Internet Explorer and navigates to website.
ie.Visible = True
ie.navigate "http://openaccess.sb-court.org/OpenAccess/"
Do While ie.ReadyState <> READYSTATE_COMPLETE
Loop
'Searches HTML(initial page) to find all elements with "input" tag name.
'Prints attributes of each element (name, type, and value".
Set HTMLDoc = ie.Document
Set HTMLButtons = HTMLDoc.getElementsByTagName("input")
Debug.Print "Initial Page"
For Each HTMLButton In HTMLButtons
Debug.Print HTMLButton.getAttribute("name"), HTMLButton.getAttribute("type"), HTMLButton.getAttribute("value")
Next HTMLButton
'Navigates to second page
HTMLButtons(1).Click
Do While ie.ReadyState <> READYSTATE_COMPLETE
Loop
'Searches HTML(second page) to find all elements with "input" tag name.
'Prints attributes of each element (name, type, and value".
Set HTMLDoc = ie.Document
Set HTMLButtons = HTMLDoc.getElementsByTagName("input")
Debug.Print "Second Page"
For Each HTMLButton In HTMLButtons
Debug.Print HTMLButton.getAttribute("name"), HTMLButton.getAttribute("type"), HTMLButton.getAttribute("value")
Next HTMLButton
End Sub
Any help you can provide will be greatly appreciated. Thank you so much.
It looks like your second loop is continuing even though the webpage hasn't completely loaded.
Do While ie.ReadyState <> READYSTATE_COMPLETE
Loop
To see this, put a break point at the second loop, and wait for the second web page to load. Then continue the code, and it should work fine.
You will need to either add a wait time in the loop, which may not always work, or find another way to tell if the ie.ReadyState is complete.