Get values from a website with same ID - html

I'm trying to make a local VBscript to get some values from a webpage. I know that I can use the next code to get a value from a specific element:
IE.document.GetElementById("id-to-find")
My problem is that I have the same ID ("hiddencardetailsenrollid") in more than one element so I need to extract all of them. This is the code repeated:
carId: <span id="hiddencardetailscarid">10972203</span>,
enrollId: <span id="hiddencardetailsenrollid">11147540</span>.
Do you have any suggestion to do this? I thought on a for condition to read all the HTML document but I do not know how to approach it.
Any help will be appreciated.
Edit: Here is it the full screenshot of the sourcecode. As you can see, they have exactly the same labels, but carId and enrollId have different values. I can't copypaste the code, stackoverflow returns me an error (I suppose because "table" tag):

If you did have multiple elements with the same ID, which you shouldn't, you could use the answer from this question (courtesty of #peter) and slightly modify it:
Dim HTMLDoc, XML, URL, table
Set HTMLDoc = CreateObject("HTMLFile")
Set XML = CreateObject("MSXML2.XMLHTTP")
URL = "http://www.verizonwireless.com/b2c/store/controller?item=phoneFirst&action=viewPhoneDetail&selectedPhoneId=5723"
With XML
.Open "GET", URL, False
.Send
HTMLDoc.Write .responseText
End With
Set spans = HTMLDoc.getElementsByTagName("span")
for each span in spans
WScript.Echo span.innerHTML
next
'=><SPAN>Set Location</SPAN>
'=>Set Location
'=><SPAN>Submit</SPAN>
'=>Submit
You would simply replace getElementsByTagName with GetElementByID as you mentioned, then loop through the resulting array of objects. GetElementByID probably isn't even capable of returning an array. But again, you should not have multiple html elements with the same id.

Related

Obtain Innertext from Web Element with Variable Path - Selenium

I have a VBA macro that I'm running in Excel 2016. The macro brings back information from the internet using Chrome and Selenium WebDriver. The macro iterates through several similar webpages, but some pages have a few more or less lines than others. Hence, the XPath to the innertext I'm interested in varies slightly from page to page. Here is a snippet of the source code for the element, it is the "242" that I'm trying to locate and extract.
<div ng-repeat="squarefootage in improvement.SquareFootage" class="ng-scope">
<div>
<span class="labelSquareFootage ng-binding">ATTACHED GARAGE AREA </span><span class="result ng-binding">242</span>
</div>
</div>
As a workaround I'm just grabbing the entire source code for the page and then parsing it with INSTR to find what I'm looking for. I was wondering if there was a more elegant method to find an element with a variable path? Is there something in WebDriver that would work like
WDriver.FindElementbyInnerHTML
?
Here is a link to the website, you can look at a few different addresses and see how the path changes from page (address) to page (next address).
You could gather all nodes with matching class and loop until desired garage text found then take the nextSibling
Public Sub Demo()
'Your code to get to page and enter address and search, open heading, then....
Dim html As MSHTML.HTMLDocument
Set html = New MSHTML.HTMLDocument
html.body.innerHTML = WDriver.PageSource
Dim nodes As Object, node As Object, i As Long
Set nodes = html.querySelectorAll(".labelSquareFootage")
For i = 0 To nodes.Length - 1
Set node = nodes.Item(i)
If InStr(node.innerText, "ATTACHED GARAGE AREA") > 0 Then
Debug.Print node.NextSibling.innerText
Exit For
End If
Next i
End Sub
For xpath, you could try
//*[text()[contains(.,'ATTACHED GARAGE AREA')]]/following-sibling::span
if the desired value is the next span node. This searches for the desired text in the .innerText then takes the nextSibling span.
CSS selectors

Concatenate Rich Text Fields (HTML) and display result on Access form

I have an access database which deals with "articles" and "items" which are all textual stuff. An article is composed of several items. Each item has a rich text field and I wish to display the textual content of an article by concatenating all rich text fields of its items.
I have written a VBA program which concatenates the items rich text fields and feeds this into an independent TextBox control on my form (Textbox.Text = resulting string) but it does not work, I get an error message saying "this property parameter is too long".
If I try to feed a single textual field into the Textbox control, I get another error stating "Impossible to update the recordset" which I do not understand, what recordset is this about ?
Each item field is typically something like this (I use square brackets instead of "<" and ">" because otherwise the display of the post is not right) [div][font ...]Content[/font] [/div]", with "[em]" tags also included.
In front of my problem, I have a number of questions :
1) How do you feed an HTML string into an independent Textbox control ?
2) Is it OK to concatenate these HTML strings or should I modify tags, for example have only one "[div]" block instead of several in a row (suppress intermediate div tags) ?
3) What control should I use to display the result ?
You might well answer that I might as well use a subform displaying the different items of which an article is made up. Yes, but it is impossible to have a variable height for each item, and the reading of the whole article is very cumbersome
Thank you for any advice you may provide
It works for me with a simple function:
Public Function ConcatHtml()
Dim RS As Recordset
Dim S As String
Set RS = CurrentDb.OpenRecordset("tRichtext")
Do While Not RS.EOF
' Visually separate the records, it works with and without this line
If S <> "" Then S = S & "<br>"
S = S & RS!rText & vbCrLf
RS.MoveNext
Loop
RS.Close
ConcatHtml = S
End Function
and an unbound textbox with control source =ConcatHtml().
In your case you'd have to add the article foreign key as parameter to limit the item records you concatenate.
The "rich text" feature of a textbox is only intended for simple text.
We use the web browser control to display a larger amount of HTML text, and load it like this:
Private Sub Form_Current()
LoadWebPreview
End Sub
Private Sub HtmlKode_AfterUpdate()
LoadWebPreview
End Sub
Private Sub LoadWebPreview()
' Let the browser control finish the rendering of its standard content.
While Me!WebPreview.ReadyState <> acComplete
DoEvents
Wend
' Avoid the pop-up warning about running scripts.
Me!WebPreview.Silent = True
' Show body as it would be displayed in Outlook.
Me!WebPreview.Document.body.innerHTML = Me!HtmlBody.Value
End Sub

VBA Excel Run time error 438 / getElementbyClassName

I'm a newbie, attempting to web scrape aspect ratio details from the imdb.com website.
I've plundered some code on You Tube and adapted it using inspect element.
The code opens imdb and runs a search by title but returns a Run Time error 438.
Ideally I'd like it to return the html of the top result so I could perform a further click the top result to follow through to the page with tech details from where I could get the aspect ratio information and paste it into a cell.
Unfortunately I get a fail from my Click instruction - haven't even got to the point of extracting the aspect ratio info.
Can anyone see where I've gone wrong?
Many thanks,
Nick
Private Sub Worksheet_Change(ByVal Target As Range)
If Target.Row = Range("Title").Row And Target.Column = Range("Title").Column Then
Dim ie As New InternetExplorer
ie.Visible = True
ie.navigate "https://www.imdb.com/find?ref_=nv_sr_fn&q=" & Range("Title").Value
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Dim doc As HTMLDocument
Set doc = ie.document
Dim sDD As String
doc.getElementsByTagName("a").Click
End If
End Sub
So, addressing your code
You can use a shorter version of Target.Address = Range("Title").Address
You don't want the first a tag element. You want the first search result a tag element.
You can use a CSS selector combination to get the first search result a tag element as shown below.
I use a CSS selector combination of .result_text a to target elements within parent class result_text with tag a. The . is a class selector.
This combination is known as a descendant selector.
Using search term in sheet of Red October this is what the CSS query first result is:
It is a relative link with base string https://www.imdb.com.
Applying via querySelector method means only first matched result is returned i.e. the top result.
VBA:
Option Explicit
Private Sub Worksheet_Change(ByVal Target As Range)
Application.EnableEvents = False
If Target.Address = Range("Title").Address Then
Dim ie As New InternetExplorer
ie.Visible = True
ie.navigate "https://www.imdb.com/find?ref_=nv_sr_fn&q=" & Range("Title").value
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Dim doc As HTMLDocument
Set doc = ie.document
doc.querySelector(".result_text a").Click
'other code
End If
Application.EnableEvents = True
End Sub
This line of code:
doc.getElementsByTagName("a")
gives you the Collection of Hyperlinks in your HTML Document. That is, it gives you ALL the elements that match your given criteria, if any are available.
However, some issues may arrive:
There may not be any hyperlinks available - So there are no elements to click on.
You are not referencing any element to click. If you want the first one in the collection of found items, you could go with the index, as suggested. Else, you might look for another clicking criteria (such as what is its text or another given attribute).
Even still, a found element might not be clickable by your browser, if, for example, it is shadowed by another element.

Retrieving the text between the <div> with VBA

I am trying to get a text string from inside a div on a webpage, but I can't seem to figure out how it is stored in the element.
Set eleval = objIE.Document.getElementsByClassName("outputValue")(0)
Debug.Print (eleval.innerText)
I have tried this and variations thereof, but my string just reads as "".
I mainly need help on how is this type of data is referenced in VBA.
<div class="outputValue">"text data that I want"</div>
Here is a screenshot of the page in question, I cannot give a link since it requires a company login to reach.
With .querySelector method, make sure page if fully loaded before attempting.
Example delays can be added with Application.Wait Now + TimeSerial(h,m,s)
Set eleval = objIE.Document.querySelector("div[class="outputValue"]")
Debug.Print eleval.innerText
If it is the first of its className on the page you could also use:
Set eleval = objIE.Document.querySelector(".outputValue")
If there is more than one and it is at a later index you can use
Set eleval = objIE.Document.querySelectorAll(".outputValue")
And then access items from the nodeList returned with
Debug.Print eleval.Item(0).innerText 'or replace 0 with the appropriate index.
Dim elaval as Variant
elaval = Trim(Doc.getElementsByTagName("div")(X).innerText)
msgbox elaval
Where X is the instance of your class div

VBA Excel Scraping

I am getting started with trying to learn about scraping. I got this page that is behind a login and I remember reading that you should not try to do the (1), (2) or (3) thing after get element by tagname. But that you should rather get something more unique like a Classname or ID. But can someone please tell me why
This the GetTag works and
Dim Companyname As String
Companyname = ie.document.getElementsByTagName("span")(1).innertext
This GetClass do not work
Dim Companyname As String
Companyname = ie.document.getElementsByClassName("account-website-name").innertext
This is the text that I am scraping
<span class="account-website-name" data-journey-name="true">Dwellington Journey</span>
getELEMENTbyProperty vs getELEMENTSbyProperty
There are primarily two distinct types of commands to retrieve one or more elements from a web page's .Document; those that return a single object and those that return a collection of objects.
Getting an ELEMENT
When getElementById is used, you are asking for a single object (e.g. MSHTML.IHTMLElement). In this case the properties (e.g. .Value, .innerText, .outerHtml, etc) can be retrieved directly. There isn't supposed to be more than a single unique id property within an HTML body so this function should safely return the only element within the i.e.document that matches.
'typical VBA use of getElementById
Dim CompanyName As String
CompanyName = ie.document.getElementById("CompanyID").innerText
Caveat: I've noticed a growing number of web designers who seem to think that using the same id for multiple elements is oh-key-doh-key as long as the id's are within different parent elements like different <div> elements. AFAIK, this is patently wrong but seems to be a growing practise. Be careful on what is returned when using .getElementById.
Getting ELEMENTS
When using getElementsByTagName, getElementsByClassName, etc. where the word Elements is plural, you are returning a collection (e.g. MSHTML.IHTMLElementCollection) of objects, even if that collection contains only one or even none. If you want to use these to directly access an property of one of the elements within the collection, an ordinal index number must be supplied so that a single element within the collection is referenced. The index number within these collections is zero based (i.e. the first starts at (0)).
'retrieve the text from the third <span> element on a webpage
Dim CompanyName As String
CompanyName = ie.document.getElementsByTagName("span")(2).innerText
'output all <span> classnames to the Immediate window until the right one comes along
'retrieve the text from the first <span> element with a classname of 'account-website-name'
Dim e as long, es as long
es = ie.document.getElementsByTagName("span").Length - 1
For e = 0 To es
Debug.Print ie.document.getElementsByTagName("span")(e).className
If ie.document.getElementsByTagName("span")(e).className = "account-website-name" Then
CompanyName = ie.document.getElementsByTagName("span")(e).innerText
Exit For
End If
Next e
'same thing, different method
Dim eSPN as MSHTML.IHTMLElement, ecSPNs as MSHTML.IHTMLElementCollection
ecSPNs = ie.document.getElementsByTagName("span")
For Each eSPN in ecSPNs
Debug.Print eSPN.className
If eSPN.className = "account-website-name" Then
CompanyName = eSPN.innerText
Exit For
End If
Next eSPN
Set eSPN = Nothing: Set ecSPNs = Nothing
To summarize, if your Internet.Explorer method uses Elements (plural) rather than Element (singular), you are returning a collection which must have an index number appended if you wish to treat one of the elements within the collection as a single element.
CSS selector:
You can achieve the same thing with a CSS selector of .account-website-name
The "." means className. This will return a collection of matching elements if there are more than one.
CSS query:
VBA:
You apply the selector with the .querySelectorAll method of .document. This returns a nodeList which you traverse the .Length of, accessing items by index, starting from 0.
Dim aNodeList As Object, i As Long
Set aNodeList = ie.document.querySelectorAll(".account-website-name")
For i = 0 To aNodeList.Length -1
Debug.Print aNodeList.Item(i).innerText
' Debug.Print aNodeList(i).innerText ''<== sometimes this syntax instead
Next