Excel VBA: using HTML DOM - html

I need to know what all can I do with HTML over Excel VBA. for example I know that I can find element by id
ie.document.getElementByID().
I will work with HTML table which doesn't have elements with id, so that it will look like child->child->sibling->child..... i think.
Can anybody pleas show me part of code, which will get text "hello" from this example table? first node will be found by his ID.
...
<table id="something">
<tr>
<td></td><td></td>
</tr>
<tr>
<td></td><td>hello</td>
</tr>
...

I'm looking at this type of thing at the moment...
I believe the something like the below should do it:
ie.getelementbyid("something").getelementsbytagname("TD")(3).innertext
How it works:
It searches the HTML doc for the element where the ID is equal to "something" (will take first iteration if more than 1 but you can loop through many). It will then get take the table data tag and go to the iteration (3) where the text is (0 would equal the first TD).
Let us know if this works.

Many HTML documents have names in generalized tables as opposed to ID's
I commonly use some form of what is shown below;
Set HTML = IE.Document
Set SomethingID = HTML.GetElementByID("something").Children
For Each TR in SomethingID
If TR.TagName = "td" Then
If TR.Name = "myname" Then
'code in here if you are looking for a name to the element
ElseIf TR.Value = "myValue" Then
'Code in here if you are looking for a specific value
ElseIf TR.innerText = "mytext" Then
'more code in here if you are looking for a specific inner text
End If
End If
Next TR

Related

How to click on "Submit" button from a website

I'm new to VBA and have stumbled upon this one problem where I coultnt use vba code to automatically click on a "Submit" button from a website. I have tweaked my code many times but it always skipped the line "e.click". Below is my recent code and an image of the website's elements.
Hope someone can shed some lights here.
Set tags = objIE.Document.getElementById("alltab").getElementsByTagName("a")
For Each e In tags
If e.getAttribute("alt") = "Submit a Contract" Then
e.Click
End If
next
website's elements
You can simply use an attribute = value selector for the alt attribute. Nice and fast. You don't want to loop an entire collection if you don't have to. Also, in any loop you would want to Exit For after found I believe.
objIE.document.querySelector("[alt='Submit a Contract']").click
Please check your code, from your screenshot it seems that the hyperlink (html a tag) not inside the alltab table.
Please try to find the table by the class name and modify your code to add an ID property:
Find the table by ID property (add an ID property for the second table):
Set tags = doc.getElementById("table id").getElementsByTagName("a")
For Each e In tags
If e.getAttribute("alt") = "Submit a Contract" Then
e.Click
End If
Next e
or
Find the table by class name:
Set tags = doc.getElementsByClassName("belowDealButtonBox")(0).getElementsByTagName("a")
For Each e In tags
If e.getAttribute("alt") = "Submit a Contract" Then
e.Click
End If
Next e

How to click on a button on a webpage using <td> and <tr>?

I am trying to click o the first "Completed" button in the highlighted part of the webpage below.
Here is a piece of the VBA code of the website page:
I tried to click on the FIRST completed button in many different ways such as :
For Each element In ie3.getElementsByTagName("main_table_data_right_border main_table_data_bottom_border")(5)
If element.innerText = "Completed" Then
' Application.Wait (Now + TimeValue("0:03:00"))
element.Click
Application.Wait (Now + TimeValue("0:00:20"))
Exit For
Else
End If
Next
Or
doc.querySelector("#divPage > table.advancedSearch_table > tbody"). _ getElementsByTagName("tr")(3).getElementsByTagName("td")(5).Children(0).Click
But none of them seem to work. When I debug the code and I go through this part and this particular line, nothing really happens. So the button is not being clicked.
Can anyone help me with that?
You could use the getElementsByTagName method to find the hyperlink. Please refer to the following sample:
VBA code to find the hyperlink and click the button (in this sample, I just find the special cell in the first row. If you want to loop through the hyperlink, you need to use For Each statement to loop through the array).
Sub Test()
Dim ie As Object
Dim Rank As Object
Set ie = CreateObject("InternetExplorer.application")
ie.Visible = True
ie.Navigate ("http://localhost:54382/HtmlPage47.html")
Do
If ie.ReadyState = 4 Then
Exit Do
Else
End If
Loop
Set doc = ie.document
doc.getElementsByTagName("tr")(1).getElementsByTagName("td")(5).getElementsByTagName("a")(0).Click
End Sub
Code in the Web page:
<div>
<table class="main_table" style="text-align:center;">
<tr class="main_table_header">
<td></td>
<td>Export Type</td>
<td>Criteria</td>
<td>Rep./List</td>
<td>Creation Date</td>
<td>Status</td>
<td>Reference</td>
</tr>
<tr class="main_table_data">
<td>
<input id="Checkbox1" type="checkbox" />
</td>
<td>Activites</td>
<td>Process Date from 2019/07/02 to 2019/07/02</td>
<td>For an advanced search</td>
<td>2019/07/03</td>
<td><a onclick="javascript:alert('hello AA')" id="link1" href="#">Conpleted</a> (601 lines)</td>
<td>662602308</td>
</tr>
<tr class="main_table_data">
<td>
<input id="Checkbox1" type="checkbox" />
</td>
<td>Activites</td>
<td>Process Date from 2019/07/02 to 2019/07/02</td>
<td>For an advanced search</td>
<td>2019/07/03</td>
<td><a onclick="javascript:alert('hello BB')" href="#">Conpleted</a> (601 lines)</td>
<td>662602308</td>
</tr>
<tr class="main_table_data">
<td>
<input id="Checkbox1" type="checkbox" />
</td>
<td>Activites</td>
<td>Process Date from 2019/07/02 to 2019/07/02</td>
<td>For an advanced search</td>
<td>2019/07/03</td>
<td><a onclick="javascript:alert('hello CC')" href="#">Conpleted</a> (601 lines)</td>
<td>662602308</td>
</tr>
<tr class="main_table_data">
<td>
<input id="Checkbox1" type="checkbox" />
</td>
<td>Activites</td>
<td>Process Date from 2019/07/02 to 2019/07/02</td>
<td>For an advanced search</td>
<td>2019/07/03</td>
<td><a onclick="javascript:alert('hello DD')" href="#">Conpleted</a> (601 lines)</td>
<td>662602308</td>
</tr>
</table>
</div>
The result is like this:
I see you are a bit confused as to how to access HTML elements, so I'll take this opportunity to demonstrate the logic of doing so in a very detailed manner, which I also believe to be very intuitive. There are other ways to do it, but I believe the following one is the most comprehensive and intuitive one and ideal for a beginner.
Firstly, I will go ahead and assume that ie3 is an InternetExplorer object.
When you use this object to navigate to a page, you can access the html of that page by using the ie3.document, which holds an HTML document object.
To take full advantage of the HTML document object you should add a reference to the Microsoft HTML Object Library. This Library will allow you to use a number of HTML elements which make your life easier.
In your case, the elements you want to be able to access are
HTML tables and their rows and cells
HTML anchor elements ()
So my declarations would be the following:
Dim ie3 As New InternetExplorer 'To be used to navigate to the page of interest
Dim doc As HTMLDocument 'this will hold the HTML document corresponding to the page
Dim toBeClicked As HTMLAnchorElement 'To be used to store the <a></a> element
Dim table As HTMLTable 'To be used to store the table element
Dim tableRow As HTMLTableRow 'To be used to store a row of the table element
Dim tableCell As HTMLTableCell 'To be used to store a cellof the table element
Assuming that you have already used the ie3 to navigate to the website of interest, you can store it's HTML document in doc like so:
Set doc = ie3.document
Once you have access to the HTML document of the webpage, you can also get access to its elements in a number of ways, some more targeted than others. Below I am demonstrating the most common methods to do that, using the table element as an example.
If the table has a unique ID, you can get access to it by using the .getElementById() method. This method returns a single element. In your case, the table you're after, doesn't have an ID.
If the table belongs to a class, you can get access to it by using the .getElementsByClassName() method. This method returns a collection of elements, all of which belong to the same class. To get access to a member of this collection you can use a (item index) kind of notation. The first member has an index of 0. In your case the table belongs to class "advancedSearch_table", which happens to only have one member.
If there's no class or ID you can use the .getElementsByTagName method. This method returns a collection of all the elements who have the same tag. In your case you would need all the tables in the document. To get access to a member of this collection you can use a (item index) kind of notation. The first member has an index of 0. Tags in HTML look like so <tagName attribute="something">Something</tagName>.
Below I demonstrate all three methods. You can use either one of the first two:
Set table = doc.getElementsByClassName("advancedSearch_table")(0)
Set table = doc.getElementsByTagName("table")(0)
Set table = doc.getElementById("ID of the table") 'only for demostration purposes, it doesn't apply to your case, as the table has no ID.
Keep in mind that in your case, there is only one table in the document and there's only one element that belongs to the class "advancedSearch_table". This means that you need the first element of the corresponding collections. That's why I use 0 as index.
By the same logic as above, now that the table has been stored, you can get access to its rows and cells. More specifically, you need the 5th cell of the 4th row. That's where the link that you want to click is:
Set tableRow = table.getElementsByTagName("tr")(3)
Set tableCell = tableRow.getElementsByTagName("td")(4)
Finally, now that the cell of interest has been stored, you can access the anchor element and click it. Again, there's only one anchor element in the cell, so it's going to be the first one in the corresponding collection:
Set toBeClicked = tableCell.getElementsByTagName("a")(0)
toBeClicked.Click
BONUS
If you want to click on all the "Completed" links, one by one, you need to loop through the corresponding elements. Here'w two ways to do it:
Click on the anchor in the 5th cell of each row:
For Each tableRow In table.Rows
Set toBeClicked = tableRow.getElementsByTagName("td")(4).getElementsByTagName("a")(0)
toBeClicked.Click
Next tableRow
Loop through all rows and though all cells of the table, find the inner text that you're looking for and click the corresponding anchor:
For Each tableRow In table.Rows
For Each tableCell In tableRow.Cells
If tableCell.innerText = "Something" Then
Set toBeClicked = tableCell.getElementsByTagName("a")(0)
toBeClicked.Click
Next tableCell
Next tableRow
Here, once you click on completed hyperlink, JavaScript gets executed and it opens an Excel file, here you can use ie3.Navigate "javascript:openExcelFile('t83_Kerrfinancialadvisorsinc/455X3/ExportActivity_66260230820190703122002139.xlsx)"
Since it's tied with a hyperlink, you can also try using
element.Click
element.FireEvent ("onclick")
or you can use execScript
Call ie3.document.parentWindow.execScript("your script in webpage", "JavaScript")

VBA to click a dynamic href

I'm trying to click a link on a website with the tag:
<a href="/dbget-bin/www_bget?dr:D01441:>D01441</a>
However, I'm doing this after searching for a unique item (I have an array of >9000 unique items), and the "D01441" part is different for each item, and I don't know in advance what it will be for each. The following code is in a loop that goes through each item and searches for it one at a time. After searching, I would like to click on a link that appears (the code above) and do more things on that next web page.
Dim IE As Object
Dim ele As Object
Set IE = CreateObject("InternetExplorer.Application")
...
For Each ele In IE.document.getElementsByTagName("a")
If ele.Href = "/dbget-bin/www_bget?dr:D01441" Then
ele.Click
Exit For
End If
Next
The above code doesn't work and I'm not sure why. But once I get it to work, I don't know how to modify the "D01441" part so that I can click on any searched item's link. Here's more html around the link I want:
<tbody>
<tr> ... </tr>
<tr>
<td class = "data1">
<a href = "/dbget-bin/www_bget?dr:D01441:>D01441</a>
</td>
<td class = "data1">..</td>
<td class = "data1">..</td>
...
EDIT: To try to deal with the changing "D01441", I tried using InStr but it doesn't work either:
For Each ele In IE.document.getElementsByTagName("a")
If InStr(ele.Href, "/dbget-bin/www_bget?dr:") = 1 Then
MsgBox "There"
ele.Click
Exit For
End If
Next
CSS selectors:
Try using a CSS selector combination applied via querySelector method of document to target the common start part of the href.
Applying the selector combination:
IE.document.querySelector("a[href^='/dbget-bin/www_bget?dr:']").Click
Understanding the selector combination:
This uses a CSS selector combination to target the element with:
a[href^='/dbget-bin/www_bget?dr:']
This says element with a tag having attribute href whose value starts with
'/dbget-bin/www_bget?dr:' . The ^ means starts with.
Query in action:
Here is the selector in action on your HTML sample:
Side note:
If you have multiple elements with a tags and an href that starts with /dbget-bin/www_bget?dr:, it will match the first one, in most instances. If that is the case seeing more HTML would help. I think there are a few problems with that HTML sample because in theory a more selective CSS query might be .data1 a[href^='/dbget-bin/www_bget?dr:'], so as to include the parent element class of data1, "." being a class selector.
#QHarr answer is the elegant and best solution, but...
To address your issue of getting the part number from the href, you can use the InStr like this
For Each ele In IE.document.getElementsByTagName("a")
Dim partNumber As String
Dim colonPosition As Long
colonPosition = InStr(1, ele.Href, ":", vbTextCompare)
If colonPosition > 0 Then
partNumber = Right$(ele.Href, Len(ele.Href) - colonPosition)
Debug.Print partNumber
End If
Next ele

Get data from a web table with table tag

I have this code in HTML:
<table cellspacing = "0" cellpadding = "0" width = "100%" border="0">
<td class="TOlinha2"><span id="Co">140200586125</span>
I already have a VBA function that accesses a web site, logs in and goes to the right page. Now I'm trying to take the td tags inside a table in HTML. The value I want is 140200586125, but I want a lot of td tags, so I intend to use a for loop to get those tds and put them in a worksheet.
I have tried both:
.document.getElementByClass()
and:
.document.getElementyById()
but neither worked.
Appreciate the help. I'm from Brazil, so sorry about any English mistakes.
There is not enough HTML to determine if the TOlinha2 is a consistent class name for all the tds within the table of interest; and is limited only to this table. If it is then you can indeed use .querySelectorAll
You could use the CSS selector:
ie.document.querySelectorAll(".TOlinha2")
Where "." stands for className.
You cannot iterate over the returned NodeList with a For Each Loop. See my question Excel crashes when attempting to inspect DispStaticNodeList. Excel will crash and you will lose any unsaved data.
You have to loop the length of the nodeList e.g.
Dim i As Long
For i = 0 To Len(nodeList) -1
Debug.Print nodeList(i).innerText
Next i
Sometimes you need different syntax which is:
Debug.Print nodeList.Item(i).innerText
You can seek to further narrow this CSS selector down with more qualifying elements such as, the element must be within tbody i.e. a table, and preceeded by a tr (table row) and have classname .TOLinha2
ie.document.querySelectorAll("tbody tr .TOlinha2")
Since you mentioned you need to retrieve multiple <td> tags, it would make more sense to retrieve the entire collection rather than using getElementById() to get them one-at-a-time.
Based on your HTML above, this would match all <span> nodes within a <td> with a class='TOlinha2':
Dim node, nodeList
Set nodeList = ie.document.querySelectorAll("td.TOlinha2 > span")
For Each node In nodeList
MsgBox node.innerText ' This should return the text within the <span>
Next

extracting text from a specific <h> element using GetElementById

I have created a VBS script file that looks at an XML data file.
Within the XML data file, the HTML data I need is embedded within the
<![CDATA[]'other interesting HTML data here'].
I have stripped out this HTML data using XPATH and insterted into a Div object (myDiv) element that is represented as a variable (its not written to a document).
So for example, the contents of myDiv.innerHTML looks like this;
<table>
<tr><td>text in cell 1</td></tr>
<tr><td><h1 id="myId1">my text for H1</h></td><tr>
<tr><td><h2 id="myId2">my text for h2</h></td></tr>
</table>
What I want to do at first is simply select the appropriate tag with the Id that matches "myId1", therefore, I used a statement like this;
MyIdText = MyDiv.getElementById("myId1")
However, the aplpication I am using says "Err 438, Object doesn't support this property or method".
I am a bit of a newbie with code and can understand some of the basic fundamantals, but get a bit lost when it becomes a bit more complex (sorry). I have looked through other postings on this board, and all of them seem to rlate to HTML nad Javascript, not VBScript (the application I am using will not allow Java Script).
Am I using the code wrong?
To use getElementById() you should write: document.getElementById("myId1"). This way you tell the browser to search inside 'document' for the specified ID. Your variable is not defined and it does not have this method attached, so your code will generate the above error.
To extract the text inside the specific H element:
MyIdText = document.getElementById("myId1").textContent;
many thanks for the help, unfortunately, I know a little VBS and even littler about DOM and I am trying to learn both by experimenting. There are certain restrictions within the environment/application I am working with (Its called ASCE and its a tool for managing Safety Cases - but thats not important right now).
However, so that we are comparing apples with apples, I have tried to experiment within an HTML page to give me a better understanding of what the DOM/VBS commands can actually do. I have had some partial success, but still cant understand why it falls over where it does.
Here is the exact file I am experimenting with, I have added comment text for each section;
<html>
<head>
<table border=1>
<tr>
<td>text in cell 1</td>
</tr>
<tr>
<td><h1 id="myId1">my text for H1</h1></td>
</tr>
<tr>
<td><h1 id="myId2">my text for h2</h2></td>
</tr>
</table>
<script type="text/vbscript">
DoStuff
Sub DoStuff
' Section 1: Get a node with the Id value of "myId1" from the above HTML
' and assign it to the variable 'GetValue'
' This works fine :-)
Dim GetValue
GetValue = document.getElementById("myId1").innerHTML
MsgBox "the text=" & GetValue
' Section 2: Create a query that assigs to the variable 'MyH1Tags' to all of the <h1>
' tags in the document.
' I assumed that this would be a 'collection of <h1> tags so I set up a loop to itterate
' through however many there were, but this fails as the browser says that this object
' doesn't support this property or method - This is where I am stuck
Dim MyH1Tags
Dim H1Tag
MyH1Tags = document.getElementsByTagName("h1") ' this works
For Each H1Tag in MyH1Tags ' this is where it falls over
MSgbox "Hello"
Next
' Section 3: Create a new Div element 'NewDiv' and then insert some HTML 'MyHTML'
' into 'NewDiv'. Create a query 'MyHeadings' that extracts all h1 headings from 'NewDiv'
' then loop round for however many h1 headings there are in 'MyHeadings'
' and display the text content. This works Ok
Dim NewDiv
Dim MyHTML
Dim MyHeadings
Dim MyHeading
Set NewDiv = document.createElement("DIV")
MyHTML="<h1 id=""a"">heading1</h1><h2 id=""b"">Heading2</h2>"
NewDiv.innerHTML=MyHTML
Set MyHeadings = NewDiv.getElementsByTagName("h1")
For Each MyHeading in MyHeadings
Msgbox "MyHeading=" & MyHeading.innerHTML
Next
'Section 4: Do a combination of Section 1 (that works) and Section 3 (that works)
' by creating a new Div element 'NewDiv2' and then paste into it some HTML
' 'MyHTML2' and then attempt to create a query that extracts the inner HTML from
' an id attribute with the value of "a". But this doesnt work either.
' I have tried "Set MyId = NewDiv2.getElementById("a").innerHTML" and
' also tried "Set MyId = NewDiv2.getElementById("a")" and it always falls over
' at the same line.
Dim NewDiv2
Dim MyHTML2
Dim MyId
Set NewDiv2 = document.createElement("DIV")
MyHTML2="<h1 id=""a"">heading1</h1><h2 id=""b"">Heading2</h2>"
NewDiv2.innerHTML=MyHTML
MyId = NewDiv2.getElementById("a").innerHTML
End Sub
</script>
</head>
<body>