Traverse HTMLDOM table with VBA - html

I have a HTML table something like this:
<table class="Example">
<tr>
<th>Header1</th>
<td>Value1></td>
</tr>
<tr>
<th>Header2</th>
<td>Value2></td>
</tr>
</table>
I want to find the <th> equal to "Header2" and then return the corresponding <td>, i.e. "Value2", which is inside the same <tr> .
I know I can easily use the index number, e.g. getElementsByTagName("td")(1) to find this value, but this is not feasible since each page may have the rows jumbled up.
I've tried doing this varying ways with no success. Hopefully, the following code indicates what I'm trying to do:
Public Declare Function SetForegroundWindow Lib "user32" (ByVal HWND As Long) As Long
Sub WebSearch()
Dim URL As String
Dim IE As Object
Dim HWNDSrc As Long
Dim html As IHTMLDocument
Dim Example As IHTMLElement
Dim TableRows As IHTMLElementCollection
Dim TableRow As IHTMLElement
Dim RowChildren As IHTMLElementCollection
Dim RowChild As IHTMLElement
Dim TableHeader As IHTMLElement
Dim TableData As IHTMLElement
URL = "https://..."
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.Navigate URL
Do While IE.ReadyState = 4: DoEvents: Loop
Do Until IE.ReadyState = 4: DoEvents: Loop
HWNDSrc = IE.HWND
SetForegroundWindow HWNDSrc
Set html = IE.document
On Error Resume Next
Set Example = html.getElementsByClassName("Example")(0)
'''''''' Trying to get Result
Set TableRows = Example.Children
For Each TableRow In TableRows
Set RowChildren = TableRow.Children
For Each RowChild In RowChildren
Set TableHeader = RowChild.getElementsByTagName("th")(0)
Set TableData = TableHeader.NextSibling
If TableHeader.innerText = "Header2" Then MsgBox TableData.innerText
Next
Next
IE.Quit
Set IE = Nothing
Application.StatusBar = ""
End Sub

Related

Web Scraping IE NAVIGATE method Works Vs MSXML2.XMLHTTP60 not Working

I am pulling data from NSE site,
the URL is:https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=VOLTAS&instrument=FUTSTK&type=-&strike=-&expiry=28MAY2020#
I am successfully extract the item using Internet explorer,How ever this method is slow,
so i moved to MSXML2.XMLHTTP60 method,but this method returns null string
please find my codes
Method 1:Works fine
Sub OI_Slow_Method()
Dim ie As New InternetExplorer
Set ie = CreateObject("InternetExplorer.Application")
Dim Link As String
Link = ActiveSheet.Range("C4").Value
ie.Visible = False
ie.navigate Link
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Dim doc As HTMLDocument
Set doc = ie.document
Dim objElement As HTMLObjectElement
Dim sDD As String
doc.Focus
ActiveSheet.Cells(1, 1).Value = doc.getElementById("openInterest").innerText 'Open Interest Value
ie.Quit
ie.Visible = True
Set doc = Nothing
Set ie = Nothing
End Sub
'--------------------------
Method 2:Help required in this method only
Sub OI_Fast_Method()
Dim xhr As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument
Set xhr = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", "https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=VOLTAS&instrument=FUTSTK&type=-&strike=-&expiry=30APR2020#", False
.send
html.body.innerHTML = StrConv(.responseBody, vbUnicode)
End With
Debug.Print html.getElementById("openInterest").Innertext
'The output of this is "<SPAN id=openInterest>??</SPAN>" only question mark returned inside the SPAN
End Sub
I think Tim hit the nail on the head, as always. You are getting some raw XML and the stuff you want is not in that XML. You can do a data dump and get what you want.
Sub DumpData()
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
URL = "https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=VOLTAS&instrument=FUTSTK&type=-&strike=-&expiry=28MAY2020#"
'Wait for site to fully load
ie.Navigate2 URL
Do While ie.Busy = True
DoEvents
Loop
RowCount = 1
With Sheets("Sheet1")
.Cells.ClearContents
RowCount = 1
For Each itm In ie.Document.all
.Range("B" & RowCount) = Left(itm.innerText, 1024)
RowCount = RowCount + 1
Next itm
End With
End Sub
Then you would have to parse the text. It's not hard, but it will be a little extra labor.
Another option may be to download the entire contents of the website, save it as a text file, import the data, and then parse that data.
Sub Sample()
Dim ie As Object
Dim retStr As String
Set ie = CreateObject("internetexplorer.application")
With ie
.Navigate "https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=VOLTAS&instrument=FUTSTK&type=-&strike=-&expiry=28MAY2020#"
.Visible = True
End With
Do While ie.readystate <> 4: Wait 5: Loop
DoEvents
retStr = ie.document.body.innerText
'~> Write the above to a text file
Dim filesize As Integer
Dim FlName As String
'~~> Change this to the relevant path
FlName = "C:\Users\ryans\OneDrive\Desktop\Sample.Txt"
filesize = FreeFile()
Open FlName For Output As #filesize
Print #filesize, retStr
Close #filesize
End Sub
Private Sub Wait(ByVal nSec As Long)
nSec = nSec + Timer
While nSec > Timer
DoEvents
Wend
End Sub
I couldn't get either of your code samples to run on my machine.

How to download a table from a web with VBA?

I'am trying to download a table from this page
to excel with VBA: http://www.merval.sba.com.ar/Vistas/Cotizaciones/Acciones.aspx --> table "Panel General"
I can download the table "Panel Merval" but i couldn't download the other table.
I use this code for table "Panel Merval":
Sub GetTable()
Dim ieApp As InternetExplorer
Dim ieDoc As Object
Dim ieTable As Object
Dim clip As DataObject
'create a new instance of ie
Set ieApp = New InternetExplorer
'you don’t need this, but it’s good for debugging
ieApp.Visible = False
'now that we’re in, go to the page we want
ieApp.Navigate "http://www.merval.sba.com.ar/Vistas/Cotizaciones/Acciones.aspx"
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
'get the table based on the table’s id
Set ieDoc = ieApp.Document
Set ieTable = ieDoc.all.Item("ctl00_ContentCentral_tcAcciones_tpMerval_grdMerval")
'copy the tables html to the clipboard and paste to teh sheet
If Not ieTable Is Nothing Then
Set clip = New DataObject
clip.SetText "" & ieTable.outerHTML & ""
clip.PutInClipboard
Sheet1.Select
Sheet1.Range("b2").Select
Sheet1.PasteSpecial "Unicode Text"
End If
'close 'er up
ieApp.Quit
Set ieApp = Nothing
End Sub
or this one
Public Sub PanelLider()
Dim oDom As Object: Set oDom = CreateObject("htmlFile")
Dim x As Long, y As Long
Dim oRow As Object, oCell As Object
Dim vData As Variant
Dim link As String
link = "http://www.merval.sba.com.ar/Vistas/Cotizaciones/Acciones.aspx"
y = 1: x = 1
With CreateObject("msxml2.xmlhttp")
.Open "GET", link, False
.Send
oDom.body.innerHTML = .ResponseText
End With
With oDom.getElementsByTagName("table")(27)
Dim dataObj As Object
Set dataObj = CreateObject("new:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
dataObj.SetText "<table>" & .innerHTML & "</table>"
dataObj.PutInClipboard
End With
Sheets(2).Paste Sheets(2).Cells(1, 1)
End Sub
Could someone help me to download the table "Panel General"?
Many thanks.
Selenium
The following gets the table using selenium basic.
Option Explicit
Public Sub GetTable()
Dim html As New HTMLDocument, htable As HTMLTable, headers()
headers = Array("Especie", "Hora Cotización", "Cierre Anterior", "Precio Apertura", "Precio Máximo", _
"Precio Mínimo", "Último Precio", "Variación Diaria", "Volumen Efectivo ($)", "Volumen Nominal", "Precio Prom. Pon")
With New ChromeDriver
.get "http://www.merval.sba.com.ar/Vistas/Cotizaciones/Acciones.aspx"
.FindElementById("__tab_ctl00_ContentCentral_tcAcciones_tpGeneral").Click
Do
DoEvents
Loop While .FindElementById("ctl00_ContentCentral_tcAcciones_tpGeneral_dgrGeneral", timeout:=7000).Text = vbNullString
html.body.innerHTML = .PageSource
Set htable = html.getElementById("ctl00_ContentCentral_tcAcciones_tpGeneral_dgrGeneral")
WriteTable2 htable, headers, 1, ActiveSheet
.Quit
End With
End Sub
Public Sub WriteTable2(ByVal htable As HTMLTable, ByRef headers As Variant, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)
If ws Is Nothing Then Set ws = ActiveSheet
Dim tRow As Object, tCell As Object, tr As Object, td As Object, R As Long, c As Long, tBody As Object
R = startRow: c = 1
With ActiveSheet
Set tRow = htable.getElementsByTagName("tr")
For Each tr In tRow
Set tCell = tr.getElementsByTagName("td")
For Each td In tCell
.Cells(R, c).Value = td.innerText
c = c + 1
Next td
R = R + 1: c = 1
Next tr
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
End With
End Sub
References:
HTML Object Library
Selenium Type Library
With IE (Using WriteTable2 sub from above):
Option Explicit
Public Sub GetInfo()
Dim ie As New InternetExplorer, html As HTMLDocument, hTable As HTMLTable, headers(), a As Object
headers = Array("Especie", "Hora Cotización", "Cierre Anterior", "Precio Apertura", "Precio Máximo", _
"Precio Mínimo", "Último Precio", "Variación Diaria", "Volumen Efectivo ($)", "Volumen Nominal", "Precio Prom. Pon")
Application.ScreenUpdating = False
With ie
.Visible = True
.navigate "http://www.merval.sba.com.ar/Vistas/Cotizaciones/Acciones.aspx"
While .Busy Or .readyState < 4: DoEvents: Wend
.document.getElementById("__tab_ctl00_ContentCentral_tcAcciones_tpGeneral").Click
Do
DoEvents
On Error Resume Next
Set hTable = .document.getElementById("ctl00_ContentCentral_tcAcciones_tpGeneral_dgrGeneral")
On Error GoTo 0
Loop While hTable Is Nothing
WriteTable2 hTable, headers, 1, ActiveSheet
.Quit '<== Remember to quit application
Application.ScreenUpdating = True
End With
End Sub
References:
Microsoft Internet Explorer Controls

Unable to click at hyperlink on webpage with anchor tag

After testing of different logic's, finally I stuck in Visual Basic for Applications to find out the right way to trigger the below attribute:
I want to click on hyperlink which does not remain same, it shows different numbers with hyperlink on every next attempt and below is my VBA code:
Dim MyBrowser As InternetExplore
Dim MyHTML_Element As IHTMLElement
Dim myURL As String
Dim htmlInput As HTMLInputElement
Dim htmlColl As IHTMLElementCollection
Dim p As String
Dim link As Object
Dim I As Integer
Dim ie As SHDocVw.InternetExplorer
Dim doc As MSHTML.HTMLDocument
myURL = "url............."
Set MyBrowser = New InternetExplorer
MyBrowser.Silent = True
MyBrowser.navigate myURL
MyBrowser.Visible = True
Do
Loop Until MyBrowser.readyState = READYSTATE_COMPLETE
Set HTMLDoc = MyBrowser.Document
If htmldoc.all.item(i).innerText = Range("K20").Value Then ' Range is equal to cell value "4000123486736"
htmldoc.all.item(i).Click <------- not woking both lines
Please also see inspects on IE appended below:
Of course this cannot work
If htmldoc.all.item(i).innerText = Range("K20").Value Then ' Range is equalto cell value "4000123486736"
htmldoc.all.item(i).Click <------- not woking both lines
because there is no loop that defines i.
I suggest to loop through all link tags <a> only:
Dim LinkItem As Variant
For Each LinkItem In HTMLDoc.getElementsByTagName("a")
If LinkItem.innerText = Range("K20").Value Then
LinkItem.Click
Exit For 'stop looping when link was found
End If
Next LinkItem

Unable to select children element of tag <li> in tab of website

I am using the below VB code on internet explorer to automate a web portal:
Dim MyBrowser As InternetExplore
Dim MyHTML_Element As IHTMLElement
Dim myURL As String
Dim htmlInput As HTMLInputElement
Dim htmlColl As IHTMLElementCollection
Dim p As String
Dim link As Object
Dim I As Integer
Dim ie As SHDocVw.InternetExplorer
Dim doc As MSHTML.HTMLDocument
On Error GoTo Err_Clear
myURL = "url............."
Set MyBrowser = New InternetExplorer
MyBrowser.Silent = True
MyBrowser.navigate myURL
MyBrowser.Visible = True
Do
Loop Until MyBrowser.readyState = READYSTATE_COMPLETE
Set HTMLDoc = MyBrowser.Document
HTMLDoc.getElementsByTagName("a").Item(7).Click <-----Error
Err_Clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
But it gives an error on the highlighted line and below is my web inspects:
Please guide me where i am making a mistake to simply click on tab? thanks
You should put a doevents into your page loading loop.
Do
doevents
Loop Until MyBrowser.readyState = READYSTATE_COMPLETE
Get the li element by id then the child anchor within.
HTMLDoc.getElementById("current").getElementsByTagName("a")(0).Click

How can I get the "tr id" from HTML code?

I want to use VBA for taking the URL adress of diferent links from a web page, but without success. Has anyone any idea why my code don't work?
My code is below:
Sub Test()
Dim URL As String
Dim IE As New InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim dictObj As Object: Set dictObj = CreateObject("Scripting.Dictionary")
Dim tRowID As String
URL = "http://www.flashscore.com/soccer/england/premier-league/";
With IE
.Navigate URL
.Visible = True Do Until
.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop Set HTMLdoc = .Document
End With
With HTMLdoc
Set tblSet = .getElementById("fs-results")
Set mTbl = tblSet.getElementsByTagName("tbody")(1)
Set tRows = mTbl.getElementsByTagName("tr")
With dictObj
For Each tRow In tRows
tRowID = Mid(tRow.getAttribute("id"), 5)
If Not .Exists(tRowID) Then .Add tRowID, Empty
End If
Next tRow
End With
End With
For Each Key In dictObj
Debug.Print Key
Next Key
Set IE = Nothing
End Sub