Scraping data from website with dynamic array function in vba - html

I'd like to know more about how array function is applied when scrape data from website. I'm currently using this vba to copy data from website. The code can scrape the data I want, however when it comes to copy data to the destination worksheet it copies all data to A1 cell. Since this vba was developed for my previous project and works fine I'm not sure which part went wrong.
Sub CopyFromHKAB()
Dim ie As Object, btnmore As Object, tbl As Object
Dim rr As Object, cc As Object, r As Integer, c As Integer, i As Integer, j As Integer
ThisWorkbook.Sheets("data").UsedRange.Clear
Set ie = CreateObject("internetexplorer.application")
With ie
.Visible = True
.navigate "https://www.hkab.org.hk/DisplayMemberAction.do?sectionid=4&subsectionid=0"
Do
DoEvents
Loop While .readyState <> 4 Or .Busy
Set tbl = .document.getElementsByClassName("etxtmed")(2)
End With
'get data from table
r = tbl.Rows.Length - 1
c = tbl.Rows(0).Cells.Length - 1
ReDim arr(0 To r, 0 To c)
Set rr = tbl.Rows
For i = 0 To r
Set cc = rr(i).Cells
For j = 0 To c
arr(i, j) = cc(j).innertext
Next
Next
ie.Quit
Application.ScreenUpdating = False
ThisWorkbook.Sheets("Sheet1").Cells(1, 1).Resize(r + 1, c + 1) = arr
With ThisWorkbook.Sheets("data")
.UsedRange.WrapText = False
.Columns.AutoFit
End With
End Sub

You need to pick up the right table given they are nested so change the index to 3. Otherwise, you are picking up the shared parent and thus all the listings are in fact within the one child element hence your current output.
Then you need to adjust your code to skip the first row.
N.B. You don't actually need IE for this as the content you want is static. You can use XMLHTTP. And you are writing out data to a different sheet than the one you end format.
Sub CopyFromHKAB()
Dim ie As Object, btnmore As Object, tbl As Object
Dim rr As Object, cc As Object, r As Integer, c As Integer, i As Integer, j As Integer
ThisWorkbook.Sheets("data").UsedRange.Clear
Set ie = CreateObject("internetexplorer.application")
With ie
.Visible = True
.navigate "https://www.hkab.org.hk/DisplayMemberAction.do?sectionid=4&subsectionid=0"
Do
DoEvents
Loop While .readyState <> 4 Or .Busy
Set tbl = .document.getElementsByClassName("etxtmed")(3)
End With
'get data from table
r = tbl.Rows.Length - 1
c = tbl.Rows(1).Cells.Length - 1
ReDim arr(0 To r, 0 To c)
Set rr = tbl.Rows
For i = 1 To r
Set cc = rr(i).Cells
For j = 0 To c
arr(i - 1, j) = cc(j).innertext
Next
Next
ie.Quit
'Application.ScreenUpdating = False
ThisWorkbook.Sheets("Sheet1").Cells(1, 1).Resize(r + 1, c + 1) = arr
With ThisWorkbook.Worksheets("data")
.UsedRange.WrapText = False
.Columns.AutoFit
End With
End Sub
I would consider switching to XHR to avoid overhead of browser, and using querySelectorAll to allow for using a css selector list to target only the nodes of interest
Option Explicit
Public Sub GetHKABInfo()
'tools > references > Microsoft HTML Object Library
Dim html As MSHTML.HTMLDocument, xhr As Object
Set xhr = CreateObject("MSXML2.XMLHTTP")
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", "https://www.hkab.org.hk/DisplayMemberAction.do?sectionid=4&subsectionid=0", False
.setRequestHeader "User-Agent", "Safari/537.36"
.send
html.body.innerHTML = .responseText
End With
Dim arr() As Variant, nodes As MSHTML.IHTMLDOMChildrenCollection, i As Long
Set nodes = html.querySelectorAll(".etxtmed .etxtmed td")
ReDim arr(1 To nodes.Length - 1)
For i = LBound(arr) To UBound(arr)
arr(i) = nodes.Item(i).innertext
Next
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).Resize(UBound(arr), 1) = Application.Transpose(arr)
End Sub

Related

Unable to get the exact element class table when scraping data from web using VBA

I would like to scrape below table from the website.
enter image description here
Based on the web code I found that the table seemed belongs to element class etxtmed so I wrote below VBA. After running this code I found that it only scrape below data
enter image description here
I thought this was because ("etxtmed")(0) refers to the 1st ("etxtmed") table then I tried several numbers after (0) and VBA first reports "Element not exist" then reports error Run-time error '91':Object variable or With block variable not set at this line of code r = tbl.Rows.Length - 1. Is it because I scraped the wrong class of table?
Sub CopyRateFromHKAB()
Dim ie As Object, btnmore As Object, tbl As Object
Dim rr As Object, cc As Object, r As Integer, c As Integer, i As Integer, j As Integer
ThisWorkbook.Sheets("Sheet2").UsedRange.Clear
Set ie = CreateObject("internetexplorer.application")
With ie
'.Visible = True
.navigate "https://www.hkab.org.hk/DisplayInterestSettlementRatesAction.do?lang=en"
Do
DoEvents
Loop While .readyState <> 4 Or .Busy
Set tbl = .document.getElementsByClassName("etxtmed")(0)
If tbl Is Nothing Then
MsgBox "Element not exist"
End If
End With
'get data from table
r = tbl.Rows.Length - 1
c = tbl.Rows(0).Cells.Length - 1
ReDim arr(0 To r, 0 To c)
Set rr = tbl.Rows
For i = 0 To r
Set cc = rr(i).Cells
For j = 0 To c
arr(i, j) = cc(j).innertext
Next
Next
ie.Quit
Application.ScreenUpdating = False
ThisWorkbook.Sheets("Sheet2").Cells(1, 1).Resize(r + 1, c + 1) = arr
With ThisWorkbook.Sheets("Sheet2")
.UsedRange.WrapText = False
.Columns.AutoFit
End With
End Sub
The table you want is inside an IFRAME so you need to access that page directly <iframe src="/hibor/listRates.do?lang=en&Submit=Detail"
Option Explicit
Sub CopyRateFromHKAB()
Const URL = "https://www.hkab.org.hk/hibor/listRates.do?lang=en&Submit=Detail"
Dim HTMLDoc As Object, request As Object
' get web page
Set HTMLDoc = CreateObject("HTMLfile")
Set request = CreateObject("MSXML2.XMLHTTP")
With request
.Open "GET", URL, False
.send
HTMLDoc.body.innerHTML = .responseText
End With
' parse html table
Dim wb As Workbook, r As Long, c As Long, arr
Dim tbl As Object, t As Object, tr As Object, td As Object
Set wb = ThisWorkbook
Set tbl = HTMLDoc.getElementsByClassName("etxtmed")
If tbl Is Nothing Then
MsgBox "No tables found", vbExclamation
Exit Sub
Else
If tbl(2) Is Nothing Then
MsgBox "Table not found", vbExclamation
Exit Sub
Else
r = tbl(2).Rows.Length
ReDim arr(1 To r, 1 To 3)
r = 1
For Each tr In tbl(2).Rows
c = 1
For Each td In tr.Cells
arr(r, c) = td.innerText
c = c + 1
Next
r = r + 1
Next
End If
'copy to sheet
With wb.Sheets("Sheet2")
.Cells(1, 1).Resize(UBound(arr), UBound(arr, 2)) = arr
.UsedRange.WrapText = False
.Columns.AutoFit
End With
End If
MsgBox "Done", vbInformation
End Sub

loop through page numbers when href contians doPostBack() in webpage

I need to scrape date on ever page by clicking page number present in the webpage below.
I have mentioned sample website which looks similar to my html webpage.
Sample web page is this Webpage.
Code i have is below:
Sub Test()
Dim IE As Object
Dim i As Long, strText As String
Dim y As Long, z As Long, wb As Excel.Workbook, ws As Excel.Worksheet
Dim myBtn As Object
Dim Table As Object, tbody As Object, datarow As Object, thlist As Object, trlist As Object
Set wb = Excel.ActiveWorkbook
Set ws = wb.ActiveSheet
Sheets("Data").Select
Set IE = CreateObject("InternetExplorer.Application")
my_url = webpage.com
With IE
.Visible = True
.navigate my_url
Do Until Not IE.Busy And IE.readyState = 4
DoEvents
Loop
End With
Set doc = IE.document
y = 1
z = 1
Application.Wait Now + TimeValue("00:00:02")
Set tbody = IE.document.getElementsByTagName("table")(0).getElementsByTagName("tbody")(0)
Set thlist = tbody.getElementsByTagName("tr")(0).getElementsByTagName("th")
Dim ii As Integer
For ii = 0 To thlist.Length - 1
ws.Cells(z, y).Value = thlist(ii).innerText
y = y + 1
Next ii
Set datarow = tbody.getElementsByTagName("tr")
y = 1
z = 2
Dim jj As Integer
Dim datarowtdlist As Object
For jj = 1 To datarow.Length - 4
Set datarowtdlist = datarow(jj).getElementsByTagName("td")
Dim hh As Integer, x As Integer
x = y
For hh = 0 To datarowtdlist.Length - 1
ws.Cells(z, x).Value = datarowtdlist(hh).innerText
x = x + 1
Next hh
z = z + 1
Next jj
Set IE = Nothing
End Sub
Im happy to help if my question is not clear.
Thanks for the support.
The next page is retrieved by incrementing the __EVENTARGUMENT of the __doPostBack e.g. from 1 to 2, 2 to 3 etc, and then triggering the __doPostBack with the new value. The last page will have been reached when the final td node (in the pagination area) no longer has a child href containing the __EVENTTARGET (sb$grd). Using this logic you can loop, incrementing, and have an exit condition, as shown below.
For more info about this function with ASP.NET see my answer here.
Public Sub LoopPages()
Dim ie As SHDocVw.InternetExplorer
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.Navigate2 "https://www.mfa.gov.tr/sub.ar.mfa?dcabec54-44b3-4aaa-a725-70d0caa8a0ae"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
Dim i As Long
i = 1
Do
Debug.Print i
Debug.Print .document.querySelector(".sub_lstitm").innerText
If .document.querySelectorAll("tr:nth-child(1) td:last-child [href*='sb$grd']").length = 0 Then Exit Do
.document.parentWindow.execScript "__doPostBack('sb$grd','Page$" & i + 1 & "');"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
'do something with new page
i = i + 1
Loop
Stop 'stops at 185
.Quit
End With
End Sub

HTML Page update for every Dropbox change event

I want to pull calls and puts data from NSE wesbite,In the website, there are 2 dropdown boxes available (Refer the image).For every dropdown change event,the table value changes.
https://www.nseindia.com/option-chain
I am successfully able to update the HTML table values for the first drop down fields using HTML events,
and i am using the same "HTML event" to update the table values when the second drop down index changes,but could not get the work done,can one please help?
I am giving the source code also,please suggess
'Option Data Pulling
Sub Options_Pull_Active_List()
Dim Opt_Url As String
Opt_Url = ActiveSheet.Range("G3").Value
Dim ie As New InternetExplorer
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.Top = 0
ie.Left = 0
ie.Width = 1000
ie.Height = 750
ie.AddressBar = 0
ie.StatusBar = 0
ie.Toolbar = 0
ie.navigate Opt_Url
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Dim doc As HTMLDocument
Set doc = ie.document
Dim objElement As HTMLObjectElement
Dim sDD As String
doc.Focus
Dim k1, k2, Opto, Sana As Integer
k1 = ActiveSheet.Range("AL7").Value
k2 = ActiveSheet.Range("AL8").Value
For Opto = k1 To k2
ActiveSheet.Range("AK10").Value = Opto
Sana = ActiveSheet.Range("AL10").Value
'instrumentType
doc.getElementById("select_symbol").Focus 'Setting Instrument Type
Application.Wait Now + TimeValue("00:00:04")
doc.getElementById("select_symbol").selectedIndex = Sana
Dim objEvent
Set objEvent = doc.createEvent("HTMLEvents")
objEvent.initEvent "change", False, True
doc.getElementById("select_symbol").dispatchEvent objEvent 'This code successfully able to updat the HTML page
Application.Wait Now + TimeValue("00:00:05")
doc.getElementById("expirySelect").selectedIndex = 2
doc.getElementById("expirySelect").dispatchEvent objEvent 'This same code does not update the table values
'-------------------------
Application.Wait Now + TimeValue("00:00:03")
Do While ie.Busy Or ie.readyState <> 4
DoEvents
Loop
'------------------
Dim hTable As Object, hBody As Object, hTR As Object, hTD As Object
Dim tb As Object, bb As Object, tr As Object, td As Object
Dim i As Integer
y = 5 ' 'Column A in Excel
Z = 25 ' 'Row 1 in Excel
Set hTable = doc.getElementsByTagName("Table")
'Set hTable = doc.getElementById("historicalData")
For Each tb In hTable
Set hBody = tb.getElementsByTagName("tbody")
For Each bb In hBody
Set hTR = bb.getElementsByTagName("tr")
For Each tr In hTR
Set hTD = tr.getElementsByTagName("td")
y = 5 ' Resets back to Default Column Set in the Excel Sheet
For i = 0 To hTD.Length
Sheets("M1 - Option Data Pull").Cells(Z, y).Value = hTD(i).innerText
If Trim(Sheets("M1 - Option Data Pull").Cells(Z, y).Value) = "-" Then
Sheets("M1 - Option Data Pull").Cells(Z, y).Value = 0
End If
If i = 11 Then
If hTD(i + 1).getAttribute("class") = "bg-yellow" And Sheets("M1 - Option Data Pull").Range("O1").Value = 0 Then
Sheets("M1 - Option Data Pull").Range("O1").Value = 1
Sheets("M1 - Option Data Pull").Range("O2").Value = hTD(i).innerText
End If
End If
y = y + 1
Next i
DoEvents
Z = Z + 1
Next tr
Exit For
Next bb
Exit For
Next tb
'-------------------
Next Opto
ie.Quit
ie.Visible = True
Set doc = Nothing
Set ie = Nothing
Application.DisplayAlerts = True
Application.ScreenUpdating = True
Application.EnableEvents = True
End Sub

Basics of webscraping

I want to get the prices of all the models of Maruti Alto using Web scraping. I am trying the code to get the data but i am not able to get it.
Sub Basics_Of_Web_Macro()
Dim myIE As Object
Dim myIEDoc As Object
'Start Internet Explorer
Set myIE = CreateObject("InternetExplorer.Application")
'if you want to see the window set this to True
myIE.Visible = False
'Now we open the page we'd like to use as a source for information
myIE.navigate "https://www.marutisuzuki.com/channels/arena/price-list/alto-price-in-mumbai-in-maharashtra"
'We wait for the Explorer to actually open the page and finish loading
While myIE.Busy
DoEvents
Wend
'Now lets read the HTML content of the page
Set myIEDoc = myIE.document
'Time to grab the information we want
Range("A1") = myIEDoc.Title
'Then we'll get something from teh inner page content by using the ID
Range("B1") = myIEDoc.Class("priceInfo clearfix")
End Sub
XHR:
You could use xmlhttp request and avoid browser. Loop the nodeList returned by collecting the classname cols. Start new row every 5 element and reset column to 1 for output. Thus creating tabular format for output from list format of nodeList/
VBE> Tools > References > Microsoft HTML Object Library
Option Explicit
Public Sub GetPrices()
Dim html As HTMLDocument
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.marutisuzuki.com/channels/arena/price-list/alto-price-in-mumbai-in-maharashtra", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End With
Dim listings As Object, i As Long, r As Long, c As Long, results()
Set listings = html.querySelectorAll(".cols")
ReDim results(1 To (listings.Length - 2) / 4, 1 To 4)
r = 1: c = 1
For i = 0 To listings.Length - 2
If i Mod 4 = 0 And i > 0 Then r = r + 1: c = 1
results(r, c) = listings.item(i).innerText
c = c + 1
Next
ActiveSheet.Cells(1, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub
Internet Explorer:
Option Explicit
'VBE > Tools > References:
' Microsoft Internet Controls
Public Sub GetPrices()
Dim html As HTMLDocument
Set html = New HTMLDocument
Dim ie As New InternetExplorer
With ie
.Visible = True
.Navigate2 "https://www.marutisuzuki.com/channels/arena/price-list/alto-price-in-mumbai-in-maharashtra"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim listings As Object, i As Long, r As Long, c As Long, results()
Set listings = .document.querySelectorAll(".cols")
ReDim results(1 To (listings.Length - 2) / 4, 1 To 4)
r = 1: c = 1
For i = 0 To listings.Length - 2
If i Mod 4 = 0 And i > 0 Then r = r + 1: c = 1
results(r, c) = listings.item(i).innerText
c = c + 1
Next
.Quit
End With
ActiveSheet.Cells(1, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub

Extract Table from Webpage in Excel using VBA

How to Extract Below table in Excel from Webpage?
Table
Company | Bonus Ratio |Announcement|Record|Ex-Bonus
Codes
Dim ie As SHDocVw.InternetExplorer
Set ie = New InternetExplorerMedium
Set ie = CreateObject("InternetExplorer.Application")
While ie.busy
DoEvents
Wend
ie.Visible = True
While ie.busy
DoEvents
Wend
Dim NavURL As String
NavURL = "https://www.moneycontrol.com/stocks/marketinfo/bonus/homebody.php?sel_year=2015"
ie.Navigate NavURL
While ie.busy
DoEvents
Wend
Set doc = ie.document
Set hTable = doc.GetElementsByTagName("table")
y = 2 'Column B in Excel
z = 7 'Row 7 in Excel
For Each td In hTable
Set hHead = tb.GetElementsByTagName("td")
For Each hh In hHead
Set hTR = hh.GetElementsByTagName("tr")
For Each tr In hTR
Webpage: https://www.moneycontrol.com/stocks/marketinfo/bonus/homebody.php?sel_year=2015
by Keeping Bonus Ratio as Same as on Webpage or Text Format
While copy it in Excel, Bonus Ratio Converts to Decimal
Your hTable is a collection as opposed to a single element. Your code should be throwing an error.
You want to target the specific table and then loop the table rows and cells within rows. You want to check if the second column is being processed so you can protect the formatting of the ratios. You also want to monitor the row number to handle the merged cells at the top.
Option Explicit
Public Sub GetInfo()
Const URL As String = "https://www.moneycontrol.com/stocks/marketinfo/bonus/homebody.php?sel_year=2015"
Dim html As HTMLDocument, hTable As HTMLTable, ws As Worksheet, headers()
headers = Array("Company", "Bonus Ratio", "Announcement", "Record", "Ex-bonus")
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.Send
html.body.innerHTML = .responseText
End With
Set hTable = html.querySelector("table.dvdtbl")
Dim td As Object, tr As Object, r As Long, c As Long
r = 1
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
For Each tr In hTable.getElementsByTagName("tr")
r = r + 1: c = 1
If r > 3 Then
For Each td In tr.getElementsByTagName("td")
.Cells(r - 2, c) = IIf(c = 2, "'" & td.innerText, td.innerText)
c = c + 1
Next
End If
Next
End With
End Sub