HTML Webpage Query with Various Search Parameters - html

I am attempting to import data from a website that requires certain search parameters. There are a selection of about 5 options that is required on the website. I'm trying to build a script that will query the website, select specific search parameters and search. From there import the results to my excel sheet.
The code I created is not working. I am new to VBA so would appreciate the help.
This is what I have:
Private Sub RegulatoryDataPull_Click()
Dim eRow As Long
Dim objIE As Object
Dim HDoc As HTMLDocument
Dim HEle As HTMLUListElement
Set objIE = CreateObject("InternetExplorer.Application") ' Create document object.
Set HDoc = objIE.document ' Create HTML element (<ul>) object.
Set HEle = HDoc.getElementById("dnn_ctr85406_StateNetDB_resultsCount") ' Get the element reference using its ID.
Set sht = Sheets("Sheet1")
eRow = Sheet1.Cells(Rows.Count, 7).End(x1Up.Offset(7, 0)).Row
With objIE
.Visible = True
.navigate "https://www.ncsl.org/research/energy/energy-legislation-tracking-database.aspx"
Do While .Busy Or _
.readyState <> 4
DoEvents
Loop
Var arr = [document.querySelectorAll('["name=dnn$ctr85406$StateNetDB$ckBxTopics$16"],[name="dnn$ctr85406$StateNetDB$ckBxTopics$5"],[name="dnn$ctr85406$StateNetDB$ckBxTopics$3"],[name="dnn$ctr85406$StateNetDB$ckBxTopics$8"]')]
Topics.Item(0).Value = Topicchoice
Set States = .document.getElementsByName("dnn$ctr85406$StateNetDB$ckBxAllStates")
States.Item(0).Value = Stateschoice
Set Status = .document.getElementsByName("dnn$ctr85406$StateNetDB$ddlStatus")
Status.Item(0).Value = Statuschoice
Set Year = .document.getElementsByName("dnn$ctr85406$StateNetDB$ddlYear")
Year.Item(0).Value = Yearchoice
.document.getElementById("dnn_ctr85406_StateNetDB_btnSearch").Click
Do While .Busy Or _
.readyState <> 4
DoEvents
Loop
Dim ele As Object
' Loop through elements inside the <ul> element and find <br>, which has the texts we want.
With HEle
For ele = 0 To .getElementsByTagName("br").Length - 1
Debug.Print .getElementsByTagName("br").Item(ele).getElementsByTagName("br").Item(0).innerHTML
End Select
Next ele
End With
Set objIE = Nothing
End Sub

Welcome to SO! I copy-pasted your code in Excel-VBA and it indeed crashed. In that case the easiest thing to do is step through it with F8 (don't just run the code with F5/a button). That does help in finding the line where the code blocks/crashes. After some modifications I came up with this code that works on my machine. It's by no means finished, but should give you a good start.
Private Sub RegulatoryDataPullTWO()
Dim eRow As Long
Dim objIE As Object
Dim HDoc As HTMLDocument
Dim HEle As HTMLUListElement
Set objIE = CreateObject("InternetExplorer.Application") ' Create document object.
objIE.Visible = True
objIE.navigate "https://www.ncsl.org/research/energy/energy-legislation-tracking-database.aspx"
Do While objIE.Busy Or objIE.readyState <> 4
DoEvents
Loop
Set HDoc = objIE.document ' Create HTML element (<ul>) object.
Set Top1 = objIE.document.getElementsByName("dnn$ctr85406$StateNetDB$ckBxTopics$16")
Top1.Item(0).Value = True
Set States = objIE.document.getElementsByName("dnn$ctr85406$StateNetDB$ckBxAllStates")
States.Item(0).Value = True
Set Status = objIE.document.getElementsByName("dnn$ctr85406$StateNetDB$ddlStatus")
Status.Item(0).Value = "Adopted"
Set yr = objIE.document.getElementsByName("dnn$ctr85406$StateNetDB$ddlYear")
yr.Item(0).Value = "2019"
objIE.document.getElementById("dnn_ctr85406_StateNetDB_btnSearch").Click
Do While objIE.Busy Or objIE.readyState <> 4
DoEvents
Loop
Set HEle = HDoc.getElementById("dnn_ctr85406_StateNetDB_resultsCount") ' Get the element reference using its ID.
Set HList = HDoc.getElementById("dnn_ctr85406_StateNetDB_linkList")
Set Sht = Sheets("Sheet1")
Debug.Print HEle.outerText
Sht.Range("B2").Value = HEle.outerText
ResRw = 3
For e = 0 To HList.getElementsByTagName("a").Length - 1
Set lnk = HList.getElementsByTagName("a").Item(e)
'Debug.Print e1.outerText, e1.outerHTML
If lnk.outerText <> "Bill Text Lookup" And lnk.outerText <> "*" Then
Debug.Print Replace(Replace(lnk.ParentNode.innerText, Chr(10), ""), Chr(13), "")
Debug.Print lnk.ParentNode.NextSibling.NextSibling.innerText
Sht.Range("A" & ResRw).Value = Replace(Replace(lnk.ParentNode.innerText, Chr(10), ""), Chr(13), "")
Sht.Range("B" & ResRw).Value = lnk.ParentNode.NextSibling.NextSibling.innerText
ResRw = ResRw + 1
End If
Next e
Set objIE = Nothing
End Sub

Related

Error Handling on VBA Excel (Web Scraper)

This is the program I created. The goal of this program is to visit each link on a specific cell range and get the "a href" of each listed links.
I used a list of links but there's a certain link that ends with .pdf and from there I get a type mismatch. Is there a way that I could make my program continue and just skip the error that it got from a specific link?
This is the link that causes the error https://ir-web-assets-v.s3.amazonaws.com/uploads/nuggets/5d40644eafe17554cf969aab/Islands_Locals_Program_Guest_FAQ.pdf
Sub extensiveScrape()
Dim extractedLinks As Range 'Links taken from RUN
Dim urls As String 'Links taken from Extensive Search
Dim appIE As Object
Dim LastRow As Long 'Number of rows
Dim rCell As Range
Dim rRng As Range
Dim html2 As HTMLDocument
Dim itemEle As Object
Dim linkurl As Object
Dim y As Integer
Application.ScreenUpdating = False
Set appIE = CreateObject("InternetExplorer.Application")
Set sht = ThisWorkbook.Worksheets("results")
LastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
Set rRng = sht.Range("A1:A" & LastRow)
For Each rCell In rRng.Cells
With appIE
.navigate rCell.Value
.Visible = True
End With
Do While appIE.readyState <> 4: Wait 5: Loop
Application.StatusBar = "Scraping Extensively..."
DoEvents
Set html2 = appIE.document
Set itemEle = html2.getElementsByTagName("a")
y = 1
For Each linkurl In itemEle
Sheets("results").Range("B" & y).Value = linkurl
y = y + 1
Next
'rCell.Offset(0, 1).Value = itemEle
Next rCell
appIE.Quit
Set appIE = Nothing
Application.StatusBar = ""
Application.ScreenUpdating = True
End Sub
Private Sub Wait(ByVal nSec As Long)
nSec = nSec + Timer
While nSec > Timer
DoEvents
Wend
End Sub

Excel VBA to Login to Website and Grab TR & TD Elements

I am testing the code below. I think this is very close, but I can't seem to login to the site for some reason.
Sub Website_Login_Test()
Dim oHTML_Element As IHTMLElement
Dim sURL As String
On Error GoTo Err_Clear
sURL = "https://login.my_site_here.jsp?"
Set oBrowser = New InternetExplorer
oBrowser.Silent = True
oBrowser.timeout = 60
oBrowser.navigate sURL
oBrowser.Visible = True
Do
' Wait till the Browser is loaded
Loop Until oBrowser.readyState = READYSTATE_COMPLETE
Set HTMLDoc = oBrowser.document
For Each oHTML_Element In HTMLDoc
Debug.Print oHTML_Element
Next
HTMLDoc.all.UserId.Value = "my_id"
HTMLDoc.all.Password.Value = "my_pass"
For Each oHTML_Element In HTMLDoc.getElementsByTagName("Button")
Debug.Print oHTML_Element.Name
'oHTML_Element.Click: Exit For
'Debug.Print oHTML_Element.Type
'If oHTML_Element.Type = "submit" Then oHTML_Element.Click: Exit For
Next
' oBrowser.Refresh ' Refresh If Needed
Err_Clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
Call Test
End Sub
Sub Test()
Dim ie As Object, i As Long, strText As String
Dim doc As Object, hTable As Object, hBody As Object, hTR As Object, hTD As Object
Dim tb As Object, bb As Object, tr As Object, td As Object
Dim y As Long, z As Long, wb As Excel.Workbook, ws As Excel.Worksheet
Set wb = Excel.ActiveWorkbook
Set ws = wb.ActiveSheet
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
y = 1 'Column A in Excel
z = 1 'Row 1 in Excel
ie.navigate "https://after_login_move_to_page_for_scraping.jsp"
Do While ie.busy: DoEvents: Loop
Do While ie.readyState <> 4: DoEvents: Loop
Set doc = ie.document
Set hTable = doc.getElementsByTagName("table")
For Each tb In hTable
Set hBody = tb.getElementsByTagName("tbody")
For Each bb In hBody
Set hTR = bb.getElementsByTagName("tr")
For Each tr In hTR
Set hTD = tr.getElementsByTagName("td")
y = 1 ' Resets back to column A
For Each td In hTD
ws.Cells(z, y).Value = td.innertext
y = y + 1
Next td
DoEvents
z = z + 1
Next tr
Exit For
Next bb
Exit For
Next tb
End Sub
I can't login to the site, so I can't do the scraping, but I think the code is pretty close. Here is the HTML for the id object, the password object and the button object. What am I doing wrong?
I think you must trigger the keypress event of the input fields. If there are other events you must trigger, have a look here, how you can find them:
Automate IE via Excel to fill in a dropdown and continue
Sub WebsiteLogin()
Const url As String = "https://login.my_site_here.jsp"
Const userName As String = "Here Your LogIn Name"
Const passWord As String = "Here Your Password"
Dim ie As Object
Dim htmlDoc As Object
Dim nodeInputUserName As Object
Dim nodeInputPassWord As Object
'Initialize Internet Explorer, set visibility,
'call URL and wait until page is fully loaded
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate url
Do Until ie.readyState = 4: DoEvents: Loop
Set htmlDoc = ie.document
'Set the log in name
Set nodeInputUserName = htmlDoc.getElementById("USERID")
nodeInputUserName.Value = userName
Call TriggerEvent(htmlDoc, nodeInputUserName, "onkeypress")
'Set the password
Set nodeInputPassWord = htmlDoc.getElementById("PASSWORD")
nodeInputPassWord.Value = passWord
Call TriggerEvent(htmlDoc, nodeInputPassWord, "onkeypress")
'Click submit button
htmlDoc.querySelector("a[role='button']").Click
End Sub
This is the procedure to trigger events:
Private Sub TriggerEvent(htmlDocument As Object, htmlElementWithEvent As Object, eventType As String)
Dim theEvent As Object
htmlElementWithEvent.Focus
Set theEvent = htmlDocument.createEvent("HTMLEvents")
theEvent.initEvent eventType, True, False
htmlElementWithEvent.dispatchEvent theEvent
End Sub

Extract the details from a Table using VBA gets object variable or with block variable not set

My script runs for few row and then i a getting object variable or with block variable not set error.
I am using the below script to extract the 5,6,7 value from the NSEIndia website.
I get the value of a stock from the same Excel and update the same excel with the values from the nseindia website.
Sub Stock_Basic_Update_NSE()
Dim ie As InternetExplorer
Dim webpage As HTMLDocument
Dim ws As Worksheet
For Item = 23 To 1505
Set ws = ThisWorkbook.Worksheets("NSE Stocks Details")
sSearch = ws.Range("A" & Item).Value
'sSearch = Filestk.Worksheets("Sheet1").Range("E1").Value
Set ie = New InternetExplorer
'ie.Visible = True
ie.navigate ("https://www.nseindia.com/get-quotes/equity?symbol=" & sSearch)
Do While ie.readyState = 4: DoEvents: Loop
Do Until ie.readyState = 4: DoEvents: Loop
While ie.Busy
DoEvents
Wend
Set webpage = ie.document
ws.Cells(Item, 3).Value = webpage.getElementsByClassName("eq-series table-fullwidth w-100")(0).getElementsByTagName("td")(5).innerText
ws.Cells(Item, 4).Value = webpage.getElementsByClassName("eq-series table-fullwidth w-100")(0).getElementsByTagName("td")(6).innerText
ws.Cells(Item, 5).Value = webpage.getElementsByClassName("eq-series table-fullwidth w-100")(0).getElementsByTagName("td")(7).innerText
ie.Quit
Set ie = Nothing
Next Item
End Sub
You had some errors in your code and you hadn't wait for the full document to load. Try the following code. I have commented it. So you can see, what I have changed and why. I have tried it with the top 50 symbols.
Sub Stock_Basic_Update_NSE()
'Declare always all variables
Dim ie As Object 'I switched this from early to late binding (not required)
Dim nodeTable As Object
Dim ws As Worksheet
Dim item As Long
Dim sSearch As String
'Use this outside the loop. You only need it once
Set ws = ThisWorkbook.Worksheets("NSE Stocks Details")
For item = 23 To 1505
sSearch = ws.Range("A" & item).Value
Set ie = CreateObject("internetexplorer.application")
ie.Visible = False
'Encode symbols that are restricted for using in URLs. Like &, : or ?
ie.navigate ("https://www.nseindia.com/get-quotes/equity?symbol=" & WorksheetFunction.EncodeURL(sSearch))
'It's not "While = 4" because 4 stands for "readystate = complete"
'If you want to use "= 4" you must use "Until" instead of "While"
'It doesn't matter what you use
Do While ie.readyState <> 4: DoEvents: Loop
'Manual break to load dynamic content after the IE reports the page load was complete
'This was your main problem
Application.Wait (Now + TimeSerial(0, 0, 2))
'The needed html table has an ID. If possible use always that instead of class names
'because an html ID is unique if the standard is kept
'Also use a variable to save the elements
'So you don't need to shorten the html document string in most cases because
'it's only needed one time
Set nodeTable = ie.document.getElementByID("equityInfo")
ws.Cells(item, 3).Value = nodeTable.getElementsByTagName("td")(5).innerText
ws.Cells(item, 4).Value = nodeTable.getElementsByTagName("td")(6).innerText
ws.Cells(item, 5).Value = nodeTable.getElementsByTagName("td")(7).innerText
'Clean up
ie.Quit
Set ie = Nothing
Next item
End Sub

HTML Element Collection filled from Previous Webpage Rather than Redirected Webpage VBA

The code below navigates to a webpage, fills search boxes with queries, and submits to the results page. However, the final element collection in the script, tdtags, which is defined after the redirect, is pulling data from the original search page, rather than the results page. I currently have the while ie.busy loop and a timed delay in the script, neither of which works. I have also tried waiting until an element only present in the results page becomes available in the html, but this also does not work.
Dim twb As Workbook
Dim ie As Object
Set twb = ThisWorkbook
twb.Activate
Set ie = CreateObject("internetexplorer.application")
'church = Sheets("Control").Range("A2").Value
'minister = Sheets("Control").Range("A4").Value
location = "London" 'Sheets("Control").Range("A6").Value
'denomination = Sheets("Control").Range("A8").Value
With ie
.navigate "http://www.ukchurch.org/index.php"
.Visible = True
Do While .Busy Or .ReadyState <> 4
DoEvents
Loop
End With
Application.Wait (Now + TimeValue("00:00:02"))
Set intags = ie.document.getelementsbytagname("input")
For Each intag In intags
If intag.getattribute("name") = "name" Then
If church <> "" Then
intag.Value = church
End If
ElseIf intag.getattribute("name") = "minister" Then
If minister <> "" Then
intag.Value = minister
End If
ElseIf intag.getattribute("name") = "location" Then
If location <> "" Then
intag.Value = location
End If
Else
End If
Next intag
Set dropopt = ie.document.getelementsbytagname("select")
For Each dropo In dropopt
If dropo.classname = "DenominationDropDown" Then
Set opttags = dropo.getelementsbytagname("option")
For Each opt In opttags
If opt.innertext = denomination Then
opt.Selected = True
End If
Next opt
End If
Next dropo
On Error Resume Next
For Each intag In intags
If intag.getattribute("src") = "images/ukchurch/button-go.jpg" Then
intag.Click
Do While ie.Busy Or ie.ReadyState <> 4
DoEvents
Loop
Application.Wait (Now + TimeValue("00:00:03"))
Exit For
End If
Next intag
Application.Wait (Now + TimeValue("00:00:03"))
Set tdtags = ie.document.getelementsbytagname("td")
For Each td In tdtags
If td.classname = "pText" Then
Debug.Print td.innertext
Debug.Print ie.locationURL
pagecount = Right(td.innertext, InStr(td.innertext, ":"))
End If
Next td
Debug.Print pagecount
End Sub
Any diagnosis would be appreciated.
Automating IE is a pain, so avoid it.
The following function requests the results page directly.
Public Function GetSearchResult(Optional ByVal ResultPage As Integer = 0, Optional ByVal ChurchName As String = "", Optional ByVal Minister As String = "", Optional ByVal ChurchLocation As String = "", Optional ByVal Denomination As String = "") As Object
Dim Request As Object: Set Request = CreateObject("MSXML2.serverXMLHTTP")
Dim Result As Object: Set Result = CreateObject("htmlfile")
Request.Open "POST", "http://www.ukchurch.org/searchresults1.php", False
Request.setRequestHeader "content-type", "application/x-www-form-urlencoded"
Request.send IIf(ResultPage = 0, "", "page=" & ResultPage & "&") & "name=" & ChurchName & "&minister=" & Minister & "&location=" & ChurchLocation & "&denomination=" & Denomination
Result.body.innerHTML = Request.responseText
Set GetSearchResult = Result
End Function
An example which prints the contents of the td with classname pText inside the table containing the search results
Sub Main()
Dim Document As Object
Set Document = GetSearchResult(ChurchLocation:="London")
Dim ResultRows as Object
Dim ResultRow As Object
Set ResultRows = Document.getElementsByTagName("table")(8).getElementsByTagName("td")
For Each ResultRow in ResultRows
If ResultRow.Classname = "pText" Then
Debug.print ResultRow.innerText
End If
Next
End Sub
Update
You need to add a couple of References to your VBA project to make the following code work.
In the VBA Editor, Goto the Tools Menu, Click References and in the dialog that opens add a check next to the following two items: Microsoft XML, v6.0 and Microsoft HTML Object Library (
Public Function GetChurchDetails(ByVal ChurchID As String) As MSHTML.HTMLDocument
Dim Request As New MSXML2.ServerXMLHTTP60
Dim Result As New MSHTML.HTMLDocument
Request.Open "GET", "http://www.ukchurch.org/churchdetails.php?churchid=" & ChurchID, False
Request.send
Result.body.innerHTML = Request.responseText
Set GetChurchDetails = Result
End Function
Sub Main2()
Dim Document As MSHTML.HTMLDocument
Dim Church As MSHTML.HTMLDocument
Set Document = GetSearchResult(ChurchLocation:="London")
Dim ResultRows As MSHTML.IHTMLElementCollection
Dim ResultRow As MSHTML.IHTMLElement
Dim ChurchID As String
'Set ResultRows = Document.getElementsByTagName("table")(8).getElementsByTagName("td")
' all result links on searchresults1.php have a classname of resultslink which makes getting them much easier
Set ResultRows = Document.getElementsByClassName("resultslink")
For Each ResultRow In ResultRows
ChurchID = ResultRow.getAttribute("href")
ChurchID = Mid(ChurchID, InStr(1, ChurchID, "=") + 1)
Set Church = GetChurchDetails(ChurchID)
' code to read data from the page using Church as the Document
' eg: Church.getElemenetsByTagName("td").....
Next
End Sub
You only need to use the "post" mode when your submitting data, for everything else you can use "get"

scrape html without id in vba

I'm trying to get month-to-date and year-to-date return values from the website
http://us.spindices.com/indices/equity/sp-oil-gas-exploration-production-select-industry-index
into an Excel spreadsheet using VBA. The problem is that there is no "id= " in the code of the page, which I understand would make this process a lot simpler. There is also the matter of which time period (year-to-date or month-to-date) is visible, but I'd be happy with scraping just the MTD values for now.
Here is my code:
Sub Get_Change()
'attempting to scrape Barclay's website
Dim appIE As Object
Dim MyVar As String
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "http://us.spindices.com/indices/equity/sp-oil-gas-exploration-production-select-industry-index"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Range("A1").Value = "Working..."
Loop
Set TDelements = appIE.document.getElementsbyClassName("performance-chart-table")
For Each TDelement In TDelements
If TDelement.class = "change" Then
MyVar = TDelement.class.innerText("Value")
End If
Next
Range("A1").Value = MyVar
appIE.Quit
Set appIE = Nothing
End Sub
If I can get a way to set the 'MyVar' variable to the current MTD or YTD value, I'll be done, but I'm having a hard time since there is not a unique identifier for either of these values. Any ideas?
I've recently watched some CSS training videos and I can tell you the CSS selector syntax is powerful and I'd recommend it. This is the same syntax that javascript/web developers use to select elements when using JQuery.
I think you should try using
document.queryselectorall
or in your case because you have drilled in to the document to get the "performance-chart-table" call queryselectorall off of that variable, TDelements.
Documentation at http://www.w3schools.com/jsref/met_document_queryselectorall.asp
and you supply as a parameter a CSS selector string the syntax of which can be found at http://www.w3schools.com/cssref/css_selectors.asp
And I've gone and done it for you....
Sub Get_Change()
'* Tools-References Microsoft HTML Object Library
'attempting to scrape Barclay's website
Dim appIE As Object
Dim MyVar As String
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "http://us.spindices.com/indices/equity/sp-oil-gas-exploration-production-select-industry-index"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Range("A1").Value = "Working..."
Loop
Dim htmlDoc As MSHTML.HTMLDocument
Set htmlDoc = appIE.document
Dim TDelements2 As MSHTML.IHTMLElementCollection
Set TDelements2 = htmlDoc.getElementsByClassName("performance-chart-table")
While TDelements2.Length < 1
DoEvents
Application.Wait (Now() + TimeSerial(0, 0, 3))
Set TDelements2 = htmlDoc.getElementsByClassName("performance-chart-table")
Wend
Dim oHTMLTablePerformanceChartTable As MSHTML.HTMLTable
Set oHTMLTablePerformanceChartTable = TDelements2.Item(0)
Dim objChangeCollection As MSHTML.IHTMLDOMChildrenCollection
Set objChangeCollection = oHTMLTablePerformanceChartTable.querySelectorAll(".change")
'Debug.Assert objChangeCollection.Length = 2
Dim objChange2 As Object
Set objChange2 = objChangeCollection.Item(1)
MyVar = objChange2.innerText
'Set TDelements = appIE.document.getElementsByClassName("performance-chart-table")
'
'For Each TDelement In TDelements
' TDelements.querySelectorAll (".change")
' If TDelement.class = "change" Then
' MyVar = TDelement.class.innerText("Value")
'
' End If
'Next
Range("A1").Value = MyVar
appIE.Quit
Set appIE = Nothing
End Sub