VBA: script to extract from a HTML table? - html

Apologies for the basic question but I've only been coding about a week!
To give some context I have a basic task where I have to manually search and type in a number to retrieve a persons qualifications and expiry dates which could be multiple per person. I've managed to write a script that searches each person in turn from a list of qualification numbers I have on an excel spreadsheet from this site:
https://www.bindt.org/Certification/pcn-certificate-verification/
My script then currently clicks the more information next to each qualification to expand the data (unsure if this is required in all honesty).
My problem is I have no idea how to extract the data back into excel, I've not managed to extract it by using class names, grabbing the whole table etc.
Would it be possible to get some help with writing the code to grab the data?
For an example if you use the number 213474 on thee site it will give you an idea of what I need to grab.
Many thanks in advance!
Tom
Edit 8/8/21
Hi Raymond,
I've put my code below so far, it all falls down at:
Set Table = IE.document.getElementByClassName("container")(1)
I do have another issue where when I go to the second person in the loop as they all have different qualification numbers the go to error 10 doesn't seem to work.
My main issue is no matter what I try to grab off the results be it tbody, tr etc I can't seem to find any combination that works.
Sub PCN()
Dim IE As Object
Dim lastrow As Integer
ThisWorkbook.Sheets("PCN").Range("A1").Select
lastrow = ActiveSheet.Cells.SpecialCells(xlCellTypeLastCell).Row
Dim i As Integer
Dim rng As Range
Set rng = Application.Range("c2:C" & lastrow)
For i = 2 To lastrow
Debug.Print rng.Cells(RowIndex:=i, ColumnIndex:="c").Value
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.navigate "https://www.bindt.org/Certification/pcn-certificate-verification/pcn-verification-form"
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
IE.document.getElementById("txtNumber").Value = ThisWorkbook.Sheets("PCN").Range("C" & i).Value
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
IE.document.getElementById("submit").Click
On Error GoTo 10
IE.document.getElementById("certificate-link-1").Click
IE.document.getElementById("certificate-link-2").Click
IE.document.getElementById("certificate-link-3").Click
IE.document.getElementById("certificate-link-4").Click
IE.document.getElementById("certificate-link-5").Click
IE.document.getElementById("certificate-link-6").Click
IE.document.getElementById("certificate-link-7").Click
IE.document.getElementById("certificate-link-8").Click
IE.document.getElementById("certificate-link-9").Click
IE.document.getElementById("certificate-link-10").Click
IE.document.getElementById("certificate-link-11").Click
IE.document.getElementById("certificate-link-12").Click
IE.document.getElementById("certificate-link-13").Click
IE.document.getElementById("certificate-link-14").Click
IE.document.getElementById("certificate-link-15").Click
IE.document.getElementById("certificate-link-16").Click
IE.document.getElementById("certificate-link-17").Click
IE.document.getElementById("certificate-link-18").Click
IE.document.getElementById("certificate-link-19").Click
IE.document.getElementById("certificate-link-20").Click
IE.document.getElementById("certificate-link-21").Click
IE.document.getElementById("certificate-link-22").Click
10
Set Table = IE.document.getElementByClassName("container")(1)
Set tbody = Table.getElementsByTagName("tbody")(0)
trs = tbody.getElementsByTagName("tr")
Do While trs Is Nothing
DoEvents
Application.Wait Now + TimeValue("00:00:02")
trs = tbody.getElementsByTagName("tr")
Loop
Dim myvalue As String: myvalue = trs.Cells(3).innerText
IE.Quit
Next
End Sub

Related

Excel VBA browser scraping loop: Reload page for each loop

I am having a bit trouble when web scraping, basically I have a loop that goes to a website and searches for a specific company and clicks on search and then goes to the company page and extracts the company's founding date. (https://icis.corp.delaware.gov/Ecorp/EntitySearch/NameSearch.aspx). I basically have a large table but I have a smaller one here for simplicity, and the loop goes from top to bottom of the table in excel and extracts the founding date in the B column next to the name.
{Table example}. My code is as follows:
Option Explicit
Sub click_search()
Dim i As SHDocVw.InternetExplorer
Dim idoc As MSHTML.HTMLDocument
Dim output As Range
Dim txt_input As Range
Dim wb As Workbook
Dim ws As Worksheet
Dim n As Integer
Set wb = ActiveWorkbook
Set ws = wb.Sheets(1)
Set i = New InternetExplorer
i.Visible = True
i.Navigate "https://icis.corp.delaware.gov/Ecorp/EntitySearch/NameSearch.aspx"
Do While i.ReadyState <> READYSTATE_COMPLETE
Loop
Set idoc = i.Document
On Error Resume Next
For n = 1 To 3
idoc.getElementById("ctl00_ContentPlaceHolder1_frmEntityName").Value = Cells(n, 1).Value
Application.Wait Now + #12:00:01 AM#
idoc.getElementById("ctl00_ContentPlaceHolder1_btnSubmit").click
Application.Wait Now + #12:00:15 AM#
idoc.getElementById("ctl00_ContentPlaceHolder1_rptSearchResults_ctl00_lnkbtnEntityName").click
Application.Wait Now + #12:00:10 AM#
Dim V As Variant
Set V = idoc.getElementById("ctl00_ContentPlaceHolder1_lblIncDate")
Cells(n, 2) = V.innerHTML
i.Quit
Set i = New InternetExplorer
i.Visible = True
i.Navigate "https://icis.corp.delaware.gov/Ecorp/EntitySearch/NameSearch.aspx"
Application.Wait Now + #12:00:10 AM#
Next n
End Sub
The problem is that I have to quit internet explorer and restart it in order to be able to search for the next company, this is where I have a problem in my loop as it just works for the very first iteration and kind of gets stuck on the page in the second iteration of the loop. It works as intended and gets the correct date for the first company but for the second company it kind of gets stuck on the start page and wont search at all. Just reloading the page wont work either as that will cause the browser to get stuck because you have to manually accept that you want to reload the page in the browser. Any help at all on how to fix this issue is greatly appreciated.
Thanks in advance!
Removed the last big chunk of the code at i.Quit and replaced it with:
i.Navigate "https://icis.corp.delaware.gov/Ecorp/EntitySearch/NameSearch.aspx"
Application.Wait Now + #12:00:10 AM#

Excel:VBA: How to make script dynamic?

I have the following script:
Sub Test1()
Dim IE As Object
Set IE = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
IE.Visible = True
IE.navigate "https://share.amazon.com/sites/IPV/Lists/IPV%20Appeals%20tracker/NewForm.aspx?Source=https%3A%2F%2Fshare%2Eamazon%2Ecom%2Fsites%2FIPV%2FLists%2FIPV%2520Appeals%2520tracker%2FAllItems%2Easpx&RootFolder="
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
IE.document.getElementById("Title_fa564e0f-0c70-4ab9-b863-0177e6ddd247_$TextField").Value = ThisWorkbook.Sheets("sheet1").Range("a1")
IE.document.getElementById("Seller_x0020_ID_2b32b5fa-ace8-44d5-bba5-5c0e321671ed_$TextField").Value = ThisWorkbook.Sheets("sheet1").Range("b1")
IE.document.getElementById("Seller_bba5-5c0e321671ed_$TextField").Value = ThisWorkbook.Sheets("sheet1").Range("c1")
IE.document.getElementById("Seller_bba5-5c0e321671ed_$TextField").Value = ThisWorkbook.Sheets("sheet1").Range("d1")
IE.document.getElementById("Seller_bba5-5c0e321671ed_$TextField").Value = ThisWorkbook.Sheets("sheet1").Range("e1")
End Sub
This script is suppose to copy the values in the following row range in excel and autofill a website form.
The problem with this is that i have applied it for just the first row, but I wanted a way to be able to select which row to apply this to.
I hope my explanation makes sense.
Concatenate in row number e.g.
Dim rowNumber As Long
rowNumber = 2
IE.document.getElementById("Title_fa564e0f-0c70-4ab9-b863-0177e6ddd247_$TextField").Value = ThisWorkbook.Sheets("sheet1").Range("A" & rowNumber)
If using in a loop then you would concatenate in the loop variable and make the id also a variable.
Same principle for your other items.
You could also pick the row number up from a cell
IE.document.getElementById("Title_fa564e0f-0c70-4ab9-b863-0177e6ddd247_$TextField").Value = ThisWorkbook.Sheets("sheet1").Range("A" & ActiveSheet.Range("A1").value)
Avoid ActiveSheet and use the actual sheet name however.

Copying columns of values from Excel worksheet to an HTML table in a website

I'm hoping to copy values from an excel table into an html table. If anyone has experience doing this between Excel 2013 and Internet Explorer 11, please let me know and I can give details. Thank you
The following is a little section of the code for the website that I am trying to automate the fill-in process for:
Table and Source Code for Charge Numbers for the Site
This is showing just one part of the process (the charge numbers), which we have anywhere from 10-25 each week with their corresponding hours charged to them each day (as seen in the table matrix).
There are 7 charge numbers on each page and Charge Numbers all have name="chargeNo" and increase id names with each box downward like so:
'1) id="chargeNo0"
'2) id="chargeNo1"
'3) id="chargeNo2"
'4) id="chargeNo3"
'5) id="chargeNo4"
'6) id="chargeNo5"
'7) id="chargeNo6"
Each day of the week also has set names as follows:
Table and Source Code for Hours Charged Each Day
'Saturday: name="hrs0"
'Sunday: name="hrs1"
'Monday: name="hrs2"
'Tuesday: name="hrs3"
'Wed: name="hrs4"
'Thurs: name="hrs5"
'Fri: name="hrs6"
I'm trying to automate the process of pulling the arrays of charge times I have from Excel and inputting them into their respective textboxes.
This script will convert elements in ColumnA into an array.
Sub MakeArray()
Dim arr As Variant
With ActiveSheet
arr = WorksheetFunction.Transpose(.Range(.[A1], .Cells(Rows.Count, "A").End(xlUp)))
End With
End Sub
Sub MakeString()
Dim s As String
Const DELIMITER = ","
With ActiveSheet
s = Join(WorksheetFunction.Transpose(.Range(.[A1], .Cells(Rows.Count, "A").End(xlUp))), DELIMITER)
End With
End Sub
If you want to pass the array to an HTML TextBox, try this.
Sub CreateArrayAndPassToHTMLTextbox()
Dim ie As Object
Dim arr As Variant
Const DELIMITER = ","
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate "http://home.arcor.de/peter.schleif/SearchForTDelement.html"
While ie.Busy: DoEvents: Wend
With ActiveSheet
arr = WorksheetFunction.Transpose(.Range(.[A1], .Cells(Rows.Count, "A").End(xlUp)))
If Not IsArray(arr) Then arr = Array(arr)
ie.document.getElementById("trackField").Value = Join(arr, DELIMITER)
End With
End Sub
If, for instance, you want to pass data to a ComboBox, try this.
Sub passValueToComboBox1()
Dim ie As Object
Dim oHTML_Element As IHTMLElement
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate "http://peterschleif.bplaced.net/excel/combobox/index.php"
While ie.Busy Or ie.readyState <> 4: DoEvents: Wend
Set oHTML_Element = ie.document.getElementsByName("selectedReportClass")(0)
If Not oHTML_Element Is Nothing Then oHTML_Element.Value = "com.db.moap.report.FUBU7"
For Each oHTML_Element In ie.document.getElementsByTagName("input")
If oHTML_Element.Type = "submit" Then oHTML_Element.Click: Exit For
Next
End Sub

I need help inputting a column from excel in VBA with html macro.

I selected 2 cells and named them "perch" and "stats". using what i inputted in the cell, i was able to run a search and retrieve the data i wanted from the html code(which i changed for safety). But my problem is want to input an entire column not just a single cell. And produce multiple results in a single column. Also the number of inputs will change periodically ~300, so i can't stipulate a specific range.
Private Sub Worksheet_Change(ByVal Target As Range)
If Target.Row = Range("perch").Row And _
Target.Column = Range("perch").Column Then
Dim IE As New InternetExplorer
'IE.Visible = True
IE.navigate "http://"google/not_real_link"
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Dim sTD As String
sTD = Trim(Doc.getElementsByTagName("td")(33).innerText)
IE.Quit
Range("stats").Value = sTD
End If
End Sub
You'll usually see the Worksheet_SelectionChange event used instaed of the Worksheet_Change.
Here are some tips on using these events:
Turn off EnableEvents. If you don't you might re-trigger the event and get stuck in an infinite loop
Use the Intersect method to test to see if your range and the Target intersect.
You should use Target.Cells.Count to see if multiple cells are being effected.
Private Sub Worksheet_SelectionChange(ByVal Target As Range)
If Not Intersect(Target, Range("perch")) Is Nothing Then
Application.EnableEvents = False
Application.EnableEvents = True
End If
End Sub
I'm not really sure how you want to handle multiple values.
Creating a function to add value(s) to the end of a list often helps.
Sub AddStatusRow(value As Variant)
Const StatColumn = 5
Dim lastRow As Long
lastRow = Range(Cells(1, StatColumn), Cells(Rows.count, StatColumn)).End(xlUp).Row + 1
Cells(lastRow, StatColumn) = value
End Sub
You might want to consider iterating over the rows of the table instead of Doc.getElementsByTagName("td")(33).innerText.
You can find lots of great examples of what you want to do from the URL below.
http://www.tushar-mehta.com/publish_train/xl_vba_cases/vba_web_pages_services/index.htm

Extracting multiple search results

I have created a VBA application that allows you to extract search results from the canada411.ca site. You simply insert values into to the values "Where" and "What" and "Title", "Location, and "Phone" will spit out. In my code What = "Name". Here is my code:
Private Sub Worksheet_Change(ByVal Target As Range)
If Target.Row = Range("Name").Row And _
Target.Column = Range("Name").Column Then
End If
If Target.Row = Range("Where").Row And _
Target.Column = Range("Where").Column Then
'Set Variables What and Where from Canada411.ca to Values on Excel WorkSheet
Dim IE As New InternetExplorer
IE.Visible = True
IE.navigate ("http://canada411.yellowpages.ca/search/si/1/") & _
Range("Name").Value & "/" & Range("Where").Value
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
'Extract from Canada411.ca Source element (first search result)
Range("Title").Value = Trim(Doc.getElementsByTagName("h3")(0).innerText)
Range("Phone").Value = Trim(Doc.getElementsByTagName("h4")(0).innerText)
Range("Location").Value = Trim(Doc.getElementsByClassName("address")(0).innerText)
IE.Quit
'Extract for Second Search result
'Third Search result etc.
End If
End Sub
My problem is that I don't know how to get the remaining results on the page, I can only get the first result on the first page. The source code for the subsequent search results are the same as the first, but I cannot seem to make it work. (Perhaps there is a shortcut after you have the code for the first one, to get the rest?) I am new to VBA and HTML and appreciate the help.
Well, you have two options.
1) Learn how to navigate the DOM using the Tools->References library 'Microsoft HTML Object Library' and extract that way.
2) It is possible to pull the web page into a Excel worksheet and then you only need to pull out data from each cell. Much easier but ties you to Excel. Use the Macro recorder and then use the GUI , on the Ribbon Data->From Web and follow the wizard.
In your link change the « 1 » for a 2, 3, 4 ... These are the page numbers !
http://canada411.yellowpages.ca/search/si/1/
http://canada411.yellowpages.ca/search/si/2/
http://canada411.yellowpages.ca/search/si/3/
...