Excel VBA browser scraping loop: Reload page for each loop - html

I am having a bit trouble when web scraping, basically I have a loop that goes to a website and searches for a specific company and clicks on search and then goes to the company page and extracts the company's founding date. (https://icis.corp.delaware.gov/Ecorp/EntitySearch/NameSearch.aspx). I basically have a large table but I have a smaller one here for simplicity, and the loop goes from top to bottom of the table in excel and extracts the founding date in the B column next to the name.
{Table example}. My code is as follows:
Option Explicit
Sub click_search()
Dim i As SHDocVw.InternetExplorer
Dim idoc As MSHTML.HTMLDocument
Dim output As Range
Dim txt_input As Range
Dim wb As Workbook
Dim ws As Worksheet
Dim n As Integer
Set wb = ActiveWorkbook
Set ws = wb.Sheets(1)
Set i = New InternetExplorer
i.Visible = True
i.Navigate "https://icis.corp.delaware.gov/Ecorp/EntitySearch/NameSearch.aspx"
Do While i.ReadyState <> READYSTATE_COMPLETE
Loop
Set idoc = i.Document
On Error Resume Next
For n = 1 To 3
idoc.getElementById("ctl00_ContentPlaceHolder1_frmEntityName").Value = Cells(n, 1).Value
Application.Wait Now + #12:00:01 AM#
idoc.getElementById("ctl00_ContentPlaceHolder1_btnSubmit").click
Application.Wait Now + #12:00:15 AM#
idoc.getElementById("ctl00_ContentPlaceHolder1_rptSearchResults_ctl00_lnkbtnEntityName").click
Application.Wait Now + #12:00:10 AM#
Dim V As Variant
Set V = idoc.getElementById("ctl00_ContentPlaceHolder1_lblIncDate")
Cells(n, 2) = V.innerHTML
i.Quit
Set i = New InternetExplorer
i.Visible = True
i.Navigate "https://icis.corp.delaware.gov/Ecorp/EntitySearch/NameSearch.aspx"
Application.Wait Now + #12:00:10 AM#
Next n
End Sub
The problem is that I have to quit internet explorer and restart it in order to be able to search for the next company, this is where I have a problem in my loop as it just works for the very first iteration and kind of gets stuck on the page in the second iteration of the loop. It works as intended and gets the correct date for the first company but for the second company it kind of gets stuck on the start page and wont search at all. Just reloading the page wont work either as that will cause the browser to get stuck because you have to manually accept that you want to reload the page in the browser. Any help at all on how to fix this issue is greatly appreciated.
Thanks in advance!

Removed the last big chunk of the code at i.Quit and replaced it with:
i.Navigate "https://icis.corp.delaware.gov/Ecorp/EntitySearch/NameSearch.aspx"
Application.Wait Now + #12:00:10 AM#

Related

VBA: script to extract from a HTML table?

Apologies for the basic question but I've only been coding about a week!
To give some context I have a basic task where I have to manually search and type in a number to retrieve a persons qualifications and expiry dates which could be multiple per person. I've managed to write a script that searches each person in turn from a list of qualification numbers I have on an excel spreadsheet from this site:
https://www.bindt.org/Certification/pcn-certificate-verification/
My script then currently clicks the more information next to each qualification to expand the data (unsure if this is required in all honesty).
My problem is I have no idea how to extract the data back into excel, I've not managed to extract it by using class names, grabbing the whole table etc.
Would it be possible to get some help with writing the code to grab the data?
For an example if you use the number 213474 on thee site it will give you an idea of what I need to grab.
Many thanks in advance!
Tom
Edit 8/8/21
Hi Raymond,
I've put my code below so far, it all falls down at:
Set Table = IE.document.getElementByClassName("container")(1)
I do have another issue where when I go to the second person in the loop as they all have different qualification numbers the go to error 10 doesn't seem to work.
My main issue is no matter what I try to grab off the results be it tbody, tr etc I can't seem to find any combination that works.
Sub PCN()
Dim IE As Object
Dim lastrow As Integer
ThisWorkbook.Sheets("PCN").Range("A1").Select
lastrow = ActiveSheet.Cells.SpecialCells(xlCellTypeLastCell).Row
Dim i As Integer
Dim rng As Range
Set rng = Application.Range("c2:C" & lastrow)
For i = 2 To lastrow
Debug.Print rng.Cells(RowIndex:=i, ColumnIndex:="c").Value
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.navigate "https://www.bindt.org/Certification/pcn-certificate-verification/pcn-verification-form"
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
IE.document.getElementById("txtNumber").Value = ThisWorkbook.Sheets("PCN").Range("C" & i).Value
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
IE.document.getElementById("submit").Click
On Error GoTo 10
IE.document.getElementById("certificate-link-1").Click
IE.document.getElementById("certificate-link-2").Click
IE.document.getElementById("certificate-link-3").Click
IE.document.getElementById("certificate-link-4").Click
IE.document.getElementById("certificate-link-5").Click
IE.document.getElementById("certificate-link-6").Click
IE.document.getElementById("certificate-link-7").Click
IE.document.getElementById("certificate-link-8").Click
IE.document.getElementById("certificate-link-9").Click
IE.document.getElementById("certificate-link-10").Click
IE.document.getElementById("certificate-link-11").Click
IE.document.getElementById("certificate-link-12").Click
IE.document.getElementById("certificate-link-13").Click
IE.document.getElementById("certificate-link-14").Click
IE.document.getElementById("certificate-link-15").Click
IE.document.getElementById("certificate-link-16").Click
IE.document.getElementById("certificate-link-17").Click
IE.document.getElementById("certificate-link-18").Click
IE.document.getElementById("certificate-link-19").Click
IE.document.getElementById("certificate-link-20").Click
IE.document.getElementById("certificate-link-21").Click
IE.document.getElementById("certificate-link-22").Click
10
Set Table = IE.document.getElementByClassName("container")(1)
Set tbody = Table.getElementsByTagName("tbody")(0)
trs = tbody.getElementsByTagName("tr")
Do While trs Is Nothing
DoEvents
Application.Wait Now + TimeValue("00:00:02")
trs = tbody.getElementsByTagName("tr")
Loop
Dim myvalue As String: myvalue = trs.Cells(3).innerText
IE.Quit
Next
End Sub

How can I scrape multiple pages/links at once using VBA?

I'm currrently trying to scrape info from this Reddit Page. My goal is to make excel open all the posts in new tabs and then I want to scrape information from each of those pages, since the starting page doesn't have as much information.
I've been trying for the last few hours to figure this out, but I'm admittedly pretty confused about how to do it, just overall unsure what to do next, so any pointers would be greatly appreciated!
Here is my current code, it works decently enough but as I said, I'm not sure what I should do next to open the links it finds one by one and scrape each page for data.
The links are scraped off that first page and then added to my spreadsheet right now, but if possible I'd like to just skip that step and scrape them all at once.
Thanks! :)
Sub GetData()
Dim objIE As InternetExplorer
Dim itemEle As Object
Dim upvote As Integer, awards As Integer, animated As Integer
Dim postdate As String, upvotepercent As String, oc As String, filetype As String, linkurl As String, myhtmldata As String, visiComments As String, totalComments As String, removedComments As String
Dim y As Integer
Set objIE = New InternetExplorer
objIE.Visible = False
objIE.navigate (ActiveCell.Value)
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
y = 1
For Each itemEle In objIE.document.getElementsByClassName("flat-list buttons")
visiComments = itemEle.getElementsByTagName("a")(0).innerText
linkurl = itemEle.getElementsByTagName("a")(0).href
Sheets("Sheet1").Range("A" & y).Value = visiComments
Sheets("Sheet1").Range("B" & y).Value = linkurl
y = y + 1
Next
End Sub
You should be able to gather the urls then visit in a loop and write results from page visited to array, then array to sheet. Add this after your existing line
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Add:
Dim nodeList As Object , i As Long, urls(), results()
Note: You are only potentially gaining on the page loads, as VBA is single threaded. To do that you would need to store a reference to each tab, or open all first, then loop through relevant open windows to do the scrape. My preference would be to keep in same tab to be honest.
Set nodeList = ie.document.querySelectorAll(".comments")
Redim urls(0 To nodeList.Length-1)
Redim results(1 to nodeList.Length, 1 to 3)
'Store all urls in an array to later loop
For i = 0 To nodeList.Length -1
urls(i) = nodeList.item(i).href
Next
For i = LBound(urls) To UBound(urls)
ie.Navigate2 urls(i)
While ie.Busy Or ie.Readystate <> 4: DoEvents:Wend
'may need a pause here
results(i + 1, 1) = ie.document.querySelector("a.title").innerText 'title
results(i + 1, 2) = ie.document.querySelector(".number").innerText 'upvotes
results(i + 1, 3) = ie.document.querySelector(".word").NextSibling.nodeValue '%
Next
ActiveSheet.Cells(1,1).Resize(UBound(results,1) , UBound(results,2)) = results

Excel Macros - Using Excel Data to scrape HTML page

As with many other questions, I'm really new to using Excel macros and HTML.
I'm building a table with entries of items from Dungeons and Dragons (DnD) that I want to get descriptions of from a particular web page: http://www.5esrd.com/gamemastering/magic-items/ . For example, An entry in my table would be Adamantine Armor, so I would want the macro to search the web page for that listing, scrape the description that's entered on the webpage and save that to the excel document. In this case, that specific description (as per the 5esrd webpage) is as follows:
Armor (medium or heavy, but not hide), uncommon
This suit of armor is reinforced with adamantine, one of the hardest substances in existence. While you’re wearing it, any critical hit against you becomes a normal hit.
On this page, all the items are links to pages that have them grouped by type (Sword, wand, etc.). I would like to iterate through my row of items in Excel, search for that item on the page and scrape its description into my excel table.
I was following along this page: https://www.wiseowl.co.uk/blog/s393/scrape-website-html.htm, which scrapes data from stackoverflow's home page. It directly targets some of the tables that the web page uses to organize its entries. The DnD page I'm trying to use doesn't organize its entries into tables like this, so I'm a little lost with how to proceed.
If anyone could help point me in the right direction, I would be very grateful!
This is the code I use for scraping a web page on our company site. I put all my data in column B, so make adjustments accordingly. This should get you started.
Sub TestScrape()
' SCRAPE A WEB PAGE
Dim ieDoc As New HTMLDocument 'ieDocDocument
Dim tdCollection As Object 'table that has the javascript attributes and contains the element I want to click
Dim AnchorLinks As Object
Dim tdElements As Object
Dim tdElement As Object
Dim AnchorLink As Object
Dim lRow As Long
Dim ie As InternetExplorer
Dim cls As IHTMLElementCollection
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.Left = 0
.TheaterMode = True '<<-- Comment this out if you don't want Theater Mode
.Navigate url:="Enter your URL here"
While .ReadyState < 4 Or .Busy: DoEvents: Wend
End With
Application.Wait Now + TimeSerial(0, 0, 4)
ieDoc.body.innerHTML = ie.Document.body.innerHTML
With ieDoc.body
Set AnchorLinks = .getElementsByTagName("table")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 2).Value = tdElement.innerText
lRow = lRow + 1
Next
Application.Wait (Now + TimeValue("0:00:1"))
With ie
.TheaterMode = False
.Quit
End With
Set ie = Nothing
End Sub

Excel VB Searching for text on a webpage and Copying information in the same Element

I am relatively new VBA.
I am trying to use this code to grab a bit of information from a website. When I do it by Element I have to search for the tag name which is tr and use a number next to it to define which one I want to use. The problem with that is it changes frequently with the position on the website. Currently the Keyword I want to search for and the information it contains is like so:
<tr>
<td class="nt">Operations</td>
<td>Windows</td>
</tr>
So if I can search by the class "Operations", and get the information "Windows", that would help. Also, I currently having an error
Next without For
If possible, is there a way I can use this to do multiple searches before I close the page? So I look for multiple specific words and input that data into different cells before moving onto the next column where it would repeat until completed at the end of the x value. I currently only have it set to x=2 To 5 but I would like to increase that to 10 or higher in the future.
The current code looks like this.
Private Sub Worksheet_Change(ByVal Target As Range)
For x = 2 To 5
If Target.Row = Cells(x, 35).Row And _
Target.Column = Cells(x, 35).Column Then
'If Target.Row = Range("ManufacturerPartNumber").Row And _
'Target.Column = Range("ManufacturerPartNumber").Column Then
Dim IE As New InternetExplorer
'IE.Visible = True
'For x = 2 To 5
'IE.navigate "" & Range("Website_1").Value
IE.navigate "" & Cells(x, 35).Value
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Dim sDD As String
sDD = InStr(1, IE.document.body.innerHTML, "Processor Model")
'sDD = Trim(Doc.getElementsByTagName("Processor Model")(1).innerText) 'Use this with tag like dd and number for which it appears like 0 or 1
IE.Quit
Dim aDD As Variant
aDD = Split(sDD, ",")
Cells(x, 44).Value = aDD(0)
'Range("ProcessorNumberCd").Value = aDD(0)
'Range("OSProvided").Value = aDD(0)
Next x
End If
'MsgBox "Complete"
End Sub
I think you want to grab the 'inner text'. Take a look at the example below.
Sub Scraper()
Dim item As Long
Dim priceStr As String
Dim priceTag As Object
Dim priceTable As Object
item = "10011" 'this will eventually be placed in a loop for multiple searches
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True
' navigate and download the web page
objIE.Navigate "www.google.com"
Do While objIE.ReadyState <> 4 Or objIE.Busy
DoEvents
Loop
'objIE.Document.getElementsByTagName("input")(0).Value = item
'objIE.Document.getElementByID("FDI").Click
Set priceTable = objIE.Document.getElementByID("price_FGC")
Set priceTag = priceTable.getElementsByTagName("u")(3)
priceStr = priceTag.innerText
Sheet1.Range("A1").Value = priceStr
objIE.Quit
End Sub
Also, check out this link for several other ways of how to do other, similar things.
http://www.tushar-mehta.com/publish_train/xl_vba_cases/vba_web_pages_services/index.htm

Copying columns of values from Excel worksheet to an HTML table in a website

I'm hoping to copy values from an excel table into an html table. If anyone has experience doing this between Excel 2013 and Internet Explorer 11, please let me know and I can give details. Thank you
The following is a little section of the code for the website that I am trying to automate the fill-in process for:
Table and Source Code for Charge Numbers for the Site
This is showing just one part of the process (the charge numbers), which we have anywhere from 10-25 each week with their corresponding hours charged to them each day (as seen in the table matrix).
There are 7 charge numbers on each page and Charge Numbers all have name="chargeNo" and increase id names with each box downward like so:
'1) id="chargeNo0"
'2) id="chargeNo1"
'3) id="chargeNo2"
'4) id="chargeNo3"
'5) id="chargeNo4"
'6) id="chargeNo5"
'7) id="chargeNo6"
Each day of the week also has set names as follows:
Table and Source Code for Hours Charged Each Day
'Saturday: name="hrs0"
'Sunday: name="hrs1"
'Monday: name="hrs2"
'Tuesday: name="hrs3"
'Wed: name="hrs4"
'Thurs: name="hrs5"
'Fri: name="hrs6"
I'm trying to automate the process of pulling the arrays of charge times I have from Excel and inputting them into their respective textboxes.
This script will convert elements in ColumnA into an array.
Sub MakeArray()
Dim arr As Variant
With ActiveSheet
arr = WorksheetFunction.Transpose(.Range(.[A1], .Cells(Rows.Count, "A").End(xlUp)))
End With
End Sub
Sub MakeString()
Dim s As String
Const DELIMITER = ","
With ActiveSheet
s = Join(WorksheetFunction.Transpose(.Range(.[A1], .Cells(Rows.Count, "A").End(xlUp))), DELIMITER)
End With
End Sub
If you want to pass the array to an HTML TextBox, try this.
Sub CreateArrayAndPassToHTMLTextbox()
Dim ie As Object
Dim arr As Variant
Const DELIMITER = ","
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate "http://home.arcor.de/peter.schleif/SearchForTDelement.html"
While ie.Busy: DoEvents: Wend
With ActiveSheet
arr = WorksheetFunction.Transpose(.Range(.[A1], .Cells(Rows.Count, "A").End(xlUp)))
If Not IsArray(arr) Then arr = Array(arr)
ie.document.getElementById("trackField").Value = Join(arr, DELIMITER)
End With
End Sub
If, for instance, you want to pass data to a ComboBox, try this.
Sub passValueToComboBox1()
Dim ie As Object
Dim oHTML_Element As IHTMLElement
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate "http://peterschleif.bplaced.net/excel/combobox/index.php"
While ie.Busy Or ie.readyState <> 4: DoEvents: Wend
Set oHTML_Element = ie.document.getElementsByName("selectedReportClass")(0)
If Not oHTML_Element Is Nothing Then oHTML_Element.Value = "com.db.moap.report.FUBU7"
For Each oHTML_Element In ie.document.getElementsByTagName("input")
If oHTML_Element.Type = "submit" Then oHTML_Element.Click: Exit For
Next
End Sub