Cannot Read New Page HTML Excel VBA - html

I am trying to get scraped data from a results page, after I have entered search variables on the previous page. I cannot get the .doc HTML to reflect the new page's html, (well not consistently, it reads correctly in 5-10% of cases). I have looked at other solutions for this problem and the majority come down to not waiting for the new page html to appear, but I have put 5,10 even twenty second waits in but that doesn't seem to do the trick.
Any advice / pointers would be welcome
<code>Sub ParseInternet()
Dim post_code As String
Dim house_num As String
post_code = CStr(Sheet1.Cells(9, 2).Value) 'get search data from worksheet
house_num = CStr(Sheet1.Cells(9, 1).Value)
If post_code = "" Then
MsgBox ("House Name /Number and postcode MUST be entered")
Exit Sub
End If
Set site = CreateObject("InternetExplorer.application")
Dim url As String
url = "http://landregistry.data.gov.uk/app/ppd"
site.Navigate url
While site.ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
site.Visible = True
site.Document.getElementById("paon").Value = house_num
site.Document.getElementById("postcode").Value = post_code
Set my_classes = site.Document.getElementsByClassName("btn btn-primary")
For Each my_class In my_classes
my_class.Click
Next my_class
While site.ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
Application.Wait (Now + TimeValue("0:00:10"))
Dim a As String
Dim b As String
b = "property by searching for"
'''STILL PREVIOUS PAGE HTML IN .DOC
Set HTMLLI = site.Document.getElementsByTagName("div")
For xli = 0 To HTMLLI.Length - 1
a = site.Document.getElementsByTagName("div")(xli).innerText
If InStr(site.Document.getElementsByTagName("div")(xli).innerText, b) Then
Sheet1.Cells(9, 4).Value = site.Document.getElementsByTagName("div")(xli).innerText
Exit For
End If
Next xli
site.Quit
Set site = Nothing
End Sub
</code>

Related

Can we fetch the specific data via using urls in vba

I have 15 different URLs, and I need to fetch price from the particular website in Excel a particular column, can you please help me out. It's my first VBA program and I try but it show my syntax error.
Sub myfile()
Dim IE As New InternetExplorer Dim url As String Dim item As
HTMLHtmlElement Dim Doc As HTMLDocument Dim tagElements As Object
Dim element As Object Dim lastRow Application.ScreenUpdating =
False Application.DisplayAlerts = False Application.EnableEvents =
False Application.Calculation = xlCalculationManual url =
"https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
IE.navigate url IE.Visible = True Do DoEvents Loop Until
IE.readyState = READYSTATE_COMPLETE
Set Doc = IE.document
lastRow = Sheet1.UsedRange.Rows.Count + 1 Set tagElements =
Doc.all.tags("tr") For Each element In tagElements
If InStr(element.innerText, "ciq-price")> 0 And
InStr(element.className, "ciq-product-name") > 0 Then
Sheet1.Cells(lastRow, 1).Value = element.innerText
' Exit the for loop once you get the temperature to avoid unnecessary processing
Exit For End If Next
IE.Quit Set IE = Nothing Application.ScreenUpdating = True
Application.DisplayAlerts = True Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
End Sub
You can't copy any web scraping macro for your purposes. Every page has it's own HTML code structure. So you must write for every page an own web scraping macro.
I can't explain all about web scraping with VBA here. Please start your recherche for information with "excel vba web scraping" and "document object model". Further you need knowlege about HTML and CSS. In best case also about JavaScript:
The error message user-defined type not defined ocours because you use early binding without a reference to the libraries Microsoft HTML Object Library and Microsoft Internet Controls. You can read here how to set a reference via Tools -> References... and about the differences between early and late binding Early Binding v/s Late Binding and here deeper information from Microsoft Using early binding and late binding in Automation
To get the prices from the shown url you can use the following macro. I use late binding:
Option Explicit
Sub myfile()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim element As Object
Dim item As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
Set tagElements = IE.document.getElementsByClassName("ciq-online-offer-item ")
For Each element In tagElements
Set item = element.getElementsByTagName("td")(1)
ActiveSheet.Cells(lastRow, 1).Value = Trim(item.innerText)
lastRow = lastRow + 1
Next
IE.Quit
Set IE = Nothing
End Sub
Edit for a second Example:
The new link leads to an offer. I assume the price of the product is to be fetched. No loop is needed for this. You just have to find out in which HTML segment the price is and then you can decide how to get it. In the end there are only two lines of VBA that write the price into the Excel spreadsheet.
I'm in Germany and Excel has automatically set the currency sign from Dollar to Euro. This is of course wrong. Depending on where you are, this may have to be intercepted.
Sub myfile2()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://www.wayfair.com/kitchen-tabletop/pdx/cuisinart-air-fryer-toaster-oven-cui3490.html"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
'Break for 3 seconds
Application.Wait (Now + TimeSerial(0, 0, 3))
Set tagElements = IE.document.getElementsByClassName("BasePriceBlock BasePriceBlock--highlight")(0)
ActiveSheet.Cells(lastRow, 1).Value = Trim(tagElements.innerText)
IE.Quit
Set IE = Nothing
End Sub

Need Excel VBA to navigate website and download specific files

Trying to understand how to interact with a website in a specific way. This is part of a larger code I'm working on that will loop through a list of ContractorIDs. What I need to do from here is the following:
Navigate to this website: https://ufr.osd.state.ma.us/WebAccess/SearchDetails.asp?ContractorID=042786217&FilingYear=2018&nOrgPage=7&Year=2018
Find the link that says "UFR Filing with Audited Financials" and click on it. (if it's not there, end the sub)
On the ensuing page, find the link that is identified under "Document Category" as "UFR Excel Template" and click on it. (in this case, the link says "15-UFR18.xls", however since there's no consistent link naming scheme, the correct link will always have to be identified by the label under "Document Category" as mentioned. If the link doesn't exist, exit sub.)
On the ensuing page, click the "Download" link at the top and save the file under the following file path (which would be created at this time): C:\Documents\042786217\2018.
Edit: Code below gets me to the point where the download button is clicked, then I get the Open/Save/Cancel dialog box. Nearly there, just need to figure out how to save the file into a specific path.
Option Explicit
Sub UFRScraper()
If MsgBox("UFR Scraper will run now. Do you wish to continue?", vbYesNo) = vbNo Then Exit Sub
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
Dim ele As Object
Dim tbl_Providers As ListObject: Set tbl_Providers = ThisWorkbook.Worksheets("tbl_ProviderList").ListObjects("tbl_Providers")
Dim FEIN As String: FEIN = ""
Dim FEINList As Range: Set FEINList = tbl_Providers.ListColumns("FEIN").DataBodyRange
Dim ProviderName As String: ProviderName = ""
Dim ProviderNames As Range: Set ProviderNames = tbl_Providers.ListColumns("Provider Name").DataBodyRange
Dim FiscalYear As String: FiscalYear = ""
Dim urlUFRDetails As String: urlUFRDetails = ""
Dim i As Integer
' Create InternetExplorer Object
Set IE = CreateObject("InternetExplorer.Application")
' Show (True)/Hide (False) IE
IE.Visible = True
i = 1
For i = 1 To 3 'Limited to 3 during testing. Change when ready.
FEIN = FEINList(i, 1)
ProviderName = ProviderNames(i, 1)
urlUFRDetails = "https://ufr.osd.state.ma.us/WebAccess/SearchDetails.asp?ContractorID=" & FEIN & "&FilingYear=2018&nOrgPage=1&Year=2018"
IE.Navigate urlUFRDetails
' Wait while IE loading...
'IE ReadyState = 4 signifies the webpage has loaded (the first loop is set to avoid inadvertently skipping over the second loop)
Do While IE.ReadyState = 4: DoEvents: Loop 'Do While
Do Until IE.ReadyState = 4: DoEvents: Loop 'Do Until
'Step 2 is done here
Dim filingFound As Boolean: filingFound = False
For Each ele In IE.Document.getElementsByTagName("a")
If ele.innerText = "UFR Filing with Audited Financials" Then
filingFound = True
IE.Navigate ele.href
Do While IE.ReadyState = 4: DoEvents: Loop 'Do While
Do Until IE.ReadyState = 4: DoEvents: Loop 'Do Until
Exit For
End If
Next ele
If filingFound = False Then
GoTo Skip
End If
'Step 3
Dim j As Integer: j = 0
Dim UFRFileFound As Boolean: UFRFileFound = False
For Each ele In IE.Document.getElementsByTagName("li")
j = j + 1
If ele.innerText = "UFR Excel Template" Then
UFRFileFound = True
IE.Navigate "https://ufr.osd.state.ma.us/WebAccess/documentviewact.asp?counter=" & j - 4
Do While IE.ReadyState = 4: DoEvents: Loop 'Do While
Do Until IE.ReadyState = 4: DoEvents: Loop 'Do Until
Exit For
End If
Next ele
If UFRFileFound = False Then
GoTo Skip
End If
'Step 4
IE.Document.getElementById("LinkButton2").Click
'**Built in wait time to avoid accidentally overloading server with repeated quick requests during development and testing**
Skip:
Application.Wait (Now + TimeValue("0:00:03"))
MsgBox "Loop " & i & " complete."
Next i
'Unload IE
IE.Quit
Set IE = Nothing
Set objElement = Nothing
Set objCollection = Nothing
MsgBox "Process complete!"
End Sub
I have tried step 3 with some what lengthy way. but could not provide complete download code as (after one successful manual attempt) at present even manual download attempt causing massage "The File Could Not Be Retrieved" (maybe server side constrain)
Code only take you down to the cell containing href of the xlx file
Dim doc As HTMLDocument
Dim Tbl As HTMLTable, Cel As HTMLTableCell, Rw As HTMLTableRow, Col As HTMLTableCol
Set doc = IE.document
For Each ele In IE.document.getElementsByClassName("boxedContent")
For Each Tbl In ele.getElementsByTagName("table")
For Each Rw In Tbl.Rows
For Each Cel In Rw.Cells
'Debug.Print Cel.innerText
If InStr(1, Cel.innerText, "UFR Excel Template") > 0 Then
Debug.Print Rw.Cells(2).innerText & " - " & Rw.Cells(2).innerHTML
End If
Next
Next Rw
Next Tbl
Next
Once the href is available PtrSafe Function or WinHTTPrequest or other methods could be used to download the file. Welcome and eager to learn some more efficient answers in this case from experts like #QHarr and others.

How to scrape address information from Google Maps?

I'm trying to create a macro that pulls a list of addresses from Excel and inputs each one into Google Maps.
It then pulls the address line, city/zip and country from Google Maps back into Excel.
It works up to the point where it scrapes the information from Google Maps.
Sub AddressLookup()
Application.ScreenUpdating = False
For i = 1 To Sheet1.Cells(Rows.Count, 1).End(xlUp).Row
Dim IE As InternetExplorer
Dim itemELE As Object
Dim address As String
Dim city As String
Dim country As String
Set IE = New InternetExplorer
IE.Visible = True
IE.navigate "https://www.google.com/maps"
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Search As MSHTML.HTMLDocument
Set Search = IE.document
Search.all.q.Value = Cells(i, 1).Value
Dim ele As MSHTML.IHTMLElement
Dim eles As MSHTML.IHTMLElementCollection
Set eles = Search.getElementsByTagName("button")
For Each ele In eles
If ele.ID = "searchbox-searchbutton" Then
ele.click
Else
End If
Next ele
For Each itemELE In IE.document.getElementsByClassName("widget-pane widget-pane-visible")
address = itemELE.getElementsByClassName("section-hero-header-description")(0).getElementsByTagName("h1")(0).innerText
city = itemELE.getElementsByClassName("section-hero-header-description")(0).getElementsByTagName("h2")(0).innerText
country = itemELE.getElementsByClassName("section-hero-header-description")(0).getElementsByTagName("h2")(1).innerText
Next
Cells(i, 2).Value = Trim(address)
Cells(i, 3).Value = Trim(city)
Cells(i, 4).Value = Trim(country)
MsgBox country
Next
Application.ScreenUpdating = True
End Sub
This answer uses the OpenStreetMap Nominatim API with the VBA-Web WebRequest.
In opposite to scraping withInternet Explorerthis is designed for this purpose (faster, more reliable, more information). This can be done with the Geocode API too, but you need an API-Key and keep track of the cost.
If you use https://nominatim.openstreetmap.org/search respect their Usage Policy, but better have your own installation.
Public Function GeocodeRequestNominatim(ByVal sAddress As String) As Dictionary
Dim Client As New WebClient
Client.BaseUrl = "https://nominatim.openstreetmap.org/"
Dim Request As New WebRequest
Dim Response As WebResponse
Dim address As Dictionary
With Request
.Resource = "search/"
.AddQuerystringParam "q", sAddress
.AddQuerystringParam "format", "json"
.AddQuerystringParam "polygon", "1"
.AddQuerystringParam "addressdetails", "1"
End With
Set Response = Client.Execute(Request)
If Response.StatusCode = WebStatusCode.Ok Then
Set address = Response.Data(1)("address")
Set GeocodeRequestNominatim = address
'Dim Part As Variant
'For Each Part In address.Items
' Debug.Print Part
'Next Part
Else
Debug.Print "Error: " & Response.StatusCode & " - " & Response.Content
End If
End Function
Example (prints country, for other fields have a look at the returned JSON-String in example on Nomination Website):
Debug.Print GeocodeRequestNominatim("united nations headquarters,USA")("country")
The geocoding API is no longer "free" though I actually believe with a billing account set-up you can scrape for free if you remain within a certain threshold. As a new release (maps/APIs has been updated) I think the expectation is that these APIs are used in conjunction with actual maps (but don't quote me on that).
Please note the following:
1) Use a proper wait for page load and after .click
While ie.Busy Or ie.readyState < 4: DoEvents: Wend
2) Use .Navigate2 rather than .Navigate
3) Use ids as faster for selections. They generally are unique so no looping required
4) In this case additional time is needed to allow for url to update and map to zoom etc. I have added a timed loop for this. I show a single example as it is clear you know how to loop.
Option Explicit
Public Sub GetInfo()
Dim ie As New InternetExplorer, arr() As String, address As String, city As String, country As String
Dim addressElement As Object, t As Date, result As String
Const MAX_WAIT_SEC As Long = 10 '<==adjust time here
With ie
.Visible = True
.Navigate2 "https://www.google.com/maps"
While .Busy Or .readyState < 4: DoEvents: Wend
With .document
.querySelector("#searchboxinput").Value = "united nations headquarters,USA"
.querySelector("#searchbox-searchbutton").Click
End With
While .Busy Or .readyState < 4: DoEvents: Wend
t = Timer
Do
DoEvents
On Error Resume Next
Set addressElement = .document.querySelector(".section-info-line span.widget-pane-link")
result = addressElement.innerText
If Timer - t > MAX_WAIT_SEC Then Exit Do
On Error GoTo 0
Loop While addressElement Is Nothing
If InStr(result, ",") > 0 Then
arr = Split(result, ",")
address = arr(0)
city = arr(1)
country = arr(2)
With ActiveSheet
.Cells(1, 2).Value = Trim$(address)
.Cells(1, 3).Value = Trim$(city)
.Cells(1, 4).Value = Trim$(country)
End With
End If
Debug.Print .document.URL
.Quit
End With
End Sub
In terms of selectors -
Commercial addresses:
.section-info-line span.widget-pane-link
And feedback from OP re: residential:
.section-hero-header div.section-hero-header-description
After running your code and inspecting Google's address search result, I was able to retrieve the entire address block 'City, Province Postal_Code' by referencing the span tag inside the section-hero-header-subtitle class.
Without making any other changes to your code, add the following line above your For-Each loop (that loops through widget-pane widget-pane visible class) and step thru the code using F8.
Debug.Print IE.Document.getElementsByClassName("section-hero-header-subtitle")(0).getElementsByTagName("span")(0).innerText

Excel VB Searching for text on a webpage and Copying information in the same Element

I am relatively new VBA.
I am trying to use this code to grab a bit of information from a website. When I do it by Element I have to search for the tag name which is tr and use a number next to it to define which one I want to use. The problem with that is it changes frequently with the position on the website. Currently the Keyword I want to search for and the information it contains is like so:
<tr>
<td class="nt">Operations</td>
<td>Windows</td>
</tr>
So if I can search by the class "Operations", and get the information "Windows", that would help. Also, I currently having an error
Next without For
If possible, is there a way I can use this to do multiple searches before I close the page? So I look for multiple specific words and input that data into different cells before moving onto the next column where it would repeat until completed at the end of the x value. I currently only have it set to x=2 To 5 but I would like to increase that to 10 or higher in the future.
The current code looks like this.
Private Sub Worksheet_Change(ByVal Target As Range)
For x = 2 To 5
If Target.Row = Cells(x, 35).Row And _
Target.Column = Cells(x, 35).Column Then
'If Target.Row = Range("ManufacturerPartNumber").Row And _
'Target.Column = Range("ManufacturerPartNumber").Column Then
Dim IE As New InternetExplorer
'IE.Visible = True
'For x = 2 To 5
'IE.navigate "" & Range("Website_1").Value
IE.navigate "" & Cells(x, 35).Value
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Dim sDD As String
sDD = InStr(1, IE.document.body.innerHTML, "Processor Model")
'sDD = Trim(Doc.getElementsByTagName("Processor Model")(1).innerText) 'Use this with tag like dd and number for which it appears like 0 or 1
IE.Quit
Dim aDD As Variant
aDD = Split(sDD, ",")
Cells(x, 44).Value = aDD(0)
'Range("ProcessorNumberCd").Value = aDD(0)
'Range("OSProvided").Value = aDD(0)
Next x
End If
'MsgBox "Complete"
End Sub
I think you want to grab the 'inner text'. Take a look at the example below.
Sub Scraper()
Dim item As Long
Dim priceStr As String
Dim priceTag As Object
Dim priceTable As Object
item = "10011" 'this will eventually be placed in a loop for multiple searches
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True
' navigate and download the web page
objIE.Navigate "www.google.com"
Do While objIE.ReadyState <> 4 Or objIE.Busy
DoEvents
Loop
'objIE.Document.getElementsByTagName("input")(0).Value = item
'objIE.Document.getElementByID("FDI").Click
Set priceTable = objIE.Document.getElementByID("price_FGC")
Set priceTag = priceTable.getElementsByTagName("u")(3)
priceStr = priceTag.innerText
Sheet1.Range("A1").Value = priceStr
objIE.Quit
End Sub
Also, check out this link for several other ways of how to do other, similar things.
http://www.tushar-mehta.com/publish_train/xl_vba_cases/vba_web_pages_services/index.htm

Getting data from HTML source in VBA (excel)

I'm trying to collect data from a website, which should be manageable once the source is in string form. Looking around I've assembled some possible solutions but have run into problems with all of them:
Use InternetExplorer.Application to open the url and then access the inner HTML
Inet
use Shell command to run wget
Here are the problems I'm having:
When I store the innerHTML into a string, it's not the entire source, only a fraction
ActiveX does not allow the creation of the Inet object (error 429)
I've got the htm into a folder on my computer, how do I get it into a string in VBA?
Code for 1:
Sub getData()
Dim url As String, ie As Object, state As Integer
Dim text As Variant, startS As Integer, endS As Integer
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 0
url = "http://www.eoddata.com/stockquote/NASDAQ/AAPL.htm"
ie.Navigate url
state = 0
Do Until state = 4
DoEvents
state = ie.readyState
Loop
text = ie.Document.Body.innerHTML
startS = InStr(ie.Document.Body.innerHTML, "7/26/2012")
endS = InStr(ie.Document.Body.innerHTML, "7/25/2012")
text = Mid(ie.Document.Body.innerHTML, startS, endS - startS)
MsgBox text
If I were trying to pull the opening price off from 08/10/12 off of that page, which is similar to what I assume you are doing, I'd do something like this:
Set ie = New InternetExplorer
With ie
.navigate "http://eoddata.com/stockquote/NASDAQ/AAPL.htm"
.Visible = False
While .Busy Or .readyState <> READYSTATE_COMPLETE
DoEvents
Wend
Set objHTML = .document
DoEvents
End With
Set elementONE = objHTML.getElementsByTagName("TD")
For i = 1 To elementONE.Length
elementTWO = elementONE.Item(i).innerText
If elementTWO = "08/10/12" Then
MsgBox (elementONE.Item(i + 1).innerText)
Exit For
End If
Next i
DoEvents
ie.Quit
DoEvents
Set ie = Nothing
You can modify this to run through the HTML and pull whatever data you want. Iteration +2 would return the high price, etc.
Since there are a lot of dates on that page you might also want to make it check that it is between the Recent End of Day Prices and the Company profile.