How to scrape address information from Google Maps? - html

I'm trying to create a macro that pulls a list of addresses from Excel and inputs each one into Google Maps.
It then pulls the address line, city/zip and country from Google Maps back into Excel.
It works up to the point where it scrapes the information from Google Maps.
Sub AddressLookup()
Application.ScreenUpdating = False
For i = 1 To Sheet1.Cells(Rows.Count, 1).End(xlUp).Row
Dim IE As InternetExplorer
Dim itemELE As Object
Dim address As String
Dim city As String
Dim country As String
Set IE = New InternetExplorer
IE.Visible = True
IE.navigate "https://www.google.com/maps"
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Search As MSHTML.HTMLDocument
Set Search = IE.document
Search.all.q.Value = Cells(i, 1).Value
Dim ele As MSHTML.IHTMLElement
Dim eles As MSHTML.IHTMLElementCollection
Set eles = Search.getElementsByTagName("button")
For Each ele In eles
If ele.ID = "searchbox-searchbutton" Then
ele.click
Else
End If
Next ele
For Each itemELE In IE.document.getElementsByClassName("widget-pane widget-pane-visible")
address = itemELE.getElementsByClassName("section-hero-header-description")(0).getElementsByTagName("h1")(0).innerText
city = itemELE.getElementsByClassName("section-hero-header-description")(0).getElementsByTagName("h2")(0).innerText
country = itemELE.getElementsByClassName("section-hero-header-description")(0).getElementsByTagName("h2")(1).innerText
Next
Cells(i, 2).Value = Trim(address)
Cells(i, 3).Value = Trim(city)
Cells(i, 4).Value = Trim(country)
MsgBox country
Next
Application.ScreenUpdating = True
End Sub

This answer uses the OpenStreetMap Nominatim API with the VBA-Web WebRequest.
In opposite to scraping withInternet Explorerthis is designed for this purpose (faster, more reliable, more information). This can be done with the Geocode API too, but you need an API-Key and keep track of the cost.
If you use https://nominatim.openstreetmap.org/search respect their Usage Policy, but better have your own installation.
Public Function GeocodeRequestNominatim(ByVal sAddress As String) As Dictionary
Dim Client As New WebClient
Client.BaseUrl = "https://nominatim.openstreetmap.org/"
Dim Request As New WebRequest
Dim Response As WebResponse
Dim address As Dictionary
With Request
.Resource = "search/"
.AddQuerystringParam "q", sAddress
.AddQuerystringParam "format", "json"
.AddQuerystringParam "polygon", "1"
.AddQuerystringParam "addressdetails", "1"
End With
Set Response = Client.Execute(Request)
If Response.StatusCode = WebStatusCode.Ok Then
Set address = Response.Data(1)("address")
Set GeocodeRequestNominatim = address
'Dim Part As Variant
'For Each Part In address.Items
' Debug.Print Part
'Next Part
Else
Debug.Print "Error: " & Response.StatusCode & " - " & Response.Content
End If
End Function
Example (prints country, for other fields have a look at the returned JSON-String in example on Nomination Website):
Debug.Print GeocodeRequestNominatim("united nations headquarters,USA")("country")

The geocoding API is no longer "free" though I actually believe with a billing account set-up you can scrape for free if you remain within a certain threshold. As a new release (maps/APIs has been updated) I think the expectation is that these APIs are used in conjunction with actual maps (but don't quote me on that).
Please note the following:
1) Use a proper wait for page load and after .click
While ie.Busy Or ie.readyState < 4: DoEvents: Wend
2) Use .Navigate2 rather than .Navigate
3) Use ids as faster for selections. They generally are unique so no looping required
4) In this case additional time is needed to allow for url to update and map to zoom etc. I have added a timed loop for this. I show a single example as it is clear you know how to loop.
Option Explicit
Public Sub GetInfo()
Dim ie As New InternetExplorer, arr() As String, address As String, city As String, country As String
Dim addressElement As Object, t As Date, result As String
Const MAX_WAIT_SEC As Long = 10 '<==adjust time here
With ie
.Visible = True
.Navigate2 "https://www.google.com/maps"
While .Busy Or .readyState < 4: DoEvents: Wend
With .document
.querySelector("#searchboxinput").Value = "united nations headquarters,USA"
.querySelector("#searchbox-searchbutton").Click
End With
While .Busy Or .readyState < 4: DoEvents: Wend
t = Timer
Do
DoEvents
On Error Resume Next
Set addressElement = .document.querySelector(".section-info-line span.widget-pane-link")
result = addressElement.innerText
If Timer - t > MAX_WAIT_SEC Then Exit Do
On Error GoTo 0
Loop While addressElement Is Nothing
If InStr(result, ",") > 0 Then
arr = Split(result, ",")
address = arr(0)
city = arr(1)
country = arr(2)
With ActiveSheet
.Cells(1, 2).Value = Trim$(address)
.Cells(1, 3).Value = Trim$(city)
.Cells(1, 4).Value = Trim$(country)
End With
End If
Debug.Print .document.URL
.Quit
End With
End Sub
In terms of selectors -
Commercial addresses:
.section-info-line span.widget-pane-link
And feedback from OP re: residential:
.section-hero-header div.section-hero-header-description

After running your code and inspecting Google's address search result, I was able to retrieve the entire address block 'City, Province Postal_Code' by referencing the span tag inside the section-hero-header-subtitle class.
Without making any other changes to your code, add the following line above your For-Each loop (that loops through widget-pane widget-pane visible class) and step thru the code using F8.
Debug.Print IE.Document.getElementsByClassName("section-hero-header-subtitle")(0).getElementsByTagName("span")(0).innerText

Related

Extract the details from a Table using VBA gets object variable or with block variable not set

My script runs for few row and then i a getting object variable or with block variable not set error.
I am using the below script to extract the 5,6,7 value from the NSEIndia website.
I get the value of a stock from the same Excel and update the same excel with the values from the nseindia website.
Sub Stock_Basic_Update_NSE()
Dim ie As InternetExplorer
Dim webpage As HTMLDocument
Dim ws As Worksheet
For Item = 23 To 1505
Set ws = ThisWorkbook.Worksheets("NSE Stocks Details")
sSearch = ws.Range("A" & Item).Value
'sSearch = Filestk.Worksheets("Sheet1").Range("E1").Value
Set ie = New InternetExplorer
'ie.Visible = True
ie.navigate ("https://www.nseindia.com/get-quotes/equity?symbol=" & sSearch)
Do While ie.readyState = 4: DoEvents: Loop
Do Until ie.readyState = 4: DoEvents: Loop
While ie.Busy
DoEvents
Wend
Set webpage = ie.document
ws.Cells(Item, 3).Value = webpage.getElementsByClassName("eq-series table-fullwidth w-100")(0).getElementsByTagName("td")(5).innerText
ws.Cells(Item, 4).Value = webpage.getElementsByClassName("eq-series table-fullwidth w-100")(0).getElementsByTagName("td")(6).innerText
ws.Cells(Item, 5).Value = webpage.getElementsByClassName("eq-series table-fullwidth w-100")(0).getElementsByTagName("td")(7).innerText
ie.Quit
Set ie = Nothing
Next Item
End Sub
You had some errors in your code and you hadn't wait for the full document to load. Try the following code. I have commented it. So you can see, what I have changed and why. I have tried it with the top 50 symbols.
Sub Stock_Basic_Update_NSE()
'Declare always all variables
Dim ie As Object 'I switched this from early to late binding (not required)
Dim nodeTable As Object
Dim ws As Worksheet
Dim item As Long
Dim sSearch As String
'Use this outside the loop. You only need it once
Set ws = ThisWorkbook.Worksheets("NSE Stocks Details")
For item = 23 To 1505
sSearch = ws.Range("A" & item).Value
Set ie = CreateObject("internetexplorer.application")
ie.Visible = False
'Encode symbols that are restricted for using in URLs. Like &, : or ?
ie.navigate ("https://www.nseindia.com/get-quotes/equity?symbol=" & WorksheetFunction.EncodeURL(sSearch))
'It's not "While = 4" because 4 stands for "readystate = complete"
'If you want to use "= 4" you must use "Until" instead of "While"
'It doesn't matter what you use
Do While ie.readyState <> 4: DoEvents: Loop
'Manual break to load dynamic content after the IE reports the page load was complete
'This was your main problem
Application.Wait (Now + TimeSerial(0, 0, 2))
'The needed html table has an ID. If possible use always that instead of class names
'because an html ID is unique if the standard is kept
'Also use a variable to save the elements
'So you don't need to shorten the html document string in most cases because
'it's only needed one time
Set nodeTable = ie.document.getElementByID("equityInfo")
ws.Cells(item, 3).Value = nodeTable.getElementsByTagName("td")(5).innerText
ws.Cells(item, 4).Value = nodeTable.getElementsByTagName("td")(6).innerText
ws.Cells(item, 5).Value = nodeTable.getElementsByTagName("td")(7).innerText
'Clean up
ie.Quit
Set ie = Nothing
Next item
End Sub

Can we fetch the specific data via using urls in vba

I have 15 different URLs, and I need to fetch price from the particular website in Excel a particular column, can you please help me out. It's my first VBA program and I try but it show my syntax error.
Sub myfile()
Dim IE As New InternetExplorer Dim url As String Dim item As
HTMLHtmlElement Dim Doc As HTMLDocument Dim tagElements As Object
Dim element As Object Dim lastRow Application.ScreenUpdating =
False Application.DisplayAlerts = False Application.EnableEvents =
False Application.Calculation = xlCalculationManual url =
"https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
IE.navigate url IE.Visible = True Do DoEvents Loop Until
IE.readyState = READYSTATE_COMPLETE
Set Doc = IE.document
lastRow = Sheet1.UsedRange.Rows.Count + 1 Set tagElements =
Doc.all.tags("tr") For Each element In tagElements
If InStr(element.innerText, "ciq-price")> 0 And
InStr(element.className, "ciq-product-name") > 0 Then
Sheet1.Cells(lastRow, 1).Value = element.innerText
' Exit the for loop once you get the temperature to avoid unnecessary processing
Exit For End If Next
IE.Quit Set IE = Nothing Application.ScreenUpdating = True
Application.DisplayAlerts = True Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
End Sub
You can't copy any web scraping macro for your purposes. Every page has it's own HTML code structure. So you must write for every page an own web scraping macro.
I can't explain all about web scraping with VBA here. Please start your recherche for information with "excel vba web scraping" and "document object model". Further you need knowlege about HTML and CSS. In best case also about JavaScript:
The error message user-defined type not defined ocours because you use early binding without a reference to the libraries Microsoft HTML Object Library and Microsoft Internet Controls. You can read here how to set a reference via Tools -> References... and about the differences between early and late binding Early Binding v/s Late Binding and here deeper information from Microsoft Using early binding and late binding in Automation
To get the prices from the shown url you can use the following macro. I use late binding:
Option Explicit
Sub myfile()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim element As Object
Dim item As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
Set tagElements = IE.document.getElementsByClassName("ciq-online-offer-item ")
For Each element In tagElements
Set item = element.getElementsByTagName("td")(1)
ActiveSheet.Cells(lastRow, 1).Value = Trim(item.innerText)
lastRow = lastRow + 1
Next
IE.Quit
Set IE = Nothing
End Sub
Edit for a second Example:
The new link leads to an offer. I assume the price of the product is to be fetched. No loop is needed for this. You just have to find out in which HTML segment the price is and then you can decide how to get it. In the end there are only two lines of VBA that write the price into the Excel spreadsheet.
I'm in Germany and Excel has automatically set the currency sign from Dollar to Euro. This is of course wrong. Depending on where you are, this may have to be intercepted.
Sub myfile2()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://www.wayfair.com/kitchen-tabletop/pdx/cuisinart-air-fryer-toaster-oven-cui3490.html"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
'Break for 3 seconds
Application.Wait (Now + TimeSerial(0, 0, 3))
Set tagElements = IE.document.getElementsByClassName("BasePriceBlock BasePriceBlock--highlight")(0)
ActiveSheet.Cells(lastRow, 1).Value = Trim(tagElements.innerText)
IE.Quit
Set IE = Nothing
End Sub

Need Excel VBA to navigate website and download specific files

Trying to understand how to interact with a website in a specific way. This is part of a larger code I'm working on that will loop through a list of ContractorIDs. What I need to do from here is the following:
Navigate to this website: https://ufr.osd.state.ma.us/WebAccess/SearchDetails.asp?ContractorID=042786217&FilingYear=2018&nOrgPage=7&Year=2018
Find the link that says "UFR Filing with Audited Financials" and click on it. (if it's not there, end the sub)
On the ensuing page, find the link that is identified under "Document Category" as "UFR Excel Template" and click on it. (in this case, the link says "15-UFR18.xls", however since there's no consistent link naming scheme, the correct link will always have to be identified by the label under "Document Category" as mentioned. If the link doesn't exist, exit sub.)
On the ensuing page, click the "Download" link at the top and save the file under the following file path (which would be created at this time): C:\Documents\042786217\2018.
Edit: Code below gets me to the point where the download button is clicked, then I get the Open/Save/Cancel dialog box. Nearly there, just need to figure out how to save the file into a specific path.
Option Explicit
Sub UFRScraper()
If MsgBox("UFR Scraper will run now. Do you wish to continue?", vbYesNo) = vbNo Then Exit Sub
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
Dim ele As Object
Dim tbl_Providers As ListObject: Set tbl_Providers = ThisWorkbook.Worksheets("tbl_ProviderList").ListObjects("tbl_Providers")
Dim FEIN As String: FEIN = ""
Dim FEINList As Range: Set FEINList = tbl_Providers.ListColumns("FEIN").DataBodyRange
Dim ProviderName As String: ProviderName = ""
Dim ProviderNames As Range: Set ProviderNames = tbl_Providers.ListColumns("Provider Name").DataBodyRange
Dim FiscalYear As String: FiscalYear = ""
Dim urlUFRDetails As String: urlUFRDetails = ""
Dim i As Integer
' Create InternetExplorer Object
Set IE = CreateObject("InternetExplorer.Application")
' Show (True)/Hide (False) IE
IE.Visible = True
i = 1
For i = 1 To 3 'Limited to 3 during testing. Change when ready.
FEIN = FEINList(i, 1)
ProviderName = ProviderNames(i, 1)
urlUFRDetails = "https://ufr.osd.state.ma.us/WebAccess/SearchDetails.asp?ContractorID=" & FEIN & "&FilingYear=2018&nOrgPage=1&Year=2018"
IE.Navigate urlUFRDetails
' Wait while IE loading...
'IE ReadyState = 4 signifies the webpage has loaded (the first loop is set to avoid inadvertently skipping over the second loop)
Do While IE.ReadyState = 4: DoEvents: Loop 'Do While
Do Until IE.ReadyState = 4: DoEvents: Loop 'Do Until
'Step 2 is done here
Dim filingFound As Boolean: filingFound = False
For Each ele In IE.Document.getElementsByTagName("a")
If ele.innerText = "UFR Filing with Audited Financials" Then
filingFound = True
IE.Navigate ele.href
Do While IE.ReadyState = 4: DoEvents: Loop 'Do While
Do Until IE.ReadyState = 4: DoEvents: Loop 'Do Until
Exit For
End If
Next ele
If filingFound = False Then
GoTo Skip
End If
'Step 3
Dim j As Integer: j = 0
Dim UFRFileFound As Boolean: UFRFileFound = False
For Each ele In IE.Document.getElementsByTagName("li")
j = j + 1
If ele.innerText = "UFR Excel Template" Then
UFRFileFound = True
IE.Navigate "https://ufr.osd.state.ma.us/WebAccess/documentviewact.asp?counter=" & j - 4
Do While IE.ReadyState = 4: DoEvents: Loop 'Do While
Do Until IE.ReadyState = 4: DoEvents: Loop 'Do Until
Exit For
End If
Next ele
If UFRFileFound = False Then
GoTo Skip
End If
'Step 4
IE.Document.getElementById("LinkButton2").Click
'**Built in wait time to avoid accidentally overloading server with repeated quick requests during development and testing**
Skip:
Application.Wait (Now + TimeValue("0:00:03"))
MsgBox "Loop " & i & " complete."
Next i
'Unload IE
IE.Quit
Set IE = Nothing
Set objElement = Nothing
Set objCollection = Nothing
MsgBox "Process complete!"
End Sub
I have tried step 3 with some what lengthy way. but could not provide complete download code as (after one successful manual attempt) at present even manual download attempt causing massage "The File Could Not Be Retrieved" (maybe server side constrain)
Code only take you down to the cell containing href of the xlx file
Dim doc As HTMLDocument
Dim Tbl As HTMLTable, Cel As HTMLTableCell, Rw As HTMLTableRow, Col As HTMLTableCol
Set doc = IE.document
For Each ele In IE.document.getElementsByClassName("boxedContent")
For Each Tbl In ele.getElementsByTagName("table")
For Each Rw In Tbl.Rows
For Each Cel In Rw.Cells
'Debug.Print Cel.innerText
If InStr(1, Cel.innerText, "UFR Excel Template") > 0 Then
Debug.Print Rw.Cells(2).innerText & " - " & Rw.Cells(2).innerHTML
End If
Next
Next Rw
Next Tbl
Next
Once the href is available PtrSafe Function or WinHTTPrequest or other methods could be used to download the file. Welcome and eager to learn some more efficient answers in this case from experts like #QHarr and others.

VBA Web Scraping using getElementsByClassName to names and addresses

I'm trying to extract the clinic name and corresponding address for all the clinics from the following web page: https://medimap.ca/Location/Calgary,%20AB,%20Canada
I'm having issues locating the exact area where I should be drilling down into. All the clinic names have the same class name of "_1FLG5" and the addresses are all "_1-Gov" . However, when I run through the below code nothing happens - no errors just nothing.
I'm also unsure if the reference after .getElementsByClassName is correct, as I want the inner text from the same row as where the "_1FLG5" is I referenced (0) and since I wanted the text from two rows below "_1-Gov" I referenced (2).
Option Explicit
Sub GetClinicData()
Dim objIE As InternetExplorer
Dim clinicEle As Object
Dim clinicAdd As Object
Dim clinicName As String
Dim address As String
Dim y As Integer
Dim x As Integer
Set objIE = New InternetExplorer
objIE.Visible = False
objIE.navigate "https://medimap.ca/Location/Calgary,%20AB,%20Canada"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
y = 1
For Each clinicEle In objIE.document.getElementsByClassName("_1FLG5")
clinicName = clinicEle.getElementsByClassName("_1FLG5")(0).innerText
Sheets("Sheet1").Range("A" & y).Value = clinicName
y = y + 1
Next
x = 1
For Each clinicAdd In objIE.document.getElementsByClassName("_1-Gov")
clinicAdd = clinicAdd.getElementsByClassName("_1-Gov")(2).innerText
Sheets("Sheet1").Range("B" & x).Value = clinicAdd
x = x + 1
Next
End Sub
Content is dynamically loaded so you need a wait condition to ensure content loaded - otherwise your collections end up being of length 0. I use querySelectorAll to apply the class names which return nodeList you For Loop over the .Length of. Ideally you should add a timeout condition to the loop. I show a timed loop here.
Option Explicit
'VBE > Tools > References: Microsoft Internet Controls
Public Sub GetData()
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.Navigate2 "https://medimap.ca/Location/Calgary,%20AB,%20Canada"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim clinics As Object, addresses As Object, i As Long
With .document
Do
Set clinics = .querySelectorAll("._1FLG5")
Set addresses = .querySelectorAll("._1-Gov")
Loop While clinics.Length = 0
For i = 0 To clinics.Length - 1
With ThisWorkbook.Worksheets("Sheet1")
.Cells(i + 1, 1) = Trim$(clinics.item(i).innerText)
.Cells(i + 1, 2) = Trim$(addresses.item(i).innerText)
End With
Next
End With
.Quit
End With
End Sub

How do I pull value from external website by Element ID with Excel / VBA?

I'm trying to retrieve values from external websites by element ID using VBA and add them to my excel table. The website URL's are indicated in column A. Column B and C are for my retrieved values.
URL example
Element ID name: "youtube-user-page-country"
Excel Pic
Bellow is my poor attempt:
Sub getCountry()
Dim IE As New InternetExplorer
IE.Visible = False
IE.navigate Worksheets("Sheet1").Range(A3).Value
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Dim getCountry As String
getCountry = Trim(Doc.getElementsByTagName("youtube-user-page-country").innerText)
Worksheets("Sheet1").Range(B31).Value = getCountry
End Sub
The code isn't working showing problems with object definition.
Could anyone give me tips on where I'm going wrong?
I've been a macro recorder user and the switch has quite a steep learning curve :-)
Thanks for any help !
I think I get what you are after. There were a few issues:
You want to use getElementByID.
Naming a string getCountry and the SubRoutine getCountry containing it is not a good idea. You can do it, but don't.
Always fully qualify your sheet references so you know what workbook and sheet you are working with
Here's the revised code, I have it working on my end.
Sub getCountry()
Dim IE As Object: Set IE = CreateObject("InternetExplorer.Application")
Dim ws As Worksheet: Set ws = ThisWorkbook.Sheets("Sheet1")
Dim Country As String
With IE
.Visible = False
.navigate ws.Range("A3").Value
Do
DoEvents
Loop Until .readyState = 4
End With
Country = Trim$(IE.document.getElementByID("youtube-user-page-country").innerText)
ws.Range("B31").Value2 = Country
IE.Quit
End Sub
You can use this to dump the data to your spreadsheet.
Sub DumpData()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "http://finance.yahoo.com/q?s=sbux&ql=1"
'Wait for site to fully load
IE.Navigate2 URL
Do While IE.Busy = True
DoEvents
Loop
RowCount = 1
With Sheets("Sheet1")
.Cells.ClearContents
RowCount = 1
For Each itm In IE.document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.ID
.Range("C" & RowCount) = itm.classname
.Range("D" & RowCount) = Left(itm.innertext, 1024)
RowCount = RowCount + 1
Next itm
End With
End Sub
Thanks Joel!!!