Need Excel VBA to navigate website and download specific files - html

Trying to understand how to interact with a website in a specific way. This is part of a larger code I'm working on that will loop through a list of ContractorIDs. What I need to do from here is the following:
Navigate to this website: https://ufr.osd.state.ma.us/WebAccess/SearchDetails.asp?ContractorID=042786217&FilingYear=2018&nOrgPage=7&Year=2018
Find the link that says "UFR Filing with Audited Financials" and click on it. (if it's not there, end the sub)
On the ensuing page, find the link that is identified under "Document Category" as "UFR Excel Template" and click on it. (in this case, the link says "15-UFR18.xls", however since there's no consistent link naming scheme, the correct link will always have to be identified by the label under "Document Category" as mentioned. If the link doesn't exist, exit sub.)
On the ensuing page, click the "Download" link at the top and save the file under the following file path (which would be created at this time): C:\Documents\042786217\2018.
Edit: Code below gets me to the point where the download button is clicked, then I get the Open/Save/Cancel dialog box. Nearly there, just need to figure out how to save the file into a specific path.
Option Explicit
Sub UFRScraper()
If MsgBox("UFR Scraper will run now. Do you wish to continue?", vbYesNo) = vbNo Then Exit Sub
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
Dim ele As Object
Dim tbl_Providers As ListObject: Set tbl_Providers = ThisWorkbook.Worksheets("tbl_ProviderList").ListObjects("tbl_Providers")
Dim FEIN As String: FEIN = ""
Dim FEINList As Range: Set FEINList = tbl_Providers.ListColumns("FEIN").DataBodyRange
Dim ProviderName As String: ProviderName = ""
Dim ProviderNames As Range: Set ProviderNames = tbl_Providers.ListColumns("Provider Name").DataBodyRange
Dim FiscalYear As String: FiscalYear = ""
Dim urlUFRDetails As String: urlUFRDetails = ""
Dim i As Integer
' Create InternetExplorer Object
Set IE = CreateObject("InternetExplorer.Application")
' Show (True)/Hide (False) IE
IE.Visible = True
i = 1
For i = 1 To 3 'Limited to 3 during testing. Change when ready.
FEIN = FEINList(i, 1)
ProviderName = ProviderNames(i, 1)
urlUFRDetails = "https://ufr.osd.state.ma.us/WebAccess/SearchDetails.asp?ContractorID=" & FEIN & "&FilingYear=2018&nOrgPage=1&Year=2018"
IE.Navigate urlUFRDetails
' Wait while IE loading...
'IE ReadyState = 4 signifies the webpage has loaded (the first loop is set to avoid inadvertently skipping over the second loop)
Do While IE.ReadyState = 4: DoEvents: Loop 'Do While
Do Until IE.ReadyState = 4: DoEvents: Loop 'Do Until
'Step 2 is done here
Dim filingFound As Boolean: filingFound = False
For Each ele In IE.Document.getElementsByTagName("a")
If ele.innerText = "UFR Filing with Audited Financials" Then
filingFound = True
IE.Navigate ele.href
Do While IE.ReadyState = 4: DoEvents: Loop 'Do While
Do Until IE.ReadyState = 4: DoEvents: Loop 'Do Until
Exit For
End If
Next ele
If filingFound = False Then
GoTo Skip
End If
'Step 3
Dim j As Integer: j = 0
Dim UFRFileFound As Boolean: UFRFileFound = False
For Each ele In IE.Document.getElementsByTagName("li")
j = j + 1
If ele.innerText = "UFR Excel Template" Then
UFRFileFound = True
IE.Navigate "https://ufr.osd.state.ma.us/WebAccess/documentviewact.asp?counter=" & j - 4
Do While IE.ReadyState = 4: DoEvents: Loop 'Do While
Do Until IE.ReadyState = 4: DoEvents: Loop 'Do Until
Exit For
End If
Next ele
If UFRFileFound = False Then
GoTo Skip
End If
'Step 4
IE.Document.getElementById("LinkButton2").Click
'**Built in wait time to avoid accidentally overloading server with repeated quick requests during development and testing**
Skip:
Application.Wait (Now + TimeValue("0:00:03"))
MsgBox "Loop " & i & " complete."
Next i
'Unload IE
IE.Quit
Set IE = Nothing
Set objElement = Nothing
Set objCollection = Nothing
MsgBox "Process complete!"
End Sub

I have tried step 3 with some what lengthy way. but could not provide complete download code as (after one successful manual attempt) at present even manual download attempt causing massage "The File Could Not Be Retrieved" (maybe server side constrain)
Code only take you down to the cell containing href of the xlx file
Dim doc As HTMLDocument
Dim Tbl As HTMLTable, Cel As HTMLTableCell, Rw As HTMLTableRow, Col As HTMLTableCol
Set doc = IE.document
For Each ele In IE.document.getElementsByClassName("boxedContent")
For Each Tbl In ele.getElementsByTagName("table")
For Each Rw In Tbl.Rows
For Each Cel In Rw.Cells
'Debug.Print Cel.innerText
If InStr(1, Cel.innerText, "UFR Excel Template") > 0 Then
Debug.Print Rw.Cells(2).innerText & " - " & Rw.Cells(2).innerHTML
End If
Next
Next Rw
Next Tbl
Next
Once the href is available PtrSafe Function or WinHTTPrequest or other methods could be used to download the file. Welcome and eager to learn some more efficient answers in this case from experts like #QHarr and others.

Related

Extract the details from a Table using VBA gets object variable or with block variable not set

My script runs for few row and then i a getting object variable or with block variable not set error.
I am using the below script to extract the 5,6,7 value from the NSEIndia website.
I get the value of a stock from the same Excel and update the same excel with the values from the nseindia website.
Sub Stock_Basic_Update_NSE()
Dim ie As InternetExplorer
Dim webpage As HTMLDocument
Dim ws As Worksheet
For Item = 23 To 1505
Set ws = ThisWorkbook.Worksheets("NSE Stocks Details")
sSearch = ws.Range("A" & Item).Value
'sSearch = Filestk.Worksheets("Sheet1").Range("E1").Value
Set ie = New InternetExplorer
'ie.Visible = True
ie.navigate ("https://www.nseindia.com/get-quotes/equity?symbol=" & sSearch)
Do While ie.readyState = 4: DoEvents: Loop
Do Until ie.readyState = 4: DoEvents: Loop
While ie.Busy
DoEvents
Wend
Set webpage = ie.document
ws.Cells(Item, 3).Value = webpage.getElementsByClassName("eq-series table-fullwidth w-100")(0).getElementsByTagName("td")(5).innerText
ws.Cells(Item, 4).Value = webpage.getElementsByClassName("eq-series table-fullwidth w-100")(0).getElementsByTagName("td")(6).innerText
ws.Cells(Item, 5).Value = webpage.getElementsByClassName("eq-series table-fullwidth w-100")(0).getElementsByTagName("td")(7).innerText
ie.Quit
Set ie = Nothing
Next Item
End Sub
You had some errors in your code and you hadn't wait for the full document to load. Try the following code. I have commented it. So you can see, what I have changed and why. I have tried it with the top 50 symbols.
Sub Stock_Basic_Update_NSE()
'Declare always all variables
Dim ie As Object 'I switched this from early to late binding (not required)
Dim nodeTable As Object
Dim ws As Worksheet
Dim item As Long
Dim sSearch As String
'Use this outside the loop. You only need it once
Set ws = ThisWorkbook.Worksheets("NSE Stocks Details")
For item = 23 To 1505
sSearch = ws.Range("A" & item).Value
Set ie = CreateObject("internetexplorer.application")
ie.Visible = False
'Encode symbols that are restricted for using in URLs. Like &, : or ?
ie.navigate ("https://www.nseindia.com/get-quotes/equity?symbol=" & WorksheetFunction.EncodeURL(sSearch))
'It's not "While = 4" because 4 stands for "readystate = complete"
'If you want to use "= 4" you must use "Until" instead of "While"
'It doesn't matter what you use
Do While ie.readyState <> 4: DoEvents: Loop
'Manual break to load dynamic content after the IE reports the page load was complete
'This was your main problem
Application.Wait (Now + TimeSerial(0, 0, 2))
'The needed html table has an ID. If possible use always that instead of class names
'because an html ID is unique if the standard is kept
'Also use a variable to save the elements
'So you don't need to shorten the html document string in most cases because
'it's only needed one time
Set nodeTable = ie.document.getElementByID("equityInfo")
ws.Cells(item, 3).Value = nodeTable.getElementsByTagName("td")(5).innerText
ws.Cells(item, 4).Value = nodeTable.getElementsByTagName("td")(6).innerText
ws.Cells(item, 5).Value = nodeTable.getElementsByTagName("td")(7).innerText
'Clean up
ie.Quit
Set ie = Nothing
Next item
End Sub

Can we fetch the specific data via using urls in vba

I have 15 different URLs, and I need to fetch price from the particular website in Excel a particular column, can you please help me out. It's my first VBA program and I try but it show my syntax error.
Sub myfile()
Dim IE As New InternetExplorer Dim url As String Dim item As
HTMLHtmlElement Dim Doc As HTMLDocument Dim tagElements As Object
Dim element As Object Dim lastRow Application.ScreenUpdating =
False Application.DisplayAlerts = False Application.EnableEvents =
False Application.Calculation = xlCalculationManual url =
"https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
IE.navigate url IE.Visible = True Do DoEvents Loop Until
IE.readyState = READYSTATE_COMPLETE
Set Doc = IE.document
lastRow = Sheet1.UsedRange.Rows.Count + 1 Set tagElements =
Doc.all.tags("tr") For Each element In tagElements
If InStr(element.innerText, "ciq-price")> 0 And
InStr(element.className, "ciq-product-name") > 0 Then
Sheet1.Cells(lastRow, 1).Value = element.innerText
' Exit the for loop once you get the temperature to avoid unnecessary processing
Exit For End If Next
IE.Quit Set IE = Nothing Application.ScreenUpdating = True
Application.DisplayAlerts = True Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
End Sub
You can't copy any web scraping macro for your purposes. Every page has it's own HTML code structure. So you must write for every page an own web scraping macro.
I can't explain all about web scraping with VBA here. Please start your recherche for information with "excel vba web scraping" and "document object model". Further you need knowlege about HTML and CSS. In best case also about JavaScript:
The error message user-defined type not defined ocours because you use early binding without a reference to the libraries Microsoft HTML Object Library and Microsoft Internet Controls. You can read here how to set a reference via Tools -> References... and about the differences between early and late binding Early Binding v/s Late Binding and here deeper information from Microsoft Using early binding and late binding in Automation
To get the prices from the shown url you can use the following macro. I use late binding:
Option Explicit
Sub myfile()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim element As Object
Dim item As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
Set tagElements = IE.document.getElementsByClassName("ciq-online-offer-item ")
For Each element In tagElements
Set item = element.getElementsByTagName("td")(1)
ActiveSheet.Cells(lastRow, 1).Value = Trim(item.innerText)
lastRow = lastRow + 1
Next
IE.Quit
Set IE = Nothing
End Sub
Edit for a second Example:
The new link leads to an offer. I assume the price of the product is to be fetched. No loop is needed for this. You just have to find out in which HTML segment the price is and then you can decide how to get it. In the end there are only two lines of VBA that write the price into the Excel spreadsheet.
I'm in Germany and Excel has automatically set the currency sign from Dollar to Euro. This is of course wrong. Depending on where you are, this may have to be intercepted.
Sub myfile2()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://www.wayfair.com/kitchen-tabletop/pdx/cuisinart-air-fryer-toaster-oven-cui3490.html"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
'Break for 3 seconds
Application.Wait (Now + TimeSerial(0, 0, 3))
Set tagElements = IE.document.getElementsByClassName("BasePriceBlock BasePriceBlock--highlight")(0)
ActiveSheet.Cells(lastRow, 1).Value = Trim(tagElements.innerText)
IE.Quit
Set IE = Nothing
End Sub

How to scrape address information from Google Maps?

I'm trying to create a macro that pulls a list of addresses from Excel and inputs each one into Google Maps.
It then pulls the address line, city/zip and country from Google Maps back into Excel.
It works up to the point where it scrapes the information from Google Maps.
Sub AddressLookup()
Application.ScreenUpdating = False
For i = 1 To Sheet1.Cells(Rows.Count, 1).End(xlUp).Row
Dim IE As InternetExplorer
Dim itemELE As Object
Dim address As String
Dim city As String
Dim country As String
Set IE = New InternetExplorer
IE.Visible = True
IE.navigate "https://www.google.com/maps"
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Search As MSHTML.HTMLDocument
Set Search = IE.document
Search.all.q.Value = Cells(i, 1).Value
Dim ele As MSHTML.IHTMLElement
Dim eles As MSHTML.IHTMLElementCollection
Set eles = Search.getElementsByTagName("button")
For Each ele In eles
If ele.ID = "searchbox-searchbutton" Then
ele.click
Else
End If
Next ele
For Each itemELE In IE.document.getElementsByClassName("widget-pane widget-pane-visible")
address = itemELE.getElementsByClassName("section-hero-header-description")(0).getElementsByTagName("h1")(0).innerText
city = itemELE.getElementsByClassName("section-hero-header-description")(0).getElementsByTagName("h2")(0).innerText
country = itemELE.getElementsByClassName("section-hero-header-description")(0).getElementsByTagName("h2")(1).innerText
Next
Cells(i, 2).Value = Trim(address)
Cells(i, 3).Value = Trim(city)
Cells(i, 4).Value = Trim(country)
MsgBox country
Next
Application.ScreenUpdating = True
End Sub
This answer uses the OpenStreetMap Nominatim API with the VBA-Web WebRequest.
In opposite to scraping withInternet Explorerthis is designed for this purpose (faster, more reliable, more information). This can be done with the Geocode API too, but you need an API-Key and keep track of the cost.
If you use https://nominatim.openstreetmap.org/search respect their Usage Policy, but better have your own installation.
Public Function GeocodeRequestNominatim(ByVal sAddress As String) As Dictionary
Dim Client As New WebClient
Client.BaseUrl = "https://nominatim.openstreetmap.org/"
Dim Request As New WebRequest
Dim Response As WebResponse
Dim address As Dictionary
With Request
.Resource = "search/"
.AddQuerystringParam "q", sAddress
.AddQuerystringParam "format", "json"
.AddQuerystringParam "polygon", "1"
.AddQuerystringParam "addressdetails", "1"
End With
Set Response = Client.Execute(Request)
If Response.StatusCode = WebStatusCode.Ok Then
Set address = Response.Data(1)("address")
Set GeocodeRequestNominatim = address
'Dim Part As Variant
'For Each Part In address.Items
' Debug.Print Part
'Next Part
Else
Debug.Print "Error: " & Response.StatusCode & " - " & Response.Content
End If
End Function
Example (prints country, for other fields have a look at the returned JSON-String in example on Nomination Website):
Debug.Print GeocodeRequestNominatim("united nations headquarters,USA")("country")
The geocoding API is no longer "free" though I actually believe with a billing account set-up you can scrape for free if you remain within a certain threshold. As a new release (maps/APIs has been updated) I think the expectation is that these APIs are used in conjunction with actual maps (but don't quote me on that).
Please note the following:
1) Use a proper wait for page load and after .click
While ie.Busy Or ie.readyState < 4: DoEvents: Wend
2) Use .Navigate2 rather than .Navigate
3) Use ids as faster for selections. They generally are unique so no looping required
4) In this case additional time is needed to allow for url to update and map to zoom etc. I have added a timed loop for this. I show a single example as it is clear you know how to loop.
Option Explicit
Public Sub GetInfo()
Dim ie As New InternetExplorer, arr() As String, address As String, city As String, country As String
Dim addressElement As Object, t As Date, result As String
Const MAX_WAIT_SEC As Long = 10 '<==adjust time here
With ie
.Visible = True
.Navigate2 "https://www.google.com/maps"
While .Busy Or .readyState < 4: DoEvents: Wend
With .document
.querySelector("#searchboxinput").Value = "united nations headquarters,USA"
.querySelector("#searchbox-searchbutton").Click
End With
While .Busy Or .readyState < 4: DoEvents: Wend
t = Timer
Do
DoEvents
On Error Resume Next
Set addressElement = .document.querySelector(".section-info-line span.widget-pane-link")
result = addressElement.innerText
If Timer - t > MAX_WAIT_SEC Then Exit Do
On Error GoTo 0
Loop While addressElement Is Nothing
If InStr(result, ",") > 0 Then
arr = Split(result, ",")
address = arr(0)
city = arr(1)
country = arr(2)
With ActiveSheet
.Cells(1, 2).Value = Trim$(address)
.Cells(1, 3).Value = Trim$(city)
.Cells(1, 4).Value = Trim$(country)
End With
End If
Debug.Print .document.URL
.Quit
End With
End Sub
In terms of selectors -
Commercial addresses:
.section-info-line span.widget-pane-link
And feedback from OP re: residential:
.section-hero-header div.section-hero-header-description
After running your code and inspecting Google's address search result, I was able to retrieve the entire address block 'City, Province Postal_Code' by referencing the span tag inside the section-hero-header-subtitle class.
Without making any other changes to your code, add the following line above your For-Each loop (that loops through widget-pane widget-pane visible class) and step thru the code using F8.
Debug.Print IE.Document.getElementsByClassName("section-hero-header-subtitle")(0).getElementsByTagName("span")(0).innerText

VBA reads HTML from the old page after clicking submit button

I am not a programmer but I have managed to learn just a few things in VBA but now on a certain website I face a problem that does not exist on some other.
What should happen is that a page form should be completed with data, submit button clicked and then I want to get some data from the result page.
The first phase works fine but it seems that no matter what I do the VBA still reads data from the page before submit was clicked.
The code is:
Sub VIES2()
'Uruchomienie Internet Explorera i wstrzymanie dalszej akcji aż uzyska stan gotowości
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate "http://ec.europa.eu/taxation_customs/vies/?locale=pl"
Do While IE.ReadyState <> 4: DoEvents: Loop
'Wypełnienie formularza odpowiednimi wartościami i kliknięcie przycisku sprawdzenia
IE.document.getElementbyId("countryCombobox").Value = "IT"
IE.document.getElementbyId("number").Value = "01802840023"
IE.document.getElementbyId("requesterCountryCombobox").Value = "IT"
IE.document.getElementbyId("requesterNumber").Value = "01802840023"
IE.document.getElementbyId("submit").Click
'Test uzyskiwania opisu i identyfikatora zapytania
For t = 1 To 999999
Next t
Application.Wait Now + TimeValue("00:00:10")
Do While IE.ReadyState <> 4: DoEvents: Loop
For t = 1 To 999999
Next t
Application.Wait Now + TimeValue("00:00:10")
MsgBox IE.LocationURL
Set Text = IE.document.getElementsbyClassName("layout-content")
For Each Element In Text
MsgBox Element.innerText
Next
Set Test = IE.document.getElementsbyTagName("TABLE")
For Each Element In Test
MsgBox Element.innerText
Next
End Sub
I have tried putting break, various wait loops and Application.Wait as suggested in similar questions where it seems to have worked. Here, even after the page is long after fully loaded the code still reads the old page - at least pulling the URL and some data seems to point that it is the case.
UPDATE: I should also add that I have tried to make the macro refresh the page but it clears the input content. What is interesting that target URL is:
http://ec.europa.eu/taxation_customs/vies/vatResponse.html
If I change the initial page to this the browser instantly redirects to the original page with notification that initial data is needed. The macro then completes the data and clicks submit button. In this case IE.LocationURL indicates this URL:
http://ec.europa.eu/taxation_customs/vies/vatResponse.html
but according to the content I get with getElementsbyClassName still reads elements from the initial page:
http://ec.europa.eu/taxation_customs/vies/?locale=pl
This worked to print out the VAT response table
Note:
If on 32-bit remove the PtrSafe.
Code:
Option Explicit
Declare PtrSafe Sub sleep Lib "kernel32" Alias "Sleep" (ByVal dwmilliseconds As Long)
Public Sub VIES2()
Application.ScreenUpdating = False
Dim IE As Object
'Uruchomienie Internet Explorera i wstrzymanie dalszej akcji az uzyska stan gotowosci
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.navigate "http://ec.europa.eu/taxation_customs/vies/?locale=pl"
Do While IE.ReadyState <> 4: DoEvents: Loop
'Wypelnienie formularza odpowiednimi wartosciami i klikniecie przycisku sprawdzenia
IE.document.getElementById("countryCombobox").Value = "IT"
IE.document.getElementById("number").Value = "01802840023"
IE.document.getElementById("requesterCountryCombobox").Value = "IT"
IE.document.getElementById("requesterNumber").Value = "01802840023"
IE.document.getElementById("submit").Click
sleep (5000) 'or increase to 10000
Dim tbl As Object
Set tbl = IE.document.getElementById("vatResponseFormTable")
Dim ws As Worksheet
Set ws = ActiveWorkbook.Worksheets.Add
ws.Name = "Results"
Dim rng As Range, currentRow As Object, currentColumn As Object, i As Long, outputRow As Long
outputRow = outputRow + 1
Set rng = ws.Range("B" & outputRow)
For Each currentRow In tbl.Rows
For Each currentColumn In currentRow.Cells
rng.Value = currentColumn.outerText
Set rng = rng.Offset(, 1)
i = i + 1
Next currentColumn
outputRow = outputRow + 1
Set rng = rng.Offset(1, -i)
i = 0
Next currentRow
Application.ScreenUpdating = True
End Sub
Output:
Although QHarr's solution is working in my end, I'm providing with another with no hardcoded delay within the script.
Using IE as your question was:
Sub Get_Data()
Dim HTML As HTMLDocument, post As Object, elems As Object
Dim elem As Object, r&, c&
With New InternetExplorer
.Visible = False
.navigate "http://ec.europa.eu/taxation_customs/vies/?locale=pl"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
With HTML
.getElementById("countryCombobox").Value = "IT"
.getElementById("number").Value = "01802840023"
.getElementById("requesterCountryCombobox").Value = "IT"
.getElementById("requesterNumber").Value = "01802840023"
.getElementById("submit").Click
Do: Set post = .getElementById("vatResponseFormTable"): DoEvents: Loop While post Is Nothing
For Each elems In post.Rows
For Each elem In elems.Cells
c = c + 1: Cells(r + 1, c) = elem.innerText
Next elem
c = 0: r = r + 1
Next elems
End With
.Quit
End With
End Sub
Reference to add to the library:
1. Microsoft Internet Controls
2. Microsoft HTML Object Library
Using xmlhttp request (It is way faster than IE):
Sub Get_Data()
Dim elems, elem As Object
Dim QueryString$, S$, r&, c&
QueryString = "memberStateCode=IT&number=01802840023&traderName=&traderStreet=&traderPostalCode=&traderCity=&requesterMemberStateCode=IT&requesterNumber=01802840023&action=check&check=Weryfikuj"
With New XMLHTTP
.Open "POST", "http://ec.europa.eu/taxation_customs/vies/vatResponse.html", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
.send QueryString
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
For Each elems In .getElementById("vatResponseFormTable").Rows
For Each elem In elems.Cells
c = c + 1: Cells(r + 1, c) = elem.innerText
Next elem
c = 0: r = r + 1
Next elems
End With
End Sub
Reference to add to the library:
1. Microsoft XML, V6
2. Microsoft HTML Object Library
Most of the time you should search if there isn't a REST/SOAP available to achieve that kind of task.
Using an Internet Explorer instance for this is a total overkill.
Try this simple function, that uses the SOAP service to validate VAT numbers:
Function IsVatValid(country_code, vat_number)
Dim objHTTP As Object
Dim xmlDoc As Object
Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")
sURL = "http://ec.europa.eu/taxation_customs/vies/services/checkVatService"
sEnv = "<s11:Envelope xmlns:s11='http://schemas.xmlsoap.org/soap/envelope/'>" & _
"<s11:Body>" & _
"<tns1:checkVat xmlns:tns1='urn:ec.europa.eu:taxud:vies:services:checkVat:types'>" & _
"<tns1:countryCode>" & country_code & "</tns1:countryCode>" & _
"<tns1:vatNumber>" & vat_number & "</tns1:vatNumber>" & _
"</tns1:checkVat>" & _
"</s11:Body>" & _
"</s11:Envelope>"
objHTTP.Open "Post", sURL, False
objHTTP.setRequestHeader "Content-Type", "text/xml"
objHTTP.setRequestHeader "SOAPAction", "checkVatService"
objHTTP.send (sEnv)
objHTTP.waitForResponse
Set xmlDoc = CreateObject("HTMLFile")
xmlDoc.body.innerHTML = objHTTP.responsetext
IsVatValid = CBool(xmlDoc.getElementsByTagName("valid")(0).innerHTML)
Set xmlDoc = Nothing
Set objHTTP = Nothing
End Function
And then you can simply validate all your vat numbers:
Debug.Print IsVatValid("IT", "01802840023")
>>> True

Web Query where there are multiple Frames

My goal is to scrape the source code of a web page.
The site seems to have different Frames which is why my code won't work properly.
I tried to modify a code which I found online which should solve the Frame issue.
The following code creates an error (object required) at:
Set profileFrame .document.getElementById("profileFrame")
Public Sub IE_Automation()
'Needs references to Microsoft Internet Controls and Microsoft HTML Object Library
Dim baseURL As String
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim profileFrame As HTMLIFrame
Dim slotsDiv As HTMLDivElement
'example URL with multiple frames
baseURL = "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"
Set IE = New InternetExplorer
With IE
.Visible = True
'Navigate to the main page
.navigate baseURL & "/publictrophy/index.htm?onlinename=ace_anubis"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
'Get the profileFrame iframe and navigate to it
Set profileFrame = .document.getElementById("profileFrame")
.navigate baseURL & profileFrame.src
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
Set HTMLdoc = .document
End With
'Display all the text in the profileFrame iframe
MsgBox HTMLdoc.body.innerText
'Display just the text in the slots_container div
Set slotsDiv = HTMLdoc.getElementById("slots_container")
MsgBox slotsDiv.innerText
End Sub
Hummmm, I'm not exactly sure what you are doing here, but can you try the code below?
Option Explicit
Sub Sample()
Dim ie As Object
Dim links As Variant, lnk As Variant
Dim rowcount As Long
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"
'Wait for site to fully load
'ie.Navigate2 URL
Do While ie.Busy = True
DoEvents
Loop
Set links = ie.document.getElementsByTagName("a")
rowcount = 1
With Sheets("Sheet1")
For Each lnk In links
'Debug.Print lnk.innerText
'If lnk.classname Like "*Real Statistics Examples Part 1*" Then
.Range("A" & rowcount) = lnk.innerText
rowcount = rowcount + 1
'Exit For
'End If
Next
End With
End Sub
General:
I think in your research you may have come across this question and misunderstood how it relates/doesn't relate to your circumstance.
I don't think iFrames are relevant to your query. If you are after the list of names, their details and the URLs to their pages you can use the code below.
CSS Selectors
To target the elements of interest I use the following two CSS selectors. These use style infomation on the page to target the elements:
.SearchResults-link
.SearchResults-item
"." means class, which is like saying .getElementsByClassName. The first gets the links, and the second gets the description information on the first page.
With respect to the first CSS selector: The actual link required is dynamically constructed, but we can use the fact that the actual profile URLs have a common base string of "https://www.xing.com/profile/", which is then followed by the profileName. So, in function GetURL, we parse the outerHTML returned by the CSS selector to get the profileName and concatenate it with the BASESTRING constant to get our actual profile link.
Code:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://www.xing.com/publicsearch/query?search%5Bq%5D=IT"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim a As Object, exitTime As Date, linksNodeList As Object, profileNodeList As Object
' exitTime = Now + TimeSerial(0, 0, 5) '<== uncomment this section if timing problems
'
' Do
' DoEvents
' On Error Resume Next
' Set linksNodeList = .document.querySelectorAll(".SearchResults-link")
' On Error GoTo 0
' If Now > exitTime Then Exit Do
' Loop While linksNodeList Is Nothing
Set linksNodeList = .document.querySelectorAll(".SearchResults-link") '<== comment this out if uncommented section above
Set profileNodeList = .document.querySelectorAll(".SearchResults-item")
Dim i As Long
For i = 0 To profileNodeList.Length - 1
Debug.Print "Profile link: " & GetURL(linksNodeList.item(i).outerHTML)
Debug.Print "Basic info: " & profileNodeList.item(i).innerText
Next i
End With
End Sub
Public Function GetURL(ByVal htmlSection As String) As String
Const BASESTRING As String = "https://www.xing.com/profile/"
Dim arr() As String
arr = Split(htmlSection, "/")
GetURL = BASESTRING & Replace$(Split((arr(UBound(arr) - 1)), ">")(0), Chr$(34), vbNullString)
End Function
Example return information: