unable to EXPORT data from current open web page using VBA - html

I want to automate my delivery status for my regular courier from various service provider like Blue Dart.
I have Docket Numbers; I tried the same using VBA but it is unable to fetch data from webpage.
My code enter the Docket number from cell in home page, then it redirects to other page where delivery status is mentioned in table.
Sub GetCourseList()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
Dim IEWindows As SHDocVw.ShellWindows
Dim IEwindow As SHDocVw.InternetExplorer
Dim IEDocument As MSHTML.HTMLDocument
Dim BreadcrumbDiv As MSHTML.HTMLElementCollection
Set IEWindows = New SHDocVw.ShellWindows
'create new instance of IE. use reference to return current open IE if
'you want to use open IE window. Easiest way I know of is via title bar.
IE.Navigate "http://www.bluedart.com/maintracking.html"
'go to web page listed inside quotes
IE.Visible = True
While IE.busy
DoEvents 'wait until IE is done loading page.
Wend
IE.Document.All("numbers").Value = ThisWorkbook.Sheets("sheet1").Range("A1")
Application.SendKeys "~"
Dim URL As String
Dim qt As QueryTable
Dim ws As Worksheet
Set ws = Worksheets.Add
For Each IEwindow In IEWindows
If InStr(IEwindow.LocationURL, "your URL or some unique string") <> 0 Then ' Found it
Set IEDocument = IEwindow.Document
URL = IEwindow.LocationURL
Set qt = ws.QueryTables.Add( _
Connection:="URL;" & URL, _
Destination:=Range("F1"))
With qt
.RefreshOnFileOpen = True
.Name = "bluedart"
.FieldNames = True
.WebSelectionType = xlAllTables
.Refresh BackgroundQuery:=False
End With
End If
Next
End Sub

Your code does not attempt to interact with a page generated after entering Docket Number and confirming in any way. It could be done by:
Emulating browser interaction, can be Internet Explorer: click "Go" element on the page after Docket Number has been entered and use:
While IE.Busy Or IE.Readystate <> 4
DoEvents
Wend
It can also be achieved by creating POST request with proper parameters, including Docket Number.
Even after this is achieved, it still won't be possible to get data by query from this page, as its URL is this:
http://www.bluedart.com/servlet/RoutingServlet
Try to open this link. Nothing will display, because content of this URL is generated via POST method and parameters needed to generate content properly are not included in URL.
Instead of query, data can be accessed via finding HTML elements, such as tables, in HTML document for both methods I've mentioned.

Related

How to access the Web using VBA? Please check my code

In order to improve the repeatitive work, I tried to access the Web site which is using in company using VBA.
So, I made code using VBA. And I checked it could be access the normal site such as google, youtube...
But, I don't know why it could not be access the company site.
VBA stopped this line
Set HTMLDoc = IE_ctrl.document
Thank you in advanced.
And I checked one different things(VBA Local values, type) between Normal and company site.
please check below 2 pictures.
Sub a()
Dim IE_ctrl As InternetExplorer
Dim HTMLDoc As HTMLDocument
Dim input_Data As IHTMLElement
Dim URL As String
URL = "https://www.google.com"
Set IE_ctrl = New InternetExplorer
IE_ctrl.Silent = True
IE_ctrl.Visible = True
IE_ctrl.navigate URL
Wait_Browser IE_ctrl
Set HTMLDoc = IE_ctrl.document
Wait_Browser IE_ctrl
Set input_Data = HTMLDoc.getElementsByClassName("text").Item
input_Data.Click
End Sub
Sub Wait_Browser(Browser As InternetExplorer, Optional t As Integer = 1)
While Browser.Busy
DoEvents
Wend
Application.Wait DateAdd("s", t, Now)
End Sub
Normal site(operating well.)
enter image description here
Company site(operating error.)
enter image description here
You can try the following code. Please read the comments. I can't say anymore because I don't know the page or the html of the page.
Sub a()
'Use late binding for what you need
Dim ie As Object
Dim nodeInputData As Object
Dim url As String
url = "https://www.google.com"
'Use the windows GUID to initialize the Internet Explorer, if you
'want to get access to a company page. This helps if there are
'security rules you can't access over other ways of initializing IE
'This don't work in most cases for pages in the "real" web
'Read here for more infos:
'https://blogs.msdn.microsoft.com/ieinternals/2011/08/03/default-integrity-level-and-automation/
Set ie = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
ie.Visible = True
ie.navigate url
'Waiting for the document to load
Do Until ie.readyState = 4: DoEvents: Loop
'If necessary, if there is dynamic content that must be loaded,
'after the ie reports, loading was ready
'(The last three values are: hours, minutes, seconds)
Application.Wait (Now + TimeSerial(0, 0, 1))
'I don't know your html. If you only want to click a button,
'you don't need a varable
'ie.document.getElementsByClassName("text")(0).Click
'will do the same like
Set nodeInputData = ie.document.getElementsByClassName("text")(0)
nodeInputData.Click
'A short explanation of getElementsByClassName() and getElementsByTagName():
'Both methods create a node collection of all html elements that was found
'by the creteria in the brackets. This is because there can be any number of
'html elements with specified class names or tag names. If, for example,
'3 html elements with the class name "Text" were found, a node collection
'with three elements is created by getElementsByClassName("Text").
'These have the indices 0 to 2, as in an array. The individual elements are
'also addressed via these indices. They are indicated in round brackets.
End Sub

VBA code to scrape data using html/javascript won't work

I want to make VBA code to search on a website on the basis of input made in the first column. Range is from A1 to A102. This code is working fine except one thing: It copies my data from Excel Cell and then paste it in the Search box of website. But it doesn't click the search button Automatically. I welcome any good Suggestions from Experts.
I know how to scrape data from websites but there is a specific class for this searchbox button. What would be this class I should use to made click? This question is relatable to both VBA and javascript/html Experts.
I am getting this as button ID " nav-search-submit-text " and this code as `Class " nav-search-submit-text nav-sprite ", when I click on Inspect element.
Both don't work?
Thanks
Private Sub worksheet_change(ByVal target As Range)
If Not Intersect(target, Range("A1:A102")) Is Nothing Then
Call getdata
End If
End Sub
Sub getdata()
Dim i As Long
Dim URL As String
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
'Set IE.Visible = True to make IE visible, or False for IE to run in the background
IE.Visible = True
URL = "https://www.amazon.co.uk"
'Navigate to URL
IE.Navigate URL
'making sure the page is done loading
Do
DoEvents
Loop Until IE.ReadyState = 4
'attempting to search date based on date value in cell
IE.Document.getElementById("twotabsearchtextbox").Value = ActiveCell.Value
'Sheets("Sheet1").Range("A1:A102").Text
'Select the date picker box and press Enter to 'activate' the new date
IE.Document.getElementById("twotabsearchtextbox").Select
'clicking the search button
IE.Document.getElementsByClassName("nav-sprite").Click
'Call nextfunction
End Sub
To use web scraping with Excel, you must be able to use both VBA and HTML. Additionally CSS and at least some JS. Above all, you should be familiar with the DOM (Document Object Model). Only with VBA or only with HTML you will not get far.
It's a mystery to me why you want to do it in a complicated way when you can do it simply via the URL. For your solution you have to use the class nav-input. This class exists twice in the HTML document. The search button is the element with the second appearance of nav-input. Since the indices of a NodeCollection start at 0, you have to click the element with index 1.
Sub getdata()
Dim URL As String
Dim IE As Object
URL = "https://www.amazon.co.uk"
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True ' True to make IE visible, or False for IE to run in the background
IE.Navigate URL 'Navigate to URL
'making sure the page is done loading
Do: DoEvents: Loop Until IE.ReadyState = 4
'attempting to search date based on date value in cell
IE.Document.getElementById("twotabsearchtextbox").Value = ActiveCell.Value
'clicking the search button
IE.Document.getElementsByClassName("nav-input")(1).Click
End Sub
Edit: Solution to open offer with known ASIN
You can open an offer on Amazon webpage directly if you know the ASIN. To use the ASIN in the active cell in the URL (this does not work reliably. If you have to press Enter to finish the input, the active cell is the one under the desired one), it can be passed as a parameter to the Sub() getdata():
Private Sub worksheet_change(ByVal target As Range)
If Not Intersect(target, Range("A1:A102")) Is Nothing Then
Call getdata(ActiveCell.Value)
End If
End Sub
In the Sub() getdata() the URL with the transferred ASIN is then called:
Sub getdata(searchTerm As String)
Dim URL As String
Dim IE As Object
'Use the right base url
URL = "https://www.amazon.co.uk/dp/" & searchTerm
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True ' True to make IE visible, or False for IE to run in the background
IE.Navigate URL 'Navigate to URL
'making sure the page is done loading
Do: DoEvents: Loop Until IE.ReadyState = 4
End Sub
It's also possible to do that all in the worksheet_change event of the worksheet (Include getting price and offer title):
Private Sub worksheet_change(ByVal target As Range)
If Not Intersect(target, Range("A1:A102")) Is Nothing Then
With CreateObject("InternetExplorer.Application")
.Visible = True ' True to make IE visible, or False for IE to run in the background
.Navigate "https://www.amazon.co.uk/dp/" & ActiveCell 'Navigate to URL
'making sure the page is done loading
Do: DoEvents: Loop Until .ReadyState = 4
'Get Price
ActiveCell.Offset(0, 1).Value = .document.getElementByID("priceblock_ourprice").innertext
'Get offer title
ActiveCell.Offset(0, 2).Value = .document.getElementByID("productTitle").innertext
End With
End If
End Sub

Can MSXML2.XMLHTTP retrieve ALL of the HTML data for a given webpage?

With dynamic web pages that display a table of retrieved data, I’ve found that both MSXML2.XMLHTTP and the Internet Explorer object usually can’t access this data. A good example is https://www.tiff.net/tiff/films.html. Both techniques won’t retrieve any of the movie data – just the surrounding web page. The code I’ve tried is as follows:
Function getHTTP(ByVal sReq As String) As Variant
On Error GoTo onErr
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", sReq, False
.send
getHTTP = StrConv(.responseBody, 64)
End With
Exit Function
onErr: MsgBox "Error " & Err & ": " & Err.Description, 49, "Error opening site"
End Function
Function GetHTML(ByVal strURL As String) As Variant
Dim oIE As InternetExplorer
Dim hElm As IHTMLElement
Set oIE = New InternetExplorer
oIE.Navigate strURL
Do While (oIE.Busy Or oIE.ReadyState <> READYSTATE_COMPLETE)
DoEvents
Loop
Set hElm = oIE.Document.all.tags("html").Item(0)
GetHTML = hElm.outerHTML
Set oIE = Nothing
Set hElm = Nothing
End Function
But there is a way to manually retrieve the movie data – just follow these steps with Microsoft Edge or Internet Explorer:
Right-click on one of the movies
Choose “inspect element." The DevTools console opens.
At the bottom-left of the screen, click on the “html” tab.
Right-click the tab. Choose “copy.”
Open notepad and paste what you’ve copied.
You now have the movie data and can save it to a file for parsing. My question: Is there any way to get this data programmatically?
Why Json? Because the page is loaded using json data
To View: Use Google Chrome --> Press F12 --> Load URL --> Goto Network tab
Code:
Sub getHTTP()
Dim Url As String, data As String
Dim xml As Object, JSON As Object, colObj, item
Url = "https://www.tiff.net/data/films-events-2018.json?q=1513263947586"
Set xml = CreateObject("MSXML2.ServerXMLHTTP")
With xml
.Open "GET", Url, False
.send
data = .responseText
End With
Set JSON = JsonConverter.ParseJson(data)
Set colObj = JSON("items")
For Each item In colObj
Debug.Print item("title")
Debug.Print item("description")
For Each c1 In item("cast")
Debug.Print c1
Next
For Each c2 In item("countries")
Debug.Print c2
Next
Next
End Sub
Output
Installation of JsonConverter
Download the latest release
Import JsonConverter.bas into your project (Open VBA Editor, Alt + F11; File > Import File)
Add Dictionary reference/class
For Windows-only, include a reference to "Microsoft Scripting Runtime"
For Windows and Mac, include VBA-Dictionary
Tree View of Data
Here are the film titles using IE (you can use same process to get directors)
Option Explicit
Public Sub GetFilms()
Dim IE As New InternetExplorer, html As HTMLDocument, films As Object, i As Long
With IE
.Visible = True
.navigate "https://www.tiff.net/tiff/films.html"
While .Busy Or .readyState < 4: DoEvents: Wend
Set films = .document.querySelectorAll("[target=_self]")
For i = 0 To films.Length - 1
Debug.Print films.item(i).innerText
Next
.Quit '<== Remember to quit application
End With
End Sub
XHR is too fast for this, with the URL provided, but IE is just fine.
If you inspect the HTML you can see each film has the following commonality:
There is an attribute within the a tag called target whose value is _self.
You can use an attribute CSS selector to gather all of these matching elements using the querySelectorAll method of document.
CSS selector (sample):
I would be interested in if this can be solved for getting the film descriptions by parsing the HTML. I had thought the presence of the comments was obscuring the film descriptions. A regex which selects the text within these in theory "<!-- react-text: \d+ -->([^...].+?(?=<))" seems to fail when applied to the .innerHTML as did attempts to swop out the comment start and finish with regex.

VBA - problems with getting html from a website after hitting submit button

I am trying to scrap data out of a section of a webpage. To get into the section I need to fill in a captcha security code and hit a button, but that is alright because the security code is actually written in the html of the page. So, I am creating an IE object, driving it to the webpage, getting the captcha security code, writing it in the proper box, hitting the submit button and then getting the html document so I can scrap data out of it.
Nonetheless I am executing the steps exatcly in the order I mentioned, it seems that the html document that is being gotten is not the one from the page after I pass through the captcha validation, but from the page before the captcha validation.
Would anyone know what must I do to get the correct html document and conseuently be able to scrap the data I really want? Thank you.
The subprocedure's code follows next:
'Getting National fuel prices from ANP
Sub subANPNationalFuelPrices()
'Creating variables for the URL and the HTML files
Dim urlANP As String: urlANP = "http://www.anp.gov.br/preco/prc/Resumo_Semanal_Index.asp"
Dim htmlANP1 As HTMLDocument
'Creating the IE object
Dim IE As InternetExplorer
Set IE = New InternetExplorer
IE.Visible = True
'Making sure that the webpage is fully load
IE.navigate (urlANP)
Do While IE.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Getting your data"
DoEvents
Loop
Set htmlANP1 = IE.document
'Getting the Captcha Password
Dim strCaptchaPassword As String
Dim colMyCollection As IHTMLElementCollection
Set colMyCollection = htmlANP1.getElementById("divQuadro").all
Dim objLabel As IHTMLElement
For Each objLabel In colMyCollection
strCaptchaPassword = strCaptchaPassword & objLabel.innerText
Next objLabel
'Getting the input box object and getting it the correct password
Dim objInputBox As IHTMLElement
Set objInputBox = htmlANP1.getElementById("txtValor")
objInputBox.Value = strCaptchaPassword
'Getting the submit button object and clicking it
Dim objInputButton As IHTMLElement
Set objInputButton = htmlANP1.getElementById("image1")
objInputButton.Click
'Getting the true rich data HTML
Set htmlANP1 = IE.document
'Extracting the data from the html document
Dim rngValues As range: Set rngValues = Sheet1.range("B17")
Dim strValues(35) As String
Dim dblValues(35) As Double
Dim objElement1 As IHTMLElement
Set objElement1 = htmlANP1.getElementsByTagName("TABLE")(1)
Dim colCollection1 As IHTMLElementCollection
Set colCollection1 = objElement1.all
Dim intTempCount As Integer
Dim objTempElement As IHTMLElement
intTempCount = 32
For Each objTempElement In colCollection1
Sheet1.Cells(intTempCount, 3) = objTempElement.tagName
Sheet1.Cells(intTempCount, 4) = objTempElement.innerText
intTempCount = intTempCount + 1
Next objTempElement
End sub
You are not waiting for the new webpage to load after clicking the button on the captcha. Either check the ready state of IE again or end you code here be starting a timer which starts your code off again in X seconds AND then checks the ready state of IE and Document.
I do scraping on a system using iFrame so using IE.Readystate isn't very reliable. Usually I have to wait for another element to 'exist', but using IsObject(element) hasn't been very reliable either. What I've had to do is use a loop in my main code that calls a function so if I'm waiting for something to load and I know that after the page loads, there's an element with the ID "UserName", then I do this..
...
Do Until IsErr(doc, "UserName") = False: Loop
...
Function IsErr(doc As HTMLDocument, ID As String) As Boolean
IsErr = True
On Error GoTo ExitFunction:
Debug.Print left(doc.getElementById(ID).innerHTML, 1)
IsErr = False
Exit Function
ExitFunction:
End Function
I could just do a loop statement that keeps trying to debug it, but that would be a nightmare with the error handling so if you use a separate function for the printing, it can exit the function after the error, then the loop re-initiates the function and it will do this forever until the next element exists.

Problems Using VBA to Submit a Web Page - Using the click button function but web page won't submit

I am writing a VBA code to pull data from a website (https://app.buzzsumo.com/top-content). I have a functional code that runs without errors however I still can't get the webpage to actually submit the form when the click command runs. I have tried many different approaches and combinations of submitting the form/clicking the submit button but none have seemed to work so far. Below is my current code.
Sub clickFormButton()
Dim ie As Object
Dim form As Variant,
Dim button As Variant
'add the “Microsoft Internet Controls” reference in VBA Project
Set ie = CreateObject("InternetExplorer.Application")
'using input box to enter URL I am serching for
Search_URL = InputBox("Enter URL to Search For")
With ie
.Visible = True
.navigate ("https://app.buzzsumo.com/#/top-content")
'Ensure that the web page downloads completely
While ie.ReadyState <> 4
DoEvents
Wend
'assigning the input variables to the html elements of the form
ie.document.getElementsByName("q").Item.innertext = Search_URL
'finding and clicking the button
Set objInputs = ie.document.getElementsByTagName("input")
For Each ele In objInputs
If ele.Title Like "Press Enter to Search" Then
ele.Click
End If
End With
End Sub
I have also tried other methods to find and click the button such as:
'Dim i As Variant
'Set form = ie.document.getElementsByClassName("btn btn-highlight")
'For i = 1 To 5
'If form.Item(i).DefaultValue = "Search!" Then
'Set button = form.Item(i)
'button.Click
'End If
'Next i
Please provide any recomendations on what I may be missing or how I can get this code to actually submit the form and advance to the search results. Thanks in advance for any help you can provide!
Here are some additional details: Unfortunately the element I am trying to click (the "Search" button) does not have an ID or Name associated with it. This is why is was trying alternative approaches, such as looping through all of the object and trying to find the one with the right “Title”. Here is the code for the element from the DOM explorer:
<input title="Press Enter to search" class="btn btn-highlight" type="submit" ng-disabled="topContentSearchForm.$invalid" value="Search!"/>
The only attributes associated with it are:
class: btn btn-highlight
type: submit
ng-disabled: topContentSearchForm.$invalid
value: Search!
title: Press Enter to Search
Please let me know if there is another way to find the element ID/name? or if there is another way to click the button without these attributes? Thanks
I know this is an old post but... I have been using this effectively..
'click login
Set htmlDoc = .document
Set htmlColl = htmlDoc.getElementsByTagName("input")
Do While htmlDoc.readyState <> "complete": DoEvents: Loop
For Each htmlInput In htmlColl
If Trim(htmlInput.Type) = "submit" Then
htmlInput.Click
Exit For
End If
Next htmlInput
A couple of ideas:
While ie.ReadyState <> 4
DoEvents
Wend
If you have javascripts on the page use Application.Wait Now + TimeSerial(0, 0, 4) (basically wait for 4 seconds) instead.
Second I don't understand why you need to loop through all the objects on the web page. The easier way would be to go that webpage in IE, hit F12 and select element in DOM explorer, you can get the ID or Name of the button and then use ie.document.GetElementByID("buttonID").Click or ie.document.GetElementsByName("buttonName").Item.Click
Let me know if this helps.
Edit: After inspecting the particular webpage it appears that the ID and Name attributes for that button are missing. So I had to resort to the following:
Dim i As integer
Set form = ie.document.getElementsByClassName("btn btn-highlight")
On Error Resume Next
For i = 1 To 20
If form.Item(i).DefaultValue = "Search!" Then
form.Item(i).Click
End If
Next i
The relevant button is clicked for the fourth item (I had to manually go through the loop because 3rd item navigated away from the page to a pricing page, so i had to go back). Anyway the full code is the following, please note that you will need to go through this exercise again if there were changes to the webpage
Sub clickFormButton()
Dim ie As Object
Dim form As Variant
Dim button As Variant
'add the “Microsoft Internet Controls” reference in VBA Project
Set ie = CreateObject("InternetExplorer.Application")
'using input box to enter URL I am serching for
Search_URL = InputBox("Enter URL to Search For")
With ie
.Visible = True
.navigate ("https://app.buzzsumo.com/#/top-content")
End With
'wait for page to load
Application.Wait Now + TimeSerial(0, 0, 5)
'assigning the input variables to the html elements of the form
ie.document.getElementsByName("q").Item.InnerText = Search_URL
'finding and clicking the button
ie.document.getElementsByClassName("btn btn-highlight").Item(4).Click
End Sub
It looks like you could potentially just build the string URL, for example if you put "abcd" in the search field, the resulting URL will be:
https://app.buzzsumo.com/top-content?result_type=total&type=articles&num_days=360&tfc=false&general_article&infographic&video&page=1&guest_post&giveaway&interview&links_sitewide=true&unique_domains=true&backlinks=false&q=abcd&offset=0
Note the bolded portion which is the search query.
So, and this is just a quick idea that may work as long as you're not trying to abuse their system by sending 1000's of automated requests:
Sub FetchWebsite()
Dim ie As Object
Dim form As Variant
Dim button As Variant
Dim url As String
'add the “Microsoft Internet Controls” reference in VBA Project
Set ie = CreateObject("InternetExplorer.Application")
'using input box to enter URL I am serching for
Search_URL = InputBox("Enter URL to Search For")
'### BUILD THE FULL URL
url = "https://app.buzzsumo.com/top-content?result_type=total&type=articles&num_days=360&tfc=false&general_article&infographic&video&page=1&guest_post&giveaway&interview&links_sitewide=true&unique_domains=true&backlinks=false&q=" & Search_URL & "&offset=0"
With ie
.Visible = True
.navigate url
End With
'wait for page to load
Do
Loop While Not ie.ReadyState = 4 And Not ie.Busy
AppActivate "Internet Explorer"
End Sub
I did some poking around in the Locals window and this should also work, modified from your code. This would be the Form.Submit that I mentioned in comment on OP.
Sub clickFormButton()
Dim ie As InternetExplorer
Dim form As Variant
Dim button As Variant
Dim ele As HTMLFormElement
'add the “Microsoft Internet Controls” reference in VBA Project
Set ie = CreateObject("InternetExplorer.Application")
'using input box to enter URL I am serching for
Search_URL = InputBox("Enter URL to Search For")
With ie
.Visible = True
.navigate ("https://app.buzzsumo.com/#/top-content")
End With
'wait for page to load
Do
Loop While Not ie.ReadyState = 4 And Not ie.Busy
'assigning the input variables to the html elements of the form
ie.document.getElementsByName("q").Item.InnerText = Search_URL
'finding and clicking the button
ie.document.getElementsByClassName("btn btn-highlight").Item(4).form.submit
End Sub
CSS selector:
You can use CSS selector of #search-btn > div. Which is div within className search-btn. "#" means class.
VBA:
Use .querySelector method to apply CSS selector:
ie.document.querySelector("#search-btn > div").Click