VBA code to scrape data using html/javascript won't work - html

I want to make VBA code to search on a website on the basis of input made in the first column. Range is from A1 to A102. This code is working fine except one thing: It copies my data from Excel Cell and then paste it in the Search box of website. But it doesn't click the search button Automatically. I welcome any good Suggestions from Experts.
I know how to scrape data from websites but there is a specific class for this searchbox button. What would be this class I should use to made click? This question is relatable to both VBA and javascript/html Experts.
I am getting this as button ID " nav-search-submit-text " and this code as `Class " nav-search-submit-text nav-sprite ", when I click on Inspect element.
Both don't work?
Thanks
Private Sub worksheet_change(ByVal target As Range)
If Not Intersect(target, Range("A1:A102")) Is Nothing Then
Call getdata
End If
End Sub
Sub getdata()
Dim i As Long
Dim URL As String
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
'Set IE.Visible = True to make IE visible, or False for IE to run in the background
IE.Visible = True
URL = "https://www.amazon.co.uk"
'Navigate to URL
IE.Navigate URL
'making sure the page is done loading
Do
DoEvents
Loop Until IE.ReadyState = 4
'attempting to search date based on date value in cell
IE.Document.getElementById("twotabsearchtextbox").Value = ActiveCell.Value
'Sheets("Sheet1").Range("A1:A102").Text
'Select the date picker box and press Enter to 'activate' the new date
IE.Document.getElementById("twotabsearchtextbox").Select
'clicking the search button
IE.Document.getElementsByClassName("nav-sprite").Click
'Call nextfunction
End Sub

To use web scraping with Excel, you must be able to use both VBA and HTML. Additionally CSS and at least some JS. Above all, you should be familiar with the DOM (Document Object Model). Only with VBA or only with HTML you will not get far.
It's a mystery to me why you want to do it in a complicated way when you can do it simply via the URL. For your solution you have to use the class nav-input. This class exists twice in the HTML document. The search button is the element with the second appearance of nav-input. Since the indices of a NodeCollection start at 0, you have to click the element with index 1.
Sub getdata()
Dim URL As String
Dim IE As Object
URL = "https://www.amazon.co.uk"
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True ' True to make IE visible, or False for IE to run in the background
IE.Navigate URL 'Navigate to URL
'making sure the page is done loading
Do: DoEvents: Loop Until IE.ReadyState = 4
'attempting to search date based on date value in cell
IE.Document.getElementById("twotabsearchtextbox").Value = ActiveCell.Value
'clicking the search button
IE.Document.getElementsByClassName("nav-input")(1).Click
End Sub
Edit: Solution to open offer with known ASIN
You can open an offer on Amazon webpage directly if you know the ASIN. To use the ASIN in the active cell in the URL (this does not work reliably. If you have to press Enter to finish the input, the active cell is the one under the desired one), it can be passed as a parameter to the Sub() getdata():
Private Sub worksheet_change(ByVal target As Range)
If Not Intersect(target, Range("A1:A102")) Is Nothing Then
Call getdata(ActiveCell.Value)
End If
End Sub
In the Sub() getdata() the URL with the transferred ASIN is then called:
Sub getdata(searchTerm As String)
Dim URL As String
Dim IE As Object
'Use the right base url
URL = "https://www.amazon.co.uk/dp/" & searchTerm
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True ' True to make IE visible, or False for IE to run in the background
IE.Navigate URL 'Navigate to URL
'making sure the page is done loading
Do: DoEvents: Loop Until IE.ReadyState = 4
End Sub
It's also possible to do that all in the worksheet_change event of the worksheet (Include getting price and offer title):
Private Sub worksheet_change(ByVal target As Range)
If Not Intersect(target, Range("A1:A102")) Is Nothing Then
With CreateObject("InternetExplorer.Application")
.Visible = True ' True to make IE visible, or False for IE to run in the background
.Navigate "https://www.amazon.co.uk/dp/" & ActiveCell 'Navigate to URL
'making sure the page is done loading
Do: DoEvents: Loop Until .ReadyState = 4
'Get Price
ActiveCell.Offset(0, 1).Value = .document.getElementByID("priceblock_ourprice").innertext
'Get offer title
ActiveCell.Offset(0, 2).Value = .document.getElementByID("productTitle").innertext
End With
End If
End Sub

Related

How to access the Web using VBA? Please check my code

In order to improve the repeatitive work, I tried to access the Web site which is using in company using VBA.
So, I made code using VBA. And I checked it could be access the normal site such as google, youtube...
But, I don't know why it could not be access the company site.
VBA stopped this line
Set HTMLDoc = IE_ctrl.document
Thank you in advanced.
And I checked one different things(VBA Local values, type) between Normal and company site.
please check below 2 pictures.
Sub a()
Dim IE_ctrl As InternetExplorer
Dim HTMLDoc As HTMLDocument
Dim input_Data As IHTMLElement
Dim URL As String
URL = "https://www.google.com"
Set IE_ctrl = New InternetExplorer
IE_ctrl.Silent = True
IE_ctrl.Visible = True
IE_ctrl.navigate URL
Wait_Browser IE_ctrl
Set HTMLDoc = IE_ctrl.document
Wait_Browser IE_ctrl
Set input_Data = HTMLDoc.getElementsByClassName("text").Item
input_Data.Click
End Sub
Sub Wait_Browser(Browser As InternetExplorer, Optional t As Integer = 1)
While Browser.Busy
DoEvents
Wend
Application.Wait DateAdd("s", t, Now)
End Sub
Normal site(operating well.)
enter image description here
Company site(operating error.)
enter image description here
You can try the following code. Please read the comments. I can't say anymore because I don't know the page or the html of the page.
Sub a()
'Use late binding for what you need
Dim ie As Object
Dim nodeInputData As Object
Dim url As String
url = "https://www.google.com"
'Use the windows GUID to initialize the Internet Explorer, if you
'want to get access to a company page. This helps if there are
'security rules you can't access over other ways of initializing IE
'This don't work in most cases for pages in the "real" web
'Read here for more infos:
'https://blogs.msdn.microsoft.com/ieinternals/2011/08/03/default-integrity-level-and-automation/
Set ie = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
ie.Visible = True
ie.navigate url
'Waiting for the document to load
Do Until ie.readyState = 4: DoEvents: Loop
'If necessary, if there is dynamic content that must be loaded,
'after the ie reports, loading was ready
'(The last three values are: hours, minutes, seconds)
Application.Wait (Now + TimeSerial(0, 0, 1))
'I don't know your html. If you only want to click a button,
'you don't need a varable
'ie.document.getElementsByClassName("text")(0).Click
'will do the same like
Set nodeInputData = ie.document.getElementsByClassName("text")(0)
nodeInputData.Click
'A short explanation of getElementsByClassName() and getElementsByTagName():
'Both methods create a node collection of all html elements that was found
'by the creteria in the brackets. This is because there can be any number of
'html elements with specified class names or tag names. If, for example,
'3 html elements with the class name "Text" were found, a node collection
'with three elements is created by getElementsByClassName("Text").
'These have the indices 0 to 2, as in an array. The individual elements are
'also addressed via these indices. They are indicated in round brackets.
End Sub

unable to EXPORT data from current open web page using VBA

I want to automate my delivery status for my regular courier from various service provider like Blue Dart.
I have Docket Numbers; I tried the same using VBA but it is unable to fetch data from webpage.
My code enter the Docket number from cell in home page, then it redirects to other page where delivery status is mentioned in table.
Sub GetCourseList()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
Dim IEWindows As SHDocVw.ShellWindows
Dim IEwindow As SHDocVw.InternetExplorer
Dim IEDocument As MSHTML.HTMLDocument
Dim BreadcrumbDiv As MSHTML.HTMLElementCollection
Set IEWindows = New SHDocVw.ShellWindows
'create new instance of IE. use reference to return current open IE if
'you want to use open IE window. Easiest way I know of is via title bar.
IE.Navigate "http://www.bluedart.com/maintracking.html"
'go to web page listed inside quotes
IE.Visible = True
While IE.busy
DoEvents 'wait until IE is done loading page.
Wend
IE.Document.All("numbers").Value = ThisWorkbook.Sheets("sheet1").Range("A1")
Application.SendKeys "~"
Dim URL As String
Dim qt As QueryTable
Dim ws As Worksheet
Set ws = Worksheets.Add
For Each IEwindow In IEWindows
If InStr(IEwindow.LocationURL, "your URL or some unique string") <> 0 Then ' Found it
Set IEDocument = IEwindow.Document
URL = IEwindow.LocationURL
Set qt = ws.QueryTables.Add( _
Connection:="URL;" & URL, _
Destination:=Range("F1"))
With qt
.RefreshOnFileOpen = True
.Name = "bluedart"
.FieldNames = True
.WebSelectionType = xlAllTables
.Refresh BackgroundQuery:=False
End With
End If
Next
End Sub
Your code does not attempt to interact with a page generated after entering Docket Number and confirming in any way. It could be done by:
Emulating browser interaction, can be Internet Explorer: click "Go" element on the page after Docket Number has been entered and use:
While IE.Busy Or IE.Readystate <> 4
DoEvents
Wend
It can also be achieved by creating POST request with proper parameters, including Docket Number.
Even after this is achieved, it still won't be possible to get data by query from this page, as its URL is this:
http://www.bluedart.com/servlet/RoutingServlet
Try to open this link. Nothing will display, because content of this URL is generated via POST method and parameters needed to generate content properly are not included in URL.
Instead of query, data can be accessed via finding HTML elements, such as tables, in HTML document for both methods I've mentioned.

Cycling Through List of URLs Using Excel VBA

I am much more familiar with Excel now, but one thing is still baffling me - how to cycle through URLs in a loop. My current conundrum is that I have this list of URLs of packages, and need to obtain the status of each package on each page using its HTML. What I currently have to cycle through the list is:
Sub TrackingDeliveryStatusResults()
Dim IE As Object
Dim URL As Range
Dim wb1 As Workbook, ws1 As Worksheet
Dim filterRange As Range
Dim copyRange As Range
Dim lastRow As Long
Set wb1 = Application.Workbooks.Open("\\S51\******\Folders\******\TrackingDeliveryStatus.xls")
Set ws1 = wb1.Worksheets("TrackingDeliveryStatusResults")
Set IE = New InternetExplorer
With IE
.Visible = True
For Each URL In Range("C2:C & lastRow")
.Navigate URL.Value
While .Busy Or .ReadyState <> 4: DoEvents: Wend
MsgBox .Document.body.innerText
Next
End With
End Sub
And the list of URLs
My goal here is:
Cycle through each URL (inserts URL in IE and keeps going without opening new tabs)
Obtain the status of the item for each URL from the HTML element
FedEx: Delivered (td class="status")
UPS: Delivered (id="tt_spStatus")
USPS: Arrived at USPS Facility (class= "info-text first)
Finish the loop and save as a csv if at all possible (I've already done that, so I'm just posting the code portion I'm having a problem with).
My understanding is that I have to code a different if statement for each different url, since all of them have different HTML tags for their delivery status. Loops are simple, but to loop through webpages is new to me. The code has been throwing me errors no matter what changes I make.
The IE object opens up but then Excel hits an error and the code stops running.
OK Ill start with the proper syntax for you to get your code going and I will edit this answer for further code
Sub Sample()
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
Application.EnableEvents = True
Dim wsSheet As Worksheet, Rows As Long, links As Variant, IE As Object, link As Variant
Set wb = ThisWorkbook
Set wsSheet = wb.Sheets("Sheet1")
Set IE = New InternetExplorer
Rows = wsSheet.Cells(wsSheet.Rows.Count, "A").End(xlUp).Row
links = wsSheet.Range("A1:A" & Rows)
With IE
.Visible = True
For Each link In links
.navigate (link)
While .Busy Or .ReadyState <> 4: DoEvents: Wend
MsgBox .Document.body.innerText
Next link
End With
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
Application.EnableEvents = True
End Sub
This will get you looping I think you had some general syntax issues which you can see the difference in my code in order to loop through in the for each the link has to be of type object or variant and links I set to variant assuming it will default to a string

Problems Using VBA to Submit a Web Page - Using the click button function but web page won't submit

I am writing a VBA code to pull data from a website (https://app.buzzsumo.com/top-content). I have a functional code that runs without errors however I still can't get the webpage to actually submit the form when the click command runs. I have tried many different approaches and combinations of submitting the form/clicking the submit button but none have seemed to work so far. Below is my current code.
Sub clickFormButton()
Dim ie As Object
Dim form As Variant,
Dim button As Variant
'add the “Microsoft Internet Controls” reference in VBA Project
Set ie = CreateObject("InternetExplorer.Application")
'using input box to enter URL I am serching for
Search_URL = InputBox("Enter URL to Search For")
With ie
.Visible = True
.navigate ("https://app.buzzsumo.com/#/top-content")
'Ensure that the web page downloads completely
While ie.ReadyState <> 4
DoEvents
Wend
'assigning the input variables to the html elements of the form
ie.document.getElementsByName("q").Item.innertext = Search_URL
'finding and clicking the button
Set objInputs = ie.document.getElementsByTagName("input")
For Each ele In objInputs
If ele.Title Like "Press Enter to Search" Then
ele.Click
End If
End With
End Sub
I have also tried other methods to find and click the button such as:
'Dim i As Variant
'Set form = ie.document.getElementsByClassName("btn btn-highlight")
'For i = 1 To 5
'If form.Item(i).DefaultValue = "Search!" Then
'Set button = form.Item(i)
'button.Click
'End If
'Next i
Please provide any recomendations on what I may be missing or how I can get this code to actually submit the form and advance to the search results. Thanks in advance for any help you can provide!
Here are some additional details: Unfortunately the element I am trying to click (the "Search" button) does not have an ID or Name associated with it. This is why is was trying alternative approaches, such as looping through all of the object and trying to find the one with the right “Title”. Here is the code for the element from the DOM explorer:
<input title="Press Enter to search" class="btn btn-highlight" type="submit" ng-disabled="topContentSearchForm.$invalid" value="Search!"/>
The only attributes associated with it are:
class: btn btn-highlight
type: submit
ng-disabled: topContentSearchForm.$invalid
value: Search!
title: Press Enter to Search
Please let me know if there is another way to find the element ID/name? or if there is another way to click the button without these attributes? Thanks
I know this is an old post but... I have been using this effectively..
'click login
Set htmlDoc = .document
Set htmlColl = htmlDoc.getElementsByTagName("input")
Do While htmlDoc.readyState <> "complete": DoEvents: Loop
For Each htmlInput In htmlColl
If Trim(htmlInput.Type) = "submit" Then
htmlInput.Click
Exit For
End If
Next htmlInput
A couple of ideas:
While ie.ReadyState <> 4
DoEvents
Wend
If you have javascripts on the page use Application.Wait Now + TimeSerial(0, 0, 4) (basically wait for 4 seconds) instead.
Second I don't understand why you need to loop through all the objects on the web page. The easier way would be to go that webpage in IE, hit F12 and select element in DOM explorer, you can get the ID or Name of the button and then use ie.document.GetElementByID("buttonID").Click or ie.document.GetElementsByName("buttonName").Item.Click
Let me know if this helps.
Edit: After inspecting the particular webpage it appears that the ID and Name attributes for that button are missing. So I had to resort to the following:
Dim i As integer
Set form = ie.document.getElementsByClassName("btn btn-highlight")
On Error Resume Next
For i = 1 To 20
If form.Item(i).DefaultValue = "Search!" Then
form.Item(i).Click
End If
Next i
The relevant button is clicked for the fourth item (I had to manually go through the loop because 3rd item navigated away from the page to a pricing page, so i had to go back). Anyway the full code is the following, please note that you will need to go through this exercise again if there were changes to the webpage
Sub clickFormButton()
Dim ie As Object
Dim form As Variant
Dim button As Variant
'add the “Microsoft Internet Controls” reference in VBA Project
Set ie = CreateObject("InternetExplorer.Application")
'using input box to enter URL I am serching for
Search_URL = InputBox("Enter URL to Search For")
With ie
.Visible = True
.navigate ("https://app.buzzsumo.com/#/top-content")
End With
'wait for page to load
Application.Wait Now + TimeSerial(0, 0, 5)
'assigning the input variables to the html elements of the form
ie.document.getElementsByName("q").Item.InnerText = Search_URL
'finding and clicking the button
ie.document.getElementsByClassName("btn btn-highlight").Item(4).Click
End Sub
It looks like you could potentially just build the string URL, for example if you put "abcd" in the search field, the resulting URL will be:
https://app.buzzsumo.com/top-content?result_type=total&type=articles&num_days=360&tfc=false&general_article&infographic&video&page=1&guest_post&giveaway&interview&links_sitewide=true&unique_domains=true&backlinks=false&q=abcd&offset=0
Note the bolded portion which is the search query.
So, and this is just a quick idea that may work as long as you're not trying to abuse their system by sending 1000's of automated requests:
Sub FetchWebsite()
Dim ie As Object
Dim form As Variant
Dim button As Variant
Dim url As String
'add the “Microsoft Internet Controls” reference in VBA Project
Set ie = CreateObject("InternetExplorer.Application")
'using input box to enter URL I am serching for
Search_URL = InputBox("Enter URL to Search For")
'### BUILD THE FULL URL
url = "https://app.buzzsumo.com/top-content?result_type=total&type=articles&num_days=360&tfc=false&general_article&infographic&video&page=1&guest_post&giveaway&interview&links_sitewide=true&unique_domains=true&backlinks=false&q=" & Search_URL & "&offset=0"
With ie
.Visible = True
.navigate url
End With
'wait for page to load
Do
Loop While Not ie.ReadyState = 4 And Not ie.Busy
AppActivate "Internet Explorer"
End Sub
I did some poking around in the Locals window and this should also work, modified from your code. This would be the Form.Submit that I mentioned in comment on OP.
Sub clickFormButton()
Dim ie As InternetExplorer
Dim form As Variant
Dim button As Variant
Dim ele As HTMLFormElement
'add the “Microsoft Internet Controls” reference in VBA Project
Set ie = CreateObject("InternetExplorer.Application")
'using input box to enter URL I am serching for
Search_URL = InputBox("Enter URL to Search For")
With ie
.Visible = True
.navigate ("https://app.buzzsumo.com/#/top-content")
End With
'wait for page to load
Do
Loop While Not ie.ReadyState = 4 And Not ie.Busy
'assigning the input variables to the html elements of the form
ie.document.getElementsByName("q").Item.InnerText = Search_URL
'finding and clicking the button
ie.document.getElementsByClassName("btn btn-highlight").Item(4).form.submit
End Sub
CSS selector:
You can use CSS selector of #search-btn > div. Which is div within className search-btn. "#" means class.
VBA:
Use .querySelector method to apply CSS selector:
ie.document.querySelector("#search-btn > div").Click

Getting data from HTML source in VBA (excel)

I'm trying to collect data from a website, which should be manageable once the source is in string form. Looking around I've assembled some possible solutions but have run into problems with all of them:
Use InternetExplorer.Application to open the url and then access the inner HTML
Inet
use Shell command to run wget
Here are the problems I'm having:
When I store the innerHTML into a string, it's not the entire source, only a fraction
ActiveX does not allow the creation of the Inet object (error 429)
I've got the htm into a folder on my computer, how do I get it into a string in VBA?
Code for 1:
Sub getData()
Dim url As String, ie As Object, state As Integer
Dim text As Variant, startS As Integer, endS As Integer
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 0
url = "http://www.eoddata.com/stockquote/NASDAQ/AAPL.htm"
ie.Navigate url
state = 0
Do Until state = 4
DoEvents
state = ie.readyState
Loop
text = ie.Document.Body.innerHTML
startS = InStr(ie.Document.Body.innerHTML, "7/26/2012")
endS = InStr(ie.Document.Body.innerHTML, "7/25/2012")
text = Mid(ie.Document.Body.innerHTML, startS, endS - startS)
MsgBox text
If I were trying to pull the opening price off from 08/10/12 off of that page, which is similar to what I assume you are doing, I'd do something like this:
Set ie = New InternetExplorer
With ie
.navigate "http://eoddata.com/stockquote/NASDAQ/AAPL.htm"
.Visible = False
While .Busy Or .readyState <> READYSTATE_COMPLETE
DoEvents
Wend
Set objHTML = .document
DoEvents
End With
Set elementONE = objHTML.getElementsByTagName("TD")
For i = 1 To elementONE.Length
elementTWO = elementONE.Item(i).innerText
If elementTWO = "08/10/12" Then
MsgBox (elementONE.Item(i + 1).innerText)
Exit For
End If
Next i
DoEvents
ie.Quit
DoEvents
Set ie = Nothing
You can modify this to run through the HTML and pull whatever data you want. Iteration +2 would return the high price, etc.
Since there are a lot of dates on that page you might also want to make it check that it is between the Recent End of Day Prices and the Company profile.