Getting data from HTML source in VBA (excel) - html

I'm trying to collect data from a website, which should be manageable once the source is in string form. Looking around I've assembled some possible solutions but have run into problems with all of them:
Use InternetExplorer.Application to open the url and then access the inner HTML
Inet
use Shell command to run wget
Here are the problems I'm having:
When I store the innerHTML into a string, it's not the entire source, only a fraction
ActiveX does not allow the creation of the Inet object (error 429)
I've got the htm into a folder on my computer, how do I get it into a string in VBA?
Code for 1:
Sub getData()
Dim url As String, ie As Object, state As Integer
Dim text As Variant, startS As Integer, endS As Integer
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 0
url = "http://www.eoddata.com/stockquote/NASDAQ/AAPL.htm"
ie.Navigate url
state = 0
Do Until state = 4
DoEvents
state = ie.readyState
Loop
text = ie.Document.Body.innerHTML
startS = InStr(ie.Document.Body.innerHTML, "7/26/2012")
endS = InStr(ie.Document.Body.innerHTML, "7/25/2012")
text = Mid(ie.Document.Body.innerHTML, startS, endS - startS)
MsgBox text

If I were trying to pull the opening price off from 08/10/12 off of that page, which is similar to what I assume you are doing, I'd do something like this:
Set ie = New InternetExplorer
With ie
.navigate "http://eoddata.com/stockquote/NASDAQ/AAPL.htm"
.Visible = False
While .Busy Or .readyState <> READYSTATE_COMPLETE
DoEvents
Wend
Set objHTML = .document
DoEvents
End With
Set elementONE = objHTML.getElementsByTagName("TD")
For i = 1 To elementONE.Length
elementTWO = elementONE.Item(i).innerText
If elementTWO = "08/10/12" Then
MsgBox (elementONE.Item(i + 1).innerText)
Exit For
End If
Next i
DoEvents
ie.Quit
DoEvents
Set ie = Nothing
You can modify this to run through the HTML and pull whatever data you want. Iteration +2 would return the high price, etc.
Since there are a lot of dates on that page you might also want to make it check that it is between the Recent End of Day Prices and the Company profile.

Related

VBA code to scrape data using html/javascript won't work

I want to make VBA code to search on a website on the basis of input made in the first column. Range is from A1 to A102. This code is working fine except one thing: It copies my data from Excel Cell and then paste it in the Search box of website. But it doesn't click the search button Automatically. I welcome any good Suggestions from Experts.
I know how to scrape data from websites but there is a specific class for this searchbox button. What would be this class I should use to made click? This question is relatable to both VBA and javascript/html Experts.
I am getting this as button ID " nav-search-submit-text " and this code as `Class " nav-search-submit-text nav-sprite ", when I click on Inspect element.
Both don't work?
Thanks
Private Sub worksheet_change(ByVal target As Range)
If Not Intersect(target, Range("A1:A102")) Is Nothing Then
Call getdata
End If
End Sub
Sub getdata()
Dim i As Long
Dim URL As String
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
'Set IE.Visible = True to make IE visible, or False for IE to run in the background
IE.Visible = True
URL = "https://www.amazon.co.uk"
'Navigate to URL
IE.Navigate URL
'making sure the page is done loading
Do
DoEvents
Loop Until IE.ReadyState = 4
'attempting to search date based on date value in cell
IE.Document.getElementById("twotabsearchtextbox").Value = ActiveCell.Value
'Sheets("Sheet1").Range("A1:A102").Text
'Select the date picker box and press Enter to 'activate' the new date
IE.Document.getElementById("twotabsearchtextbox").Select
'clicking the search button
IE.Document.getElementsByClassName("nav-sprite").Click
'Call nextfunction
End Sub
To use web scraping with Excel, you must be able to use both VBA and HTML. Additionally CSS and at least some JS. Above all, you should be familiar with the DOM (Document Object Model). Only with VBA or only with HTML you will not get far.
It's a mystery to me why you want to do it in a complicated way when you can do it simply via the URL. For your solution you have to use the class nav-input. This class exists twice in the HTML document. The search button is the element with the second appearance of nav-input. Since the indices of a NodeCollection start at 0, you have to click the element with index 1.
Sub getdata()
Dim URL As String
Dim IE As Object
URL = "https://www.amazon.co.uk"
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True ' True to make IE visible, or False for IE to run in the background
IE.Navigate URL 'Navigate to URL
'making sure the page is done loading
Do: DoEvents: Loop Until IE.ReadyState = 4
'attempting to search date based on date value in cell
IE.Document.getElementById("twotabsearchtextbox").Value = ActiveCell.Value
'clicking the search button
IE.Document.getElementsByClassName("nav-input")(1).Click
End Sub
Edit: Solution to open offer with known ASIN
You can open an offer on Amazon webpage directly if you know the ASIN. To use the ASIN in the active cell in the URL (this does not work reliably. If you have to press Enter to finish the input, the active cell is the one under the desired one), it can be passed as a parameter to the Sub() getdata():
Private Sub worksheet_change(ByVal target As Range)
If Not Intersect(target, Range("A1:A102")) Is Nothing Then
Call getdata(ActiveCell.Value)
End If
End Sub
In the Sub() getdata() the URL with the transferred ASIN is then called:
Sub getdata(searchTerm As String)
Dim URL As String
Dim IE As Object
'Use the right base url
URL = "https://www.amazon.co.uk/dp/" & searchTerm
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True ' True to make IE visible, or False for IE to run in the background
IE.Navigate URL 'Navigate to URL
'making sure the page is done loading
Do: DoEvents: Loop Until IE.ReadyState = 4
End Sub
It's also possible to do that all in the worksheet_change event of the worksheet (Include getting price and offer title):
Private Sub worksheet_change(ByVal target As Range)
If Not Intersect(target, Range("A1:A102")) Is Nothing Then
With CreateObject("InternetExplorer.Application")
.Visible = True ' True to make IE visible, or False for IE to run in the background
.Navigate "https://www.amazon.co.uk/dp/" & ActiveCell 'Navigate to URL
'making sure the page is done loading
Do: DoEvents: Loop Until .ReadyState = 4
'Get Price
ActiveCell.Offset(0, 1).Value = .document.getElementByID("priceblock_ourprice").innertext
'Get offer title
ActiveCell.Offset(0, 2).Value = .document.getElementByID("productTitle").innertext
End With
End If
End Sub

Navigating In IE Using Excel VBA By ID

Problem: Attempting to copy/paste information from Excel into a webpage and then push button. Will eventually need to get this all the way to having it print a PDF into a folder. I attempted the code below, but am completely unsure why it is not working. Google didn't resolve.
Website: https://www.easymapmaker.com/advanced
Attempted Code:
Sub MapMacro()
AddressGrid = Range("A1").Value
Set IE = CreateObject("InternetExplorer.Application")
WebSite = "https://www.easymapmaker.com/advanced"
With IE
.Visible = True
.navigate WebSite
Do While IE.Busy Or IE.readyState <> 4
DoEvents
Loop
On Error Resume Next 'This is here in case fields cant be found.
Set Address = IE.Document.getElementsByID("sourceData")
Address.Value = AddressGrid
Set Element = IE.Document.getElementsByID("optionButton")
Element.Click
Do While IE.Busy Or IE.readyState <> 4
DoEvents
Loop
End With
End Sub
TO clarify, I got this working by changing the lines:
Set Address = IE.Document.getElementsByID("sourceData")
Address.Value = AddressGrid
to:
IE.Document.getElementByID("sourceData").Value = AddressGrid
Now, since AddressGrid is only referring to one cell but you want a range, say "A1:D4" you could try this method:
Dim clipTXT As MSForms.DataObject
Range("A1:D4").Copy
Set clipTXT = New MSForms.DataObject
clipTXT.GetFromClipboard
IE.Document.getElementByID("sourceData").Value = clipTXT.GetTxt

VBA Scraping with IE11 - HTML Elements not indexing in VBA Local - Call javascript instead?

I've been scraping away for the last couple months with no issues, but now I have a site I'd like to access that is a bit more complicated than simple HTML. When I inspect elements, I can find the tag, ID, and name of an input box i'd like to fill, but when I run the code, there are no elements within the VBA locals window for my ie object.
I believe this is due to AJAX, but i'm too novice with ajax/javascript to really know for sure.
This is a screenshot of the input box i'd like to fill out:
and this is the code I'm using:
Sub Findie()
Dim ie As Object
Dim userName As Variant
Dim passWord As Variant
Dim strHTML As String
Dim sht As Worksheet
Dim itemRng As Range
Dim testObj As Object
Set sht = Sheets("Sheet1")
Set itemRng = sht.Range("A2")
userName = "***#***.com"
passWord = "********"
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
strHTML = "http://[MYWEBSITE].com"
ie.Navigate strHTML
'wait for browser
While ie.ReadyState <> READYSTATE_COMPLETE
DoEvents
Wend
Application.Wait (Now + #12:00:03 AM#)
ie.Document.getElementsByName("username")(0).Value = userName
ie.Document.getElementsByName("password")(0).Value = passWord
ie.Document.getElementsByClassName("submit")(0).Click
While ie.ReadyState <> READYSTATE_COMPLETE
DoEvents
Wend
Application.Wait (Now + #12:00:01 AM#)
ie.Navigate "[MYWEBSITE]" 'after logging in, navigating to this page saves time
While ie.ReadyState <> READYSTATE_COMPLETE
DoEvents
Wend
ie.Document.getElementsByName("productId")(0).Value = "12345"
This is not an issue with the page loading either, it does not work when I step through.
I'm thinking it will be much easier if I can actually run the scripts that are on the server, but I've never done that using VBA. I'm not really sure how. Here is a screenshot of the scripts available on the server. I don't really know where to go from here, or which method is ideal.

Cycling Through List of URLs Using Excel VBA

I am much more familiar with Excel now, but one thing is still baffling me - how to cycle through URLs in a loop. My current conundrum is that I have this list of URLs of packages, and need to obtain the status of each package on each page using its HTML. What I currently have to cycle through the list is:
Sub TrackingDeliveryStatusResults()
Dim IE As Object
Dim URL As Range
Dim wb1 As Workbook, ws1 As Worksheet
Dim filterRange As Range
Dim copyRange As Range
Dim lastRow As Long
Set wb1 = Application.Workbooks.Open("\\S51\******\Folders\******\TrackingDeliveryStatus.xls")
Set ws1 = wb1.Worksheets("TrackingDeliveryStatusResults")
Set IE = New InternetExplorer
With IE
.Visible = True
For Each URL In Range("C2:C & lastRow")
.Navigate URL.Value
While .Busy Or .ReadyState <> 4: DoEvents: Wend
MsgBox .Document.body.innerText
Next
End With
End Sub
And the list of URLs
My goal here is:
Cycle through each URL (inserts URL in IE and keeps going without opening new tabs)
Obtain the status of the item for each URL from the HTML element
FedEx: Delivered (td class="status")
UPS: Delivered (id="tt_spStatus")
USPS: Arrived at USPS Facility (class= "info-text first)
Finish the loop and save as a csv if at all possible (I've already done that, so I'm just posting the code portion I'm having a problem with).
My understanding is that I have to code a different if statement for each different url, since all of them have different HTML tags for their delivery status. Loops are simple, but to loop through webpages is new to me. The code has been throwing me errors no matter what changes I make.
The IE object opens up but then Excel hits an error and the code stops running.
OK Ill start with the proper syntax for you to get your code going and I will edit this answer for further code
Sub Sample()
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
Application.EnableEvents = True
Dim wsSheet As Worksheet, Rows As Long, links As Variant, IE As Object, link As Variant
Set wb = ThisWorkbook
Set wsSheet = wb.Sheets("Sheet1")
Set IE = New InternetExplorer
Rows = wsSheet.Cells(wsSheet.Rows.Count, "A").End(xlUp).Row
links = wsSheet.Range("A1:A" & Rows)
With IE
.Visible = True
For Each link In links
.navigate (link)
While .Busy Or .ReadyState <> 4: DoEvents: Wend
MsgBox .Document.body.innerText
Next link
End With
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
Application.EnableEvents = True
End Sub
This will get you looping I think you had some general syntax issues which you can see the difference in my code in order to loop through in the for each the link has to be of type object or variant and links I set to variant assuming it will default to a string

Interaction with button on webpage using excel vba working only in step debugging mode

I am trying to fetch the data from website & put it in the excel worksheet
Scenario:
Webpage contains button with id=btnAllRun.
When i click on the button,table is generated dynamically containing information in tr tags inside it.
Using macro i need to count no. of the such tr tags & put the count in worksheet.
Code:
Dim IE as Object
Sub Button_Click()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate URL
Do
DoEvents
Loop Until IE.ReadyState = 4 //waiting for the webpage to load
Set Elmt = IE.Document.getElementById("btnAllRun") //get button elmt for All Running
Elmt.Click //Clicking button
Do
DoEvents
Loop Until IE.ReadyState = 4
Set Tbl = IE.Document.getElementById("gvRunning") //gets table elmt containing tr tags
Sheets("XXX").Range("B36") = Tbl.Rows.Length - 1
When i am trying to run the macro i get 'Object Variable or with block not set' but when running the same macro in step debugging mode i get the correct results.
Any help on this would be greatly appreciated!!
If it works in debug mode then it means your DoEvents is not working as expected. In such a case I use a customized routine Wait. So what I am doing is forcing the code to wait for a specific amount of time and then continue. For slower systems you may have to increase the time.
Wait 2 basically pauses the code for 2 seconds.
Try this. This has worked for me in many occasions.
Dim IE As Object
Sub Button_Click()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate URL
Do While IE.ReadyState <> 4: DoEvents: Loop
Wait 2
Set Elmt = IE.Document.getElementById("btnAllRun")
Wait 2
Elmt.Click
Do While IE.ReadyState <> 4: DoEvents: Loop
Wait 2
Set Tbl = IE.Document.getElementById("gvRunning")
Sheets("XXX").Range("B36") = Tbl.Rows.Length - 1
'
'~~> Rest of the code
'
End Sub
Private Sub Wait(ByVal nSec As Long)
nSec = nSec + Timer
While nSec > Timer
DoEvents
Wend
End Sub