In order to improve the repeatitive work, I tried to access the Web site which is using in company using VBA.
So, I made code using VBA. And I checked it could be access the normal site such as google, youtube...
But, I don't know why it could not be access the company site.
VBA stopped this line
Set HTMLDoc = IE_ctrl.document
Thank you in advanced.
And I checked one different things(VBA Local values, type) between Normal and company site.
please check below 2 pictures.
Sub a()
Dim IE_ctrl As InternetExplorer
Dim HTMLDoc As HTMLDocument
Dim input_Data As IHTMLElement
Dim URL As String
URL = "https://www.google.com"
Set IE_ctrl = New InternetExplorer
IE_ctrl.Silent = True
IE_ctrl.Visible = True
IE_ctrl.navigate URL
Wait_Browser IE_ctrl
Set HTMLDoc = IE_ctrl.document
Wait_Browser IE_ctrl
Set input_Data = HTMLDoc.getElementsByClassName("text").Item
input_Data.Click
End Sub
Sub Wait_Browser(Browser As InternetExplorer, Optional t As Integer = 1)
While Browser.Busy
DoEvents
Wend
Application.Wait DateAdd("s", t, Now)
End Sub
Normal site(operating well.)
enter image description here
Company site(operating error.)
enter image description here
You can try the following code. Please read the comments. I can't say anymore because I don't know the page or the html of the page.
Sub a()
'Use late binding for what you need
Dim ie As Object
Dim nodeInputData As Object
Dim url As String
url = "https://www.google.com"
'Use the windows GUID to initialize the Internet Explorer, if you
'want to get access to a company page. This helps if there are
'security rules you can't access over other ways of initializing IE
'This don't work in most cases for pages in the "real" web
'Read here for more infos:
'https://blogs.msdn.microsoft.com/ieinternals/2011/08/03/default-integrity-level-and-automation/
Set ie = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
ie.Visible = True
ie.navigate url
'Waiting for the document to load
Do Until ie.readyState = 4: DoEvents: Loop
'If necessary, if there is dynamic content that must be loaded,
'after the ie reports, loading was ready
'(The last three values are: hours, minutes, seconds)
Application.Wait (Now + TimeSerial(0, 0, 1))
'I don't know your html. If you only want to click a button,
'you don't need a varable
'ie.document.getElementsByClassName("text")(0).Click
'will do the same like
Set nodeInputData = ie.document.getElementsByClassName("text")(0)
nodeInputData.Click
'A short explanation of getElementsByClassName() and getElementsByTagName():
'Both methods create a node collection of all html elements that was found
'by the creteria in the brackets. This is because there can be any number of
'html elements with specified class names or tag names. If, for example,
'3 html elements with the class name "Text" were found, a node collection
'with three elements is created by getElementsByClassName("Text").
'These have the indices 0 to 2, as in an array. The individual elements are
'also addressed via these indices. They are indicated in round brackets.
End Sub
Related
I want to make VBA code to search on a website on the basis of input made in the first column. Range is from A1 to A102. This code is working fine except one thing: It copies my data from Excel Cell and then paste it in the Search box of website. But it doesn't click the search button Automatically. I welcome any good Suggestions from Experts.
I know how to scrape data from websites but there is a specific class for this searchbox button. What would be this class I should use to made click? This question is relatable to both VBA and javascript/html Experts.
I am getting this as button ID " nav-search-submit-text " and this code as `Class " nav-search-submit-text nav-sprite ", when I click on Inspect element.
Both don't work?
Thanks
Private Sub worksheet_change(ByVal target As Range)
If Not Intersect(target, Range("A1:A102")) Is Nothing Then
Call getdata
End If
End Sub
Sub getdata()
Dim i As Long
Dim URL As String
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
'Set IE.Visible = True to make IE visible, or False for IE to run in the background
IE.Visible = True
URL = "https://www.amazon.co.uk"
'Navigate to URL
IE.Navigate URL
'making sure the page is done loading
Do
DoEvents
Loop Until IE.ReadyState = 4
'attempting to search date based on date value in cell
IE.Document.getElementById("twotabsearchtextbox").Value = ActiveCell.Value
'Sheets("Sheet1").Range("A1:A102").Text
'Select the date picker box and press Enter to 'activate' the new date
IE.Document.getElementById("twotabsearchtextbox").Select
'clicking the search button
IE.Document.getElementsByClassName("nav-sprite").Click
'Call nextfunction
End Sub
To use web scraping with Excel, you must be able to use both VBA and HTML. Additionally CSS and at least some JS. Above all, you should be familiar with the DOM (Document Object Model). Only with VBA or only with HTML you will not get far.
It's a mystery to me why you want to do it in a complicated way when you can do it simply via the URL. For your solution you have to use the class nav-input. This class exists twice in the HTML document. The search button is the element with the second appearance of nav-input. Since the indices of a NodeCollection start at 0, you have to click the element with index 1.
Sub getdata()
Dim URL As String
Dim IE As Object
URL = "https://www.amazon.co.uk"
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True ' True to make IE visible, or False for IE to run in the background
IE.Navigate URL 'Navigate to URL
'making sure the page is done loading
Do: DoEvents: Loop Until IE.ReadyState = 4
'attempting to search date based on date value in cell
IE.Document.getElementById("twotabsearchtextbox").Value = ActiveCell.Value
'clicking the search button
IE.Document.getElementsByClassName("nav-input")(1).Click
End Sub
Edit: Solution to open offer with known ASIN
You can open an offer on Amazon webpage directly if you know the ASIN. To use the ASIN in the active cell in the URL (this does not work reliably. If you have to press Enter to finish the input, the active cell is the one under the desired one), it can be passed as a parameter to the Sub() getdata():
Private Sub worksheet_change(ByVal target As Range)
If Not Intersect(target, Range("A1:A102")) Is Nothing Then
Call getdata(ActiveCell.Value)
End If
End Sub
In the Sub() getdata() the URL with the transferred ASIN is then called:
Sub getdata(searchTerm As String)
Dim URL As String
Dim IE As Object
'Use the right base url
URL = "https://www.amazon.co.uk/dp/" & searchTerm
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True ' True to make IE visible, or False for IE to run in the background
IE.Navigate URL 'Navigate to URL
'making sure the page is done loading
Do: DoEvents: Loop Until IE.ReadyState = 4
End Sub
It's also possible to do that all in the worksheet_change event of the worksheet (Include getting price and offer title):
Private Sub worksheet_change(ByVal target As Range)
If Not Intersect(target, Range("A1:A102")) Is Nothing Then
With CreateObject("InternetExplorer.Application")
.Visible = True ' True to make IE visible, or False for IE to run in the background
.Navigate "https://www.amazon.co.uk/dp/" & ActiveCell 'Navigate to URL
'making sure the page is done loading
Do: DoEvents: Loop Until .ReadyState = 4
'Get Price
ActiveCell.Offset(0, 1).Value = .document.getElementByID("priceblock_ourprice").innertext
'Get offer title
ActiveCell.Offset(0, 2).Value = .document.getElementByID("productTitle").innertext
End With
End If
End Sub
I have a problem with the VBA code.
I would like to open this website: https://www.tnt.com/express/en_us/site/tracking.html and in Shipment numbers search box I would like to put active cells from Excel file. At the beginning I tried to put only a specific text for example: "777777".
I wrote the below code but unfortunately, the search button is empty and there is no error. I tried everything and I have no idea what should I change in my code.
Any clues? Thank you in advance.
HTML:
<input class="__c-form-field__text ng-touched ng-dirty ng-invalid" formcontrolname="query" pbconvertnewlinestocommasonpaste="" pbsearchhistorynavigation="" shamselectalltextonfocus="" type="search">
VBA:
Sub TNT2_tracker()
Dim objIE As InternetExplorer
Dim aEle As HTMLLinkElement
Dim y As Integer
Dim result As String
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.navigate "https://www.tnt.com/express/en_us/site/tracking.html"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Dim webpageelement As Object
For Each webpageelement In objIE.document.getElementsByClassName("input")
If webpageelement.Class = "__c-form-field__text ng-pristine ng-invalid ng-touched" Then
webpageelement.Value = "777"
End If
Next webpageelement
End Sub
You could use the querySelector + class name to find an element.
something like
'Find the input box
objIE.document.querySelector("input.__c-form-field__text").value = "test"
'Find the search button and do a click
objIE.document.querySelector("button.__c-btn").Click
No need to loop through elements. Unless the site allows you to search multiple tracking numbers at the same time.
It seems automating this page is a litte tricky. If you change the value of the input field it doesn' t work. Nothing happen by clicking the submit button.
A look in the dom inspector shows several events for the input field. I checked them out and it seems we need to paste the value over the clipboard by trigger the paste event of the shipping field.
In order for this to work without Internet Explorer prompting, its security settings for the Internet zone must be set to allow pasting from the clipboard. I'm using a German version of IE, so I have problems explaining how to find the setting.
This macro works for me:
Sub TNT2_tracker()
Dim browser As Object
Dim url As String
Dim nodeDivWithInputField As Object
Dim nodeInputShipmentNumber As Object
Dim textToClipboard As Object
'Dataobject by late binding to use the clipboard
Set textToClipboard = CreateObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
url = "https://www.tnt.com/express/en_us/site/tracking.html"
'Initialize Internet Explorer, set visibility,
'call URL and wait until page is fully loaded
Set browser = CreateObject("internetexplorer.application")
browser.Visible = True
browser.navigate url
Do Until browser.ReadyState = 4: DoEvents: Loop
'Manual break for loading the page complitly
'Application.Wait (Now + TimeSerial(pause_hours, pause_minutes, pause_seconds))
Application.Wait (Now + TimeSerial(0, 0, 3))
'Get div element with input field for shipment number
Set nodeDivWithInputField = browser.Document.getElementsByClassName("pb-search-form-input-group")(0)
If Not nodeDivWithInputField Is Nothing Then
'If we got the div element ...
'First child element is the input field
Set nodeInputShipmentNumber = nodeDivWithInputField.FirstChild
'Put shipment number to clipboard
textToClipboard.setText "7777777"
textToClipboard.PutInClipboard
'Insert value by trigger paste event of the input field
Call TriggerEvent(browser.Document, nodeInputShipmentNumber, "paste")
'Click button
browser.Document.getElementsByClassName("__c-btn")(0).Click
Else
MsgBox "No input field for shipment number found."
End If
End Sub
And this function to trigger a html event:
Private Sub TriggerEvent(htmlDocument As Object, htmlElementWithEvent As Object, eventType As String)
Dim theEvent As Object
htmlElementWithEvent.Focus
Set theEvent = htmlDocument.createEvent("HTMLEvents")
theEvent.initEvent eventType, True, False
htmlElementWithEvent.dispatchEvent theEvent
End Sub
As #Stavros Jon alludes to..... there is a browserless way using xhr GET request via API. It returns json and thus you ideally need to use a json parser to handle the response.
I use jsonconverter.bas as the json parser to handle the response. Download raw code from here and add to standard module called JsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.
Example request with dummy tracking number (deliberately passed as string):
Option Explicit
Public Sub TntTracking()
Dim json As Object, ws As Worksheet, trackingNumber As String
trackingNumber = "1234567" 'test input value. Currently this is not a valid input but is for demo.
Set ws = ThisWorkbook.Worksheets("Sheet1") 'for later use if writing something specific out
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.tnt.com/api/v3/shipment?con=" & trackingNumber & "&searchType=CON&locale=en_US&channel=OPENTRACK", False
.send
Set json = JsonConverter.ParseJson(.responseText)
End With
'do something with results
Debug.Print json("tracker.output")("notFound").Count > 0
Debug.Print JsonConverter.ConvertToJson(json("tracker.output")("notFound"))
End Sub
I want to automate my delivery status for my regular courier from various service provider like Blue Dart.
I have Docket Numbers; I tried the same using VBA but it is unable to fetch data from webpage.
My code enter the Docket number from cell in home page, then it redirects to other page where delivery status is mentioned in table.
Sub GetCourseList()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
Dim IEWindows As SHDocVw.ShellWindows
Dim IEwindow As SHDocVw.InternetExplorer
Dim IEDocument As MSHTML.HTMLDocument
Dim BreadcrumbDiv As MSHTML.HTMLElementCollection
Set IEWindows = New SHDocVw.ShellWindows
'create new instance of IE. use reference to return current open IE if
'you want to use open IE window. Easiest way I know of is via title bar.
IE.Navigate "http://www.bluedart.com/maintracking.html"
'go to web page listed inside quotes
IE.Visible = True
While IE.busy
DoEvents 'wait until IE is done loading page.
Wend
IE.Document.All("numbers").Value = ThisWorkbook.Sheets("sheet1").Range("A1")
Application.SendKeys "~"
Dim URL As String
Dim qt As QueryTable
Dim ws As Worksheet
Set ws = Worksheets.Add
For Each IEwindow In IEWindows
If InStr(IEwindow.LocationURL, "your URL or some unique string") <> 0 Then ' Found it
Set IEDocument = IEwindow.Document
URL = IEwindow.LocationURL
Set qt = ws.QueryTables.Add( _
Connection:="URL;" & URL, _
Destination:=Range("F1"))
With qt
.RefreshOnFileOpen = True
.Name = "bluedart"
.FieldNames = True
.WebSelectionType = xlAllTables
.Refresh BackgroundQuery:=False
End With
End If
Next
End Sub
Your code does not attempt to interact with a page generated after entering Docket Number and confirming in any way. It could be done by:
Emulating browser interaction, can be Internet Explorer: click "Go" element on the page after Docket Number has been entered and use:
While IE.Busy Or IE.Readystate <> 4
DoEvents
Wend
It can also be achieved by creating POST request with proper parameters, including Docket Number.
Even after this is achieved, it still won't be possible to get data by query from this page, as its URL is this:
http://www.bluedart.com/servlet/RoutingServlet
Try to open this link. Nothing will display, because content of this URL is generated via POST method and parameters needed to generate content properly are not included in URL.
Instead of query, data can be accessed via finding HTML elements, such as tables, in HTML document for both methods I've mentioned.
In order to fix the following code, I tried to split it up into a smaller part. So, I have the following code that drives me crazy for hours in Sheet1:
Sub Scrapping_Data()
Dim IE As Object, EURUSD1 As String, EURUSD2 As String
Application.ScreenUpdating = False
Range("A:B").Clear
Set IE = CreateObject("internetexplorer.application")
With IE
.Navigate "http://uk.investing.com/currencies/streaming-forex-rates-majors"
.Visible = False
End With
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Set FOREX = IE.document.getElementById("pair_1")
EURUSD1 = FOREX.Cells(1).innerHTML
EURUSD2 = FOREX.Cells(2).innerHTML
IE.Quit
Set IE = Nothing
Range("A1").Value = EURUSD1
Range("B1").Value = EURUSD2
End Sub
I run it for the first time and it worked fine. But when I run it for the second time, the error the run-time error '91' occurred. So I clicked F8, but nothing happened the code worked fine and I checked Sheet1 there were values in Cells(1,1) and Cells(1,2). I then run it again and the error the run-time error '13' occurred this time. Again I clicked F8, but nothing happened the code worked fine. When I kept running the code, the errors still occurred and clicking F8 didn't help to find the problem. What is wrong with my code? How to fix it?
What I don't get it here too is my laptop is getting slow every time I run the code and I have to manually restart it many times.
The following requires that you go into the VBE's Tools ► References and place checkmarks beside Microsoft HTML Object library and Microsoft XML v6.0.
This is an xmlhttprewuest equivalent of an Internet Explorer object web scrape to the same URL.
Option Explicit
Sub tournamentFixtures()
'declare the objects with early binding
Dim htmlBDY As New HTMLDocument, xmlHTTP As New MSXML2.XMLHTTP60
'declare the regular variables
Dim sURL As String, ws As Worksheet
'set a var object to the destination worksheet
Set ws = Worksheets("Sheet1")
'assign the URL to a string var
sURL = "http://uk.investing.com/currencies/streaming-forex-rates-majors"
'isolate all commands to the MSXML2.XMLHTTP60 object
With xmlHTTP
'initiate the URL
.Open "GET", sURL, False
'set hidden header information
.setRequestHeader "User-Agent", "XMLHTTP/1.0"
'get the page data
.send
'safety check to make sure we got the web page's data
If .Status <> 200 Then GoTo bm_safe_Exit
'if here you got the page data - copy it to the local var
htmlBDY.body.innerHTML = .responseText
End With
'localize all commands to the page data
With htmlBDY
'check if the element ID exists
If Not .getElementById("pair_1") Is Nothing Then
'it exists - get the data directly to the worksheet
With .getElementById("pair_1")
ws.Range("A1") = .Cells(1).innerText
ws.Range("B1") = .Cells(2).innerText
End With
Else
'it doesn't exist - bad page data
MsgBox "there is no 'pair_1' on this page"
End If
End With
bm_safe_Exit:
'clean up all of the objects that were instantiated
Set htmlBDY = Nothing: Set xmlHTTP = Nothing: Set ws = Nothing
End Sub
I have commented virtually every line so you can follow what is happening. This may need some tweaking. I ran it ~40 times and it failed once but that could have been my own Internet connection. Consider this a starting point where you can do your own research to accomplish your goals. If you continue to have problems with this new code, please do not paste this into another question and ask why it doesn't work without doing some research and attempting a solution yourself. StackOverflow is a site for professional and enthusiast programmers.
I gave up trying to offer solutions to web scraping problems because page technology changes too fast to keep up on a peripheral level. You have to be involved in the immediate changes to be able to respond to them quickly and my own interests lie elsewhere. I responded to this request because you actually supplied the URL to test against (something few people asking questions actually think is important - go figure) and I thought the static dimming of the var would help.
The construction and destruction of an InternetExplorer object takes time; up to a few seconds even on the fastest sytems. You can wait an appropriate amount of time for it to relinquish all of the .DLLs et al it has loaded or you can declare your IE as a static object that will be reused on subsequent reruns of the sub procedure.
Option Explicit
Sub Scrapping_Data()
Static IE As Object
Dim EURUSD1 As String, EURUSD2 As String
Application.ScreenUpdating = False
With Worksheets("Sheet1") 'KNOW what worksheet you are on!!!!!
.Range("A:B").Clear
End With
If IE Is Nothing Then
Set IE = CreateObject("internetexplorer.application")
With IE
.Visible = True
'.Visible = False
.Silent = True
End With
End If
With IE
.Navigate "http://uk.investing.com/currencies/streaming-forex-rates-majors"
Do While .ReadyState <> 4: DoEvents: Loop
With .document.getElementById("pair_1")
EURUSD1 = .Cells(1).innerHTML
EURUSD2 = .Cells(2).innerHTML
End With
End With
With Worksheets("Sheet1") 'KNOW what worksheet you are on!!!!!
.Range("A1") = EURUSD1
.Range("B1") = EURUSD2
End With
IE.Navigate "about:blank"
End Sub
The caveat here is that you will have to destruct the InternetExplorer object yourself at some point in the future. Closing the workbook will close the VBA project but leave the IE object 'orphaned'.
Given all of the HTML5 debris that comes along with that web page, have you considered moving to xmlhttprequest? And if you are wondering then yes, that would be a new question under a different set of [tags].
I am trying to scrap data out of a section of a webpage. To get into the section I need to fill in a captcha security code and hit a button, but that is alright because the security code is actually written in the html of the page. So, I am creating an IE object, driving it to the webpage, getting the captcha security code, writing it in the proper box, hitting the submit button and then getting the html document so I can scrap data out of it.
Nonetheless I am executing the steps exatcly in the order I mentioned, it seems that the html document that is being gotten is not the one from the page after I pass through the captcha validation, but from the page before the captcha validation.
Would anyone know what must I do to get the correct html document and conseuently be able to scrap the data I really want? Thank you.
The subprocedure's code follows next:
'Getting National fuel prices from ANP
Sub subANPNationalFuelPrices()
'Creating variables for the URL and the HTML files
Dim urlANP As String: urlANP = "http://www.anp.gov.br/preco/prc/Resumo_Semanal_Index.asp"
Dim htmlANP1 As HTMLDocument
'Creating the IE object
Dim IE As InternetExplorer
Set IE = New InternetExplorer
IE.Visible = True
'Making sure that the webpage is fully load
IE.navigate (urlANP)
Do While IE.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Getting your data"
DoEvents
Loop
Set htmlANP1 = IE.document
'Getting the Captcha Password
Dim strCaptchaPassword As String
Dim colMyCollection As IHTMLElementCollection
Set colMyCollection = htmlANP1.getElementById("divQuadro").all
Dim objLabel As IHTMLElement
For Each objLabel In colMyCollection
strCaptchaPassword = strCaptchaPassword & objLabel.innerText
Next objLabel
'Getting the input box object and getting it the correct password
Dim objInputBox As IHTMLElement
Set objInputBox = htmlANP1.getElementById("txtValor")
objInputBox.Value = strCaptchaPassword
'Getting the submit button object and clicking it
Dim objInputButton As IHTMLElement
Set objInputButton = htmlANP1.getElementById("image1")
objInputButton.Click
'Getting the true rich data HTML
Set htmlANP1 = IE.document
'Extracting the data from the html document
Dim rngValues As range: Set rngValues = Sheet1.range("B17")
Dim strValues(35) As String
Dim dblValues(35) As Double
Dim objElement1 As IHTMLElement
Set objElement1 = htmlANP1.getElementsByTagName("TABLE")(1)
Dim colCollection1 As IHTMLElementCollection
Set colCollection1 = objElement1.all
Dim intTempCount As Integer
Dim objTempElement As IHTMLElement
intTempCount = 32
For Each objTempElement In colCollection1
Sheet1.Cells(intTempCount, 3) = objTempElement.tagName
Sheet1.Cells(intTempCount, 4) = objTempElement.innerText
intTempCount = intTempCount + 1
Next objTempElement
End sub
You are not waiting for the new webpage to load after clicking the button on the captcha. Either check the ready state of IE again or end you code here be starting a timer which starts your code off again in X seconds AND then checks the ready state of IE and Document.
I do scraping on a system using iFrame so using IE.Readystate isn't very reliable. Usually I have to wait for another element to 'exist', but using IsObject(element) hasn't been very reliable either. What I've had to do is use a loop in my main code that calls a function so if I'm waiting for something to load and I know that after the page loads, there's an element with the ID "UserName", then I do this..
...
Do Until IsErr(doc, "UserName") = False: Loop
...
Function IsErr(doc As HTMLDocument, ID As String) As Boolean
IsErr = True
On Error GoTo ExitFunction:
Debug.Print left(doc.getElementById(ID).innerHTML, 1)
IsErr = False
Exit Function
ExitFunction:
End Function
I could just do a loop statement that keeps trying to debug it, but that would be a nightmare with the error handling so if you use a separate function for the printing, it can exit the function after the error, then the loop re-initiates the function and it will do this forever until the next element exists.