I have a problem with the VBA code.
I would like to open this website: https://www.tnt.com/express/en_us/site/tracking.html and in Shipment numbers search box I would like to put active cells from Excel file. At the beginning I tried to put only a specific text for example: "777777".
I wrote the below code but unfortunately, the search button is empty and there is no error. I tried everything and I have no idea what should I change in my code.
Any clues? Thank you in advance.
HTML:
<input class="__c-form-field__text ng-touched ng-dirty ng-invalid" formcontrolname="query" pbconvertnewlinestocommasonpaste="" pbsearchhistorynavigation="" shamselectalltextonfocus="" type="search">
VBA:
Sub TNT2_tracker()
Dim objIE As InternetExplorer
Dim aEle As HTMLLinkElement
Dim y As Integer
Dim result As String
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.navigate "https://www.tnt.com/express/en_us/site/tracking.html"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Dim webpageelement As Object
For Each webpageelement In objIE.document.getElementsByClassName("input")
If webpageelement.Class = "__c-form-field__text ng-pristine ng-invalid ng-touched" Then
webpageelement.Value = "777"
End If
Next webpageelement
End Sub
You could use the querySelector + class name to find an element.
something like
'Find the input box
objIE.document.querySelector("input.__c-form-field__text").value = "test"
'Find the search button and do a click
objIE.document.querySelector("button.__c-btn").Click
No need to loop through elements. Unless the site allows you to search multiple tracking numbers at the same time.
It seems automating this page is a litte tricky. If you change the value of the input field it doesn' t work. Nothing happen by clicking the submit button.
A look in the dom inspector shows several events for the input field. I checked them out and it seems we need to paste the value over the clipboard by trigger the paste event of the shipping field.
In order for this to work without Internet Explorer prompting, its security settings for the Internet zone must be set to allow pasting from the clipboard. I'm using a German version of IE, so I have problems explaining how to find the setting.
This macro works for me:
Sub TNT2_tracker()
Dim browser As Object
Dim url As String
Dim nodeDivWithInputField As Object
Dim nodeInputShipmentNumber As Object
Dim textToClipboard As Object
'Dataobject by late binding to use the clipboard
Set textToClipboard = CreateObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
url = "https://www.tnt.com/express/en_us/site/tracking.html"
'Initialize Internet Explorer, set visibility,
'call URL and wait until page is fully loaded
Set browser = CreateObject("internetexplorer.application")
browser.Visible = True
browser.navigate url
Do Until browser.ReadyState = 4: DoEvents: Loop
'Manual break for loading the page complitly
'Application.Wait (Now + TimeSerial(pause_hours, pause_minutes, pause_seconds))
Application.Wait (Now + TimeSerial(0, 0, 3))
'Get div element with input field for shipment number
Set nodeDivWithInputField = browser.Document.getElementsByClassName("pb-search-form-input-group")(0)
If Not nodeDivWithInputField Is Nothing Then
'If we got the div element ...
'First child element is the input field
Set nodeInputShipmentNumber = nodeDivWithInputField.FirstChild
'Put shipment number to clipboard
textToClipboard.setText "7777777"
textToClipboard.PutInClipboard
'Insert value by trigger paste event of the input field
Call TriggerEvent(browser.Document, nodeInputShipmentNumber, "paste")
'Click button
browser.Document.getElementsByClassName("__c-btn")(0).Click
Else
MsgBox "No input field for shipment number found."
End If
End Sub
And this function to trigger a html event:
Private Sub TriggerEvent(htmlDocument As Object, htmlElementWithEvent As Object, eventType As String)
Dim theEvent As Object
htmlElementWithEvent.Focus
Set theEvent = htmlDocument.createEvent("HTMLEvents")
theEvent.initEvent eventType, True, False
htmlElementWithEvent.dispatchEvent theEvent
End Sub
As #Stavros Jon alludes to..... there is a browserless way using xhr GET request via API. It returns json and thus you ideally need to use a json parser to handle the response.
I use jsonconverter.bas as the json parser to handle the response. Download raw code from here and add to standard module called JsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.
Example request with dummy tracking number (deliberately passed as string):
Option Explicit
Public Sub TntTracking()
Dim json As Object, ws As Worksheet, trackingNumber As String
trackingNumber = "1234567" 'test input value. Currently this is not a valid input but is for demo.
Set ws = ThisWorkbook.Worksheets("Sheet1") 'for later use if writing something specific out
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.tnt.com/api/v3/shipment?con=" & trackingNumber & "&searchType=CON&locale=en_US&channel=OPENTRACK", False
.send
Set json = JsonConverter.ParseJson(.responseText)
End With
'do something with results
Debug.Print json("tracker.output")("notFound").Count > 0
Debug.Print JsonConverter.ConvertToJson(json("tracker.output")("notFound"))
End Sub
Related
In order to improve the repeatitive work, I tried to access the Web site which is using in company using VBA.
So, I made code using VBA. And I checked it could be access the normal site such as google, youtube...
But, I don't know why it could not be access the company site.
VBA stopped this line
Set HTMLDoc = IE_ctrl.document
Thank you in advanced.
And I checked one different things(VBA Local values, type) between Normal and company site.
please check below 2 pictures.
Sub a()
Dim IE_ctrl As InternetExplorer
Dim HTMLDoc As HTMLDocument
Dim input_Data As IHTMLElement
Dim URL As String
URL = "https://www.google.com"
Set IE_ctrl = New InternetExplorer
IE_ctrl.Silent = True
IE_ctrl.Visible = True
IE_ctrl.navigate URL
Wait_Browser IE_ctrl
Set HTMLDoc = IE_ctrl.document
Wait_Browser IE_ctrl
Set input_Data = HTMLDoc.getElementsByClassName("text").Item
input_Data.Click
End Sub
Sub Wait_Browser(Browser As InternetExplorer, Optional t As Integer = 1)
While Browser.Busy
DoEvents
Wend
Application.Wait DateAdd("s", t, Now)
End Sub
Normal site(operating well.)
enter image description here
Company site(operating error.)
enter image description here
You can try the following code. Please read the comments. I can't say anymore because I don't know the page or the html of the page.
Sub a()
'Use late binding for what you need
Dim ie As Object
Dim nodeInputData As Object
Dim url As String
url = "https://www.google.com"
'Use the windows GUID to initialize the Internet Explorer, if you
'want to get access to a company page. This helps if there are
'security rules you can't access over other ways of initializing IE
'This don't work in most cases for pages in the "real" web
'Read here for more infos:
'https://blogs.msdn.microsoft.com/ieinternals/2011/08/03/default-integrity-level-and-automation/
Set ie = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
ie.Visible = True
ie.navigate url
'Waiting for the document to load
Do Until ie.readyState = 4: DoEvents: Loop
'If necessary, if there is dynamic content that must be loaded,
'after the ie reports, loading was ready
'(The last three values are: hours, minutes, seconds)
Application.Wait (Now + TimeSerial(0, 0, 1))
'I don't know your html. If you only want to click a button,
'you don't need a varable
'ie.document.getElementsByClassName("text")(0).Click
'will do the same like
Set nodeInputData = ie.document.getElementsByClassName("text")(0)
nodeInputData.Click
'A short explanation of getElementsByClassName() and getElementsByTagName():
'Both methods create a node collection of all html elements that was found
'by the creteria in the brackets. This is because there can be any number of
'html elements with specified class names or tag names. If, for example,
'3 html elements with the class name "Text" were found, a node collection
'with three elements is created by getElementsByClassName("Text").
'These have the indices 0 to 2, as in an array. The individual elements are
'also addressed via these indices. They are indicated in round brackets.
End Sub
I'm new to web scraping and the HTML language.
I'm trying to write a code in VBA to extract data from the following website:
https://companies.govmu.org:4343/MNSOnlineSearch/
I have an Excel sheet with over 5000 company names and their respective "File No" in columns A and B respectively, and I need to input their "Status" (either "Live" or "Defunct") in column C. This will be done after searching for each company by "File No" and then extracting their status to the Excel sheet.
The issue is that I can't seem to get the element containing the data that I need.
I've already written the bit of code which will extract the "File No" from my Excel sheet, paste it on the webpage in the "File No" search box, and run the search. (You can try searching C5113, as an example).
However, on the resulting webpage, I've tried getting the element containing the data that I need, but it does not work.
For example, I tried to MsgBox (MsgBox is my personal way to check whether my variable contains the data I need) the inner HTML of the tag fieldset (fs) with ID "CompanyList" as shown in the code below, but it returns an error.
I've also tried with another variable named div, of data type HTMLDivElement, and then getting the element by ID "companies".
And finally, I've also tried looping through a variable of type IHTMLElementCollection to look for the element that I need, but it still does not show the element that I need (it shows other elements that I don't need).
Option Explicit
Sub ExtractStatusDetails()
Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim resultHtml As HTMLDocument
Dim fs As IHTMLElement
Dim searchBoxes As IHTMLElementCollection
Dim searchButton As Object
Dim homePage As String
homePage = "https://companies.govmu.org:4343/MNSOnlineSearch/"
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate homePage
Do While ie.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
Set html = ie.document
Set searchBoxes = html.getElementsByClassName("col-md-6 col-lg-4")
searchBoxes(0).innerHTML = Replace(searchBoxes(0).innerHTML, "placeholder", "value")
searchBoxes(0).innerHTML = Replace(searchBoxes(0).innerHTML, "Search company by File No...", "C63")
Set searchButton = searchBoxes(0).getElementsByClassName("btn btn-large btn-primary btn-raised")
searchButton(0).Click
Do While ie.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
Set resultHtml = ie.document
Set fs = resultHtml.getElementById("CompanyList")
MsgBox fs.innerHTML
ie.Quit
End Sub
The page does an xmlhttp POST request which retrieves data from a backend data store (likely Oracle GlassFish > JDBC API > data repository e.g. MySQL) . It returns all similar matches, possibly including exact.
You can find the POST request in the network traffic of browser dev tools after you enter the fileNo and press the search button.
Below is a function you can call in a loop over your fileNos to retrieve the company status
Option Explicit
Public Sub test()
Dim fileNo As String, xmlhttp As Object
fileNo = "C5113"
Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
MsgBox GetCompanyStatus(fileNo, xmlhttp)
End Sub
Public Function GetCompanyStatus(ByVal fileNo As String, ByVal xmlhttp As Object) As String
Dim html As HTMLDocument, body As String, fileNos As Object, i As Long
Set html = New HTMLDocument
body = "tabs=tab-1&searchByName=&searchByFileNo=PLACEHOLDER&submitCompanies=&searchByBusName=&searchByBRN=&searchByIncDateFrom=&searchByIncDateTo=&doAction=search"
With xmlhttp
.Open "POST", "https://companies.govmu.org:4343/MNSOnlineSearch/GetCompanies", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send Replace$(body, "PLACEHOLDER", fileNo)
html.body.innerHTML = .responseText
Set fileNos = html.querySelectorAll("td.tdFileNo")
If fileNos.Length > 0 Then
For i = 0 To fileNos.Length - 1
If fileNos.item(i).innerText = fileNo Then
GetCompanyStatus = html.querySelectorAll("td.tdStatus").item(i).innerText
Exit Function
End If
Next i
End If
GetCompanyStatus = "Not found"
End With
End Function
I would instead consider how you can group your requests. As you can post partial file numbers you could cut down on the number of requests considerably by doing in batches with partial file numbers e.g. search for C5 or C51; the backend then does something like "C5%" to return all matches starting with the specified string, and then loop those results searching for your file numbers of interest that fall within that range.
You could have a dictionary with fileNo as key and status as value and update this as you loop the results returned by a request. I think the number of keys is constrained by Long, so no problem I think for storing all your fileNos at the start, in a dictionary, and updating later during requests. You could even have multiple dictionaries that host ranges of fileNos, like the volumes of the good old Encyclopædia Britannica. This would limit loops to dicts you hope to populate from the same request, for example. Is something to play around with an explore.
With dynamic web pages that display a table of retrieved data, I’ve found that both MSXML2.XMLHTTP and the Internet Explorer object usually can’t access this data. A good example is https://www.tiff.net/tiff/films.html. Both techniques won’t retrieve any of the movie data – just the surrounding web page. The code I’ve tried is as follows:
Function getHTTP(ByVal sReq As String) As Variant
On Error GoTo onErr
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", sReq, False
.send
getHTTP = StrConv(.responseBody, 64)
End With
Exit Function
onErr: MsgBox "Error " & Err & ": " & Err.Description, 49, "Error opening site"
End Function
Function GetHTML(ByVal strURL As String) As Variant
Dim oIE As InternetExplorer
Dim hElm As IHTMLElement
Set oIE = New InternetExplorer
oIE.Navigate strURL
Do While (oIE.Busy Or oIE.ReadyState <> READYSTATE_COMPLETE)
DoEvents
Loop
Set hElm = oIE.Document.all.tags("html").Item(0)
GetHTML = hElm.outerHTML
Set oIE = Nothing
Set hElm = Nothing
End Function
But there is a way to manually retrieve the movie data – just follow these steps with Microsoft Edge or Internet Explorer:
Right-click on one of the movies
Choose “inspect element." The DevTools console opens.
At the bottom-left of the screen, click on the “html” tab.
Right-click the tab. Choose “copy.”
Open notepad and paste what you’ve copied.
You now have the movie data and can save it to a file for parsing. My question: Is there any way to get this data programmatically?
Why Json? Because the page is loaded using json data
To View: Use Google Chrome --> Press F12 --> Load URL --> Goto Network tab
Code:
Sub getHTTP()
Dim Url As String, data As String
Dim xml As Object, JSON As Object, colObj, item
Url = "https://www.tiff.net/data/films-events-2018.json?q=1513263947586"
Set xml = CreateObject("MSXML2.ServerXMLHTTP")
With xml
.Open "GET", Url, False
.send
data = .responseText
End With
Set JSON = JsonConverter.ParseJson(data)
Set colObj = JSON("items")
For Each item In colObj
Debug.Print item("title")
Debug.Print item("description")
For Each c1 In item("cast")
Debug.Print c1
Next
For Each c2 In item("countries")
Debug.Print c2
Next
Next
End Sub
Output
Installation of JsonConverter
Download the latest release
Import JsonConverter.bas into your project (Open VBA Editor, Alt + F11; File > Import File)
Add Dictionary reference/class
For Windows-only, include a reference to "Microsoft Scripting Runtime"
For Windows and Mac, include VBA-Dictionary
Tree View of Data
Here are the film titles using IE (you can use same process to get directors)
Option Explicit
Public Sub GetFilms()
Dim IE As New InternetExplorer, html As HTMLDocument, films As Object, i As Long
With IE
.Visible = True
.navigate "https://www.tiff.net/tiff/films.html"
While .Busy Or .readyState < 4: DoEvents: Wend
Set films = .document.querySelectorAll("[target=_self]")
For i = 0 To films.Length - 1
Debug.Print films.item(i).innerText
Next
.Quit '<== Remember to quit application
End With
End Sub
XHR is too fast for this, with the URL provided, but IE is just fine.
If you inspect the HTML you can see each film has the following commonality:
There is an attribute within the a tag called target whose value is _self.
You can use an attribute CSS selector to gather all of these matching elements using the querySelectorAll method of document.
CSS selector (sample):
I would be interested in if this can be solved for getting the film descriptions by parsing the HTML. I had thought the presence of the comments was obscuring the film descriptions. A regex which selects the text within these in theory "<!-- react-text: \d+ -->([^...].+?(?=<))" seems to fail when applied to the .innerHTML as did attempts to swop out the comment start and finish with regex.
I am trying to scrap data out of a section of a webpage. To get into the section I need to fill in a captcha security code and hit a button, but that is alright because the security code is actually written in the html of the page. So, I am creating an IE object, driving it to the webpage, getting the captcha security code, writing it in the proper box, hitting the submit button and then getting the html document so I can scrap data out of it.
Nonetheless I am executing the steps exatcly in the order I mentioned, it seems that the html document that is being gotten is not the one from the page after I pass through the captcha validation, but from the page before the captcha validation.
Would anyone know what must I do to get the correct html document and conseuently be able to scrap the data I really want? Thank you.
The subprocedure's code follows next:
'Getting National fuel prices from ANP
Sub subANPNationalFuelPrices()
'Creating variables for the URL and the HTML files
Dim urlANP As String: urlANP = "http://www.anp.gov.br/preco/prc/Resumo_Semanal_Index.asp"
Dim htmlANP1 As HTMLDocument
'Creating the IE object
Dim IE As InternetExplorer
Set IE = New InternetExplorer
IE.Visible = True
'Making sure that the webpage is fully load
IE.navigate (urlANP)
Do While IE.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Getting your data"
DoEvents
Loop
Set htmlANP1 = IE.document
'Getting the Captcha Password
Dim strCaptchaPassword As String
Dim colMyCollection As IHTMLElementCollection
Set colMyCollection = htmlANP1.getElementById("divQuadro").all
Dim objLabel As IHTMLElement
For Each objLabel In colMyCollection
strCaptchaPassword = strCaptchaPassword & objLabel.innerText
Next objLabel
'Getting the input box object and getting it the correct password
Dim objInputBox As IHTMLElement
Set objInputBox = htmlANP1.getElementById("txtValor")
objInputBox.Value = strCaptchaPassword
'Getting the submit button object and clicking it
Dim objInputButton As IHTMLElement
Set objInputButton = htmlANP1.getElementById("image1")
objInputButton.Click
'Getting the true rich data HTML
Set htmlANP1 = IE.document
'Extracting the data from the html document
Dim rngValues As range: Set rngValues = Sheet1.range("B17")
Dim strValues(35) As String
Dim dblValues(35) As Double
Dim objElement1 As IHTMLElement
Set objElement1 = htmlANP1.getElementsByTagName("TABLE")(1)
Dim colCollection1 As IHTMLElementCollection
Set colCollection1 = objElement1.all
Dim intTempCount As Integer
Dim objTempElement As IHTMLElement
intTempCount = 32
For Each objTempElement In colCollection1
Sheet1.Cells(intTempCount, 3) = objTempElement.tagName
Sheet1.Cells(intTempCount, 4) = objTempElement.innerText
intTempCount = intTempCount + 1
Next objTempElement
End sub
You are not waiting for the new webpage to load after clicking the button on the captcha. Either check the ready state of IE again or end you code here be starting a timer which starts your code off again in X seconds AND then checks the ready state of IE and Document.
I do scraping on a system using iFrame so using IE.Readystate isn't very reliable. Usually I have to wait for another element to 'exist', but using IsObject(element) hasn't been very reliable either. What I've had to do is use a loop in my main code that calls a function so if I'm waiting for something to load and I know that after the page loads, there's an element with the ID "UserName", then I do this..
...
Do Until IsErr(doc, "UserName") = False: Loop
...
Function IsErr(doc As HTMLDocument, ID As String) As Boolean
IsErr = True
On Error GoTo ExitFunction:
Debug.Print left(doc.getElementById(ID).innerHTML, 1)
IsErr = False
Exit Function
ExitFunction:
End Function
I could just do a loop statement that keeps trying to debug it, but that would be a nightmare with the error handling so if you use a separate function for the printing, it can exit the function after the error, then the loop re-initiates the function and it will do this forever until the next element exists.
I would like to know how can I export data from VB6 textbox to a HTML textbox? It could be a simple html page or an asp page.
for example, on my VB6 form, i have an name field. Upon clicking of a button on the VB6 form, the data in the name field will be exported to a textbox on the html page.
Thank you all for help and time for reading this.
To see this demo in action, and be able to follow it through and learn how to grab from it what you need:
Create a form with a lable over a textbox, and stick 1 command buttons on the form. Don't rename any of them - the program expects the text1, command1
The following CODE is the complete FORM CODE to copy/paste into it.
Add refernce to your project from (Project=>References)Microsoft Internet Controls,Microsoft HTML Object Library,
Option Explicit
Public TargetIE As SHDocVw.InternetExplorer
Private Sub Command1_Click() ' Send text to first IE-document found
GetTheIEObjectFromSystem
SendTextToActiveElementWithSubmitOptionSet (False)
End Sub
Private Sub Form_Load()
Me.Text1 = "This is a sample text message set and submitted programmatically" 'make text1 multiline in design
Me.Command1.Caption = "Text to the first IE browser document found"
End Sub
Public Sub GetTheIEObjectFromSystem(Optional ByVal inurl As String = ".") ' "." will be found in ALL browser URLs
Dim SWs As New SHDocVw.ShellWindows
Dim IE As SHDocVw.InternetExplorer
Dim Doc As Object
For Each IE In SWs
If TypeOf IE.Document Is HTMLDocument Then ' necessary to avoid Windows Explorer
If InStr(IE.LocationURL, inurl) > 0 Then
Set TargetIE = IE
Exit For
End If
End If
Next
Set SWs = Nothing
Set IE = Nothing
End Sub
Private Sub SendTextToActiveElementWithSubmitOptionSet(ByVal bSubmitIt As Boolean)
Dim TheActiveElement As IHTMLElement
Set TheActiveElement = TargetIE.Document.activeElement
If Not TheActiveElement.isTextEdit Then
MsgBox "Active element is not a text-input system"
Else
TheActiveElement.Value = Me.Text1.Text
Dim directParent As IHTMLElement
If bSubmitIt Then
Dim pageForm As IHTMLFormElement
Set directParent = TheActiveElement.parentElement
' find its parent FORM element by checking parent nodes up and up and up until found or BODY
Do While (UCase(directParent.tagName) <> "FORM" And UCase(directParent.tagName <> "BODY"))
Set directParent = directParent.parentElement
Loop
If UCase(directParent.tagName) = "FORM" Then
Set pageForm = directParent
pageForm.submit 'intrinsic Form-element Method
Else
MsgBox ("Error: No form unit for submitting the text on this page!")
End If
End If
Set pageForm = Nothing
Set directParent = Nothing
End If
Set TheActiveElement = Nothing
Set TargetIE = Nothing
End Sub