Website scraping: website search box has no value - html

I am trying to crosscheck a large body of data with a specific website (https://icis.corp.delaware.gov/Ecorp/EntitySearch/NameSearch.aspx).
The goal is to search for many company names based on a larger list in Excel to get their founding dates. For now I am starting out with a single name to get it running. I am having trouble in my main code as there is no inherent input value in the HTML code:
<input name="ctl00$ContentPlaceHolder1$frmEntityName" type="text" id="ctl00_ContentPlaceHolder1_frmEntityName" tabindex="4" size="30" maxlength="120" class="txtNormal" onkeyup="KeyEvent1(this.id)">
I tried the following:
Sub click_search()
Dim i As SHDocVw.InternetExplorer
Set i = New InternetExplorer
i.Visible = True
i.Navigate "https://icis.corp.delaware.gov/Ecorp/EntitySearch/NameSearch.aspx"
Do While i.ReadyState <> READYSTATE_COMPLETE
Loop
Dim idoc As MSHTML.HTMLDocument
Set idock = i.Document
idoc.getElementsByTagName("input").Item("ctl00$ContentPlaceHolder1$frmEntityName").Value = "10X Genomics Inc"
End Sub
The problem I believe is the HTML code does not have inherent value = "" to begin with but it only comes up in the HTML code after you write it in.
How do I fix this and furthermore then click the search button?
The error is
"Object variable or With block variable not set"

Always use Option Explicit at the top of every VBA code file.
If the webpage in question contains ids for the elements you are interested in, use getElementById() to access them. This code works, however it does not find any records.
Option Explicit
Sub click_search()
Dim i As SHDocVw.InternetExplorer
Dim idoc As MSHTML.HTMLDocument
Set i = New InternetExplorer
i.Visible = True
i.Navigate "https://icis.corp.delaware.gov/Ecorp/EntitySearch/NameSearch.aspx"
Do While i.ReadyState <> READYSTATE_COMPLETE
Loop
Set idoc = i.Document
idoc.getElementById("ctl00_ContentPlaceHolder1_frmEntityName").Value = "10X Genomics Inc"
idoc.getElementById("ctl00_ContentPlaceHolder1_frmFileNumber").Value = "1"
idoc.getElementById("ctl00_ContentPlaceHolder1_btnSubmit").Click
End Sub

Related

How to access the Web using VBA? Please check my code

In order to improve the repeatitive work, I tried to access the Web site which is using in company using VBA.
So, I made code using VBA. And I checked it could be access the normal site such as google, youtube...
But, I don't know why it could not be access the company site.
VBA stopped this line
Set HTMLDoc = IE_ctrl.document
Thank you in advanced.
And I checked one different things(VBA Local values, type) between Normal and company site.
please check below 2 pictures.
Sub a()
Dim IE_ctrl As InternetExplorer
Dim HTMLDoc As HTMLDocument
Dim input_Data As IHTMLElement
Dim URL As String
URL = "https://www.google.com"
Set IE_ctrl = New InternetExplorer
IE_ctrl.Silent = True
IE_ctrl.Visible = True
IE_ctrl.navigate URL
Wait_Browser IE_ctrl
Set HTMLDoc = IE_ctrl.document
Wait_Browser IE_ctrl
Set input_Data = HTMLDoc.getElementsByClassName("text").Item
input_Data.Click
End Sub
Sub Wait_Browser(Browser As InternetExplorer, Optional t As Integer = 1)
While Browser.Busy
DoEvents
Wend
Application.Wait DateAdd("s", t, Now)
End Sub
Normal site(operating well.)
enter image description here
Company site(operating error.)
enter image description here
You can try the following code. Please read the comments. I can't say anymore because I don't know the page or the html of the page.
Sub a()
'Use late binding for what you need
Dim ie As Object
Dim nodeInputData As Object
Dim url As String
url = "https://www.google.com"
'Use the windows GUID to initialize the Internet Explorer, if you
'want to get access to a company page. This helps if there are
'security rules you can't access over other ways of initializing IE
'This don't work in most cases for pages in the "real" web
'Read here for more infos:
'https://blogs.msdn.microsoft.com/ieinternals/2011/08/03/default-integrity-level-and-automation/
Set ie = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
ie.Visible = True
ie.navigate url
'Waiting for the document to load
Do Until ie.readyState = 4: DoEvents: Loop
'If necessary, if there is dynamic content that must be loaded,
'after the ie reports, loading was ready
'(The last three values are: hours, minutes, seconds)
Application.Wait (Now + TimeSerial(0, 0, 1))
'I don't know your html. If you only want to click a button,
'you don't need a varable
'ie.document.getElementsByClassName("text")(0).Click
'will do the same like
Set nodeInputData = ie.document.getElementsByClassName("text")(0)
nodeInputData.Click
'A short explanation of getElementsByClassName() and getElementsByTagName():
'Both methods create a node collection of all html elements that was found
'by the creteria in the brackets. This is because there can be any number of
'html elements with specified class names or tag names. If, for example,
'3 html elements with the class name "Text" were found, a node collection
'with three elements is created by getElementsByClassName("Text").
'These have the indices 0 to 2, as in an array. The individual elements are
'also addressed via these indices. They are indicated in round brackets.
End Sub

VBA Web search button - GetElementsbyClassName

I have a problem with the VBA code.
I would like to open this website: https://www.tnt.com/express/en_us/site/tracking.html and in Shipment numbers search box I would like to put active cells from Excel file. At the beginning I tried to put only a specific text for example: "777777".
I wrote the below code but unfortunately, the search button is empty and there is no error. I tried everything and I have no idea what should I change in my code.
Any clues? Thank you in advance.
HTML:
<input class="__c-form-field__text ng-touched ng-dirty ng-invalid" formcontrolname="query" pbconvertnewlinestocommasonpaste="" pbsearchhistorynavigation="" shamselectalltextonfocus="" type="search">
VBA:
Sub TNT2_tracker()
Dim objIE As InternetExplorer
Dim aEle As HTMLLinkElement
Dim y As Integer
Dim result As String
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.navigate "https://www.tnt.com/express/en_us/site/tracking.html"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Dim webpageelement As Object
For Each webpageelement In objIE.document.getElementsByClassName("input")
If webpageelement.Class = "__c-form-field__text ng-pristine ng-invalid ng-touched" Then
webpageelement.Value = "777"
End If
Next webpageelement
End Sub
You could use the querySelector + class name to find an element.
something like
'Find the input box
objIE.document.querySelector("input.__c-form-field__text").value = "test"
'Find the search button and do a click
objIE.document.querySelector("button.__c-btn").Click
No need to loop through elements. Unless the site allows you to search multiple tracking numbers at the same time.
It seems automating this page is a litte tricky. If you change the value of the input field it doesn' t work. Nothing happen by clicking the submit button.
A look in the dom inspector shows several events for the input field. I checked them out and it seems we need to paste the value over the clipboard by trigger the paste event of the shipping field.
In order for this to work without Internet Explorer prompting, its security settings for the Internet zone must be set to allow pasting from the clipboard. I'm using a German version of IE, so I have problems explaining how to find the setting.
This macro works for me:
Sub TNT2_tracker()
Dim browser As Object
Dim url As String
Dim nodeDivWithInputField As Object
Dim nodeInputShipmentNumber As Object
Dim textToClipboard As Object
'Dataobject by late binding to use the clipboard
Set textToClipboard = CreateObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
url = "https://www.tnt.com/express/en_us/site/tracking.html"
'Initialize Internet Explorer, set visibility,
'call URL and wait until page is fully loaded
Set browser = CreateObject("internetexplorer.application")
browser.Visible = True
browser.navigate url
Do Until browser.ReadyState = 4: DoEvents: Loop
'Manual break for loading the page complitly
'Application.Wait (Now + TimeSerial(pause_hours, pause_minutes, pause_seconds))
Application.Wait (Now + TimeSerial(0, 0, 3))
'Get div element with input field for shipment number
Set nodeDivWithInputField = browser.Document.getElementsByClassName("pb-search-form-input-group")(0)
If Not nodeDivWithInputField Is Nothing Then
'If we got the div element ...
'First child element is the input field
Set nodeInputShipmentNumber = nodeDivWithInputField.FirstChild
'Put shipment number to clipboard
textToClipboard.setText "7777777"
textToClipboard.PutInClipboard
'Insert value by trigger paste event of the input field
Call TriggerEvent(browser.Document, nodeInputShipmentNumber, "paste")
'Click button
browser.Document.getElementsByClassName("__c-btn")(0).Click
Else
MsgBox "No input field for shipment number found."
End If
End Sub
And this function to trigger a html event:
Private Sub TriggerEvent(htmlDocument As Object, htmlElementWithEvent As Object, eventType As String)
Dim theEvent As Object
htmlElementWithEvent.Focus
Set theEvent = htmlDocument.createEvent("HTMLEvents")
theEvent.initEvent eventType, True, False
htmlElementWithEvent.dispatchEvent theEvent
End Sub
As #Stavros Jon alludes to..... there is a browserless way using xhr GET request via API. It returns json and thus you ideally need to use a json parser to handle the response.
I use jsonconverter.bas as the json parser to handle the response. Download raw code from here and add to standard module called JsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.
Example request with dummy tracking number (deliberately passed as string):
Option Explicit
Public Sub TntTracking()
Dim json As Object, ws As Worksheet, trackingNumber As String
trackingNumber = "1234567" 'test input value. Currently this is not a valid input but is for demo.
Set ws = ThisWorkbook.Worksheets("Sheet1") 'for later use if writing something specific out
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.tnt.com/api/v3/shipment?con=" & trackingNumber & "&searchType=CON&locale=en_US&channel=OPENTRACK", False
.send
Set json = JsonConverter.ParseJson(.responseText)
End With
'do something with results
Debug.Print json("tracker.output")("notFound").Count > 0
Debug.Print JsonConverter.ConvertToJson(json("tracker.output")("notFound"))
End Sub

Check if website checkbox in checked with excel VBA

I am trying to make a VBA that can read the HTML and checked if a specific check box is checked and write either check or unchecked in a cell. But I am having difficulties with VBA as I do not use it as often, any advise will be appreciated.
HTML
<input id="foo1" type="checkbox" name="Device" value="iPad"
checked="checked">
VBA
Sub getValue()
Dim IE As Object: Set IE =
CreateObject("InternetExplorer.Application")
Dim ws As Worksheet: Set ws = ThisWorkbook.Sheets("Sheet1")
Dim Country As String
With IE
.Visible = False
.navigate ws.Range("A3").Value
Do
DoEvents
Loop Until .readyState = 4
End With
Set oShell = CreateObject("WScript.Shell")
Dim document
document.getElementById("checkBox(iPad)")
Item(0).Checked = True
End Sub
Try
Debug.Print ie.document.querySelector("#foo1").getAttribute("checked") ="checked"
I am not sure, without an URL to test with whether there is .Checked property you can evaluate for True ( ie.document.querySelector("#foo1").Checked)
Without more HTML hard to say if this will be able to access the required element. There may be forms/iframes/frames to negotiate.
This can also be done with looping all the input elements on the website until you find the one with the right name, this will of course be slower then a querySelector, but can be usefull if you need to change multiple input elements.
Dim objCollection as Object
Set objCollection = ie.Document.getElementsByTagName("input")
i = 0
'Loop through all elements and find the checkbox
While i < objCollection.Length
If objCollection(i).Name = "Device" Then
objCollection(i).Checked = False
End If
i = i + 1
Wend
If you only have 1 checkbox i would no doubt go with a querySelector as #QHarr

VBA - getElementById works for simple website but not another?

So I'm currently stuck at getting a VBA script to retrieve the value of an input box from this Sudoku website. However, I was able to get the value from a paragraph element with the id of "contact" from my own simpler website, using the same code (after switching the url and id names, of course).
Any attempts to research further brings up articles/blogs that discuss what I've done correctly so far, so I suspect I am not researching it properly.
Here is my code:
Sub GetTable()
Dim ieApp As InternetExplorer
Dim ieDoc As Object
Dim sudokuCell As Object
Dim url, id, content As String
Dim i As Integer
Set ieApp = New InternetExplorer
ieApp.Visible = True
url = "http://www.websudoku.com/"
ieApp.navigate url
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.READYSTATE = READYSTATE_COMPLETE: DoEvents: Loop
Set ieDoc = ieApp.document
If ieDoc Is Nothing Then
MsgBox ("Nothing")
'Else
' MsgBox ("Something")
End If
For i = 0 To 8
Set sudokuCell = ieDoc.getElementById("f00")
content = sudokuCell.innerText
MsgBox (content)
Next i
ieApp.Quit
Set ieApp = Nothing
End Sub
And here is an example of the html for a cell which is blank:
<td class="g0" id="c00"><input class="d0" size="2" autocomplete="off"
name="8iz6n11" maxlength="1" onblur="j8(this)" id="f00"></td>
And here is one for cell that is prefilled with a number:
<td class="f0" id="c10"><input class="s0" size="2" autocomplete="off"
name="s8iz6n21" readonly="" value="7" id="f10"></td>
I have tried both the "c00" an "f00" without success. Also, while I believe the problem at hand is I am not retrieving the element, I am concerned that the .innerText property won't retrieve the values.
First: The website is using FRAME, so you are not accessing the frame document in the VBA code actually. You need to navigate to the actual URL given below - change your url variable as the following (which is the frame's src property):
url = "http://view.websudoku.com/?"
Second: Those are INPUT elements you are trying to get values, you should be better using Value property instead innerText
content = sudokuCell.Value
Third and last: I have no idea what your code is supposed to do inside the loop as it will keep reading f00 element value as is. However I believe you'll loop through the input elements and just hit the wall here about the FRAME issue I explained above, so I assume loop is your part and have no trouble about it.

VBA - problems with getting html from a website after hitting submit button

I am trying to scrap data out of a section of a webpage. To get into the section I need to fill in a captcha security code and hit a button, but that is alright because the security code is actually written in the html of the page. So, I am creating an IE object, driving it to the webpage, getting the captcha security code, writing it in the proper box, hitting the submit button and then getting the html document so I can scrap data out of it.
Nonetheless I am executing the steps exatcly in the order I mentioned, it seems that the html document that is being gotten is not the one from the page after I pass through the captcha validation, but from the page before the captcha validation.
Would anyone know what must I do to get the correct html document and conseuently be able to scrap the data I really want? Thank you.
The subprocedure's code follows next:
'Getting National fuel prices from ANP
Sub subANPNationalFuelPrices()
'Creating variables for the URL and the HTML files
Dim urlANP As String: urlANP = "http://www.anp.gov.br/preco/prc/Resumo_Semanal_Index.asp"
Dim htmlANP1 As HTMLDocument
'Creating the IE object
Dim IE As InternetExplorer
Set IE = New InternetExplorer
IE.Visible = True
'Making sure that the webpage is fully load
IE.navigate (urlANP)
Do While IE.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Getting your data"
DoEvents
Loop
Set htmlANP1 = IE.document
'Getting the Captcha Password
Dim strCaptchaPassword As String
Dim colMyCollection As IHTMLElementCollection
Set colMyCollection = htmlANP1.getElementById("divQuadro").all
Dim objLabel As IHTMLElement
For Each objLabel In colMyCollection
strCaptchaPassword = strCaptchaPassword & objLabel.innerText
Next objLabel
'Getting the input box object and getting it the correct password
Dim objInputBox As IHTMLElement
Set objInputBox = htmlANP1.getElementById("txtValor")
objInputBox.Value = strCaptchaPassword
'Getting the submit button object and clicking it
Dim objInputButton As IHTMLElement
Set objInputButton = htmlANP1.getElementById("image1")
objInputButton.Click
'Getting the true rich data HTML
Set htmlANP1 = IE.document
'Extracting the data from the html document
Dim rngValues As range: Set rngValues = Sheet1.range("B17")
Dim strValues(35) As String
Dim dblValues(35) As Double
Dim objElement1 As IHTMLElement
Set objElement1 = htmlANP1.getElementsByTagName("TABLE")(1)
Dim colCollection1 As IHTMLElementCollection
Set colCollection1 = objElement1.all
Dim intTempCount As Integer
Dim objTempElement As IHTMLElement
intTempCount = 32
For Each objTempElement In colCollection1
Sheet1.Cells(intTempCount, 3) = objTempElement.tagName
Sheet1.Cells(intTempCount, 4) = objTempElement.innerText
intTempCount = intTempCount + 1
Next objTempElement
End sub
You are not waiting for the new webpage to load after clicking the button on the captcha. Either check the ready state of IE again or end you code here be starting a timer which starts your code off again in X seconds AND then checks the ready state of IE and Document.
I do scraping on a system using iFrame so using IE.Readystate isn't very reliable. Usually I have to wait for another element to 'exist', but using IsObject(element) hasn't been very reliable either. What I've had to do is use a loop in my main code that calls a function so if I'm waiting for something to load and I know that after the page loads, there's an element with the ID "UserName", then I do this..
...
Do Until IsErr(doc, "UserName") = False: Loop
...
Function IsErr(doc As HTMLDocument, ID As String) As Boolean
IsErr = True
On Error GoTo ExitFunction:
Debug.Print left(doc.getElementById(ID).innerHTML, 1)
IsErr = False
Exit Function
ExitFunction:
End Function
I could just do a loop statement that keeps trying to debug it, but that would be a nightmare with the error handling so if you use a separate function for the printing, it can exit the function after the error, then the loop re-initiates the function and it will do this forever until the next element exists.