Visual Basic - Extracting text from a webpage - html

My program is designed to search Amazon and extract bits of data from the results - such as product name and price.
So far, my program pastes what the user has typed into txt_Search and presses 'Go' on the page. Here is the code so far:
Private Sub btn_Search_Click(sender As Object, e As EventArgs) Handles btn_Search.Click
Dim SearchTerm As String
'Set SearchTerm as text input
SearchTerm = txt_Search.Text
'Identify the search text box
Dim AmazonElementCollection As HtmlElementCollection = web_Browser.Document.GetElementsByTagName("Input")
For Each curElement As HtmlElement In AmazonElementCollection
Dim searchtextbox As String = curElement.GetAttribute("id").ToString
If searchtextbox = "twotabsearchtextbox" Then
'Populate the search text box
curElement.SetAttribute("Value", searchterm)
End If
Next
'BUTTON CLICK: Search button
'<input type="submit" value="Go" class="nav-submit-input" title="Go">
For Each curElement As HtmlElement In AmazonElementCollection
'Find the Go button
If curElement.GetAttribute("value").Equals("Go") Then
'Function to click the button
curElement.InvokeMember("click")
End If
Next
So this all works fine as it does all this on a web browser object called web_Browser. But how do I take the text off the web page for the first result for example and put it into a textbox. I would like the title of the first result to be put into a textbox called txt_Title. Thank you.

You need to add an event handler for the DocumentCompleted event of your web_browser control.
In that event handler, you can use the same method to find your results element as you have done to find the search text box and the search button, that is: loop through the elements of the document and use the properties of the elements to identify the one you want.
This might look something like this:
Add event handler:
If curElement.GetAttribute("value").Equals("Go") Then
'Add the handler
AddHandler web_Browser.DocumentCompleted, AddressOf myDocumentCompleted
'Function to click the button
curElement.InvokeMember("click")
End If
Event handler implementation:
Private Sub myDocumentCompleted(ByVal sender As Object, _
ByVal e As WebBrowserDocumentCompletedEventArgs)
Dim AmazonElementCollection As HtmlElementCollection = web_Browser.Document.GetElementsByTagName("something you need to figure out")
For Each curElement As HtmlElement In AmazonElementCollection
Dim resultText As String = curElement.GetAttribute("id").ToString
If resultText = "something you need to figure out" Then
'Populate the text box
txt_Title.Text = curElement.innerHTML;
End If
Next
End Sub
Note: I haven't run this through a compiler or tested it, so be prepared for some problems, but I hope you get the idea.

Related

VBA Web search button - GetElementsbyClassName

I have a problem with the VBA code.
I would like to open this website: https://www.tnt.com/express/en_us/site/tracking.html and in Shipment numbers search box I would like to put active cells from Excel file. At the beginning I tried to put only a specific text for example: "777777".
I wrote the below code but unfortunately, the search button is empty and there is no error. I tried everything and I have no idea what should I change in my code.
Any clues? Thank you in advance.
HTML:
<input class="__c-form-field__text ng-touched ng-dirty ng-invalid" formcontrolname="query" pbconvertnewlinestocommasonpaste="" pbsearchhistorynavigation="" shamselectalltextonfocus="" type="search">
VBA:
Sub TNT2_tracker()
Dim objIE As InternetExplorer
Dim aEle As HTMLLinkElement
Dim y As Integer
Dim result As String
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.navigate "https://www.tnt.com/express/en_us/site/tracking.html"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Dim webpageelement As Object
For Each webpageelement In objIE.document.getElementsByClassName("input")
If webpageelement.Class = "__c-form-field__text ng-pristine ng-invalid ng-touched" Then
webpageelement.Value = "777"
End If
Next webpageelement
End Sub
You could use the querySelector + class name to find an element.
something like
'Find the input box
objIE.document.querySelector("input.__c-form-field__text").value = "test"
'Find the search button and do a click
objIE.document.querySelector("button.__c-btn").Click
No need to loop through elements. Unless the site allows you to search multiple tracking numbers at the same time.
It seems automating this page is a litte tricky. If you change the value of the input field it doesn' t work. Nothing happen by clicking the submit button.
A look in the dom inspector shows several events for the input field. I checked them out and it seems we need to paste the value over the clipboard by trigger the paste event of the shipping field.
In order for this to work without Internet Explorer prompting, its security settings for the Internet zone must be set to allow pasting from the clipboard. I'm using a German version of IE, so I have problems explaining how to find the setting.
This macro works for me:
Sub TNT2_tracker()
Dim browser As Object
Dim url As String
Dim nodeDivWithInputField As Object
Dim nodeInputShipmentNumber As Object
Dim textToClipboard As Object
'Dataobject by late binding to use the clipboard
Set textToClipboard = CreateObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
url = "https://www.tnt.com/express/en_us/site/tracking.html"
'Initialize Internet Explorer, set visibility,
'call URL and wait until page is fully loaded
Set browser = CreateObject("internetexplorer.application")
browser.Visible = True
browser.navigate url
Do Until browser.ReadyState = 4: DoEvents: Loop
'Manual break for loading the page complitly
'Application.Wait (Now + TimeSerial(pause_hours, pause_minutes, pause_seconds))
Application.Wait (Now + TimeSerial(0, 0, 3))
'Get div element with input field for shipment number
Set nodeDivWithInputField = browser.Document.getElementsByClassName("pb-search-form-input-group")(0)
If Not nodeDivWithInputField Is Nothing Then
'If we got the div element ...
'First child element is the input field
Set nodeInputShipmentNumber = nodeDivWithInputField.FirstChild
'Put shipment number to clipboard
textToClipboard.setText "7777777"
textToClipboard.PutInClipboard
'Insert value by trigger paste event of the input field
Call TriggerEvent(browser.Document, nodeInputShipmentNumber, "paste")
'Click button
browser.Document.getElementsByClassName("__c-btn")(0).Click
Else
MsgBox "No input field for shipment number found."
End If
End Sub
And this function to trigger a html event:
Private Sub TriggerEvent(htmlDocument As Object, htmlElementWithEvent As Object, eventType As String)
Dim theEvent As Object
htmlElementWithEvent.Focus
Set theEvent = htmlDocument.createEvent("HTMLEvents")
theEvent.initEvent eventType, True, False
htmlElementWithEvent.dispatchEvent theEvent
End Sub
As #Stavros Jon alludes to..... there is a browserless way using xhr GET request via API. It returns json and thus you ideally need to use a json parser to handle the response.
I use jsonconverter.bas as the json parser to handle the response. Download raw code from here and add to standard module called JsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.
Example request with dummy tracking number (deliberately passed as string):
Option Explicit
Public Sub TntTracking()
Dim json As Object, ws As Worksheet, trackingNumber As String
trackingNumber = "1234567" 'test input value. Currently this is not a valid input but is for demo.
Set ws = ThisWorkbook.Worksheets("Sheet1") 'for later use if writing something specific out
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.tnt.com/api/v3/shipment?con=" & trackingNumber & "&searchType=CON&locale=en_US&channel=OPENTRACK", False
.send
Set json = JsonConverter.ParseJson(.responseText)
End With
'do something with results
Debug.Print json("tracker.output")("notFound").Count > 0
Debug.Print JsonConverter.ConvertToJson(json("tracker.output")("notFound"))
End Sub

Call a vb6 function from a VB6 WebBrowser [duplicate]

I am working with VB6 WebBrowser, Here i need to open a vb6 form when user click any particular link of WebBrowser's link like
In HTML
<html>
<head>
<body>
Click To show VB6 Form2
</body>
</html>
I do't have any idea how to do it. I thought sometime it can be done a third text file like when the link clicked will write a cod like 002 in a text file.
And the in vb form a Timer will check once a second the file, when timer detect the file contains 002 it will show the form.
Can be do this by this method? or anything else shorter i can except?
Pick a better naming scheme like:
Click To show VB6 Form2
Waffles
Then intercept link clicks via the BeforeNavigate2 event, look at the url and if it matches #vb-* run your code:
Private Sub WebBrowserCtrl_BeforeNavigate2(ByVal pDisp As Object, URL As Variant, Flags As Variant, TargetFrameName As Variant, PostData As Variant, Headers As Variant, Cancel As Boolean)
'// get #vb-XXX command from url
Dim pos As Long: pos = InStrRev(URL, "#vb-")
If pos Then
Cancel = True '// stop default navigation
URL = Mid$(URL, pos + 4)
Select Case LCase$(URL)
Case "showform2": Form2.Show
'...
Case "waffles": MsgBox "Waffles."
Case Else: MsgBox "Unknown Command " & URL
End Select
End If
End Sub
Instead of putting the form name inside the href attribute, I believe a better method would be to set a your own data attribute and use that, it seems to me a much cleaner way to do such a task.
In my example, inside the href tag i'm using the classic void(0) to prevent the link navigation, otherwise your external link to VB forms could break the browser history with unexpected results.
To use the WebBrowser control, You should have already added in your VB project a reference to the Microsoft Internet Controls, what you need next is to add a reference to the Microsoft HTML Library, the type library contained inside the mshtml.tlb file.
Assuming your WebBrowser control is called "WebBrowser1", and you have three additional forms called "Form1", "Form2" and "Form3", in the form where you placed the WebBrowser control put this piece of code:
Private HTMLdoc As MSHTML.HTMLDocument
' Create a Web Page to test the navigation '
' You can skip this phase after your test are successfully executed '
Private Sub Form_Load()
Dim HTML As String
WebBrowser1.Navigate "about:blank"
HTML = "<html>"
HTML = HTML & "<title>Open a VB Form from a Link</title>"
HTML = HTML & "<body>"
HTML = HTML & "<a data-vb='Form1' href='javascript:void(0)'>Click To show Form1</a>"
HTML = HTML & "</br>"
HTML = HTML & "<a data-vb='Form2' href='javascript:void(0)'>Click To show Form2</a>"
HTML = HTML & "</br>"
HTML = HTML & "<a data-vb='Form3' href='javascript:void(0)'>Click To show Form3</a>"
HTML = HTML & "</br>"
HTML = HTML & "</body>"
HTML = HTML & "</html>"
WebBrowser1.Document.Write HTML
End Sub
' This will load and show the form specified in the data-vb attribute of the link '
Private Sub WebBrowser1_BeforeNavigate2(ByVal pDisp As Object, URL As Variant, Flags As Variant, TargetFrameName As Variant, PostData As Variant, Headers As Variant, Cancel As Boolean)
Dim frm As Form, FormName as String
If Not (WebBrowser1.Document Is Nothing) Then
Set HTMLdoc = WebBrowser1.Document
FormName = vbNullString & HTMLdoc.activeElement.getAttribute("data-vb")
If Not FormName = vbNullString Then
Set frm = Forms.Add(FormName)
frm.Show
End If
End If
End Sub
An additional note:
You can get the content of the clicked link in following way:
HTMLdoc.activeElement.toString
Obviously, for all links in my test page, the result will be:
javascript:void(0) which is the same as the URL parameter of the BeforeNavigate event.
Another useful information which you can get from the HTMLDocument and wouldn't be available in the BeforeNavigate event is, for example:
HTMLdoc.activeElement.outerHTML
the result will be:
Click To show Form2
To do this with a button instead of a link, add the button to the document and a bit of javascript:
<input type="button" id="MyButton1_id" style="cursor: pointer" name=MyButton1 value="Show It!">
<SCRIPT LANGUAGE="VBScript">
Sub MyButton1_OnClick()
location.href = "event:button1_show"
End Sub
</SCRIPT>
Then in the BeforeNavigate2 event:
Public Sub webBrowser_BeforeNavigate2(ByVal pDisp As Object, URL As Variant, flags As Variant, TargetFrameName As Variant, PostData As Variant, Headers As Variant, Cancel As Boolean)
Select Case LCase$(URL)
Case "event:button1_show"
Cancel = True
Form2.Show
Case "event:other_stuff"
'other stuff to do, etc
End Select
End Sub

Refer to Form Control Without Specific Naming

I have a form with numerous images, each of which performs a series of actions when clicked. I can create a Private Sub with all of the actions for each button, however I think this is inefficient. Rather, I'd like to record all the actions in one Macro and then call this Macro when each image is clicked. To do so, I'd need the single Macro to refer to the current image selected and not refer to any image by name. Is this possible?
My current code includes the following:
Me.Image001.BorderColor = RGB(1, 1, 1)
Me.Image001.BorderWidth = 2
Me.Image001.BorderStyle = 1
I'd need to amend this so that it amends the border colour/width/style etc of whichever image is selected, and not a specific named image ('Image001').
Thanks!
You should use event sinking.
With event sinking you could bind to an event your own procedures.
You can see an example here http://p2p.wrox.com/access-vba/37472-event-triggered-when-any-control-changed.html
In simple words you create a module where you bind the event to your specific implementation . Then on the form you are interested you create a collection where you register the controls you want to "follow" the event sinking...
My sub sinking for checkboxes (i have alot)
1st a class module name SpecialEventHandler
Option Compare Database
Private WithEvents chkbx As CheckBox
Private m_Form As Form
Private Const mstrEventProcedure = "[Event Procedure]"
Public Function init(chkbox As CheckBox, frm As Form)
Set chkbx = chkbox
Set m_Form = frm
'Debug.Print frm.Name
chkbx.AfterUpdate = mstrEventProcedure
End Function
Private Sub chkbx_AfterUpdate()
'your Code here
End Sub
Private Sub Class_Terminate()
Set chkbx = Nothing
Set m_Form = Nothing
End Sub
Then on the form you want to use event sinking
Option Compare Database
Dim spEventHandler As SpecialEventHandler
Private colcheckBoxes As New Collection
Private Sub Form_Open(Cancel As Integer)
Dim ctl As Control
For Each ctl In Me.Detail.Controls
Select Case ctl.ControlType
Case acCheckBox
Set spEventHandler = New SpecialEventHandler
spEventHandler.init Controls(ctl.NAME), Me
colcheckBoxes.Add spEventHandler
End Select
Next ctl
End Sub
You could always create a Standard Sub, then call it on the click of the button. Something like
Public Sub changeColor(frm As Form, ctrl As Control)
frm.Controls(ctrl.Name).BorderColor = RGB(1, 1, 1)
frm.Controls(ctrl.Name).BorderWidth = 2
frm.Controls(ctrl.Name).BorderStyle = 1
End Sub
So when you click an image all you have to do is,
Private Sub Image001_Click()
changeColor Me, Image001
End Sub

selstart returns position 0 if text is entered in memo field (not clicked)

I have memo field and list. What I want to accomplish is if I am typing something in memo field and then just click on text record in list that the text shows up in memo positioned with the beginning where cursor was.
After research, and googling I succeed to make it. I did it with .selstart property.
But for me it seems that selstart has bug. It works only if I click somewhere in memo (Then everything works great.) But if was typing something, and then click on text in list (without previously clicking in memo field) selstart returns position 0.
This makes me huge problem.
Can anyone help? Thank you.
As you found out, the problem is that the cursor position is lost when you move away from the memo.
This is probably due to the fact that Access form controls are not "real" controls: they are real windows controls only when they have the focus. the rest of the time, they are sort of images of the control pasted onto the form.
So, what you need to do is track the cursor position (and currently selected length of text) during various interractions:
when the user moves the cursor using the keyboard (KeyUp event)
when the user clicks inside the memo (Click event, to position the cursor or select text using the mouse)
when the memo initially gets the focus (GetFocus, the first time, the whole text is selected and the cursor is at position 0)
To test this, I made a small form:
The added the following code to the form:
'----------------------------------------------------------
' Track the position of the cursor in the memo
'----------------------------------------------------------
Private currentPosition As Long
Private currentSelLen As Long
Private Sub txtMemo_Click()
RecordCursorPosition
End Sub
Private Sub txtMemo_GotFocus()
RecordCursorPosition
End Sub
Private Sub txtMemo_KeyUp(KeyCode As Integer, Shift As Integer)
RecordCursorPosition
End Sub
Private Sub RecordCursorPosition()
currentPosition = txtMemo.SelStart
currentSelLen = txtMemo.SelLength
End Sub
'----------------------------------------------------------
' Insert when the user double-click the listbox or press the button
'----------------------------------------------------------
Private Sub listSnippets_DblClick(Cancel As Integer)
InsertText
End Sub
Private Sub btInsert_Click()
InsertText
End Sub
'----------------------------------------------------------
' Do the actual insertion of text
'----------------------------------------------------------
Private Sub InsertText()
If Len(Nz(listSnippets.Value, vbNullString)) = 0 Then Exit Sub
Echo False 'Avoid flickering during update
' Update the Memo content
Dim oldstr As String
oldstr = Nz(txtMemo.Value, vbNullString)
If Len(oldstr) = 0 Then
txtMemo.Value = listSnippets.Value
Else
txtMemo.Value = Left$(oldstr, currentPosition) & _
listSnippets.Value & _
Mid$(oldstr, currentPosition + currentSelLen + 1)
End If
'We will place the cursor after the inserted text
Dim newposition As Long
newposition = currentPosition + Len(listSnippets.Value)
txtMemo.SetFocus
txtMemo.SelStart = newposition
txtMemo.SelLength = 0
currentPosition = newposition
currentSelLen = 0
Echo True
End Sub
I have made a test accdb database that you can download so you can see the details and play around with this.

ListBox with html element

Can anyone offer me some advice? I currently have a listbox I am using, in the listbox there is a list of images from any website. they are grabbed from the website via this method
Private Sub WebBrowser1_DocumentCompleted(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
Dim PageElements As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("img")
For Each CurElement As HtmlElement In PageElements
imagestxt.Items.Add(imagestxt.Text & CurElement.GetAttribute("src") & Environment.NewLine)
Next
Timer1.Enabled = True
End Sub
I then use the picture control method to get the image and display it.
pic1.Image = New Bitmap(New MemoryStream(New WebClient().DownloadData(imagestxtimagestxt.SelectedItem.ToString))).SelectedItem.ToString)))
This method pulls the images and title from the HTML.
Private Function StrHTML12() As Boolean
Dim htmlDocument As HtmlDocument = WebBrowser1.Document
ListBox1.Items.Clear()
For Each element As HtmlElement In htmlDocument.All
ListBox1.Items.Add(element.TagName)
If element.TagName.ToUpper = "IMG" Then
imgtags.Items.Add(element.OuterHtml.ToString)
End If
If element.TagName.ToUpper = "TITLE" Then
titletags.Items.Add(element.OuterHtml.ToString)
Timer1.Enabled = False
End If
Next
End Function
This is a counting method to count how many empty alt="" or empty img alt='' there are on the page.
Basically what i am looking to do is;
Have a program that can check the image, look at the alt='' or img alt='' if on the website the dev hasn't put anything in the alt tag i want the image to show in a picture box and i want the alt tag either next to it or underneith it or something. but i have no idea how.
counter = InStr(counter + 1, strHTML, "<img alt=''")
counter = InStr(counter + 1, strHTML, "alt=''")
counter = InStr(counter + 1, strHTML, "alt=""")
The above seems really slow and messy. is there a better way of doing it?
I do not have VB installed so I have not been able to test the code. I'm also not familiar with the datagridview component so have not attempted to integrate my code with it.
The code below should get you the title of the page, and loop through all the img tags that do not have (or have empty) alt-text
HtmlElement.GetAttribute(sAttr) returns the value of the attribute or an empty string.
Private Sub WebBrowser1_DocumentCompleted(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
Dim Title As String
Dim ImSrc As String
Dim PageElements As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("img")
// This line might need to be adjusted, see below
Title = PageElements.GetElementsByTagName("title")(0).InnerText
For Each CurElement As HtmlElement In PageElements
If CurElement.GetAttribute("alt") = "" Then
// CurElement does not have alt-text
ImSrc = CurElement.GetAttribute("src") // This Image has no Alt Text
Else
// CurElement has alt-text
End If
Next
Timer1.Enabled = True
End Sub
The line that gets the title might need to be changed as I'm unsure how collections can be accessed. You want the first (hopefully only) element returned from the GetElementsByTagName function.