get HTMLDocument from HttpWebRequest without HtmlAgilityPack - html

I'm trying to write a function that returns an "htmlDocument" using "HttpWebRequest" instead of a browser but I'm stuck with transferring of innerhtml.
I don't understand how to set value of "mWebPage" because VB doesn't accept "New" for HTMLDocument
I know that I can use "HtmlAgilityPack" but I would like to test my current code, changing only web request and not to change all parsing code.(To do this I need an HtmlDocument)
After this test, I'll try to change also the parsing code.
Function mWebRe(ByVal mUrl As String) As HTMLDocument
Dim request As HttpWebRequest = CType(WebRequest.Create(mUrl), HttpWebRequest)
' Set some reasonable limits on resources used by this request
request.MaximumAutomaticRedirections = 4
request.MaximumResponseHeadersLength = 4
' Set credentials to use for this request.
request.Credentials = CredentialCache.DefaultCredentials
'Here I've tryed many types
Dim mWebPage As HTMLDocument
Try
Dim request2 As HttpWebRequest = WebRequest.Create(mUrl)
Dim response2 As HttpWebResponse = request2.GetResponse()
Dim reader2 As StreamReader = New StreamReader(response2.GetResponseStream())
Dim WebContent As String = reader2.ReadToEnd()
'This is my last attempt
'This gives Null Reference Exception
mWebPage.Body.InnerHtml = WebContent
Catch ex As Exception
MsgBox(ex.ToString)
End Try
Return mWebPage
End Function
I've tryed many ways (also import HTML Object Library) but nothing worked :(

Okay this is becoming more of a hack by the minute, but this should work.
First, you'll need to instantiate your WebBrowser control at the class level:
Private m_objWebBrowser As WebBrowser
Next add an Event Handler for the DocumentCompleted Event that contains all your HTML parsing data. You get an instance of the HtmlDocument using the OpenNew method of the WebBrowser control.
Private Sub HandleParsing(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
'Use your code for generating WebContent.
Dim WebContent As String = "<html></html>"
Dim mWebPage As HtmlDocument = DirectCast(sender, WebBrowser).Document.OpenNew(True)
mWebPage.Write(WebContent)
End Sub
Finally, you can trigger all of this by wiring up the Event Handler and navigating to some page or Html file on disk (DocumentCompleted fires asynchronously):
AddHandler m_objWebBrowser.DocumentCompleted, AddressOf HandleParsing
m_objWebBrowser.Navigate("www.google.com")

I found a solution on the web and modified my code as below:
To make it work you must activate reference to "Microsoft HTML object library" (in .Com references)
It is obsolete but it seems to be the only way to make an html document without using webbrowser.
I Hope it helps someone else.
Function mWebRe(ByVal mUrl As String) As MSHTML.HTMLDocument
Dim request As HttpWebRequest = WebRequest.Create(mUrl)
Dim doc As MSHTML.IHTMLDocument2 = New MSHTML.HTMLDocument
' Set some reasonable limits on resources used by this request
request.MaximumAutomaticRedirections = 4
request.MaximumResponseHeadersLength = 4
' Set credentials to use for this request.
request.Credentials = CredentialCache.DefaultCredentials
Try
Dim response As HttpWebResponse = request.GetResponse()
Dim reader As StreamReader = New StreamReader(response.GetResponseStream())
Dim WebContent As String = reader.ReadToEnd()
doc.clear()
doc.write(WebContent)
doc.close()
'To make sure that the data is fully load.
While (doc.readyState <> "complete")
'This for more waiting (if needed)
'System.Threading.Thread.Sleep(1000)
Application.DoEvents()
End While
Catch ex As Exception
MsgBox(ex.ToString)
End Try
Return doc
End Function

Related

How can I show my JSON results in a Textbox instead of writing to the Console?

I'm running into a little problem that I haven't found a way to to solve.
I haven't found a forum where this specific problem is addressed, I really hope to find some help.
Here is my code:
Imports System.IO
Imports System.Net
Imports Newtonsoft.Json.Linq
Public Class Form1
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim request As HttpWebRequest
Dim response As HttpWebResponse = Nothing
Dim reader As StreamReader
request = DirectCast(WebRequest.Create("https://pastebin.com/raw/dWjmfW8N"), HttpWebRequest)
response = DirectCast(request.GetResponse(), HttpWebResponse)
reader = New StreamReader(response.GetResponseStream())
Dim jsontxt As String
jsontxt = reader.ReadToEnd()
Dim myJObject = JObject.Parse(jsontxt)
For Each match In myJObject("matches")
Console.WriteLine(match("http")("host").ToString)
Next
End Sub
End Class
Here is the output:
223.16.205.13
190.74.163.58
71.7.168.29
117.146.53.244
31.170.146.28
118.36.122.169
123.7.117.78
113.61.154.182
36.48.37.191
113.253.179.234
124.13.29.41
180.122.74.183
121.157.114.93
39.78.35.216
176.82.1.100
201.143.142.75
222.117.29.229
89.228.209.185
59.153.89.245
148.170.162.37
112.160.243.23
62.101.254.177
190.141.161.149
121.132.177.79
79.165.124.174
118.39.91.43
220.83.82.58
220.161.101.195
190.218.188.86
123.241.174.77
219.71.218.113
81.198.205.2
1.64.205.1
190.204.66.180
203.163.241.36
36.34.148.33
221.124.127.89
115.29.210.231
39.121.63.13
178.160.38.191
117.146.55.217
149.91.99.49
220.93.231.104
49.245.71.40
211.44.70.107
37.119.247.51
222.101.54.200
178.163.102.223
119.198.145.129
188.26.240.141
115.29.233.160
190.164.29.145
94.133.185.144
181.37.196.134
116.88.213.9
115.2.194.11
1.226.12.161
178.63.73.210
49.149.194.242
14.32.29.251
59.0.191.68
58.122.168.43
142.129.230.137
105.145.89.51
201.243.97.65
175.37.162.102
186.88.141.126
105.148.43.100
60.179.173.21
69.115.51.207
90.171.193.132
14.64.76.165
121.127.95.80
175.211.168.48
99.240.74.72
58.153.174.2
119.77.168.142
121.170.47.232
58.243.20.124
199.247.243.234
47.111.76.211
93.72.213.251
218.32.44.73
220.83.90.204
119.158.102.20
95.109.55.204
106.5.19.223
190.199.215.69
190.218.57.249
36.102.72.163
219.78.162.215
177.199.151.96
196.93.125.34
211.58.150.166
180.131.163.40
93.156.97.81
159.89.22.81
130.0.55.156
186.93.202.111
195.252.44.173
What I want to do is to transfer that console output to my Textbox1.Text. Can anyone please show me a way to solve this?
A somewhat simplified method, using WebClient's DownloadStringTaskAsync to download the JSON.
You don't need special treatment here, strings that represent IpAddresses are just numbers and dots and the source encoding is probably UTF8.
After that, just parse the JSON and Select() the property values you care about, transform the resulting Enumerable(Of JToken) to an array of strings and set the array as the source of a TextBox.Lines property.
You can store the lines collection for any other use, in case it's needed.
Private Async Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Using client As New WebClient()
Dim json = Await client.DownloadStringTaskAsync([The URL])
Dim parsed = JObject.Parse(json)
Dim lines = parsed("matches").
Where(Function(jt) jt("http") IsNot Nothing).
Select(Function(jt) jt("http")("host").ToString()).ToArray()
TextBox1.Lines = lines
End Using
End Sub
There's no need to transfer anything. If you want the data in a TextBox then put it in a TextBox. You can then output the same data using Console.WriteLine or Debug.WriteLine. You can use a loop:
Dim hosts As New List(Of String)
For Each match In myJObject("matches")
hosts.Add(match("http")("host").ToString())
Next
Dim text = String.Join(Environment.NewLine, hosts)
myTextBox.Text = text
Console.WriteLine(text)
You could also use LINQ:
Dim text = String.Join(Environment.NewLine, myJObject("matches").Select(Function(match) match("http")("host").ToString()))
myTextBox.Text = text
Console.WriteLine(text)
Alternative approach to display collection of things in Winforms are ListView, DataGridView or other collection controls depends on desired usage.
Add ListView control in designer and next code will fill it with received values.
Shared ReadOnly client As HttpClient = New HttpClient()
Private Async Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim response As HttpResponseMessage =
Await client.GetAsync("https://pastebin.com/raw/dWjmfW8N")
response.EnsureSuccessStatusCode()
Dim jsonBody As String = Await response.Content.ReadAsStringAsync()
Dim myJObject = JObject.Parse(jsonBody)
ListView1.Items.Clear()
For Each match In myJObject("matches")
ListView1.Items.Add(match("http")("host").ToString)
Next
End Sub

filling a html auto search box and obtaining the results

I am trying to fill in a search box on a web page that as it is filled in it auto searches for the results. The website is https://pcpartpicker.com/products/motherboard/. If you go there and type in a motherboard manufacturer of motherboard name you can see how it begins to narrow down the possible selections. I have code that will fill in the search box but nothing happens.
Sub GetMotherboards()
Dim ie As InternetExplorer
Set ie = New InternetExplorer
Dim doc As HTMLDocument
Dim objText As DataObject
Dim objArticleContents As Object
Dim objLinksCollection As Object
Dim objToClipBoard As DataObject
Dim r As Object
Dim prodRating As String
Dim prodName As String
Dim lngNumberOfVideos As Long
Dim strURL As String
Dim strNewString As String, strStr As String, strTestChar As String
Dim bFlag As Boolean
strURL = "https://pcpartpicker.com/products/motherboard/" ' Range("J5").Value
With ie
.navigate strURL
.Visible = True
Do While .readyState <> 4: DoEvents: Loop
Application.Wait Now + #12:00:02 AM#
Set doc = ie.document
End With
bFlag = False
With doc
Set objArticleContents = .getElementsByClassName("subTitle__form")
Stop
Set ele = .getElementsByClassName("subTitle__form")(0)
Set form = .getElementsByClassName("subTitle__form")(0).getElementsByClassName("form-label xs-inline")(1)
Set inzputz = ele.getElementsByClassName("text-input")(0)
Call .getElementsByClassName("text-input")(0).setAttribute("placeholder", "MSI B450 TOMAHAWK") '.setAttribute("part_category_search", "MSI B450 TOMAHAWK")
End With
End Sub
After reading some posts here (which I now can't find) my thinking is that there is/ are event listeners and functions that need to be included in this code but that is over my head. Could someone please help me figure this out.
Tim Williams has a post here (an answer to a post) which discussed this but now I can't find it.
You can avoid the expense of a browser and perform the same xhr GET request the page does that returns json. You will need a json parser to handle the response.
Json library:
I use jsonconverter.bas. Download raw code from here and add to standard module called JsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.
I show a partial implementation which makes requests for different categories and products and uses both full and partial string searches. It is a partial implementation in that I read responses into json objects and also print the json strings but do not attempt to access all items within json object. That can be refined upon more detail from you. For demo puposes I access ("result")("data") which gives you the price and name info. Part of the original response json has html as value for accessor ("result")("html"). This has description info e.g.Socket/CPU with motherboard items.
Option Explicit
Public Sub ProductSearches()
Dim xhr As Object, category As String, items()
Set xhr = CreateObject("MSXML2.XMLHTTP")
category = "motherboard"
items = Array("Gigabyte B450M DS3H", "MSI B450 TOMAHAWK", "random string")
PrintListings items, xhr, category
category = "memory"
items = Array("Corsair Vengeance") 'partial search
PrintListings items, xhr, category
End Sub
Public Function GetListings(ByVal xhr As Object, ByVal category As String, ByVal item As String) As Object
Dim json As Object
With xhr
.Open "GET", "https://pcpartpicker.com/products/" & category & "/fetch/?xslug=&location=&search=" & item, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
Set json = JsonConverter.ParseJson(.responseText)("result")("data")
Set GetListings = json
End With
End Function
Public Sub PrintListings(ByRef items(), ByVal xhr As Object, ByVal category As String)
'Partially implemented. You need to decide what to do with contents of json object
Dim json As Object, i As Long
For i = LBound(items) To UBound(items)
Set json = GetListings(xhr, category, items(i))
'Debug.Print Len(JsonConverter.ConvertToJson(json)) ' Len(JsonConverter.ConvertToJson(json)) =2 i.e {} then no results
Debug.Print JsonConverter.ConvertToJson(json) 'demo purposes only
'do something with json
Next
End Sub
Json parsing:
Read about using JsonConverter and parsing json in vba here, here and here.
You need to execute the keyup event after you place your value into your textbox.
You can accomplish this by using the execScript method.
So, after you load the webpage, create a variable for your input/textbox. In the below example, it's tb. Set the .Value property to your search text (which I used "MSI") then fire the keyup event via script.
Dim tb As HTMLInputElement
Set tb = IE.document.getElementById("part_category_search")
tb.Value = "MSI"
IE.document.parentWindow.execScript "$('#part_category_search').keyup()"
I am not overly familiar with jQuery, so this script targets all inputs on the webpage. But I've tested it and it works for your search.
Here was the full code I used in testing if you want to shorten yours:
Dim IE As InternetExplorer
Set IE = New InternetExplorer
IE.Visible = True
IE.navigate "https://pcpartpicker.com/products/motherboard/"
Do While IE.Busy Or IE.readyState < 4
DoEvents
Loop
Dim tb As HTMLInputElement
Set tb = IE.document.getElementById("part_category_search")
tb.Value = "MSI"
IE.document.parentWindow.execScript "$('#part_category_search').keyup()"

Get website's inner text without webbrowser

I want to get website's inner text through code.
I can already get it's inner html with code below, but i can't find any code that's getting URL's inner text without webbrowser.
This code is getting text from website in webbrowser, but i need same thing, just without webbrowser.
Dim sourceString As String = WebBrowser1.Document.Body.InnerText
With HtmlAgilityPack...
Private Sub ToolStripButton1_Click(sender As Object, e As EventArgs) Handles ToolStripButton1.Click
Dim doc As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument
With New Net.WebClient
doc.LoadHtml(.DownloadString("https://example.com"))
.Dispose()
End With
Debug.Print(doc.DocumentNode.Name)
PrintChildNodes(doc.DocumentNode)
Debug.Print(doc.DocumentNode.Element("html").Element("body").InnerText)
End Sub
Sub PrintChildNodes(Node As HtmlAgilityPack.HtmlNode, Optional Indent As Integer = 1)
For Each Child As HtmlAgilityPack.HtmlNode In Node.ChildNodes
Debug.Print("{0}{1}", String.Empty.PadLeft(Indent, vbTab), Child.Name)
PrintChildNodes(Child, Indent + 1)
Next
End Sub
**Taken from **
Wolfwyrd
In this question HTTP GET in VB.NET
Try
Dim fr As System.Net.HttpWebRequest
Dim targetURI As New Uri("http://whatever.you.want.to.get/file.html")
fr = DirectCast(HttpWebRequest.Create(targetURI), System.Net.HttpWebRequest)
If (fr.GetResponse().ContentLength > 0) Then
Dim str As New System.IO.StreamReader(fr.GetResponse().GetResponseStream())
Response.Write(str.ReadToEnd())
str.Close();
End If
Catch ex As System.Net.WebException
'Error in accessing the resource, handle it
End Try
You will get Html as well as http headers. Don't think this will work by itself with https.

Parsing html table containing images to datatable attribute

i used the following code to parse html table inner text to datatable (using Html-Agility-Pack):
Imports System.Net
Public Sub ParseHtmlTable(byval HtmlFilePath as String)
Dim webStream As Stream
Dim webResponse = ""
Dim req As FileWebRequest
Dim res As FileWebResponse
' REQUEST PAGE (We are requesting Google Finance Page with NSE:RENUKA Stock Info
req = WebRequest.Create("file:///" & HtmlFilePath)
req.Method = "GET" ' Method of sending HTTP Request(GET/POST)
res = req.GetResponse ' Send Request
webStream = res.GetResponseStream() ' Get Response
Dim webStreamReader As New StreamReader(webStream)
Dim htmldoc As New HtmlAgilityPack.HtmlDocument
htmldoc.LoadHtml(webStreamReader.ReadToEnd())
Dim nodes As HtmlAgilityPack.HtmlNodeCollection = htmldoc.DocumentNode.SelectNodes("//table/tbody/tr")
Dim dtTable As New DataTable("Table1")
Dim Headers As List(Of String) = nodes(0).Elements("th").Select(Function(x) x.InnerText.Trim).ToList
For Each Hr In Headers
dtTable.Columns.Add(Hr)
Next
For Each node As HtmlAgilityPack.HtmlNode In nodes
Dim Row = node.Elements("td").Select(Function(x) x.InnerText.Trim).ToArray
dtTable.Rows.Add(Row)
Next
dtTable.WriteXml("G:\1.xml", XmlWriteMode.WriteSchema)
End Sub
How to parse an html table containing images to a Datatable and saving images as binary or saving their links using VB.net
I found the answer finally. images look like:
<img src="img.jpg"/>
We can use
.SelectNodes("./img").Attributes("src").Value()
To return the image path on the node containing it

Download Json Obeject from URL by VBA

I have a MS Access project that requires me retrieve and parse a Json object from a URL. I have done parse part, but I cannot figure out the correct way to retrieve the Json from the URL. If I copy and paste the URL on IE, it will automatically download the Json object as .json file for me. I have searched solution by Google, and none of them works for me. I think the problem is that the URL looks like "https://******.com/rest/external/session/123", which is not similar to a standard XML HTTP request URL. So most solutions which use XMLHTTP request does not work for me.
I have tried to use following code to get it from URL. But all I get is homepage DOM tree instead of Json.
Dim wb As XMLHTTP
Set wb = New XMLHTTP
wb.Open "POST", "https://******.com/rest/external/session/123", False
wb.send
Do Until wb.Status = 200 And wb.ReadyState = 4
DoEvents
Loop
Debug.Print wb.responseText
Anyone has any idea about what I should do here?
Any help is appreciated!
Updated:
I have tried both POST and GET http request. And it gave me the same result
Following are the processes captured by fiddler.
This is captured processes if I copy the url directly on IE
This is captured processes if I use the code above
Just explaining the code logic below. You will need to work on it to build your own code.
Option Compare Database
Dim ApiUrl As String
Dim reader As New XMLHTTP60
Dim coll As Collection
Dim Json As New clsJSONParser
Public Sub ApiInitalisation()
ApiUrl = "http://private-anon-73376961e-count.apiary-mock.com/"
End Sub
Public Sub GetPerson()
On Error GoTo cmdLogIn_Click_Err
'For API
Dim db As DAO.Database
Dim rs As DAO.Recordset
Dim contact As Variant
Api.ApiInitalisation
ApiUrl = ApiUrl & "users/5428a72c86abcdee98b7e359"
reader.Open "GET", ApiUrl, False
'reader.setRequestHeader "Accept", "application/json"
reader.send
'Temporay variable to store the response
Dim egTran As String
' Add data to Table
If reader.Status = 200 Then
Set db = CurrentDb
Set rs = db.OpenRecordset("tblPerson", dbOpenDynaset, dbSeeChanges)
egTran = "[" & reader.responseText & "]"
Set coll = Json.parse(egTran)
For Each contact In coll
rs.AddNew
rs!FName = contact.Item("name")
rs!Mobile = contact.Item("phoneNumber")
rs!UserID = contact.Item("deviceId")
rs!SID = contact.Item("_id")
rs.Update
Next
Else
MsgBox "Unable to import data."
End If
End Sub