Get website's inner text without webbrowser - html

I want to get website's inner text through code.
I can already get it's inner html with code below, but i can't find any code that's getting URL's inner text without webbrowser.
This code is getting text from website in webbrowser, but i need same thing, just without webbrowser.
Dim sourceString As String = WebBrowser1.Document.Body.InnerText

With HtmlAgilityPack...
Private Sub ToolStripButton1_Click(sender As Object, e As EventArgs) Handles ToolStripButton1.Click
Dim doc As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument
With New Net.WebClient
doc.LoadHtml(.DownloadString("https://example.com"))
.Dispose()
End With
Debug.Print(doc.DocumentNode.Name)
PrintChildNodes(doc.DocumentNode)
Debug.Print(doc.DocumentNode.Element("html").Element("body").InnerText)
End Sub
Sub PrintChildNodes(Node As HtmlAgilityPack.HtmlNode, Optional Indent As Integer = 1)
For Each Child As HtmlAgilityPack.HtmlNode In Node.ChildNodes
Debug.Print("{0}{1}", String.Empty.PadLeft(Indent, vbTab), Child.Name)
PrintChildNodes(Child, Indent + 1)
Next
End Sub

**Taken from **
Wolfwyrd
In this question HTTP GET in VB.NET
Try
Dim fr As System.Net.HttpWebRequest
Dim targetURI As New Uri("http://whatever.you.want.to.get/file.html")
fr = DirectCast(HttpWebRequest.Create(targetURI), System.Net.HttpWebRequest)
If (fr.GetResponse().ContentLength > 0) Then
Dim str As New System.IO.StreamReader(fr.GetResponse().GetResponseStream())
Response.Write(str.ReadToEnd())
str.Close();
End If
Catch ex As System.Net.WebException
'Error in accessing the resource, handle it
End Try
You will get Html as well as http headers. Don't think this will work by itself with https.

Related

How can I show my JSON results in a Textbox instead of writing to the Console?

I'm running into a little problem that I haven't found a way to to solve.
I haven't found a forum where this specific problem is addressed, I really hope to find some help.
Here is my code:
Imports System.IO
Imports System.Net
Imports Newtonsoft.Json.Linq
Public Class Form1
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim request As HttpWebRequest
Dim response As HttpWebResponse = Nothing
Dim reader As StreamReader
request = DirectCast(WebRequest.Create("https://pastebin.com/raw/dWjmfW8N"), HttpWebRequest)
response = DirectCast(request.GetResponse(), HttpWebResponse)
reader = New StreamReader(response.GetResponseStream())
Dim jsontxt As String
jsontxt = reader.ReadToEnd()
Dim myJObject = JObject.Parse(jsontxt)
For Each match In myJObject("matches")
Console.WriteLine(match("http")("host").ToString)
Next
End Sub
End Class
Here is the output:
223.16.205.13
190.74.163.58
71.7.168.29
117.146.53.244
31.170.146.28
118.36.122.169
123.7.117.78
113.61.154.182
36.48.37.191
113.253.179.234
124.13.29.41
180.122.74.183
121.157.114.93
39.78.35.216
176.82.1.100
201.143.142.75
222.117.29.229
89.228.209.185
59.153.89.245
148.170.162.37
112.160.243.23
62.101.254.177
190.141.161.149
121.132.177.79
79.165.124.174
118.39.91.43
220.83.82.58
220.161.101.195
190.218.188.86
123.241.174.77
219.71.218.113
81.198.205.2
1.64.205.1
190.204.66.180
203.163.241.36
36.34.148.33
221.124.127.89
115.29.210.231
39.121.63.13
178.160.38.191
117.146.55.217
149.91.99.49
220.93.231.104
49.245.71.40
211.44.70.107
37.119.247.51
222.101.54.200
178.163.102.223
119.198.145.129
188.26.240.141
115.29.233.160
190.164.29.145
94.133.185.144
181.37.196.134
116.88.213.9
115.2.194.11
1.226.12.161
178.63.73.210
49.149.194.242
14.32.29.251
59.0.191.68
58.122.168.43
142.129.230.137
105.145.89.51
201.243.97.65
175.37.162.102
186.88.141.126
105.148.43.100
60.179.173.21
69.115.51.207
90.171.193.132
14.64.76.165
121.127.95.80
175.211.168.48
99.240.74.72
58.153.174.2
119.77.168.142
121.170.47.232
58.243.20.124
199.247.243.234
47.111.76.211
93.72.213.251
218.32.44.73
220.83.90.204
119.158.102.20
95.109.55.204
106.5.19.223
190.199.215.69
190.218.57.249
36.102.72.163
219.78.162.215
177.199.151.96
196.93.125.34
211.58.150.166
180.131.163.40
93.156.97.81
159.89.22.81
130.0.55.156
186.93.202.111
195.252.44.173
What I want to do is to transfer that console output to my Textbox1.Text. Can anyone please show me a way to solve this?
A somewhat simplified method, using WebClient's DownloadStringTaskAsync to download the JSON.
You don't need special treatment here, strings that represent IpAddresses are just numbers and dots and the source encoding is probably UTF8.
After that, just parse the JSON and Select() the property values you care about, transform the resulting Enumerable(Of JToken) to an array of strings and set the array as the source of a TextBox.Lines property.
You can store the lines collection for any other use, in case it's needed.
Private Async Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Using client As New WebClient()
Dim json = Await client.DownloadStringTaskAsync([The URL])
Dim parsed = JObject.Parse(json)
Dim lines = parsed("matches").
Where(Function(jt) jt("http") IsNot Nothing).
Select(Function(jt) jt("http")("host").ToString()).ToArray()
TextBox1.Lines = lines
End Using
End Sub
There's no need to transfer anything. If you want the data in a TextBox then put it in a TextBox. You can then output the same data using Console.WriteLine or Debug.WriteLine. You can use a loop:
Dim hosts As New List(Of String)
For Each match In myJObject("matches")
hosts.Add(match("http")("host").ToString())
Next
Dim text = String.Join(Environment.NewLine, hosts)
myTextBox.Text = text
Console.WriteLine(text)
You could also use LINQ:
Dim text = String.Join(Environment.NewLine, myJObject("matches").Select(Function(match) match("http")("host").ToString()))
myTextBox.Text = text
Console.WriteLine(text)
Alternative approach to display collection of things in Winforms are ListView, DataGridView or other collection controls depends on desired usage.
Add ListView control in designer and next code will fill it with received values.
Shared ReadOnly client As HttpClient = New HttpClient()
Private Async Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim response As HttpResponseMessage =
Await client.GetAsync("https://pastebin.com/raw/dWjmfW8N")
response.EnsureSuccessStatusCode()
Dim jsonBody As String = Await response.Content.ReadAsStringAsync()
Dim myJObject = JObject.Parse(jsonBody)
ListView1.Items.Clear()
For Each match In myJObject("matches")
ListView1.Items.Add(match("http")("host").ToString)
Next
End Sub

Convert image stored on remote folder to base64 in web app

In my web app i need to convert images on catalog to string
Catalogue are created in a listview with .NET Framework 4 using webform
Protected Sub ProductsLv_ItemDataBound(ByVal sender As Object, ByVal e As System.Web.UI.WebControls.ListViewItemEventArgs) Handles ProductsLv.ItemDataBound
If e.Item.ItemType = ListViewItemType.DataItem Then
Dim dataRow = DirectCast(e.Item.DataItem, DataRowView)
Dim path = Replace("~/Products/Immagine.ashx?FileName=" & dataRow("ImageName"), "\\machine\Foto\", "")
path = "http://" & Me.Request.Url.Host & ResolveUrl(path)
Dim sBase64 As String = "data:image/jpeg;base64,"
Using w As New System.Net.WebClient()
Dim buffer As Byte() = w.DownloadData(path)
sBase64 &= Convert.ToBase64String(buffer)
End Using
DirectCast(e.Item.FindControl("myIMG"), System.Web.UI.WebControls.Image).ImageUrl = sBase64
End If
End Sub
Code above seems works well because i found string on image source of html produced by server response
but images are not visible
What mistake i making?
The value of sBase64 is

ImageUrl needs a relative or absolute path. So try setting “src” attribute instead, as follows.
DirectCast(e.Item.FindControl("myIMG"), System.Web.UI.WebControls.Image).Attributes("src") = sBase64

get HTMLDocument from HttpWebRequest without HtmlAgilityPack

I'm trying to write a function that returns an "htmlDocument" using "HttpWebRequest" instead of a browser but I'm stuck with transferring of innerhtml.
I don't understand how to set value of "mWebPage" because VB doesn't accept "New" for HTMLDocument
I know that I can use "HtmlAgilityPack" but I would like to test my current code, changing only web request and not to change all parsing code.(To do this I need an HtmlDocument)
After this test, I'll try to change also the parsing code.
Function mWebRe(ByVal mUrl As String) As HTMLDocument
Dim request As HttpWebRequest = CType(WebRequest.Create(mUrl), HttpWebRequest)
' Set some reasonable limits on resources used by this request
request.MaximumAutomaticRedirections = 4
request.MaximumResponseHeadersLength = 4
' Set credentials to use for this request.
request.Credentials = CredentialCache.DefaultCredentials
'Here I've tryed many types
Dim mWebPage As HTMLDocument
Try
Dim request2 As HttpWebRequest = WebRequest.Create(mUrl)
Dim response2 As HttpWebResponse = request2.GetResponse()
Dim reader2 As StreamReader = New StreamReader(response2.GetResponseStream())
Dim WebContent As String = reader2.ReadToEnd()
'This is my last attempt
'This gives Null Reference Exception
mWebPage.Body.InnerHtml = WebContent
Catch ex As Exception
MsgBox(ex.ToString)
End Try
Return mWebPage
End Function
I've tryed many ways (also import HTML Object Library) but nothing worked :(
Okay this is becoming more of a hack by the minute, but this should work.
First, you'll need to instantiate your WebBrowser control at the class level:
Private m_objWebBrowser As WebBrowser
Next add an Event Handler for the DocumentCompleted Event that contains all your HTML parsing data. You get an instance of the HtmlDocument using the OpenNew method of the WebBrowser control.
Private Sub HandleParsing(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
'Use your code for generating WebContent.
Dim WebContent As String = "<html></html>"
Dim mWebPage As HtmlDocument = DirectCast(sender, WebBrowser).Document.OpenNew(True)
mWebPage.Write(WebContent)
End Sub
Finally, you can trigger all of this by wiring up the Event Handler and navigating to some page or Html file on disk (DocumentCompleted fires asynchronously):
AddHandler m_objWebBrowser.DocumentCompleted, AddressOf HandleParsing
m_objWebBrowser.Navigate("www.google.com")
I found a solution on the web and modified my code as below:
To make it work you must activate reference to "Microsoft HTML object library" (in .Com references)
It is obsolete but it seems to be the only way to make an html document without using webbrowser.
I Hope it helps someone else.
Function mWebRe(ByVal mUrl As String) As MSHTML.HTMLDocument
Dim request As HttpWebRequest = WebRequest.Create(mUrl)
Dim doc As MSHTML.IHTMLDocument2 = New MSHTML.HTMLDocument
' Set some reasonable limits on resources used by this request
request.MaximumAutomaticRedirections = 4
request.MaximumResponseHeadersLength = 4
' Set credentials to use for this request.
request.Credentials = CredentialCache.DefaultCredentials
Try
Dim response As HttpWebResponse = request.GetResponse()
Dim reader As StreamReader = New StreamReader(response.GetResponseStream())
Dim WebContent As String = reader.ReadToEnd()
doc.clear()
doc.write(WebContent)
doc.close()
'To make sure that the data is fully load.
While (doc.readyState <> "complete")
'This for more waiting (if needed)
'System.Threading.Thread.Sleep(1000)
Application.DoEvents()
End While
Catch ex As Exception
MsgBox(ex.ToString)
End Try
Return doc
End Function

ListBox with html element

Can anyone offer me some advice? I currently have a listbox I am using, in the listbox there is a list of images from any website. they are grabbed from the website via this method
Private Sub WebBrowser1_DocumentCompleted(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
Dim PageElements As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("img")
For Each CurElement As HtmlElement In PageElements
imagestxt.Items.Add(imagestxt.Text & CurElement.GetAttribute("src") & Environment.NewLine)
Next
Timer1.Enabled = True
End Sub
I then use the picture control method to get the image and display it.
pic1.Image = New Bitmap(New MemoryStream(New WebClient().DownloadData(imagestxtimagestxt.SelectedItem.ToString))).SelectedItem.ToString)))
This method pulls the images and title from the HTML.
Private Function StrHTML12() As Boolean
Dim htmlDocument As HtmlDocument = WebBrowser1.Document
ListBox1.Items.Clear()
For Each element As HtmlElement In htmlDocument.All
ListBox1.Items.Add(element.TagName)
If element.TagName.ToUpper = "IMG" Then
imgtags.Items.Add(element.OuterHtml.ToString)
End If
If element.TagName.ToUpper = "TITLE" Then
titletags.Items.Add(element.OuterHtml.ToString)
Timer1.Enabled = False
End If
Next
End Function
This is a counting method to count how many empty alt="" or empty img alt='' there are on the page.
Basically what i am looking to do is;
Have a program that can check the image, look at the alt='' or img alt='' if on the website the dev hasn't put anything in the alt tag i want the image to show in a picture box and i want the alt tag either next to it or underneith it or something. but i have no idea how.
counter = InStr(counter + 1, strHTML, "<img alt=''")
counter = InStr(counter + 1, strHTML, "alt=''")
counter = InStr(counter + 1, strHTML, "alt=""")
The above seems really slow and messy. is there a better way of doing it?
I do not have VB installed so I have not been able to test the code. I'm also not familiar with the datagridview component so have not attempted to integrate my code with it.
The code below should get you the title of the page, and loop through all the img tags that do not have (or have empty) alt-text
HtmlElement.GetAttribute(sAttr) returns the value of the attribute or an empty string.
Private Sub WebBrowser1_DocumentCompleted(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
Dim Title As String
Dim ImSrc As String
Dim PageElements As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("img")
// This line might need to be adjusted, see below
Title = PageElements.GetElementsByTagName("title")(0).InnerText
For Each CurElement As HtmlElement In PageElements
If CurElement.GetAttribute("alt") = "" Then
// CurElement does not have alt-text
ImSrc = CurElement.GetAttribute("src") // This Image has no Alt Text
Else
// CurElement has alt-text
End If
Next
Timer1.Enabled = True
End Sub
The line that gets the title might need to be changed as I'm unsure how collections can be accessed. You want the first (hopefully only) element returned from the GetElementsByTagName function.

vb.NET WebRequest to read aspx page to string, access denied?

I'm trying to make an executable in VS2008 that will read a webpage source code using a vb.NET function into a string variable. The problem is that the page is not *.html but rather *.aspx.
I need a way to execute the aspx and get the displayed html into a string.
The page I want to read is any page of this type: http://www.realtor.ca/PropertyDetails.aspx?PropertyID=9620716
I have tried the following code, which works properly for html pages, but generates the wrong source code with "access denied" for the page title when I pass in the above aspx page.
Dim myReq As WebRequest = WebRequest.Create(url)
Dim myWebResponse As WebResponse = myReq.GetResponse()
Dim dataStream As Stream = myWebResponse.GetResponseStream()
Dim reader As New StreamReader(dataStream, System.Text.Encoding.UTF8)
Dim responseFromServer As String = reader.ReadToEnd()
Any suggestions or ideas?
I get the same thing while running wget from the command line:
wget http://www.realtor.ca/PropertyDetails.aspx?PropertyID=9620716
I guess the server is relying on that something is set in the browser before the response is delivered, e.g. a cookie. You might want to try using a WebBrowser control (you don't have to have it visible) in the following way (this works):
Public Class Form1
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
AddHandler WebBrowser1.DocumentCompleted, New WebBrowserDocumentCompletedEventHandler(AddressOf DocumentCompletedHandler)
WebBrowser1.Navigate("http://www.realtor.ca/PropertyDetails.aspx?PropertyID=9620716")
End Sub
Private Sub DocumentCompletedHandler(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
Console.WriteLine(WebBrowser1.DocumentText)
End Sub
End Class