Parsing html table containing images to datatable attribute - html

i used the following code to parse html table inner text to datatable (using Html-Agility-Pack):
Imports System.Net
Public Sub ParseHtmlTable(byval HtmlFilePath as String)
Dim webStream As Stream
Dim webResponse = ""
Dim req As FileWebRequest
Dim res As FileWebResponse
' REQUEST PAGE (We are requesting Google Finance Page with NSE:RENUKA Stock Info
req = WebRequest.Create("file:///" & HtmlFilePath)
req.Method = "GET" ' Method of sending HTTP Request(GET/POST)
res = req.GetResponse ' Send Request
webStream = res.GetResponseStream() ' Get Response
Dim webStreamReader As New StreamReader(webStream)
Dim htmldoc As New HtmlAgilityPack.HtmlDocument
htmldoc.LoadHtml(webStreamReader.ReadToEnd())
Dim nodes As HtmlAgilityPack.HtmlNodeCollection = htmldoc.DocumentNode.SelectNodes("//table/tbody/tr")
Dim dtTable As New DataTable("Table1")
Dim Headers As List(Of String) = nodes(0).Elements("th").Select(Function(x) x.InnerText.Trim).ToList
For Each Hr In Headers
dtTable.Columns.Add(Hr)
Next
For Each node As HtmlAgilityPack.HtmlNode In nodes
Dim Row = node.Elements("td").Select(Function(x) x.InnerText.Trim).ToArray
dtTable.Rows.Add(Row)
Next
dtTable.WriteXml("G:\1.xml", XmlWriteMode.WriteSchema)
End Sub
How to parse an html table containing images to a Datatable and saving images as binary or saving their links using VB.net

I found the answer finally. images look like:
<img src="img.jpg"/>
We can use
.SelectNodes("./img").Attributes("src").Value()
To return the image path on the node containing it

Related

Add json object into http request as body in vb.net

I've one API in vb.net and i want to make HTTP request to another API from one of my methods. I've created an instance of WebClient class and added my headers but the problem is i could not add my json object as body into HTTP request. Here is my code.
Dim webc As New WebClient
webc.Headers.Add("Content-Type: application/json")
webc.Headers.Add("Authorization: " + testHeader)
webc.Headers("x-iyzi-rnd") = random_string
Dim url = "https://sandbox-api.iyzipay.com/"
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
Dim html As Byte() = webc.DownloadData(url)
Dim utf As UTF8Encoding = New UTF8Encoding()
Dim response As String = utf.GetString(html)

Download Json Obeject from URL by VBA

I have a MS Access project that requires me retrieve and parse a Json object from a URL. I have done parse part, but I cannot figure out the correct way to retrieve the Json from the URL. If I copy and paste the URL on IE, it will automatically download the Json object as .json file for me. I have searched solution by Google, and none of them works for me. I think the problem is that the URL looks like "https://******.com/rest/external/session/123", which is not similar to a standard XML HTTP request URL. So most solutions which use XMLHTTP request does not work for me.
I have tried to use following code to get it from URL. But all I get is homepage DOM tree instead of Json.
Dim wb As XMLHTTP
Set wb = New XMLHTTP
wb.Open "POST", "https://******.com/rest/external/session/123", False
wb.send
Do Until wb.Status = 200 And wb.ReadyState = 4
DoEvents
Loop
Debug.Print wb.responseText
Anyone has any idea about what I should do here?
Any help is appreciated!
Updated:
I have tried both POST and GET http request. And it gave me the same result
Following are the processes captured by fiddler.
This is captured processes if I copy the url directly on IE
This is captured processes if I use the code above
Just explaining the code logic below. You will need to work on it to build your own code.
Option Compare Database
Dim ApiUrl As String
Dim reader As New XMLHTTP60
Dim coll As Collection
Dim Json As New clsJSONParser
Public Sub ApiInitalisation()
ApiUrl = "http://private-anon-73376961e-count.apiary-mock.com/"
End Sub
Public Sub GetPerson()
On Error GoTo cmdLogIn_Click_Err
'For API
Dim db As DAO.Database
Dim rs As DAO.Recordset
Dim contact As Variant
Api.ApiInitalisation
ApiUrl = ApiUrl & "users/5428a72c86abcdee98b7e359"
reader.Open "GET", ApiUrl, False
'reader.setRequestHeader "Accept", "application/json"
reader.send
'Temporay variable to store the response
Dim egTran As String
' Add data to Table
If reader.Status = 200 Then
Set db = CurrentDb
Set rs = db.OpenRecordset("tblPerson", dbOpenDynaset, dbSeeChanges)
egTran = "[" & reader.responseText & "]"
Set coll = Json.parse(egTran)
For Each contact In coll
rs.AddNew
rs!FName = contact.Item("name")
rs!Mobile = contact.Item("phoneNumber")
rs!UserID = contact.Item("deviceId")
rs!SID = contact.Item("_id")
rs.Update
Next
Else
MsgBox "Unable to import data."
End If
End Sub

vb.net from string to listbox line by line

i made an webrequestto get an htmlcode of an website and then i extract the
the wanted links with htmlagilitypack
like this :
'webrequest'
Dim rt As String = TextBox1.Text
Dim wRequest As WebRequest
Dim WResponse As WebResponse
Dim SR As StreamReader
wRequest = FtpWebRequest.Create(rt)
WResponse = wRequest.GetResponse
SR = New StreamReader(WResponse.GetResponseStream)
rt = SR.ReadToEnd
TextBox2.Text = rt
'htmlagility to extract the links'
Dim htmlDoc1 As New HtmlDocument()
htmlDoc1.LoadHtml(rt)
Dim links = htmlDoc1.DocumentNode.SelectNodes("//*[#id='catlist-listview']/ul/li/a")
Dim hrefs = links.Cast(Of HtmlNode).Select(Function(x) x.GetAttributeValue("href", ""))
'join the `hrefs`, separated by newline, into one string'
textbox3.text = String.Join(Environment.NewLine, hrefs)
the links are like this :
http://wantedlink1
http://wantedlink2
http://wantedlink3
http://wantedlink4
http://wantedlink5
http://wantedlink6
http://wantedlink7
Now i want to add every line in the string to listbox instead of textbox
one item for each line
THERE IS ABOUT 400 http://wantedlink
hrefs in your case already contained IEnumerable(Of String). Joining them into one string and then split it again to make it work is weird. Since String.Split() returns array, maybe you only need to project hrefs into array to make .AddRange() to work :
ListBox1.Items.AddRange(hrefs.ToArray())
Use the AddRange method of the listbox's items collection and pass it the lines array of the textbox.
AddRange
Lines
Hint: It's one line of code.
its ok i find the answer
Dim linklist = String.Join(Environment.NewLine, hrefs)
Dim parts As String() = linklist.Split(New String() {Environment.NewLine},
StringSplitOptions.None)
ListBox1.Items.AddRange(parts)
this add all the 400 links to the listbox

get HTMLDocument from HttpWebRequest without HtmlAgilityPack

I'm trying to write a function that returns an "htmlDocument" using "HttpWebRequest" instead of a browser but I'm stuck with transferring of innerhtml.
I don't understand how to set value of "mWebPage" because VB doesn't accept "New" for HTMLDocument
I know that I can use "HtmlAgilityPack" but I would like to test my current code, changing only web request and not to change all parsing code.(To do this I need an HtmlDocument)
After this test, I'll try to change also the parsing code.
Function mWebRe(ByVal mUrl As String) As HTMLDocument
Dim request As HttpWebRequest = CType(WebRequest.Create(mUrl), HttpWebRequest)
' Set some reasonable limits on resources used by this request
request.MaximumAutomaticRedirections = 4
request.MaximumResponseHeadersLength = 4
' Set credentials to use for this request.
request.Credentials = CredentialCache.DefaultCredentials
'Here I've tryed many types
Dim mWebPage As HTMLDocument
Try
Dim request2 As HttpWebRequest = WebRequest.Create(mUrl)
Dim response2 As HttpWebResponse = request2.GetResponse()
Dim reader2 As StreamReader = New StreamReader(response2.GetResponseStream())
Dim WebContent As String = reader2.ReadToEnd()
'This is my last attempt
'This gives Null Reference Exception
mWebPage.Body.InnerHtml = WebContent
Catch ex As Exception
MsgBox(ex.ToString)
End Try
Return mWebPage
End Function
I've tryed many ways (also import HTML Object Library) but nothing worked :(
Okay this is becoming more of a hack by the minute, but this should work.
First, you'll need to instantiate your WebBrowser control at the class level:
Private m_objWebBrowser As WebBrowser
Next add an Event Handler for the DocumentCompleted Event that contains all your HTML parsing data. You get an instance of the HtmlDocument using the OpenNew method of the WebBrowser control.
Private Sub HandleParsing(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
'Use your code for generating WebContent.
Dim WebContent As String = "<html></html>"
Dim mWebPage As HtmlDocument = DirectCast(sender, WebBrowser).Document.OpenNew(True)
mWebPage.Write(WebContent)
End Sub
Finally, you can trigger all of this by wiring up the Event Handler and navigating to some page or Html file on disk (DocumentCompleted fires asynchronously):
AddHandler m_objWebBrowser.DocumentCompleted, AddressOf HandleParsing
m_objWebBrowser.Navigate("www.google.com")
I found a solution on the web and modified my code as below:
To make it work you must activate reference to "Microsoft HTML object library" (in .Com references)
It is obsolete but it seems to be the only way to make an html document without using webbrowser.
I Hope it helps someone else.
Function mWebRe(ByVal mUrl As String) As MSHTML.HTMLDocument
Dim request As HttpWebRequest = WebRequest.Create(mUrl)
Dim doc As MSHTML.IHTMLDocument2 = New MSHTML.HTMLDocument
' Set some reasonable limits on resources used by this request
request.MaximumAutomaticRedirections = 4
request.MaximumResponseHeadersLength = 4
' Set credentials to use for this request.
request.Credentials = CredentialCache.DefaultCredentials
Try
Dim response As HttpWebResponse = request.GetResponse()
Dim reader As StreamReader = New StreamReader(response.GetResponseStream())
Dim WebContent As String = reader.ReadToEnd()
doc.clear()
doc.write(WebContent)
doc.close()
'To make sure that the data is fully load.
While (doc.readyState <> "complete")
'This for more waiting (if needed)
'System.Threading.Thread.Sleep(1000)
Application.DoEvents()
End While
Catch ex As Exception
MsgBox(ex.ToString)
End Try
Return doc
End Function

Extract specific html string from html source code(website) in vb.net

Actually I have full html source code of the website ..I want to extract data between the specific div tag
here is my code..
Dim request As WebRequest = WebRequest.Create("https://www.crowdsurge.com/store/index.php?storeid=1056&menu=detail&eventid=41815")
Using response As WebResponse = request.GetResponse()
Using reader As New StreamReader(response.GetResponseStream())
html = reader.ReadToEnd()
End Using
End Using
Dim pattern1 As String = "<div class = ""ei_value ei_date"">(.*)"
Dim m As Match = Regex.Match(html, pattern1)
If m.Success Then
MsgBox(m.Groups(1).Value)
End If
An easier approach for parsing HTML (especially from a source that you don't control) is to use the HTML Agility Pack, which would allow you to do something a little like:
Dim req As WebRequest = WebRequest.Create("https://www.crowdsurge.com/store/index.php?storeid=1056&menu=detail&eventid=41815")
Dim doc As New HtmlDocument()
Using res As WebResponse = req.GetResponse()
doc.Load(res.GetResponseStream())
End Using
Dim nodes = doc.DocumentNode.SelectNodes("//div[#class='ei_value ei_date']")
If nodes IsNot Nothing Then
For Each var node in nodes
MsgBox(node.InnerText)
Next
End IF
(I've assumed Option Infer)
Try that:
Dim pattern1 As String = "<div class\s*=\s*""ei_value ei_date"">(.*?)</div>"
or
Dim pattern1 As String = "<div class=""ei_value ei_date"">(.*?)</div>"