Actually I have full html source code of the website ..I want to extract data between the specific div tag
here is my code..
Dim request As WebRequest = WebRequest.Create("https://www.crowdsurge.com/store/index.php?storeid=1056&menu=detail&eventid=41815")
Using response As WebResponse = request.GetResponse()
Using reader As New StreamReader(response.GetResponseStream())
html = reader.ReadToEnd()
End Using
End Using
Dim pattern1 As String = "<div class = ""ei_value ei_date"">(.*)"
Dim m As Match = Regex.Match(html, pattern1)
If m.Success Then
MsgBox(m.Groups(1).Value)
End If
An easier approach for parsing HTML (especially from a source that you don't control) is to use the HTML Agility Pack, which would allow you to do something a little like:
Dim req As WebRequest = WebRequest.Create("https://www.crowdsurge.com/store/index.php?storeid=1056&menu=detail&eventid=41815")
Dim doc As New HtmlDocument()
Using res As WebResponse = req.GetResponse()
doc.Load(res.GetResponseStream())
End Using
Dim nodes = doc.DocumentNode.SelectNodes("//div[#class='ei_value ei_date']")
If nodes IsNot Nothing Then
For Each var node in nodes
MsgBox(node.InnerText)
Next
End IF
(I've assumed Option Infer)
Try that:
Dim pattern1 As String = "<div class\s*=\s*""ei_value ei_date"">(.*?)</div>"
or
Dim pattern1 As String = "<div class=""ei_value ei_date"">(.*?)</div>"
Related
i used the following code to parse html table inner text to datatable (using Html-Agility-Pack):
Imports System.Net
Public Sub ParseHtmlTable(byval HtmlFilePath as String)
Dim webStream As Stream
Dim webResponse = ""
Dim req As FileWebRequest
Dim res As FileWebResponse
' REQUEST PAGE (We are requesting Google Finance Page with NSE:RENUKA Stock Info
req = WebRequest.Create("file:///" & HtmlFilePath)
req.Method = "GET" ' Method of sending HTTP Request(GET/POST)
res = req.GetResponse ' Send Request
webStream = res.GetResponseStream() ' Get Response
Dim webStreamReader As New StreamReader(webStream)
Dim htmldoc As New HtmlAgilityPack.HtmlDocument
htmldoc.LoadHtml(webStreamReader.ReadToEnd())
Dim nodes As HtmlAgilityPack.HtmlNodeCollection = htmldoc.DocumentNode.SelectNodes("//table/tbody/tr")
Dim dtTable As New DataTable("Table1")
Dim Headers As List(Of String) = nodes(0).Elements("th").Select(Function(x) x.InnerText.Trim).ToList
For Each Hr In Headers
dtTable.Columns.Add(Hr)
Next
For Each node As HtmlAgilityPack.HtmlNode In nodes
Dim Row = node.Elements("td").Select(Function(x) x.InnerText.Trim).ToArray
dtTable.Rows.Add(Row)
Next
dtTable.WriteXml("G:\1.xml", XmlWriteMode.WriteSchema)
End Sub
How to parse an html table containing images to a Datatable and saving images as binary or saving their links using VB.net
I found the answer finally. images look like:
<img src="img.jpg"/>
We can use
.SelectNodes("./img").Attributes("src").Value()
To return the image path on the node containing it
i made an webrequestto get an htmlcode of an website and then i extract the
the wanted links with htmlagilitypack
like this :
'webrequest'
Dim rt As String = TextBox1.Text
Dim wRequest As WebRequest
Dim WResponse As WebResponse
Dim SR As StreamReader
wRequest = FtpWebRequest.Create(rt)
WResponse = wRequest.GetResponse
SR = New StreamReader(WResponse.GetResponseStream)
rt = SR.ReadToEnd
TextBox2.Text = rt
'htmlagility to extract the links'
Dim htmlDoc1 As New HtmlDocument()
htmlDoc1.LoadHtml(rt)
Dim links = htmlDoc1.DocumentNode.SelectNodes("//*[#id='catlist-listview']/ul/li/a")
Dim hrefs = links.Cast(Of HtmlNode).Select(Function(x) x.GetAttributeValue("href", ""))
'join the `hrefs`, separated by newline, into one string'
textbox3.text = String.Join(Environment.NewLine, hrefs)
the links are like this :
http://wantedlink1
http://wantedlink2
http://wantedlink3
http://wantedlink4
http://wantedlink5
http://wantedlink6
http://wantedlink7
Now i want to add every line in the string to listbox instead of textbox
one item for each line
THERE IS ABOUT 400 http://wantedlink
hrefs in your case already contained IEnumerable(Of String). Joining them into one string and then split it again to make it work is weird. Since String.Split() returns array, maybe you only need to project hrefs into array to make .AddRange() to work :
ListBox1.Items.AddRange(hrefs.ToArray())
Use the AddRange method of the listbox's items collection and pass it the lines array of the textbox.
AddRange
Lines
Hint: It's one line of code.
its ok i find the answer
Dim linklist = String.Join(Environment.NewLine, hrefs)
Dim parts As String() = linklist.Split(New String() {Environment.NewLine},
StringSplitOptions.None)
ListBox1.Items.AddRange(parts)
this add all the 400 links to the listbox
I'm trying to write a function that returns an "htmlDocument" using "HttpWebRequest" instead of a browser but I'm stuck with transferring of innerhtml.
I don't understand how to set value of "mWebPage" because VB doesn't accept "New" for HTMLDocument
I know that I can use "HtmlAgilityPack" but I would like to test my current code, changing only web request and not to change all parsing code.(To do this I need an HtmlDocument)
After this test, I'll try to change also the parsing code.
Function mWebRe(ByVal mUrl As String) As HTMLDocument
Dim request As HttpWebRequest = CType(WebRequest.Create(mUrl), HttpWebRequest)
' Set some reasonable limits on resources used by this request
request.MaximumAutomaticRedirections = 4
request.MaximumResponseHeadersLength = 4
' Set credentials to use for this request.
request.Credentials = CredentialCache.DefaultCredentials
'Here I've tryed many types
Dim mWebPage As HTMLDocument
Try
Dim request2 As HttpWebRequest = WebRequest.Create(mUrl)
Dim response2 As HttpWebResponse = request2.GetResponse()
Dim reader2 As StreamReader = New StreamReader(response2.GetResponseStream())
Dim WebContent As String = reader2.ReadToEnd()
'This is my last attempt
'This gives Null Reference Exception
mWebPage.Body.InnerHtml = WebContent
Catch ex As Exception
MsgBox(ex.ToString)
End Try
Return mWebPage
End Function
I've tryed many ways (also import HTML Object Library) but nothing worked :(
Okay this is becoming more of a hack by the minute, but this should work.
First, you'll need to instantiate your WebBrowser control at the class level:
Private m_objWebBrowser As WebBrowser
Next add an Event Handler for the DocumentCompleted Event that contains all your HTML parsing data. You get an instance of the HtmlDocument using the OpenNew method of the WebBrowser control.
Private Sub HandleParsing(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
'Use your code for generating WebContent.
Dim WebContent As String = "<html></html>"
Dim mWebPage As HtmlDocument = DirectCast(sender, WebBrowser).Document.OpenNew(True)
mWebPage.Write(WebContent)
End Sub
Finally, you can trigger all of this by wiring up the Event Handler and navigating to some page or Html file on disk (DocumentCompleted fires asynchronously):
AddHandler m_objWebBrowser.DocumentCompleted, AddressOf HandleParsing
m_objWebBrowser.Navigate("www.google.com")
I found a solution on the web and modified my code as below:
To make it work you must activate reference to "Microsoft HTML object library" (in .Com references)
It is obsolete but it seems to be the only way to make an html document without using webbrowser.
I Hope it helps someone else.
Function mWebRe(ByVal mUrl As String) As MSHTML.HTMLDocument
Dim request As HttpWebRequest = WebRequest.Create(mUrl)
Dim doc As MSHTML.IHTMLDocument2 = New MSHTML.HTMLDocument
' Set some reasonable limits on resources used by this request
request.MaximumAutomaticRedirections = 4
request.MaximumResponseHeadersLength = 4
' Set credentials to use for this request.
request.Credentials = CredentialCache.DefaultCredentials
Try
Dim response As HttpWebResponse = request.GetResponse()
Dim reader As StreamReader = New StreamReader(response.GetResponseStream())
Dim WebContent As String = reader.ReadToEnd()
doc.clear()
doc.write(WebContent)
doc.close()
'To make sure that the data is fully load.
While (doc.readyState <> "complete")
'This for more waiting (if needed)
'System.Threading.Thread.Sleep(1000)
Application.DoEvents()
End While
Catch ex As Exception
MsgBox(ex.ToString)
End Try
Return doc
End Function
I've wrote a script to create a HTML file based on a SQL Query.... It has become necessary to have that HTML be emailed. Most of our execs use blackberry's and I want to send the HTML file as the body. I have found a round about way to get this done, by adding a WebBrowser, and having the web browser then load the file, and then using the below code to send. The problem i'm facing is if I automate the code fully, it will only email part of the HTML document, now if I add a button, and make it do the email function, it sends correctly. I have added a wait function in several different location, thinking it may be an issue with the HTML not being fully created before emailing. I have to get this 100% automated. Is there a way I can use the .HTMLBody to link to the actual HTML file stored on the C:(actual path is C:\Turnover.html). Thanks all for any help.
Public Sub Email()
Dim strdate
Dim iCfg As Object
Dim iMsg As Object
strdate = Date.Today.TimeOfDay
iCfg = CreateObject("CDO.Configuration")
iMsg = CreateObject("CDO.Message")
With iCfg.Fields
.Item("http://schemas.microsoft.com/cdo/configuration/sendusing") = 1
.Item("http://schemas.microsoft.com/cdo/configuration/smtpserverport") = 25
.Item("http://schemas.microsoft.com/cdo/configuration/smtpserver") = "xxxxx.com"
.Item("http://schemas.microsoft.com/cdo/configuration/smtpauthenticate") = 1
.Item("http://schemas.microsoft.com/cdo/configuration/sendemailaddress") = """Turnover Report"" <TurnoverReports#xxxxx.com>"
.Update()
End With
With iMsg
.Configuration = iCfg
.Subject = "Turnover Report"
.To = "xxxxx#xxxxx.com"
'.Cc = ""
.HTMLBody = WebBrowserReportView.DocumentText
.Send()
End With
iMsg = Nothing
iCfg = Nothing
End Sub
used the below function to read in a local html file. then set
TextBox2.Text = getHTML("C:\Turnover2.html")
and also
.HTMLBody = TextBox2.Text
Private Function getHTML(ByVal address As String) As String
Dim rt As String = ""
Dim wRequest As WebRequest
Dim wResponse As WebResponse
Dim SR As StreamReader
wrequest = WebRequest.Create(address)
wResponse = wrequest.GetResponse
SR = New StreamReader(wResponse.GetResponseStream)
rt = SR.ReadToEnd
SR.Close()
Return rt
End Function
I have tried a few things like converting HTML to XML and then using an XML navigator to get input elements but I get lost whenever I start this process.
What I am trying to do is to navigate to a website which will be loaded using textbox1.text
Then download the html and parse out the input elements like . username, password, etc and place the element by type (id or name) into the richtextbox with the attribute beside the name.
Example.
Username id="username"
Password id="password"
Any clues or how to properly execute an HTML to XML conveter, reader, parser?
Thanks
It sounds like you just need a good HTML parsing library (instead of trying to use an XML parser). The HTML Agility Pack often fits this need. There are other options as well.
Somthing like below uses a streamreader to extract the source of the page into a string result
Dim uri As String = "https://www.yourUrl.com"
Dim request As HttpWebRequest = CType(WebRequest.Create(uri), HttpWebRequest)
Dim objRequest As HttpWebRequest = WebRequest.Create(uri)
Dim result As String
objRequest.Method = "GET"
Dim objResponse As HttpWebResponse = objRequest.GetResponse()
Dim sr As StreamReader
sr = New StreamReader(objResponse.GetResponseStream())
result = sr.ReadToEnd()
sr.Close
Then use regular expression (regex) to extra the attributes needed. for example something like this
Dim pattern As String = "(?<=Username id="")\w+"
Dim m0 As MatchCollection = Regex.Matches(result, pattern, RegexOptions.Singleline)
Dim m As Match
Dim k As Integer = 0
dim strUserID as String = ""
For Each m In m0
'extract the values for username id
strUserID = m0[k].Value;
k=k+1
Next
You'll need to change the pattern so it can pick up the other attributes you want to find, but this shouldn't be difficult