VB.net Fill Textbox with HTML string - html

I have a string from an HTML enabled email of something like so:
</div><span color="red">Hi how are you?!</span></div><table><tr><td>Company Information</td></tr></table>
and so on [its a long string of stuff but you get the idea. There are <div>..<spans>..<table> and so forth.
I want to display the text in the text box formatted like html [which will format it based on the <table>..<span> and so forth while removing the actual text of <span> and so forth from the textbox's text.
I need this to happen because I get a page error because it reads the <span> and etc as being a security issue.
My current way of reading the whole text and removing the issues are like so:
If Not DsAds.Tables(0).Rows(0).Item(0) Is DBNull.Value Then
Dim bodyInitial As String = DsAds.Tables(0).Rows(0).Item(0).ToString()
Dim newbody As String = bodyInitial.Replace("<br>", vbNewLine)
newbody = newbody.Replace("<b>", "")
newbody = newbody.Replace("</b>", "")
Bodylisting.Text = newbody
End If
I tried to encorporate the following:
Dim bodyInitial As String = DsAds.Tables(0).Rows(0).Item(0).ToString()
Dim myEncodedString As String
' Encode the string.
myEncodedString = bodyInitial.HttpUtility.HtmlEncode(bodyInitial)
Dim myWriter As New StringWriter()
' Decode the encoded string.
HttpUtility.HtmlDecode(bodyInitial, myWriter)
but I get errors with HTTpUtility and strings
Question:
So my question is, is there a way to actually see the HTML formatting and fill the textbox that way, or do I have to stick with my .Replace method?
<asp:TextBox ID="Bodylisting" runat="server" style="float:left; margin:10px; padding-bottom:500px;" Width="95%" TextMode="MultiLine" ></asp:TextBox>

I suggest you investigate HtmlAgilityPack. This library includes a parser giving you the ability to 'select' the <span> tags. Once you have them, you can strip them out, or grab the InnerHtml and do further processing. This is an example of how I do something similar with it.
Private Shared Function StripHtml(html As String, xPath As String) As String
Dim htmlDoc As New HtmlDocument()
htmlDoc.LoadHtml(html)
If xPath.Length > 0 Then
Dim invalidNodes As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes(xPath)
If Not invalidNodes Is Nothing Then
For Each node As HtmlNode In invalidNodes
node.ParentNode.RemoveChild(node, True)
Next
End If
End If
Return htmlDoc.DocumentNode.WriteContentTo()
End Function

Related

VBA Microsoft HTML Object Library HTML Document HTMLUnknownElement not displaying innerHTML or innerText

Using VBA 7.1 on a MS Access for Office 365 MSO version 16.0.x 64 bit
I have a reference to Microsoft HTML Object Library (mshtml.dll) 11.0.x set
I have the following code
Dim myHTMLDoc As New HTMLDocument
Dim myEnvironTag As HTMLUnknownElement
myHTMLDoc.body.innerHTML = "some text <environ>EnvironmentURL</environ> some other text"
For Each myEnvironTag In myHTMLDoc.getElementsByTagName("environ")
MsgBox myEnvironTag.innerHTML
MsgBox myEnvironTag.innerText
Next myEnvironTag
From the string <environ>EnvironmentURL</environ>
I am trying to return the string EnvironmentURL
In the code sample above both MessageBoxes are returning zero length strings.
Any idea how to return the string inside custom tags like this?
Thanks in advance
This workaround works but... not sure why it doesn't work without the replace function changing it into an anchor tag?
Dim myHTMLDoc As New HTMLDocument
Dim myEnvironTag As HTMLAnchorElement
myHTMLDoc.body.innerHTML = Replace(Replace("some text <environ>EnvironmentURL</environ> some other text", "<environ>", "<a>"), "</environ>", "</a>")
For Each myEnvironTag In myHTMLDoc.getElementsByTagName("a")
MsgBox myEnvironTag.innerHTML
MsgBox myEnvironTag.innerText
Next myEnvironTag
I was able to run the code snippet from your question and got the desired results ("EnvironmentURL"), so I'm not sure what is the cause here.
Maybe this will depend on your version of Excel, but since HTMLDocument was designed to handle HTML code and "environ" is not a supported element in HTML, maybe an XML parser would be better suited in this task. (More info here)
With that in mind, here's an example of code to extract environ tags and print the text value between the tags :
Dim MyXml As String
'In the XML string, we need to make sure that there is a valid xml starting and closing tag surrounding our string.
MyXml = "<xml>some text <environ>EnvironmentURL</environ> some other text</xml>"
'Make sure to include the Microsoft XML Librairy to your VBA Project
Dim objXML As MSXML2.DOMDocument60 'or DOMDocument for older versions of Excel
Set objXML = New MSXML2.DOMDocument60
If Not objXML.LoadXML(MyXml) Then
Err.Raise objXML.parseError.ErrorCode, , objXML.parseError.reason
End If
Dim elem As IXMLDOMElement
For Each elem In objXML.getElementsByTagName("environ")
Debug.Print elem.text
Next

VB.net extract variable value inside <span> tags?

I'm trying to extract a decimal value that may vary from inside HTML using VB.net.
As sort of a test, here is the code I'm using:
Dim result As String = "<td class='fl'><label>Balance:</label></td><td nowrap class='fd'><span>$999,999.99</span></td></tr></table></td>"
Dim RegexResult = Regex.Match(result, "^(\$|)([1-9]\d{0,2}(\,\d{3})*|([1-9]\d*))(\.\d{2})?$")
Console.WriteLine(RegexResult)
FYI, I found that expression here:
In this example, the extracted result should be: $999999.99. This will then be modified to strip the dollar sign.
Regex result, when viewed in the Visual Studio console is {}. How do I modify the expression to account for the <span> tags?
Even if your regex would work now, don't use regex to parse dynamic HTML content.
Use an available HTML parser like HtmlAgilityPack, that's a much more reliable solution:
Dim html = "<td class='fl'><label>Balance:</label></td><td nowrap class='fd'><span>$999,999.99</span></td></tr></table></td>"
Dim doc As New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(html)
Dim td = doc.DocumentNode.SelectSingleNode("//*[contains(#class,'fd')]")
Dim spanText = td.Descendants("span").First().InnerText
Dim balance As Decimal
Dim usCulture = New CultureInfo("en-us")
Dim valid = Decimal.TryParse(spanText, NumberStyles.Currency, usCulture, balance)

vb net + getting content from a div with htmlagilitypack

Flow:
1. (OK) i download a json
2. (OK) i parse a value from the json object that contains html
3. (NOT OK) i display the values inside div.countries
my code:
Dim webClient As New System.Net.WebClient
Dim result As String = webClient.DownloadString("http://example.com/countries.json")
Dim values As JObject = JObject.Parse(result)
Dim finalHTML As String = values.GetValue("countries_html")
basically finalHTML variable looks like this:
<div class="country_name">USA</div>
<div class="country_name">Ireland</div>
<div class="country_name">Australia</div>
Im stuck and dont know how to move on.
I need to go over all div.country_name and get the inner_text of it. Hope that make sense.
Since the finalHTML string already contain only the target div elements, you can simply load the string to HtmlDocument object and use a bit of LINQ to project the divs into collection -IEnumerable, List<T>, or whatever most suitable to your need- of InnerText strings :
....
Dim finalHTML As String = values.GetValue("countries_html")
Dim doc = New HtmlDocument()
doc.LoadHtml(finalHTML)
Dim countries = doc.DocumentNode.Elements("div").Select(Function(o) o.InnerText.Trim())
'print the result as comma separated text to console:
Console.WriteLine(String.Join(",", countries))
Dotnetfiddle Demo
output :
USA,Ireland,Australia
here's a nice article on using the HAP: http://www.mikesdotnetting.com/article/273/using-the-htmlagilitypack-to-parse-html-in-asp-net.

How to extract text content from tags in .NET?

I'm trying to code a vb.net function to extract specific text content from tags; I wrote this function
Public Function GetTagContent(ByRef instance_handler As String, ByRef start_tag As String, ByRef end_tag As String) As String
Dim s As String = ""
Dim content() As String = instance_handler.Split(start_tag)
If content.Count > 1 Then
Dim parts() As String = content(1).Split(end_tag)
If parts.Count > 0 Then
s = parts(0)
End If
End If
Return s
End Function
But it doesn't work, for example with the following debug code
Dim testString As String = "<body>my example <div style=""margin-top:20px""> text to extract </div> <br /> another line.</body>"
txtOutput.Text = testString.GetTagContent("<div style=""margin-top:20px"">", "</div>")
I get only "body>my example" string, instead of "text to extract"
can anyone help me? tnx in advance
I wrote a new routine and the following code works however I would know if exists a better code for performance:
Dim s As New StringBuilder()
Dim i As Integer = instance_handler.IndexOf(start_tag, 0)
If i < 0 Then
Return ""
Else
i = i + start_tag.Length
End If
Dim j As Integer = instance_handler.IndexOf(end_tag, i)
If j < 0 Then
s.Append(instance_handler.Substring(i))
Else
s.Append(instance_handler.Substring(i, j - i))
End If
Return s.ToString
XPath is one way of accomplishing this task. I'm sure others will suggest LINQ. Here's an example using XPath:
Dim testString As String = "<body>my example <div style=""margin-top:20px""> text to extract </div> <br /> another line.</body>"
Dim doc As XmlDocument = New XmlDocument()
doc.LoadXml(testString)
MessageBox.Show(doc.SelectSingleNode("/body/div").InnerText)
Obviously, a more complex document may require a more complex xpath than simply "/body/div", but it's still pretty simple.
If you need to get a list of multiple elements that match the path, you can use doc.SelectNodes.

VB.NET ~ how does one navigate to a website and download the html then parse out code to only display input elements?

I have tried a few things like converting HTML to XML and then using an XML navigator to get input elements but I get lost whenever I start this process.
What I am trying to do is to navigate to a website which will be loaded using textbox1.text
Then download the html and parse out the input elements like . username, password, etc and place the element by type (id or name) into the richtextbox with the attribute beside the name.
Example.
Username id="username"
Password id="password"
Any clues or how to properly execute an HTML to XML conveter, reader, parser?
Thanks
It sounds like you just need a good HTML parsing library (instead of trying to use an XML parser). The HTML Agility Pack often fits this need. There are other options as well.
Somthing like below uses a streamreader to extract the source of the page into a string result
Dim uri As String = "https://www.yourUrl.com"
Dim request As HttpWebRequest = CType(WebRequest.Create(uri), HttpWebRequest)
Dim objRequest As HttpWebRequest = WebRequest.Create(uri)
Dim result As String
objRequest.Method = "GET"
Dim objResponse As HttpWebResponse = objRequest.GetResponse()
Dim sr As StreamReader
sr = New StreamReader(objResponse.GetResponseStream())
result = sr.ReadToEnd()
sr.Close
Then use regular expression (regex) to extra the attributes needed. for example something like this
Dim pattern As String = "(?<=Username id="")\w+"
Dim m0 As MatchCollection = Regex.Matches(result, pattern, RegexOptions.Singleline)
Dim m As Match
Dim k As Integer = 0
dim strUserID as String = ""
For Each m In m0
'extract the values for username id
strUserID = m0[k].Value;
k=k+1
Next
You'll need to change the pattern so it can pick up the other attributes you want to find, but this shouldn't be difficult