Is there a better way to extract HTML code using Visual Basic - html

I'm trying to extract some HTML code here, I only want the final String to say
'Entity B'. Is there a better way to do this than what I have done here?
Also this is a format for many entries, so Entity B wont always be Entity B and same for Entity C
SMethod = "<b>Entity B<br/>Entity C</b>"
SMethod = SMethod.Replace("</b>", "</c>")
SMethod = SMethod.Replace("<br/>", "</b><c>")
SMethod = "<a>" & SMethod & "</a>"
Dim ShippingMethod As XDocument = XDocument.Parse(SMethod)
SMethod = ShippingMethod.Element("a").Element("b").Value.Trim

I'm not 100% clear what your end game is, but as for the first sentence of your question where you want to take a string with HTML code in it and remove all the code, this function will remove any tag enclosed in <>:
Public Function RemoveHTML(ByVal input As String) As String
While InStr(input, "<") > 0
Dim tagStart As Integer = InStr(input, "<")
Dim tagEnd As Integer = InStr(input, ">")
input = Left(input, tagStart - 1) & Right(input, Len(input) - tagEnd)
End While
Return input
End Function
And if you're also trying to trim off anything after the <br/> tag, this will do that:
Public Function OneEntity(ByVal input As String) As String
If InStr(input, "<br/>" Then
Dim parts() As String = Split(input, "<br/>")
Return RemoveHTML(parts(0))
Else
Return RemoveHTML(input)
End If
End Function

Related

Removing a portion of HTML within a HTML page

I am trying to remove some tags with content while loading a page to restrict not sending few tags.
I was doing with search string and its not helpful for larger data set.
string startTag = "<section>"+Environment.NewLine+
" <div id=\"nonPrintable123\">";
var startIndex = htmlString.IndexOf(startTag);
var html = htmlString.Substring(0, startIndex) + "</div></form> </body></html>";
Is there any way so I could use Regex and remove /replace a whole div- child with empty string?
The Data within <Section> {data} </Section>
should be replaced with empty or any other suppression.
using String.Replace has worked for me in the past.
https://learn.microsoft.com/en-us/dotnet/api/system.string.replace?view=netframework-4.7.2
startString &= startString.Replace("<div>HTML you want to replace</div>", "")
I did with the following piece of code using vb.net:
Private Sub removehtml()
Dim str As String = " <div id=nonPrintable123> <!--# Start --> hjhjhty iuh hwjkednjkb dvhv xcaisfdchascjk bkasj df kh <!--End #-->"
Dim sindex As Integer = 0
Dim eindex As Integer = 0
sindex = str.IndexOf("<!--#")
eindex = str.IndexOf("#-->")
Dim substr As String = String.Empty
substr = str.Substring(sindex, (eindex - sindex) + 4)
str = str.Replace(substr, String.Empty)
End Sub
By this way I have removed all the non required data from given string

Parsing HTML to recreate tables in a Word Document using VBA

Is there a way of taking html code for a table and printing out the same table in a word document using VBA (VBA should be able to parse the html code block for a table)?
It is possible to take the contents of the table and copy them into a new table created in Word, however is it possible to recreate a table using the html code and vba?
For any of this, where can one begin to research?
EDIT:
Thanks to R3uK: here is the first portion of the VBA script which reads a line of html code from a file and uses R3uK's code to print it to the excel worksheet:
Private Sub button1_Click()
Dim the_string As String
the_string = Trim(ImportTextFile("path\to\file.txt"))
' still working on removing new line characters
Call PrintHTML_Table(the_string)
End Sub
Public Function ImportTextFile(strFile As String) As String
' http://mrspreadsheets.com/1/post/2013/09/vba-code-snippet-22-read-entire-text-file-into-string-variable.html
Open strFile For Input As #1
ImportTextFile = Input$(LOF(1), 1)
Close #1
End Function
' Insert R3uK's portion of the code here
This could be a good place to start, you will only need to check content after to see if there is any problem and then copy it to word.
Sub PrintHTML_Table(ByVal StrTable as String)
Dim TA()
Dim Table_String as String
Table_String = " " & StrTable & " "
TA = SplitTo2DArray(Table_String, "</tr>", "</td>")
For i = LBound(TA, 1) To UBound(TA, 1)
For j = LBound(TA, 2) To UBound(TA, 2)
ActiveSheet.Cells(i + 1, j + 1) = Trim(Replace(Replace(TA(i, j), "<td>", ""), "<tr>", ""))
Next j
Next i
End Sub
Public Function SplitTo2DArray(ByRef StringToSplit As String, ByRef RowSep As String, ByRef ColSep As String) As String()
Dim Rows As Variant
Dim rowNb As Long
Dim Columns() As Variant
Dim i As Long
Dim maxlineNb As Long
Dim lineNb As Long
Dim asCells() As String
Dim j As Long
' Split up the table value by rows, get the number of rows, and dim a new array of Variants.
Rows = Split(StringToSplit, RowSep)
rowNb = UBound(Rows)
ReDim Columns(0 To rowNb)
' Iterate through each row, and split it into columns. Find the maximum number of columns.
maxlineNb = 0
For i = 0 To rowNb
Columns(i) = Split(Rows(i), ColSep)
lineNb = UBound(Columns(i))
If lineNb > maxlineNb Then
maxlineNb = lineNb
End If
Next i
' Create a 2D string array to contain the data in <Columns>.
ReDim asCells(0 To maxlineNb, 0 To rowNb)
' Copy all the data from Columns() to asCells().
For i = 0 To rowNb
For j = 0 To UBound(Columns(i))
asCells(j, i) = Columns(i)(j)
Next j
Next i
SplitTo2DArray = asCells()
End Function

How to extract text content from tags in .NET?

I'm trying to code a vb.net function to extract specific text content from tags; I wrote this function
Public Function GetTagContent(ByRef instance_handler As String, ByRef start_tag As String, ByRef end_tag As String) As String
Dim s As String = ""
Dim content() As String = instance_handler.Split(start_tag)
If content.Count > 1 Then
Dim parts() As String = content(1).Split(end_tag)
If parts.Count > 0 Then
s = parts(0)
End If
End If
Return s
End Function
But it doesn't work, for example with the following debug code
Dim testString As String = "<body>my example <div style=""margin-top:20px""> text to extract </div> <br /> another line.</body>"
txtOutput.Text = testString.GetTagContent("<div style=""margin-top:20px"">", "</div>")
I get only "body>my example" string, instead of "text to extract"
can anyone help me? tnx in advance
I wrote a new routine and the following code works however I would know if exists a better code for performance:
Dim s As New StringBuilder()
Dim i As Integer = instance_handler.IndexOf(start_tag, 0)
If i < 0 Then
Return ""
Else
i = i + start_tag.Length
End If
Dim j As Integer = instance_handler.IndexOf(end_tag, i)
If j < 0 Then
s.Append(instance_handler.Substring(i))
Else
s.Append(instance_handler.Substring(i, j - i))
End If
Return s.ToString
XPath is one way of accomplishing this task. I'm sure others will suggest LINQ. Here's an example using XPath:
Dim testString As String = "<body>my example <div style=""margin-top:20px""> text to extract </div> <br /> another line.</body>"
Dim doc As XmlDocument = New XmlDocument()
doc.LoadXml(testString)
MessageBox.Show(doc.SelectSingleNode("/body/div").InnerText)
Obviously, a more complex document may require a more complex xpath than simply "/body/div", but it's still pretty simple.
If you need to get a list of multiple elements that match the path, you can use doc.SelectNodes.

Is it possible to write this VBA code any better?

Am I reinventing the wheel here? Is there a better way to do this? This VBA function looks for the first instance of a string in the comment field of a form in Access containing 20 characters or less, no spaces, surrounded by (~) tildes, then returns it.
Public Function ParseComment(strComment As String) As String
' This function parses the comment field of the job entry dialog for (~) tilde
' surrounded text, then returns that text.
Dim intCounter As Integer
Dim intFirstChar As Integer
Dim intLastChar As Integer
Dim strResult As String
intFirstChar = 0
intLastChar = 0
intCounter = 0
Do While (intLastChar = 0) And (intCounter < Len(strComment))
intCounter = intCounter + 1
strCharacter = Mid(strComment, intCounter, 1)
If (strCharacter = "~") Then
If intFirstChar Then
intLastChar = intCounter
Else
intFirstChar = intCounter + 1
End If
End If
Loop
strResult = Mid(strComment, intFirstChar, intLastChar - intFirstChar)
If (intLastChar - intFirstChar <= 20) And (intFirstChar <> 0 Or intLastChar <> 0) And Not InStr(strResult, " ") Then
ParseComment = strResult
End If
End Function
Thanks much.
I would use InStr to find the first and second occurences of the ~ character, something like this, rather than looping manually:
Public Function ParseComment(strComment As String) As String
' This function parses the comment field of the job entry dialog for (~) tilde
' surrounded text, then returns that text.
Dim firstTilde As Integer
Dim secondTilde As Integer
Dim strResult As String
firstTilde = 0
secondTilde = 0
strResult = ""
firstTilde = InStr(strComment, "~")
If firstTilde > 0 Then
secondTilde = InStr(firstTilde + 1, strComment, "~")
If (secondTilde > 0) And (secondTilde < 20) Then
strResult = Mid(strComment, firstTilde, secondTilde)
If InStr(strResult, " ") = 0 Then
ParseComment = strResult
End If
End If
End If
End Function
[Disclaimer, I haven't tested this!]
Using the built-in functions might be a little quicker, but don't imagine it will make a critical difference...
Something like:
Public Function getTildeDelimStringPart(inputstring As String) As String
Dim commentStart As Long, commentEnd As Long
commentStart = InStr(1, inputstring, "~")
If commentStart = 0 Then ' no tilde
getTildeDelimStringPart = vbNullString
Exit Function
End If
commentEnd = InStr(1 + commentStart, inputstring, "~")
If commentEnd = 0 Then
getTildeDelimStringPart = vbNullString
Exit Function
End If
getTildeDelimStringPart = Mid(inputstring, commentStart, commentEnd - commentStart + 1)
End Function
I see everyone has given you some more ways to do this (instr is a great way, see Vicky's answer!), so I'll just list up some tips on optimizing your code:
Use Long instead of Integer. VBA will convert them to Long every time.
Default value for Int and Long is 0 in VBA, so no need to declare them so.
Use Mid$ instead of Mid
Using Instr() would be a very effecient way to find location of ~
Fun Tip: If you do want to evaluate each character, fastest way is numeric comparision:
if Asc(Mid$(strComment, intCounter, 1)) = 126 Then
This worked for me:
Public Function ParseComment(strComment As String) As String
Dim regex As Object ' VBScript_RegExp_55.RegExp
Dim regexmatch As Object ' VBScript_RegExp_55.MatchCollection
Set regex = CreateObject("VBScript_RegExp_55.RegExp")
With regex
.MultiLine = False
.Global = False
.IgnoreCase = True
.Pattern = "(~[^ ~]{1,20}~)"
End With
Set regexmatch = regex.Execute(strComment)
If regexmatch.Count > 0 Then
ParseComment = regexmatch(0)
End If
End Function
You can add additional parsing at the end if you want to remove the tilde characters.
I tested it on the following string:
ABC~123aA%dwdD~CBA
the function returns ~123aA%dwdD~
Forgot to mention that this code requires VBScript Regular Expressions 5.5 which is located in %windir%\system32\vbscript.dll\3, although the code is late bound so you should just be able to drop it into your project.

Return multiple values from a function, sub or type?

So I was wondering, how can I return multiple values from a function, sub or type in VBA?
I've got this main sub which is supposed to collect data from several functions, but a function can only return one value it seems. So how can I return multiple ones to a sub?
You might want want to rethink the structure of you application, if you really, really want one method to return multiple values.
Either break things apart, so distinct methods return distinct values, or figure out a logical grouping and build an object to hold that data that can in turn be returned.
' this is the VB6/VBA equivalent of a struct
' data, no methods
Private Type settings
root As String
path As String
name_first As String
name_last As String
overwrite_prompt As Boolean
End Type
Public Sub Main()
Dim mySettings As settings
mySettings = getSettings()
End Sub
' if you want this to be public, you're better off with a class instead of a User-Defined-Type (UDT)
Private Function getSettings() As settings
Dim sets As settings
With sets ' retrieve values here
.root = "foo"
.path = "bar"
.name_first = "Don"
.name_last = "Knuth"
.overwrite_prompt = False
End With
' return a single struct, vb6/vba-style
getSettings = sets
End Function
You could try returning a VBA Collection.
As long as you dealing with pair values, like "Version=1.31", you could store the identifier as a key ("Version") and the actual value (1.31) as the item itself.
Dim c As New Collection
Dim item as Variant
Dim key as String
key = "Version"
item = 1.31
c.Add item, key
'Then return c
Accessing the values after that it's a breeze:
c.Item("Version") 'Returns 1.31
or
c("Version") '.Item is the default member
Does it make sense?
Ideas :
Use pass by reference (ByRef)
Build a User Defined Type to hold the stuff you want to return, and return that.
Similar to 2 - build a class to represent the information returned, and return objects of that class...
You can also use a variant array as the return result to return a sequence of arbitrary values:
Function f(i As Integer, s As String) As Variant()
f = Array(i + 1, "ate my " + s, Array(1#, 2#, 3#))
End Function
Sub test()
result = f(2, "hat")
i1 = result(0)
s1 = result(1)
a1 = result(2)
End Sub
Ugly and bug prone because your caller needs to know what's being returned to use the result, but occasionally useful nonetheless.
A function returns one value, but it can "output" any number of values. A sample code:
Function Test (ByVal Input1 As Integer, ByVal Input2 As Integer, _
ByRef Output1 As Integer, ByRef Output2 As Integer) As Integer
Output1 = Input1 + Input2
Output2 = Input1 - Input2
Test = Output1 + Output2
End Function
Sub Test2()
Dim Ret As Integer, Input1 As Integer, Input2 As Integer, _
Output1 As integer, Output2 As Integer
Input1 = 1
Input2 = 2
Ret = Test(Input1, Input2, Output1, Output2)
Sheet1.Range("A1") = Ret ' 2
Sheet1.Range("A2") = Output1 ' 3
Sheet1.Range("A3") = Output2 '-1
End Sub
you can return 2 or more values to a function in VBA or any other visual basic stuff but you need to use the pointer method called Byref. See my example below. I will make a function to add and subtract 2 values say 5,6
sub Macro1
' now you call the function this way
dim o1 as integer, o2 as integer
AddSubtract 5, 6, o1, o2
msgbox o2
msgbox o1
end sub
function AddSubtract(a as integer, b as integer, ByRef sum as integer, ByRef dif as integer)
sum = a + b
dif = b - 1
end function
Not elegant, but if you don't use your method overlappingly you can also use global variables, defined by the Public statement at the beginning of your code, before the Subs.
You have to be cautious though, once you change a public value, it will be held throughout your code in all Subs and Functions.
I always approach returning more than one result from a function by always returning an ArrayList. By using an ArrayList I can return only one item, consisting of many multiple values, mixing between Strings and Integers.
Once I have the ArrayList returned in my main sub, I simply use ArrayList.Item(i).ToString where i is the index of the value I want to return from the ArrayList
An example:
Public Function Set_Database_Path()
Dim Result As ArrayList = New ArrayList
Dim fd As OpenFileDialog = New OpenFileDialog()
fd.Title = "Open File Dialog"
fd.InitialDirectory = "C:\"
fd.RestoreDirectory = True
fd.Filter = "All files (*.*)|*.*|All files (*.*)|*.*"
fd.FilterIndex = 2
fd.Multiselect = False
If fd.ShowDialog() = DialogResult.OK Then
Dim Database_Location = Path.GetFullPath(fd.FileName)
Dim Database_Connection_Var = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=""" & Database_Location & """"
Result.Add(Database_Connection_Var)
Result.Add(Database_Location)
Return (Result)
Else
Return (Nothing)
End If
End Function
And then call the Function like this:
Private Sub Main_Load()
Dim PathArray As ArrayList
PathArray = Set_Database_Path()
My.Settings.Database_Connection_String = PathArray.Item(0).ToString
My.Settings.FilePath = PathArray.Item(1).ToString
My.Settings.Save()
End Sub
you could connect all the data you need from the file to a single string, and in the excel sheet seperate it with text to column.
here is an example i did for same issue, enjoy:
Sub CP()
Dim ToolFile As String
Cells(3, 2).Select
For i = 0 To 5
r = ActiveCell.Row
ToolFile = Cells(r, 7).Value
On Error Resume Next
ActiveCell.Value = CP_getdatta(ToolFile)
'seperate data by "-"
Selection.TextToColumns Destination:=Range("C3"), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=True, _
Semicolon:=False, Comma:=False, Space:=False, Other:=True, OtherChar _
:="-", FieldInfo:=Array(Array(1, 1), Array(2, 1)), TrailingMinusNumbers:=True
Cells(r + 1, 2).Select
Next
End Sub
Function CP_getdatta(ToolFile As String) As String
Workbooks.Open Filename:=ToolFile, UpdateLinks:=False, ReadOnly:=True
Range("A56000").Select
Selection.End(xlUp).Select
x = CStr(ActiveCell.Value)
ActiveCell.Offset(0, 20).Select
Selection.End(xlToLeft).Select
While IsNumeric(ActiveCell.Value) = False
ActiveCell.Offset(0, -1).Select
Wend
' combine data to 1 string
CP_getdatta = CStr(x & "-" & ActiveCell.Value)
ActiveWindow.Close False
End Function