get url from HTML string - html

I have the following code that grabs a div element:
For Each ele As HtmlElement In WebBrowser1.Document.GetElementsByTagName("div")
If ele.GetAttribute("className").Contains("description") Then
Dim content As String = ele.InnerHtml
If content.Contains("http://myserver.com/image/check.png") Then
'Do stuff if image exists
Else
'Do stuff if image doesn't exist
End If
End If
The div element looks like this:
<DIV class=headline><SPAN class=blue-title-lg>TITLE_HERE
</SPAN> LOCATION1_HERE, LOCATION2_HERE</DIV>DESCRIPTION_HERE<BR>
<DIV class=about><A class=link href="viewprofile.aspx?
profile_id=00000000">USERNAME</A> 20 FSM -
Friends <FONT color=green>Online Today</FONT></DIV>
When the tick image doesn't exist, I want to grab the url that's in:
<a class=link href="viewprofile.aspx?profile_id=00000000"></a>
and put it into a string. This is where I've hit a brick wall and I need some help. I'd imagine a regex solution would resolve my issue, but regex is one of my weak spots. Can someone put me out of my misery?

Solved it!
I slept on it and came up with a really simple way of solving it. The UI of my app now looks like a mess, but I'll sort that later. I have the information I need.
Here's how I did it:
Dim PageElement As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("a")
For Each CurElement As HtmlElement In PageElement
Dim linkunverified As String
linkunverified = CurElement.GetAttribute("href")
If linkunverified.Contains("viewprofile.aspx") Then
If ListBox1.Items.Contains(linkunverified) Then
Else
ListBox1.Items.Add(linkunverified)
End If
End If
Next
For Each ele As HtmlElement In WebBrowser1.Document.GetElementsByTagName("div")
If ele.GetAttribute("className").Contains("description") Then
Dim content As String = ele.InnerHtml
If content.Contains("http://pics.myserver.com/image/check.png") Then
Else
Dim i As Integer
For i = 0 To ListBox1.Items.Count - 1
If content.Contains(ListBox1.Items(i).Remove(0, 24)) Then
ListBox2.Items.Add("http://www.myserver.com/" & ListBox1.Items(i).Remove(0, 24))
End If
Next
End If
End If
Next

Related

Extract text from html with substring methods

I wanto to extract text from html.
I'm already getting html source with webrequest.
How can I extract text like the following example?:
class="btn btn-success btn-lg" href="I wanto to get this link that is changing every time" rel="nofollow noopener">Click</a><
Can I do it using substring methods like startwith and end with?
Thanks
So, Using string.indexof I found a solution. I was strugglin a bit with those "" in html string, but this now does what it is supposed to do.
I found the solution!
Dim allinputtext As String = RichTextBox1.Text
Dim textafter As String = """ rel=""nofollow noopener"
Dim textbefore As String = "class=""btn btn-success btn-lg"" href="""
Dim startPosition As Integer = allInputText.IndexOf(textBefore)
'If text before was not found, return Nothing
If startPosition < 0 Then
End If
'Move the start position to the end of the text before, rather than the beginning.
startPosition += textBefore.Length
'Find the first occurrence of text after the desired number
Dim endPosition As Integer = allInputText.IndexOf(textAfter, startPosition)
'If text after was not found, return Nothing
If endPosition < 0 Then
End If
'Get the string found at the start and end positions
Dim textFound As String = allInputText.Substring(startPosition, endPosition - startPosition)
TextBox4.Text = (textFound)

Selecting a HTML button in Excel VBA that does not have an id

Been working on this issue for a day now. I have a webform that you have 1 set of standard data, and then you enter line items for a purchase requisition; I am trying to enter all data in Excel and use VBA to transfer it to the site. I am getting stuck at how to "update part" (the text on the button that I need to click to add another line item on the webpage). I have also tried the send key method to Shift Tab into the correct location (just normal shifting runs into an error with one of the fields). I am fine with any solution working, this is my first attempt at linking Excel to HTML so it's been fun.
From what I can find the button does not have an id so I have not been successful in calling it.
Here is my code (with the web url deleted):
Sub Login_2_Website()
Dim oHTML_Element As IHTMLElement
Dim oHTML_Element1 As IHTMLElement
Dim sURL As String
Dim aURL As String
Dim nodeList As Object
On Error GoTo Err_Clear
sURL = URL Can't be Shared
oBrowser.Silent = True
oBrowser.timeout = 60
oBrowser.Navigate sURL
oBrowser.Visible = True
Do
' Wait till the Browser is loaded
Loop Until oBrowser.ReadyState = READYSTATE_COMPLETE
Set HTMLDoc = oBrowser.Document
Set nodeList = HTMLDoc.querySelectorAll("a[onlick*='UpdatePartRow']")
HTMLDoc.all.UserName.Value = ThisWorkbook.Sheets("sheet1").Range("I1")
HTMLDoc.all.Password.Value = ThisWorkbook.Sheets("sheet1").Range("I2")
For Each oHTML_Element In HTMLDoc.getElementsByTagName("input")
If oHTML_Element.Type = "submit" Then oHTML_Element.Click: Exit For
HTMLDoc.all.reason.Value = ThisWorkbook.Sheets("sheet1").Range("B1") ' selects the reason for the requisition
HTMLDoc.all.Comments.Value = ThisWorkbook.Sheets("sheet1").Range("B2") ' selects the comments to purchasing
HTMLDoc.forms("_PurchaseRequisition").getElementsByTagName("select")("RequiredMonth").Value = ThisWorkbook.Sheets("sheet1").Range("B3")
HTMLDoc.forms("_PurchaseRequisition").FireEvent ("onchange")
HTMLDoc.forms("_PurchaseRequisition").getElementsByTagName("select")("RequiredDay").Value = ThisWorkbook.Sheets("sheet1").Range("B4")
HTMLDoc.forms("_PurchaseRequisition").FireEvent ("onchange")
HTMLDoc.forms("_PurchaseRequisition").getElementsByTagName("select")("RequiredYear").Value = ThisWorkbook.Sheets("sheet1").Range("B5")
HTMLDoc.forms("_PurchaseRequisition").FireEvent ("onchange")
HTMLDoc.forms("_PurchaseRequisition").getElementsByTagName("select")("CommodityMain").Value = ThisWorkbook.Sheets("sheet1").Range("B9")
HTMLDoc.forms("_PurchaseRequisition").FireEvent ("onchange") 'Selects the commodity group
HTMLDoc.all.Quantity.Value = ThisWorkbook.Sheets("sheet1").Range("B11")
HTMLDoc.all.Description.Value = ThisWorkbook.Sheets("sheet1").Range("B12")
HTMLDoc.all.ChargedDepartment.Value = ThisWorkbook.Sheets("sheet1").Range("B13")
HTMLDoc.all.SubJobNumber.Value = ThisWorkbook.Sheets("sheet1").Range("B14")
HTMLDoc.all.AccountNumber.Value = ThisWorkbook.Sheets("sheet1").Range("B15")
HTMLDoc.all.UnitPrice.Value = ThisWorkbook.Sheets("sheet1").Range("B16")
HTMLDoc.all.CommodityMainSub.Value = ThisWorkbook.Sheets("sheet1").Range("B17")
Set nodeList = HTMLDoc.querySelectorAll("a[onlick*='UpdatePartRow']")
nodeList.Item(0).Click
nodeList.Item(0).FireEvent "onclick"
Next
' oBrowser.Refresh ' Refresh If Needed
Err_Clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
End Sub
For the login portion I modified code from: http://vbadud.blogspot.com/2009/08/how-to-login-to-website-using-vba.html#ZiqYAtAQMHzl7x1k.99
That works perfectly, so does entering the fields.
The segment of HTML that is associated with this button is:
<a onclick="UpdatePartRow();
chkKeepSubmitPR();
return false;" href=""></a>
<a onclick="var doc = window.document.forms[0];
UpdatePartRow();
chkKeepSubmitPR();
if (doc.OrgMatrixYes.value == "Y") {
VerifyDeptOrgMatrix();
}
return false;" href=""><img src="/Web/purchreq.nsf/UpdatePart.gif?OpenImageResource" width="72" height="25" border="0"></a>
I am taking this on because this system is a pain. I could just have multiple macros and have the user hit the button between each line item, but I want to try to offer a full solution. I am a mechanical engineer by trade and my coding experience is limited to what I have picked up on making tools to ease my job. Any help or suggestions would be super helpful. If there is more info needed, please let me know and I can try to help anyway I can. Thank you!
Update: I have tried (See Code) to make the changes that have been suggested. I am still a fairly complete newbie when it comes to coding, so please bear with me and thank you for trying to teach me!
You have two a tag elements there with an onclick
You can get both with attribute = value CSS selector using "*" contains operator to search for a substring in the attribute value
a[onclick*='UpdatePartRow']
You can grab both with querySelectorAll method of HTMLDocument object
Dim nodeList As Object
Set nodeList = HTMLDoc.querySelectorAll("a[onclick*='UpdatePartRow']")
The two matches, for your sample, are as follows:
index 0
<a onclick="UpdatePartRow(); chkKeepSubmitPR(); return false;" href="">
index 1
<a onclick="var doc = window.document.forms[0]; UpdatePartRow(); chkKeepSubmitPR(); if (doc.OrgMatrixYes.value == "Y") { VerifyDeptOrgMatrix(); } return false;" href="">
You can access the nodeList by index e.g.
nodeList.item(0).Click
nodeList.item(0).FireEvent "onclick"

VB.net Fill Textbox with HTML string

I have a string from an HTML enabled email of something like so:
</div><span color="red">Hi how are you?!</span></div><table><tr><td>Company Information</td></tr></table>
and so on [its a long string of stuff but you get the idea. There are <div>..<spans>..<table> and so forth.
I want to display the text in the text box formatted like html [which will format it based on the <table>..<span> and so forth while removing the actual text of <span> and so forth from the textbox's text.
I need this to happen because I get a page error because it reads the <span> and etc as being a security issue.
My current way of reading the whole text and removing the issues are like so:
If Not DsAds.Tables(0).Rows(0).Item(0) Is DBNull.Value Then
Dim bodyInitial As String = DsAds.Tables(0).Rows(0).Item(0).ToString()
Dim newbody As String = bodyInitial.Replace("<br>", vbNewLine)
newbody = newbody.Replace("<b>", "")
newbody = newbody.Replace("</b>", "")
Bodylisting.Text = newbody
End If
I tried to encorporate the following:
Dim bodyInitial As String = DsAds.Tables(0).Rows(0).Item(0).ToString()
Dim myEncodedString As String
' Encode the string.
myEncodedString = bodyInitial.HttpUtility.HtmlEncode(bodyInitial)
Dim myWriter As New StringWriter()
' Decode the encoded string.
HttpUtility.HtmlDecode(bodyInitial, myWriter)
but I get errors with HTTpUtility and strings
Question:
So my question is, is there a way to actually see the HTML formatting and fill the textbox that way, or do I have to stick with my .Replace method?
<asp:TextBox ID="Bodylisting" runat="server" style="float:left; margin:10px; padding-bottom:500px;" Width="95%" TextMode="MultiLine" ></asp:TextBox>
I suggest you investigate HtmlAgilityPack. This library includes a parser giving you the ability to 'select' the <span> tags. Once you have them, you can strip them out, or grab the InnerHtml and do further processing. This is an example of how I do something similar with it.
Private Shared Function StripHtml(html As String, xPath As String) As String
Dim htmlDoc As New HtmlDocument()
htmlDoc.LoadHtml(html)
If xPath.Length > 0 Then
Dim invalidNodes As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes(xPath)
If Not invalidNodes Is Nothing Then
For Each node As HtmlNode In invalidNodes
node.ParentNode.RemoveChild(node, True)
Next
End If
End If
Return htmlDoc.DocumentNode.WriteContentTo()
End Function

Invoking a click on a class which is only used once

I'm trying to use a loop and I'm getting freezing but no full crash.
Dim theElementCollection As HtmlElementCollection
theElementCollection = WebBrowser1.Document.GetElementsByTagName("DIV")
For Each curElement As HtmlElement In theElementCollection
Dim controlName As String = curElement.GetAttribute("name").ToString
curElement.InvokeMember("click")
Next
In my case there should only be one div class name and I just want to invoke a click on it.
edit-improved formatting
I found a workaround like so
For Each h As HtmlElement In WebBrowser1.Document.GetElementsByTagName("input")
If Not Object.ReferenceEquals(h.GetAttribute("className"), Nothing) AndAlso h.GetAttribute("className").Equals("numeroCustomer") Then
h.InnerText = loginid
Exit For
End If
Next
Hope this helps anybody with a similar problem.

ListBox with html element

Can anyone offer me some advice? I currently have a listbox I am using, in the listbox there is a list of images from any website. they are grabbed from the website via this method
Private Sub WebBrowser1_DocumentCompleted(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
Dim PageElements As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("img")
For Each CurElement As HtmlElement In PageElements
imagestxt.Items.Add(imagestxt.Text & CurElement.GetAttribute("src") & Environment.NewLine)
Next
Timer1.Enabled = True
End Sub
I then use the picture control method to get the image and display it.
pic1.Image = New Bitmap(New MemoryStream(New WebClient().DownloadData(imagestxtimagestxt.SelectedItem.ToString))).SelectedItem.ToString)))
This method pulls the images and title from the HTML.
Private Function StrHTML12() As Boolean
Dim htmlDocument As HtmlDocument = WebBrowser1.Document
ListBox1.Items.Clear()
For Each element As HtmlElement In htmlDocument.All
ListBox1.Items.Add(element.TagName)
If element.TagName.ToUpper = "IMG" Then
imgtags.Items.Add(element.OuterHtml.ToString)
End If
If element.TagName.ToUpper = "TITLE" Then
titletags.Items.Add(element.OuterHtml.ToString)
Timer1.Enabled = False
End If
Next
End Function
This is a counting method to count how many empty alt="" or empty img alt='' there are on the page.
Basically what i am looking to do is;
Have a program that can check the image, look at the alt='' or img alt='' if on the website the dev hasn't put anything in the alt tag i want the image to show in a picture box and i want the alt tag either next to it or underneith it or something. but i have no idea how.
counter = InStr(counter + 1, strHTML, "<img alt=''")
counter = InStr(counter + 1, strHTML, "alt=''")
counter = InStr(counter + 1, strHTML, "alt=""")
The above seems really slow and messy. is there a better way of doing it?
I do not have VB installed so I have not been able to test the code. I'm also not familiar with the datagridview component so have not attempted to integrate my code with it.
The code below should get you the title of the page, and loop through all the img tags that do not have (or have empty) alt-text
HtmlElement.GetAttribute(sAttr) returns the value of the attribute or an empty string.
Private Sub WebBrowser1_DocumentCompleted(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
Dim Title As String
Dim ImSrc As String
Dim PageElements As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("img")
// This line might need to be adjusted, see below
Title = PageElements.GetElementsByTagName("title")(0).InnerText
For Each CurElement As HtmlElement In PageElements
If CurElement.GetAttribute("alt") = "" Then
// CurElement does not have alt-text
ImSrc = CurElement.GetAttribute("src") // This Image has no Alt Text
Else
// CurElement has alt-text
End If
Next
Timer1.Enabled = True
End Sub
The line that gets the title might need to be changed as I'm unsure how collections can be accessed. You want the first (hopefully only) element returned from the GetElementsByTagName function.