Read multiple web classes, add them to listview - html

Hey I have following problem
I need to get specific values from website and more than one,
here's an example of website code
<div class="content">
<div class="all-items">
<div class="item1">
Example Item
</div>
<div class="itemsize">
" 103 "
<span> cm <span>
</div>
there more "item1" classes under the first with same name which I need to add in listview
until there is no "item1" class more.
I tried following but its not throwing an error or anything...
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
For Each Element As HtmlElement In WebBrowser1.Document.GetElementsByTagName("div")
If Element.GetAttribute("className") = "content" Then
For Each Element0 As HtmlElement In WebBrowser1.Document.GetElementsByTagName("div")
If Element0.GetAttribute("className") = "all-items" Then
For Each Element1 As HtmlElement In Element.GetElementsByTagName("div")
If Element1.GetAttribute("className") = "item" Then
For Each Element2 As HtmlElement In Element.GetElementsByTagName("a")
If Element2.GetAttribute("className") = "href" Then
Dim vLink As String = Element2.InnerText
For Each Element3 As HtmlElement In Element.GetElementsByTagName("a")
If Element3.GetAttribute("className") = "Example Item" Then
Dim vTitle As String = Element3.InnerText
For Each Element4 As HtmlElement In Element.GetElementsByTagName("div")
If Element4.GetAttribute("className") = "itemsize" Then
Dim vSize As String = Element4.InnerText
For Each Element5 As HtmlElement In Element.GetElementsByTagName("span")
If Element5.GetAttribute("className") = "span" Then
Dim vUnit As String = Element5.InnerText
With lvList.Items.Add(vTitle)
.SubItems(0).Text = (vLink)
.SubItems(1).Text = (vSize + " " + vUnit)
End With
End If
Next
End If
Next
End If
Next
End If
Next
End If
Next
End If
Next
End If
Next
End Sub
its messy af but I should work theoretically except the size one
and I'm not sure to if I get all "item1" classes with that
I have literally no more idea at this point especially I failing
to retrieve just one value.
any suggestions or help?

Related

Get "href=link" from html page and navigate to that link using vba

I am writing a code in Excel VBA to get href value of a class and navigate to that href link
(i.e) here is the href value I want to get into my particular Excel sheet and I want to navigate to that link automatically through my VBA code.
How to make the word invisible when it's checked without js
The result I'm getting is that I'm able to get that containing tag's class value How to make the word invisible when it's checked without js <---- this is title is what I am getting in my sheet. What I want to get is this title's holding a href link /questions/51509457/how-to-make-the-word-invisible-when-its-checked-without-js this is what I want to get and navigate through my code.
Please help me out. Thanks in advance
Below are the entire coding:
Sub useClassnames()
Dim element As IHTMLElement
Dim elements As IHTMLElementCollection
Dim ie As InternetExplorer
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate "https://stackoverflow.com/questions"
'Wait until IE has loaded the web page
Do While ie.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
Set html = ie.document
Set elements = html.getElementsByClassName("question-hyperlink")
Dim count As Long
Dim erow As Long
count = 0
For Each element In elements
If element.className = "question-hyperlink" Then
erow = Sheets("Exec").Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row
Sheets("Exec").Cells(erow, 1) = html.getElementsByClassName("question-hyperlink")(count).innerText
count = count + 1
End If
Next element
Range("H10").Select
End Sub
I cant find any answer in this website asked by anyone. Please don't suggest this question as duplicate.
<div class="row hoverSensitive">
<div class="column summary-column summary-column-icon-compact ">
<img src="images/app/run32.png" alt="" width="32" height="32">
</div>
<div class="column summary-column ">
<div class="summary-title summary-title-compact text-ppp">
MMDA
</div>
<div class="summary-description-compact text-secondary text-ppp">
By on 7/9/2018 </div>
</div>
<div class="column summary-column summary-column-bar ">
<div class="table">
<div class="column">
<div class="chart-bar ">
<div class="chart-bar-custom link-tooltip" tooltip-position="left" style="background: #4dba0f; width: 125px" tooltip-text="100% Passed (11/11 tests)"></div>
</div>
</div>
<div class="column chart-bar-percent chart-bar-percent-compact">
100%'
Method ①
Use XHR to make initial request using question homepage URL; apply CSS selector to retrieve links and then pass those links to IE to navigate to
CSS selectors to select elements:
You want the href attribute of the element.You have been given an example already. You can use getAttribute, or, as pointed out by #Santosh, combine an href attribute CSS selector with other CSS selectors to target the elements.
CSS selector:
a.question-hyperlink[href]
Looks for elements with parent a tag having class of question-hyperlink and an href attribute.
You then apply the CSS selector combination with the querySelectorAll method of document to gather a nodeList of the links.
XHR to get initial list of links:
I would issue this first as an XHR, as much faster, and gather your links into a collection/nodeList you can later loop with your IE browser.
Option Explicit
Public Sub GetLinks()
Dim sResponse As String, HTML As New HTMLDocument, linkList As Object, i As Long
Const BASE_URL As String = "https://stackoverflow.com"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://stackoverflow.com/questions", False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
With HTML
.body.innerHTML = sResponse
Set linkList = .querySelectorAll("a.question-hyperlink[href]")
For i = 0 To linkList.Length - 1
Debug.Print Replace$(linkList.item(i), "about:", BASE_URL)
Next i
End With
'Code using IE and linkList
End Sub
In the above linkList is a nodeList holding all the matched elements from the homepage i.e. all the hrefs on the question landing page. You can loop the .Length of the nodeList and index into it to retrieve a particular href e.g. linkList.item(i). As the link returned is relative, you need to replace the relative about: part of the path with the protocol + domain i.e. "https://stackoverflow.com".
Now that you have quickly obtained that list, and can access items, you can pass any given updated href onto IE.Navigate.
Navigating to questions using IE and nodeList
For i = 0 To linkList.Length - 1
IE.Navigate Replace$(linkList.item(i).getAttribute("href"), "about:", BASE_URL)
Next i
Method ②
Use XHR to make initial request using GET request and searched for question title; apply CSS selector to retrieve links and then pass those links to IE to navigate to.
Option Explicit
Public Sub GetLinks()
Dim sResponse As String, HTML As New HTMLDocument, linkList As Object, i As Long
Const BASE_URL As String = "https://stackoverflow.com"
Const TARGET_QUESTION As String = "How to make the word invisible when it's checked without js"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://stackoverflow.com/search?q=" & URLEncode(TARGET_QUESTION), False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
With HTML
.body.innerHTML = sResponse
Set linkList = .querySelectorAll("a.question-hyperlink[href]")
For i = 0 To linkList.Length - 1
Debug.Print Replace$(linkList.item(i).getAttribute("href"), "about:", BASE_URL)
Next i
End With
If linkList Is Nothing Then Exit Sub
'Code using IE and linkList
End Sub
'https://stackoverflow.com/questions/218181/how-can-i-url-encode-a-string-in-excel-vba #Tomalak
Public Function URLEncode( _
StringVal As String, _
Optional SpaceAsPlus As Boolean = False _
) As String
Dim StringLen As Long: StringLen = Len(StringVal)
If StringLen > 0 Then
ReDim result(StringLen) As String
Dim i As Long, CharCode As Integer
Dim Char As String, Space As String
If SpaceAsPlus Then Space = "+" Else Space = "%20"
For i = 1 To StringLen
Char = Mid$(StringVal, i, 1)
CharCode = Asc(Char)
Select Case CharCode
Case 97 To 122, 65 To 90, 48 To 57, 45, 46, 95, 126
result(i) = Char
Case 32
result(i) = Space
Case 0 To 15
result(i) = "%0" & Hex(CharCode)
Case Else
result(i) = "%" & Hex(CharCode)
End Select
Next i
URLEncode = Join(result, "")
End If
End Function
This If element.className = "question-hyperlink" Then is useless because it is always true because you getElementsByClassName("question-hyperlink") so all elements are definitely of class question-hyperlink. The If statement can be removed.
You have each link in the variable element so you don't need the count. Instead of html.getElementsByClassName("question-hyperlink")(count).innerText use element.innerText.
So it should look like this:
Set elements = html.getElementsByClassName("question-hyperlink")
Dim erow As Long
For Each element In elements
erow = Worksheets("Exec").Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row
Worksheets("Exec").Cells(erow, 1) = element.innerText
Worksheets("Exec").Cells(erow, 2) = element.GetAttribute("href") 'this should give you the URL
Next element

Identify HtmlElement under cursor in WebBrowser

I'm trying to set up a small extension of WebBrowser control as an HtmlTextBox, with limited formatting possibilities. It works for basic formatting (bold, italic, underline). But I also wanted to allow indentation in one single level, and ideally call this in a "toggle" fashion.
I noticed that when I run Document.ExecCommand("Indent", False, Nothing) it converts the <p> element into a <blockquote> element, which is exactly what I need. But a second call to the same command just adds to the indent margin, but I want to make it so that, if cursor is already inside a <blockquote> element, it will perform an "outdent" instead.
For that, I tried to query Document.ActiveElement before performing my action, but this returns always the whole <body> element, and not the specific block element in which cursor rests at that moment.
How could I accomplish that?
This is my code:
Public Class HtmlTextBox
Inherits WebBrowser
Public Sub New()
WebBrowserShortcutsEnabled = False
IsWebBrowserContextMenuEnabled = False
DocumentText = "<html><body></body></html>"
If Document IsNot Nothing Then
Dim doc = Document.DomDocument
If doc IsNot Nothing Then
doc.designMode = "On"
If Me.ContextMenuStrip Is Nothing Then
AddHandler Document.ContextMenuShowing, Sub(sender As Object, e As HtmlElementEventArgs) Application.DoEvents()
End If
End If
End If
End Sub
Private Sub HtmlTextBox_PreviewKeyDown(sender As Object, e As PreviewKeyDownEventArgs) Handles Me.PreviewKeyDown
If e.Control Then
If e.KeyData.HasFlag(Keys.B) OrElse e.KeyData.HasFlag(Keys.N) Then BoldToggle()
If e.KeyData.HasFlag(Keys.I) Then ItalicToggle()
If e.KeyData.HasFlag(Keys.S) OrElse e.KeyData.HasFlag(Keys.U) Then UnderlineToggle()
If e.KeyData.HasFlag(Keys.M) Then BlockQuoteToggle()
End If
End Sub
Public Sub BoldToggle()
Document.ExecCommand("Bold", False, Nothing)
End Sub
Public Sub ItalicToggle()
Document.ExecCommand("Italic", False, Nothing)
End Sub
Public Sub UnderlineToggle()
Document.ExecCommand("Underline", False, Nothing)
End Sub
Public Sub BlockQuoteToggle()
If Document.ActiveElement.TagName.ToLower = "blockquote" Then
Document.ExecCommand("Outdent", False, Nothing)
Else
Document.ExecCommand("Indent", False, Nothing)
End If
End Sub
End Class
The method ElementAtSelectionStart is designed to return the element containing the start of the current selection. This code is for a WebBrowser control in edit mode. Hopefully it will work for your needs.
Public Class mshtmlUtilities
Public Enum C_Bool
[False] = 0
[True] = 1
End Enum
Public Shared Function ElementAtSelectionStart(ByVal wb As System.Windows.Forms.WebBrowser) As System.Windows.Forms.HtmlElement
Dim el As System.Windows.Forms.HtmlElement = Nothing
If wb IsNot Nothing AndAlso _
wb.Document IsNot Nothing AndAlso _
DirectCast(wb.Document.DomDocument, mshtml.IHTMLDocument2).designMode.Equals("on", StringComparison.InvariantCultureIgnoreCase) Then
Dim doc As mshtml.IHTMLDocument2 = DirectCast(wb.Document.DomDocument, mshtml.IHTMLDocument2)
Dim sel As mshtml.IHTMLSelectionObject = DirectCast(doc.selection, mshtml.IHTMLSelectionObject)
Select Case sel.type.ToLowerInvariant
Case "text"
Dim rng As mshtml.IHTMLTxtRange = DirectCast(sel.createRange(), mshtml.IHTMLTxtRange)
rng.collapse(True)
el = MakeWinFormHTMLElement(rng.parentElement, wb)
Case "control"
Dim rng As mshtml.IHTMLControlRange = DirectCast(sel.createRange(), mshtml.IHTMLControlRange)
el = MakeWinFormHTMLElement(rng.item(0).parentElement, wb)
Case "none"
Dim ds As mshtml.IDisplayServices = DirectCast(doc, mshtml.IDisplayServices)
Dim caret As mshtml.IHTMLCaret = Nothing
ds.GetCaret(caret)
Dim pt As mshtml.tagPOINT
caret.GetLocation(pt, C_Bool.False)
el = wb.Document.GetElementFromPoint(New Point(pt.x, pt.y))
End Select
End If
Return el
End Function
Private Shared Function MakeWinFormHTMLElement(ByVal el As mshtml.IHTMLElement, ByVal wb As System.Windows.Forms.WebBrowser) As System.Windows.Forms.HtmlElement
Dim shimInfo As Reflection.PropertyInfo = wb.GetType.GetProperty("ShimManager", Reflection.BindingFlags.NonPublic Or Reflection.BindingFlags.Instance)
Dim shimManager As Object = shimInfo.GetValue(wb, Nothing)
Dim ciElement As Reflection.ConstructorInfo() _
= wb.Document.Body.GetType().GetConstructors(Reflection.BindingFlags.Instance Or Reflection.BindingFlags.NonPublic)
Return CType(ciElement(0).Invoke(New Object() {shimManager, el}), HtmlElement)
End Function
End Class

VB.NET/GetElementByClass how to div class click?

</div></div></div><div class="f u" id="m_more_item"><span>Diğerlerini Gör</span></div></div></div></div></div></div></div></body></html>
DOCUMENT
CODE:
Dim h1 As HtmlElementCollection = Nothing
h1 = W.Document.GetElementsByTagName("div")
For Each curElement As HtmlElement In h1
If InStr(curElement.GetAttribute("classname").ToString, "f u") Then
curElement.InvokeMember("Click")
BUT code does not work HELP ME ?
Off my Example: VB.Net - select a class using GetElementByClass and then click the item programmatically
The problem is your trying to click the DIV instead of trying to click the A HREF tag. You will want to do that instead.
Since there is no "class" or anything on that element, then you could do something like...
Dim h1 As HtmlElementCollection = Nothing
h1 = W.Document.GetElementsByTagName("div")
For Each h1Element as HtmlElement in h1
Dim NameStr As String = h1Element.GetAttribute("href")
If ((NameStr IsNot Nothing) And (NameStr.Length <> 0)) Then
h1Element.InvokeMember("Click")
Dim theElementCollection As HtmlElementCollection = Nothing
For Each Element As HtmlElement In WebBrowser1.Document.GetElementsByTagName("div")
If Element.GetAttribute("id") = "m_more_item" And Element.OuterHtml.Contains("Diğerlerini Gör") Then
Element.InvokeMember("click")
End If
Next

get url from HTML string

I have the following code that grabs a div element:
For Each ele As HtmlElement In WebBrowser1.Document.GetElementsByTagName("div")
If ele.GetAttribute("className").Contains("description") Then
Dim content As String = ele.InnerHtml
If content.Contains("http://myserver.com/image/check.png") Then
'Do stuff if image exists
Else
'Do stuff if image doesn't exist
End If
End If
The div element looks like this:
<DIV class=headline><SPAN class=blue-title-lg>TITLE_HERE
</SPAN> LOCATION1_HERE, LOCATION2_HERE</DIV>DESCRIPTION_HERE<BR>
<DIV class=about><A class=link href="viewprofile.aspx?
profile_id=00000000">USERNAME</A> 20 FSM -
Friends <FONT color=green>Online Today</FONT></DIV>
When the tick image doesn't exist, I want to grab the url that's in:
<a class=link href="viewprofile.aspx?profile_id=00000000"></a>
and put it into a string. This is where I've hit a brick wall and I need some help. I'd imagine a regex solution would resolve my issue, but regex is one of my weak spots. Can someone put me out of my misery?
Solved it!
I slept on it and came up with a really simple way of solving it. The UI of my app now looks like a mess, but I'll sort that later. I have the information I need.
Here's how I did it:
Dim PageElement As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("a")
For Each CurElement As HtmlElement In PageElement
Dim linkunverified As String
linkunverified = CurElement.GetAttribute("href")
If linkunverified.Contains("viewprofile.aspx") Then
If ListBox1.Items.Contains(linkunverified) Then
Else
ListBox1.Items.Add(linkunverified)
End If
End If
Next
For Each ele As HtmlElement In WebBrowser1.Document.GetElementsByTagName("div")
If ele.GetAttribute("className").Contains("description") Then
Dim content As String = ele.InnerHtml
If content.Contains("http://pics.myserver.com/image/check.png") Then
Else
Dim i As Integer
For i = 0 To ListBox1.Items.Count - 1
If content.Contains(ListBox1.Items(i).Remove(0, 24)) Then
ListBox2.Items.Add("http://www.myserver.com/" & ListBox1.Items(i).Remove(0, 24))
End If
Next
End If
End If
Next

ListBox with html element

Can anyone offer me some advice? I currently have a listbox I am using, in the listbox there is a list of images from any website. they are grabbed from the website via this method
Private Sub WebBrowser1_DocumentCompleted(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
Dim PageElements As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("img")
For Each CurElement As HtmlElement In PageElements
imagestxt.Items.Add(imagestxt.Text & CurElement.GetAttribute("src") & Environment.NewLine)
Next
Timer1.Enabled = True
End Sub
I then use the picture control method to get the image and display it.
pic1.Image = New Bitmap(New MemoryStream(New WebClient().DownloadData(imagestxtimagestxt.SelectedItem.ToString))).SelectedItem.ToString)))
This method pulls the images and title from the HTML.
Private Function StrHTML12() As Boolean
Dim htmlDocument As HtmlDocument = WebBrowser1.Document
ListBox1.Items.Clear()
For Each element As HtmlElement In htmlDocument.All
ListBox1.Items.Add(element.TagName)
If element.TagName.ToUpper = "IMG" Then
imgtags.Items.Add(element.OuterHtml.ToString)
End If
If element.TagName.ToUpper = "TITLE" Then
titletags.Items.Add(element.OuterHtml.ToString)
Timer1.Enabled = False
End If
Next
End Function
This is a counting method to count how many empty alt="" or empty img alt='' there are on the page.
Basically what i am looking to do is;
Have a program that can check the image, look at the alt='' or img alt='' if on the website the dev hasn't put anything in the alt tag i want the image to show in a picture box and i want the alt tag either next to it or underneith it or something. but i have no idea how.
counter = InStr(counter + 1, strHTML, "<img alt=''")
counter = InStr(counter + 1, strHTML, "alt=''")
counter = InStr(counter + 1, strHTML, "alt=""")
The above seems really slow and messy. is there a better way of doing it?
I do not have VB installed so I have not been able to test the code. I'm also not familiar with the datagridview component so have not attempted to integrate my code with it.
The code below should get you the title of the page, and loop through all the img tags that do not have (or have empty) alt-text
HtmlElement.GetAttribute(sAttr) returns the value of the attribute or an empty string.
Private Sub WebBrowser1_DocumentCompleted(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
Dim Title As String
Dim ImSrc As String
Dim PageElements As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("img")
// This line might need to be adjusted, see below
Title = PageElements.GetElementsByTagName("title")(0).InnerText
For Each CurElement As HtmlElement In PageElements
If CurElement.GetAttribute("alt") = "" Then
// CurElement does not have alt-text
ImSrc = CurElement.GetAttribute("src") // This Image has no Alt Text
Else
// CurElement has alt-text
End If
Next
Timer1.Enabled = True
End Sub
The line that gets the title might need to be changed as I'm unsure how collections can be accessed. You want the first (hopefully only) element returned from the GetElementsByTagName function.