Getting all the innerText from a <div> object - html

I have a code that goes into a website, fill in a form and then get me to this webpage: http://www.stf.jus.br/portal/jurisprudencia/listarJurisprudencia.asp?s1=%28ICMS+BASE+DE+CALCULO+PIS+COFINS%29&base=baseAcordaos&url
In that page, I need the content on all these "tables". They have the following information: div class="processosJurisprudenciaAcordaos".
Inside these "tables" there are several types of information and I need them all.
Here's the code until now. ( It only goes until the webpage )
Sub tese()
Dim ie As InternetExplorer
Dim pesquisa As String
pesquisa = InputBox("Digite os termos que quer pesquisar: ", "", "")
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate "http://www.stf.jus.br/portal/jurisprudencia/pesquisarJurisprudencia.asp"
ieBusy ie
ie.document.getElementById("txtPesquisaLivre").innerText = pesquisa
ie.document.getElementById("pesquisar").Click
ieBusy ie
Dim elemUnique, elemCollection As Object
Set elemCollection = ie.document.getElementsByTagName("a")
For Each elemUnique In elemCollection
If elemUnique.className Like "linkPagina" Then
elemUnique.Click
Exit For
End If
Next elemUnique
ieBusy ie
End Sub

You can set your element to a IHTMLElementCollection, then loop the collection one at a time grabbing the information you need from it. In my below example, I used a msgbox to show you that you grabbed the table.
The top half of this code was used for testing purposes and to show you this is working code, the below section (under the line) is what you are needing.
Code:
Sub Test()
Dim ie As New InternetExplorer, Url As String
Url = "http://www.stf.jus.br/portal/jurisprudencia/listarJurisprudencia.asp?s1=%28ICMS+BASE+DE+CALCULO+PIS+COFINS%29&base=baseAcordaos&url"
With ie
.navigate Url
Do While .Busy Or .readyState < READYSTATE_COMPLETE
DoEvents
Loop
.Visible = True
End With
'<------- Above was to Test, you want below this line ------->
Dim tblColl As IHTMLElementCollection, tbl As IHTMLElement
Set tblColl = ie.document.getElementsByClassName("processosJurisprudenciaAcordaos")
For Each tbl In tblColl
'Do what you need to with your table
'Using a MsgBox to show it works
MsgBox tbl.innerText
Next tbl
End Sub

Related

get specific data from database and print it to a cell

I'm trying to write a simple code for studying vocabulary and want this code to look up the words in column "A" using my favorite online dictionary "Cambridge" automatically and then print the definitions to the cells next to the words. I have written the code below so far and it goes to the site and searches the word. The question is what code is needed to get the definitions and print them to the cells?
Sub SearchWords()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLInput As MSHTML.IHTMLElement
Dim HTMLButtons As MSHTML.IHTMLElementCollection
Dim HTMLButton As MSHTML.IHTMLElement
IE.Visible = True
IE.Navigate "www.dictionary.cambridge.org"
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.Document
Set HTMLInput = HTMLDoc.getElementById("cdo-search-input")
HTMLInput.Value = ThisWorkbook.Sheets(1).Range("A1").Value
Set HTMLButtons = HTMLDoc.getElementsByClassName("cdo-search__button")
HTMLButtons(0).Click
End Sub
Thanks in advance.
The result appears to be in an element with classname entry. I read your column A search words in to an array and loop that to look up each word. The result is written back out to the sheet. I use css selectors mostly as a more flexible and faster method for selecting elements. css selectors, in this instance, are applied via querySelector method of HTMLDocument (i.e. ie.Document)
Proper page loads waits are used throughout.
Option Explicit
'entry
Public Sub SearchWords()
Dim IE As SHDocVw.InternetExplorer, lookups(), dataSheet As Worksheet, iRow As Long
Set dataSheet = ThisWorkbook.Worksheets("Sheet1")
Set IE = New SHDocVw.InternetExplorer
lookups = Application.Transpose(dataSheet.Range("A2:A3").Value) '<Read words to lookup into a 2d array and transpose into 1D
With IE
.Visible = True
.Navigate2 "www.dictionary.cambridge.org"
While .Busy Or .readyState <> 4: DoEvents: Wend
For iRow = LBound(lookups) To UBound(lookups)
.document.getElementById("cdo-search-input").Value = lookups(iRow) 'work off .document to avoid stale elements
.document.querySelector(".cdo-search__button").Click
While .Busy Or .readyState <> 4: DoEvents: Wend 'wait for page reload
Application.Wait Now + TimeSerial(0, 0, 1)
Do
Loop While .document.querySelectorAll(".entry").Length = 0
dataSheet.Cells(iRow + 1, 2) = .document.querySelector(".entry").innerText
Next
.Quit
End With
End Sub
Done! Perfectly working. (Since this post is too long for a comment, I had to post this as an answer) Now I am trying to get some more data from the page(since I need the other explanations and Turkish definitions as well). When I inspect the page, I see that full descriptions are placed in "di $ entry-body__el entry-body__el--smalltop clrd js-share-holder" class. I added "/turkish" to the URL and tried to get the related element using the class name I mentioned instead of ".def-block", but it didn't work. Then I tried a different way using this code:
Sub GetMeaningsFromCambridgeDictionary()
Dim ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Meanings")
Dim sourceWord As String
sourceWord = ws.Range("A2").Value
Dim i As Integer
Dim çeviri As String
Dim ilkSatir As Integer
ilkSatir = ws.Cells(ws.Rows.Count, "B").End(xlUp).Row + 1
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
Dim URL As String
Dim countElement As Integer
Range("B2:B1000").Delete
IE.Visible = False
URL = "https://dictionary.cambridge.org/dictionary/turkish/" & sourceWord
IE.Navigate URL
Do While IE.Busy: DoEvents: Loop
Application.Wait (Now + TimeValue("0:00:01"))
Do While IE.readyState <> 4
Application.Wait (Now + TimeValue("0:00:01"))
Loop
countElement = IE.document.getElementsByClassName("di $ entry-body__el entry-body__el--smalltop clrd js-share-holder").Length
For i = 0 To countElement - 1
çeviri = IE.document.getElementsByClassName("di $ entry-body__el entry-body__el--smalltop clrd js-share-holder")(i).innerText
Range("B" & i + 2).Value = çeviri
Range("B" & i + 2).Rows.AutoFit
Next i
Columns(2).AutoFit
IE.Quit
MsgBox "All meanings have been copied."
End Sub
This code is also working, and I see all the definitions in detail, but this time the problem is only the first word is done. What should I do to do the same thing for the other words?

Excel VBA code to click web button

Need help how to create excel vba code for this
I'll be needing the codes so I can complete my macro.
Thanks in advance
First, you will need to create a reference to:
Microsoft Internet Controls
Microsoft HTML Object Library
In VBE, click Tools > References
Sub clickLink()
Dim ie As New InternetExplorer, Url$, doc As HTMLDocument
Url = "http://UrlToYourLink.com"
With ie
.navigate Url
Do While .Busy Or .readyState < READYSTATE_COMPLETE
DoEvents
Loop
doc = .document
.Visible = True
End With
Dim myBtn As Object
Set myBtn = doc.getElementsByClassName("button rounded")(0)
myBtn.Click
End Sub
The Internet control is used to browse the webpage and the HTML Objects are used to identify the username and password textboxes and submit the text using the control button.
Dim HTMLDoc As HTMLDocument
Dim oBrowser As InternetExplorer
Sub Login_2_Website()
Dim oHTML_Element As IHTMLElement
Dim sURL As String
On Error GoTo Err_Clear
sURL = "https://www.google.com/accounts/Login"
Set oBrowser = New InternetExplorer
oBrowser.Silent = True
oBrowser.timeout = 60
oBrowser.navigate sURL
oBrowser.Visible = True
Do
' Wait till the Browser is loaded
Loop Until oBrowser.readyState = READYSTATE_COMPLETE
Set HTMLDoc = oBrowser.Document
HTMLDoc.all.Email.Value = "sample#vbadud.com"
HTMLDoc.all.passwd.Value = "*****"
For Each oHTML_Element In HTMLDoc.getElementsByTagName("input")
If oHTML_Element.Type = "submit" Then oHTML_Element.Click: Exit For
Next
' oBrowser.Refresh ' Refresh If Needed
Err_Clear:
If Err <> 0 Then
Debug.Assert Err = 0
Err.Clear
Resume Next
End If
End Sub
The program requires references to the following:
1 Microsoft Internet Controls
2. Microsoft HTML Object Library
Microsoft internet controls are a great way to do this, but if you aren't allowed to add new references, here is another way to go about web scraping.
This methode ain't as 'clean' as Microsoft internet controls and HTML object but it gets the job done.
Sub GoogleSearch()
Dim ie As Object
Dim objSearchBnt As Object
Dim objCollection As Object
Dim i As Integer
'initialize counter
i = 0
'Create InternetExplorer Object
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
'navigate to the url
ie.navigate "Www.google.com"
'Statusbar shows in the buttom corner of excel
Application.StatusBar = "Loading, please wait..."
'Wait until page is ready
Do While ie.busy
Application.Wait DateAdd("s", 1, Now)
Loop
'Store all the elements with input tag
Set objCollection = ie.Document.getElementsByTagName("input")
'Go through all input elements
While i < objCollection.Length
'input search field
If objCollection(i).Name = "q" Then
objCollection(i).Value = "Hello World"
End If
'search button
If objCollection(i).Type = "submit" Then
Set objSearchBnt = objCollection(i)
End If
i = i + 1
Wend
objSearchBnt.Click
'Clean up
Set objSearchBnt = Nothing
Set objCollection = Nothing
Set ie = Nothing
'Give excel control over the status bar agian
Application.StatusBar = ""
End Sub

Web Query where there are multiple Frames

My goal is to scrape the source code of a web page.
The site seems to have different Frames which is why my code won't work properly.
I tried to modify a code which I found online which should solve the Frame issue.
The following code creates an error (object required) at:
Set profileFrame .document.getElementById("profileFrame")
Public Sub IE_Automation()
'Needs references to Microsoft Internet Controls and Microsoft HTML Object Library
Dim baseURL As String
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim profileFrame As HTMLIFrame
Dim slotsDiv As HTMLDivElement
'example URL with multiple frames
baseURL = "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"
Set IE = New InternetExplorer
With IE
.Visible = True
'Navigate to the main page
.navigate baseURL & "/publictrophy/index.htm?onlinename=ace_anubis"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
'Get the profileFrame iframe and navigate to it
Set profileFrame = .document.getElementById("profileFrame")
.navigate baseURL & profileFrame.src
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
Set HTMLdoc = .document
End With
'Display all the text in the profileFrame iframe
MsgBox HTMLdoc.body.innerText
'Display just the text in the slots_container div
Set slotsDiv = HTMLdoc.getElementById("slots_container")
MsgBox slotsDiv.innerText
End Sub
Hummmm, I'm not exactly sure what you are doing here, but can you try the code below?
Option Explicit
Sub Sample()
Dim ie As Object
Dim links As Variant, lnk As Variant
Dim rowcount As Long
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"
'Wait for site to fully load
'ie.Navigate2 URL
Do While ie.Busy = True
DoEvents
Loop
Set links = ie.document.getElementsByTagName("a")
rowcount = 1
With Sheets("Sheet1")
For Each lnk In links
'Debug.Print lnk.innerText
'If lnk.classname Like "*Real Statistics Examples Part 1*" Then
.Range("A" & rowcount) = lnk.innerText
rowcount = rowcount + 1
'Exit For
'End If
Next
End With
End Sub
General:
I think in your research you may have come across this question and misunderstood how it relates/doesn't relate to your circumstance.
I don't think iFrames are relevant to your query. If you are after the list of names, their details and the URLs to their pages you can use the code below.
CSS Selectors
To target the elements of interest I use the following two CSS selectors. These use style infomation on the page to target the elements:
.SearchResults-link
.SearchResults-item
"." means class, which is like saying .getElementsByClassName. The first gets the links, and the second gets the description information on the first page.
With respect to the first CSS selector: The actual link required is dynamically constructed, but we can use the fact that the actual profile URLs have a common base string of "https://www.xing.com/profile/", which is then followed by the profileName. So, in function GetURL, we parse the outerHTML returned by the CSS selector to get the profileName and concatenate it with the BASESTRING constant to get our actual profile link.
Code:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://www.xing.com/publicsearch/query?search%5Bq%5D=IT"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim a As Object, exitTime As Date, linksNodeList As Object, profileNodeList As Object
' exitTime = Now + TimeSerial(0, 0, 5) '<== uncomment this section if timing problems
'
' Do
' DoEvents
' On Error Resume Next
' Set linksNodeList = .document.querySelectorAll(".SearchResults-link")
' On Error GoTo 0
' If Now > exitTime Then Exit Do
' Loop While linksNodeList Is Nothing
Set linksNodeList = .document.querySelectorAll(".SearchResults-link") '<== comment this out if uncommented section above
Set profileNodeList = .document.querySelectorAll(".SearchResults-item")
Dim i As Long
For i = 0 To profileNodeList.Length - 1
Debug.Print "Profile link: " & GetURL(linksNodeList.item(i).outerHTML)
Debug.Print "Basic info: " & profileNodeList.item(i).innerText
Next i
End With
End Sub
Public Function GetURL(ByVal htmlSection As String) As String
Const BASESTRING As String = "https://www.xing.com/profile/"
Dim arr() As String
arr = Split(htmlSection, "/")
GetURL = BASESTRING & Replace$(Split((arr(UBound(arr) - 1)), ">")(0), Chr$(34), vbNullString)
End Function
Example return information:

Pull value from website (HTML div class) using Excel VBA

I'm trying to automate going to a website and pulling the ratings from several apps.
I've figured out how to navigate and login to the page.
How do I pull the element - the number "3.3" in this case - from this specific section into Excel.
Being unfamiliar with HTML in VBA, I got this far following tutorials/other questions.
Rating on website and the code behind it
Sub PullRating()
Dim HTMLDoc As HTMLDocument
Dim ie As InternetExplorer
Dim oHTML_Element As IHTMLElement
Dim sURL As String
On Error GoTo Err_Clear
sURL = "https://www.appannie.com/account/login/xxxxxxxxxx"
Set ie = New InternetExplorer
ie.Silent = True
ie.navigate sURL
ie.Visible = True
Do
'Wait until the Browser is loaded
Loop Until ie.readyState = READYSTATE_COMPLETE
Set HTMLDoc = ie.Document
HTMLDoc.all.Email.Value = "xxxxxxxxx#xxx.com"
HTMLDoc.all.Password.Value = "xxxxx"
For Each oHTML_Element In HTMLDoc.getElementById("login-form")
If oHTML_Element.Type = "submit" Then oHTML_Element.Click: Exit For
Next
Dim rating As Variant
Set rating = HTMLDoc.getElementsByClassName("rating-number ng-binding")
Range("A1").Value = rating
'ie.Refresh 'Refresh if required
Err_Clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
End Sub
The code below will let you extract text from first element with class name "rating-number ng-binding" in HTML document. By the way GetElementsByClassName is supported since IE 9.0. I use coding compatible also with older versions in my example.
Dim htmlEle1 as IHTMLElement
For Each htmlEle1 in HTMLDoc.getElementsByTagName("div")
If htmlEle1.className = "rating-number ng-binding" then
Range("A1").Value = htmlEle1.InnerText
Exit For
End if
Next htmlEle1
While Ryszards code should do the trick if you want to use the code you have already written then here is the alterations I believe you need to make.
For Each oHTML_Element In HTMLDoc.getElementById("login-form")
If oHTML_Element.Type = "submit" Then oHTML_Element.Click: Exit For
Next
'Need to wait for page to load before collecting the value
Loop Until ie.readyState = READYSTATE_COMPLETE
Dim rating As IHTMLElement
Set rating = HTMLDoc.getElementsByClassName("rating-number ng-binding")
'Need to get the innerhtml of the element
Range("A1").Value = rating.innerhtml

Cycling Through List of URLs Using Excel VBA

I am much more familiar with Excel now, but one thing is still baffling me - how to cycle through URLs in a loop. My current conundrum is that I have this list of URLs of packages, and need to obtain the status of each package on each page using its HTML. What I currently have to cycle through the list is:
Sub TrackingDeliveryStatusResults()
Dim IE As Object
Dim URL As Range
Dim wb1 As Workbook, ws1 As Worksheet
Dim filterRange As Range
Dim copyRange As Range
Dim lastRow As Long
Set wb1 = Application.Workbooks.Open("\\S51\******\Folders\******\TrackingDeliveryStatus.xls")
Set ws1 = wb1.Worksheets("TrackingDeliveryStatusResults")
Set IE = New InternetExplorer
With IE
.Visible = True
For Each URL In Range("C2:C & lastRow")
.Navigate URL.Value
While .Busy Or .ReadyState <> 4: DoEvents: Wend
MsgBox .Document.body.innerText
Next
End With
End Sub
And the list of URLs
My goal here is:
Cycle through each URL (inserts URL in IE and keeps going without opening new tabs)
Obtain the status of the item for each URL from the HTML element
FedEx: Delivered (td class="status")
UPS: Delivered (id="tt_spStatus")
USPS: Arrived at USPS Facility (class= "info-text first)
Finish the loop and save as a csv if at all possible (I've already done that, so I'm just posting the code portion I'm having a problem with).
My understanding is that I have to code a different if statement for each different url, since all of them have different HTML tags for their delivery status. Loops are simple, but to loop through webpages is new to me. The code has been throwing me errors no matter what changes I make.
The IE object opens up but then Excel hits an error and the code stops running.
OK Ill start with the proper syntax for you to get your code going and I will edit this answer for further code
Sub Sample()
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
Application.EnableEvents = True
Dim wsSheet As Worksheet, Rows As Long, links As Variant, IE As Object, link As Variant
Set wb = ThisWorkbook
Set wsSheet = wb.Sheets("Sheet1")
Set IE = New InternetExplorer
Rows = wsSheet.Cells(wsSheet.Rows.Count, "A").End(xlUp).Row
links = wsSheet.Range("A1:A" & Rows)
With IE
.Visible = True
For Each link In links
.navigate (link)
While .Busy Or .ReadyState <> 4: DoEvents: Wend
MsgBox .Document.body.innerText
Next link
End With
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
Application.EnableEvents = True
End Sub
This will get you looping I think you had some general syntax issues which you can see the difference in my code in order to loop through in the for each the link has to be of type object or variant and links I set to variant assuming it will default to a string