Extracting data from website using VBA - html

I want to extract the projectstatus of a project which I can find on a website. See below for an example how the html is parsed. I want to extract the text Start which is the text between td and /td. See below the html my code.
<div id="ProjectStatus">
<tr>
<th>
<span id="ProjectStatus_Label1" title="De status van het project">Projectstatus</span>
</th>
<td>Start</td>
</tr>
Below you'll find the code that I have at this moment. This code only gives me the string "Projectstatus", which is not what I want. How can I extract the word "Start"?
Private Sub btnClick()
Dim ieApp As InternetExplorer
Set ieApp = New InternetExplorer
Set ieApp = CreateObject("internetexplorer.application")
With ieApp
.Navigate "url"
.Visible = True
End With
Do While ieApp.Busy
DoEvents
Loop
Set getStatus = ieApp.Document.getElementById("ProjectStatus_Label1")
strStatus = getStatus.innerText
MsgBox (strStatus) 'gives met the text "Projectstatus, but I need the text "Start"
ieApp.Quit
Set ieApp = Nothing
End Sub

Achieving this, starting from the ProjectStatus_Label1, will require some DOM navigation.
Use the following:
Do While ieApp.Busy
DoEvents
Loop
Set labelSpan = ieApp.Document.getElementById("ProjectStatus_Label1")
Set tableHeader = labelSpan.Parent
Set tableRow = tableHeader.Parent
For Each child In tableRow.Children
If child.tagName = "TD" 'This is the element you're looking for
Debug.Print child.innerText
Exit For
End If
Next
Of course, I highly recommend you revise this code and use explicit declarations and Option Explicit, but you haven't in your question so I won't in my answer.
Also, I've used a number of assignments (labelSpan, tableHeader) for demonstrative purposes. You can use Set tableRow = ieApp.Document.getElementById("ProjectStatus_Label1").Parent.Parent and remove those other declarations.
Or you can use the code-golfy, harder-to-understand approach, starting from the ProjectStatus div:
Debug.Print ieApp.Document.getElementById("ProjectStatus").GetElementsByTagName("td")(0).innerText

Related

How to get excel VBA to find a name then click the hyperlink

This is the piece I am stuck on, everything else is working.
Sub Avidie()
Dim i As Long
Dim url As String
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
Set links = .document.getelementbyclass("invoice-number-hyperlink").getelementsbyname("25407")(0).Click
I am up to the point where I have the Webpage code, which is below, and I want my code to find the link that has that class and name, and click it. How can I fix this?? The class name is the same on multiple parts of this page, but the name Is what I’m looking up so I need to find it by that.
<a class="invoice-number-hyperlink" onclick="avid.navigateHelper['PAQ-Invoice'](this);" href="#/invoices/70c2373a-ac71-43f2-a9e0-8d0332f1a19b?fromQueue=true">25407</a>
Based on the part of the code you provided, as well as the methods that other members of the community have mentioned, you can traverse all eligible s according to the class, filter according to the text content, and then select and click. I have created a simple example, I hope it can be helpful to you:
Dim appIE As InternetExplorerMedium
Set appIE = New InternetExplorerMedium
sURL = "https://www.example.com/"
With appIE
.navigate sURL
.Visible = True
End With
Do While appIE.Busy Or appIE.readyState <> 4
DoEvents
Loop
'appIE.document.getElementsByClassName("invoice-number-hyperlink")(0).Click
For Each link In appIE.document.getElementsByClassName("invoice-number-hyperlink")
If link.Text = "25407" Then
link.Click
Exit For
End If
Next
And page:
<a class="invoice-number-hyperlink" onclick="window.alert('jump to google...')" href="https://www.google.com/">25407</a>

How can I pull data from website using vba

I am new at vba coding to pull data from website so generally, I use this code to connect and check item to pull data from website but this code cannot check data via watch in vba with my firm webapp. it show nothing when I add watch to the class so what should I do.HTML Code from my firm webapp 1
HTML Code from my firm webapp 2
Sub Connect_web()
Dim ie As InternetExplorer
Dim doc As HTMLdocument
Dim ele As IHTMLElement
Dim col As IHTMLElementCollection
Dim ele_tmp As IHTMLElement
Set ie = New InternetExplorer
URL = "" ' Cannot provide
ie.Visible = True
ie.navigate URL
Do While ie.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Loading Page..."
DoEvents
End If
Loop
Set doc = ie.Document
Set ele = doc.getElementByClassName("GDB3EHGDHLC")
end sub
Let's start with four things:
1) Instead of .Navigate use .Navigate2
2) Use a proper wait
While ie.Busy Or ie.readyState < 4: DoEvents: Wend
3) Correct the syntax of your Set ele line. You are using ByClassNamewhich returns a collection and therefore is plural. You are missing the s at the end of element.
As you have declared ele as singular (element), perhaps first set the collection into a separate variable and index into that collection.
Dim eles As Object, ele As Object
Set eles = doc.getElementsByClassName("GDB3EHGDHLC")
Set ele = eles(0)
4) You should always use id over other attributes, if possible, as id is usually quicker for retrieval. There is an id against that class name in your image (highlighted element). I am not going to try and type it all out. Please share your HTML using the snippet tool, by editing your question, so we can relate to your html in answer easily.
Set ele = doc.getElementById("gwt-debug-restOfIdStringGoesHere")

How to get a particular link from a web page's source code?

i need a macro in VBA that is able to extract all HTML source code from an url contained in a EXCEL cell and put it line by line in all different Excel cells.
I've previously searched different solutions on the net but not finding the right one.
Thanks for helping ;)
EDIT:
thanks to the libraries just insert i could also test another macro that i've previously found on the net:
Sub Naviga()
Dim texto As String
Dim objIE As Object
Dim DestUrl As String
DestUrl = "http://www.google.it"
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = False
objIE.Navigate2 DestUrl
Do
DoEvents
Loop Until objIE.ReadyState = READYSTATE_COMPLETE
Range("A" & 1).Value = objIE.document.body.innerHTML
End Sub
and it's works, but unfortunately i would like that the link was acquired directly from a cell in excel, and when the line is copied, the next line, start with the next cell, the cell below.
How can i modify the macro?
EDIT 2:
The solution is near, i've just fixed the code, now is more clean:
Sub EstrSorgPag()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.navigate Range("H1")
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Range("A" & 1).Value = IE.document.body.innerHTML
End Sub
but lacks the last part where the macro should copy the content cell by cell (A1,A2,A3,A4... and so on)
EDIT 3:
Hello guys, i wrote this short code that extract all links from a web page's source code:
Sub EstraiURLdaWeb()
Dim doc As HTMLDocument
Dim output As Object
Set IE = New InternetExplorer
IE.Visible = False
IE.navigate Range("L1")
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Set doc = IE.document
Set output = doc.getElementsByTagName("a")
i = 5
For Each link In output
Range("A" & i).Value = link
i = i + 1
Next
MsgBox "Fatto!"
End Sub
But i would need to extract this in particular:
<li class="bubble"><span>Main</span></li>
how can I do?
Verify the <a>'s InnerHTML or InnerText
If you already got all the <a> tag elements, you can loop through them all (you got this already) and create a logical condition, if each particular element contains the keyword you are looking for.
Set output = doc.getElementsByTagName("a")
For Each link In output
If link.InnerHTML = "Main" Then
Range("A" & i).Value2 = link
End If
Next
Combine more GetElement(s) methods
To get more narrow collection of HTML elements, you can combine multiple GetElement(s) methods. Like so:
You can get all the HTML elemens with specific class:
Set BubbleCollection = doc.getElementsByClassName("bubble")
Then you can scan this collection for <a> tags:
Set output = BubbleCollection.getElementsByTagName("a")
Check how many elements you've got (optional for debugging/refining the search):
Debug.Print output.length

Excel VB Searching for text on a webpage and Copying information in the same Element

I am relatively new VBA.
I am trying to use this code to grab a bit of information from a website. When I do it by Element I have to search for the tag name which is tr and use a number next to it to define which one I want to use. The problem with that is it changes frequently with the position on the website. Currently the Keyword I want to search for and the information it contains is like so:
<tr>
<td class="nt">Operations</td>
<td>Windows</td>
</tr>
So if I can search by the class "Operations", and get the information "Windows", that would help. Also, I currently having an error
Next without For
If possible, is there a way I can use this to do multiple searches before I close the page? So I look for multiple specific words and input that data into different cells before moving onto the next column where it would repeat until completed at the end of the x value. I currently only have it set to x=2 To 5 but I would like to increase that to 10 or higher in the future.
The current code looks like this.
Private Sub Worksheet_Change(ByVal Target As Range)
For x = 2 To 5
If Target.Row = Cells(x, 35).Row And _
Target.Column = Cells(x, 35).Column Then
'If Target.Row = Range("ManufacturerPartNumber").Row And _
'Target.Column = Range("ManufacturerPartNumber").Column Then
Dim IE As New InternetExplorer
'IE.Visible = True
'For x = 2 To 5
'IE.navigate "" & Range("Website_1").Value
IE.navigate "" & Cells(x, 35).Value
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Dim sDD As String
sDD = InStr(1, IE.document.body.innerHTML, "Processor Model")
'sDD = Trim(Doc.getElementsByTagName("Processor Model")(1).innerText) 'Use this with tag like dd and number for which it appears like 0 or 1
IE.Quit
Dim aDD As Variant
aDD = Split(sDD, ",")
Cells(x, 44).Value = aDD(0)
'Range("ProcessorNumberCd").Value = aDD(0)
'Range("OSProvided").Value = aDD(0)
Next x
End If
'MsgBox "Complete"
End Sub
I think you want to grab the 'inner text'. Take a look at the example below.
Sub Scraper()
Dim item As Long
Dim priceStr As String
Dim priceTag As Object
Dim priceTable As Object
item = "10011" 'this will eventually be placed in a loop for multiple searches
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True
' navigate and download the web page
objIE.Navigate "www.google.com"
Do While objIE.ReadyState <> 4 Or objIE.Busy
DoEvents
Loop
'objIE.Document.getElementsByTagName("input")(0).Value = item
'objIE.Document.getElementByID("FDI").Click
Set priceTable = objIE.Document.getElementByID("price_FGC")
Set priceTag = priceTable.getElementsByTagName("u")(3)
priceStr = priceTag.innerText
Sheet1.Range("A1").Value = priceStr
objIE.Quit
End Sub
Also, check out this link for several other ways of how to do other, similar things.
http://www.tushar-mehta.com/publish_train/xl_vba_cases/vba_web_pages_services/index.htm

Take data from next HTML tag

Using this HTML code for example:
<table class="table-grid">
<tr>
<th>auto.model</th>
<td>
<pre>'Toyota Avensis Wagon'</pre>
</td>
</tr>
<tr>
<th>auto.year</th>
<td>
<pre>2005</pre>
</td>
</tr>
</table>
If I take the parameter "auto.model" between <th></th> tags and want to receive "Toyota Avensis Wagon", i.e. the next expression between <pre></pre>. Ideally I'd like to have function to do it.
Thank you #Jeeped, but code raise "Type mismatch" error and points to Set el = Param.PreviousSibling:
Sub Extract_TD_text()
Dim URL As String
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim Params As IHTMLElementCollection
Dim Param As HTMLTableCell
Dim Val As HTMLTableCell
Dim r As Long
Dim el As HTMLTableCell
URL = "My URL"
Set IE = New InternetExplorer
With IE
.navigate URL
.Visible = False
'Wait for page to load
While .Busy Or .READYSTATE <> READYSTATE_COMPLETE: DoEvents: Wend
Set HTMLdoc = .document
End With
Set Params = HTMLdoc.getElementsByTagName("tr")
For Each Param In Params
If Param.innerText Like "*auto.model*" Then
Set el = Param.PreviousSibling
Exit For
End If
Next
If Not el Is Nothing Then Debug.Print el.innerText
IE.Quit
Set IE = Nothing
End Sub
Instead of using previousSibling, I'd like to suggest nextElementSibling.
From the way your HTML and VBA codes are currently set up, the current 'param' value being passed should be the <th> tag. I think previousSibling would likely check the tag that comes before that, and since is the first element within the <tr> (the parent element), there shouldn't be anything (except maybe an invisible node- which previousSibling can find, but that we don't need).
I think nextElementSibling should be able to find your <td> tag, since it comes after your <th> tag.