HTML Page Title in Excel VBA - html

Given an url, how can I get the title of the html page in VBA in Excel?
For example suppose I have three urls like :
http://url1.com/somepage.html
http://url2.com/page.html
http://url3.com/page.html
Now I need to get the title of these html pages in another column. How do I do it?

Remou's answer was VERY helpful for me, but it caused a problem: It doesn't close the Internet Explorer process, so since I needed to run this dozens of times I ended up with too many IEs open, and my computer couldn't handle this.
So just add
wb.Quit
and everything will be fine.
This is the code that works for me:
Function GetTitleFromURL(sURL As String)
Dim wb As Object
Dim doc As Object
Set wb = CreateObject("InternetExplorer.Application")
wb.Navigate sURL
While wb.Busy
DoEvents
Wend
GetTitleFromURL = wb.Document.Title
wb.Quit
Set wb = Nothing
End Function

I am not sure what you mean by title, but here is an idea:
Dim wb As Object
Dim doc As Object
Dim sURL As String
Set wb = CreateObject("InternetExplorer.Application")
sURL = "http://lessthandot.com"
wb.Navigate sURL
While wb.Busy
DoEvents
Wend
''HTML Document
Set doc = wb.document
''Title
Debug.Print doc.Title
Set wb = Nothing

if you use Selenium :
Sub Get_Title()
Dim driver As New WebDriver
debug.print driver.Title
End Sub

Related

VBA web scraping issue - how to navigate specific web using html structure (href / child/ )

Hello dear VBA collegues :)
Sub login()
'test
Const URL$ = "https://kwm.kromi.de/cgi-bin/kwm?HTML=frontend/login.htm"
Dim UserName As String, Password As String, LoginData As Worksheet
Set LoginData = ThisWorkbook.Worksheets("Sheet1")
UserName = LoginData.Cells(1, "B").Value
Password = LoginData.Cells(2, "B").Value
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = False
.Navigate URL
ieBusy IE
.Visible = True
Dim oLogin As Object, oPassword As Object
Set oLogin = .document.getElementsByName("VS_LOGIN")(0)
Set oPassword = .document.getElementsByName("VS_PASSWORD")(0)
oLogin.Value = UserName
oPassword.Value = Password
.document.forms(0).submit
ieBusy IE
Stop
'.document.getElementsByTagName("a")(2).href
'.document.getElementsByClassName("link3").Click
.Navigate2 ""
ieBusy IE
Stop
End With
'''
End Sub
Sub ieBusy(IE As Object)
Do While IE.Busy Or IE.readyState < 4
DoEvents
Loop
End Sub
And the first task is work, macro log in to website. I need to go deeper and click something but structure of web is too much for my small head I am looking some examples on website but nothing work. I showed code of website below. I need to click button "statystyka".
/html/body/div[1]/div[1]/a[2] - Xpath adress
[link picture]https://ibb.co/2Pgx2tn
May you give me some help please :)
edit:
I tried use something like this:
'.document.getElementsByTagName("a")(2).href but this not good way on thinking
You need to move into the appropriate frame, add a wait as I am using .Navigate to the frame src, then you can target by a substring of the onclick attribute:
ie.navigate ie.document.querySelector("[name=Navigator]").src
ieBusy ie
ie.document.querySelector("[onclick*=statistic]").click
If You want navigate by tag You need to set frame as is below
Dim doc As HTMLDocument
Dim doc2 As HTMLDocument
Dim lnk As HTMLLinkElement
Set doc = IE.document
Set doc2 = doc.frames("Navigator").document
Set lnk = doc2.getElementsByTagName("A")(1)
lnk.Click
#QHarr #Raymond Wu :) Thank You for trying help, maybe that will be solution in others

How to fetch iframe data using Excel VBA

I am using below mentioned code in Excel VBA for IE navigation.I am facing following error while fetching data from iframe.
Error detail:
Object does not support this property or method
Option Explicit
Public Sub Cgg_Click()
Dim Ie As New InternetExplorer
Dim WebURL
Dim Docx As HTMLDocument
Dim productDesc
Dim productTitle
Dim price
Dim RcdNum
Ie.Visible = True
WebURL = "https://www.google.com/maps/place/parlour+beauty+parlour+beauty/#40.7314166,-74.13182,11z/data=!4m8!1m2!2m1!1sParlour+NY!3m4!1s0x89c2599bd4c1d2e7:0x20873676f6334189!8m2!3d40.7314166!4d-73.9917443"
Ie.Navigate2 WebURL
Do Until Ie.readyState = READYSTATE_COMPLETE
DoEvents
Loop
Application.Wait (Now + TimeValue("00:00:25"))
For N = 0 To Ie.document.getElementsByClassName("section-subheader-header GLOBAL__gm2-subtitle-alt-1").Length - 1
If Ie.document.getElementsByClassName("section-subheader-header GLOBAL__gm2-subtitle-alt-1").Item(N).innerText = "Web results" Then
Ie.document.getElementsByClassName("section-subheader-header GLOBAL__gm2-subtitle-alt-1").Item(N).ScrollIntoView (False)
End If
Next N
Application.Wait (Now + TimeValue("00:00:25"))
Set Docx = Ie.document
productDesc = Docx.Window.frames("section-iframe-iframe").contentWindow.document.getElementsByClassName("trex")(0).outerHTML
End Sub
Here is the HTML:
Please help to resolve this error.
I want to extract "trex" ClassName HTML Contain from above url
Thanks.
You can change the line of extract "trex" element to one of the following, both of them can work well:
Use the getElementsbyTagName method to get the Iframe first , then according to the Iframe.contentDocument property to reach the element via the class name:
productDesc = Docx.getElementsByTagName("iframe")(0).contentDocument.getElementsByClassName("trex")(0).outerHTML
Use querySelector method to get the Iframe through class, then use the same as the above to reach the element:
productDesc = Docx.querySelector(".section-iframe-iframe").contentDocument.getElementsByClassName("trex")(0).outerHTML

Getting object error when getting getElementById value on webpage

I am automating web scraping and i'm getting an object error 438 object doesn't support this property or method. in vba when i get to last line of code. If i run it on the internet explorer console, i get the value but i get error in vba. Any help?
Dim shellWins As ShellWindows
Dim IE As Object
Set shellWins = New ShellWindows
If shellWins.Count > 0 Then
' Get IE
Set IE = shellWins.Item(0)
Else
' Create IE
Exit Sub
End If
IE.Navigate "https://mywebpage.com"
While IE.Busy
DoEvents
Wend
Do Until IE.ReadyState = 4
DoEvents
Loop
Dim rtn As String
rtn = IE.getElementById("myID").getAttribute("value") << I get ERROR here
I saw your question in this comment,
is there a way to set an object to an already existing instance of IE ?
so here's what you can do to grab an already-existing IE browser:
Dim IE As Object
Set IE = GetIE("https://mywebpage.com")
Function GetIE(sLocation As String) As Object
Dim objShell As Object, objShellWindows As Object, o As Object
Dim sURL As String
Dim RetVal As Object
Set objShell = CreateObject("shell.application")
Set objShellWindows = objShell.Windows
For Each o In objShellWindows
sURL = ""
On Error Resume Next
sURL = o.document.Location
On Error GoTo 0
If sURL Like "*" & sLocation & "*" Then
Set RetVal = o
Exit For
End If
Next o
Set GetIE = RetVal
End Function
Now to the actual question. It's impossible to help you precisely without seeing HTML code or the website itself.
However, you can try either of these properties to see if they'd work (Try in the order listed here):
Dim rtn As String
rtn = IE.Document.getElementById("myID").Value
Dim rtn As String
rtn = IE.Document.getElementById("myID").innerText
Dim rtn As String
rtn = IE.Document.getElementById("myID").outerText
The method is a property of .document (HTMLDocument) object not the IE object
IE.document.getElementById("myID")
To create an IE instance:
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
Going to agree with some of the above answers. Get off IE unless its a constraint.
You're not creating an IE object at all. So you're trying to use a non-existent object.
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
First of all: Don't use IE. Firefox and Chrome are much more viable and are actually HTML comformant.
In case this doesn't fix it: Furthermore, if your script runs on startup and is declared before the HTML, it wil try search the current buffer and wont find the object since it hasn't been loaded yet. Try placing the script after the HTML declaration.
I'm sorry if I misunderstood the Code above since I don't actually speak VB ^^'

VBA: handling data in Document Object Model

I am currently trying to scrap data from a website using VBA. I am following this tutorial and hence my code is the following one:
Sub Foo()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "https://www.ishares.com/it/investitore-privato/it/prodotti/251843/ishares-euro-high-yield-corporate-bond-ucits-etf"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Loop
Set allRowOfData = appIE.document.getElementsByClassName("visible-data totalNetAssets")
Dim myValue As String
myValue = allRowOfData.Cells(1).innerHTML
MsgBox myValue
End Sub
Unfortunately there are some differences between data I want to scrap and those ones used in the example: this line
myValue = allRowOfData.Cells(1).innerHTML
is wrong according to VBA debug.
Anyone could provide me with some explanations about why that doesn't work and how am I supposed to pick the right method to scrap HTML pages?
Try the below change which will solve your issue. In brief, you will need to treat the allRowofData as a collection.
myValue = allRowOfData(0).Cells(1).innerHTML

How to get META keywords content with VBA from source code in an EXCEL file

I have to download the source code of a several hundred websites to an Excel file (for example to cells(1, 1) in Worksheets 1) and then extract the content of of the META tag keywords in let's say cells(1, 2).
For downloading I use the following code in VBA:
Dim htm As Object
Set htm = CreateObject("HTMLfile")
URL = "https://www.insolvenzbekanntmachungen.de/cgi-bin/bl_aufruf.pl?PHPSESSID=8ecbeb942c887974468b9010531fc7ab&datei=gerichte/nw/agkoeln/16/0071_IN00181_16/2016_06_10__11_53_26_Anordnung_Sicherungsmassnahmen.htm"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
htm.body.innerHTML = .responseText
Cells(1, 1) = .responseText
End With
I've found the following code on this website but, unfortunately, I'm unable to adapt it to solve my problem:
Sub GetData()
Dim ie As New InternetExplorer
Dim str As String
Dim wk As Worksheet
Dim webpage As New HTMLDocument
Dim item As HTMLHtmlElement
Set wk = Worksheets(1)
str = "https://www.insolvenzbekanntmachungen.de/cgi-bin/bl_aufruf.pl?PHPSESSID=8ecbeb942c887974468b9010531fc7ab&datei=gerichte/nw/agkoeln/16/0071_IN00181_16/2016_06_10__11_53_26_Anordnung_Sicherungsmassnahmen.htm"
ie.Visible = True
ie.navigate str
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
'Find the proper meta element --------------
Const META_TAG As String = "META"
Const META_NAME As String = "keywords"
Dim Doc As HTMLDocument
Dim metaElements As Object
Dim element As Object
Dim kwd As String
Set Doc = ie.Document
Set metaElements = Doc.all.tags(META_TAG)
For Each element In metaElements
If element.Name = META_NAME Then
kwd = element.Content
End If
Next
MsgBox kwd
End Sub
I think I have to modify this line, but don't know how:
Set Doc = ie.Document
Can you please help me out?
Embed a WebrowserControl into a Excel Spreadsheet or userform
How to add a Webrowser to Excel
Set up references to the HTML Object Library
How to add VBA References – Internet Controls, HTML Object Library
Grab Greg Truby's code from this post Webbroswer Control
You'll have access the Document Object Model (DOM). This will expose most of the HTMLElements properties and event's
Option Explicit
Private WithEvents htmDocument As HTMLDocument
Private WithEvents MyButton As HTMLButtonElement
Private Function MyButton_onclick() As Boolean
MsgBox "Sombody Click MyButton on WebBrowser1"
End Function
Private Sub WebBrowser1_NavigateComplete2(ByVal pDisp As Object, URL As Variant)
Dim aTags As Hyperlinks
Do Until .ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
Set MyButton = htmDocument.getElementById("MyButtonID")
Set htmDocument = WebBrowser1.Document
Set aTags = htmDocument.getElementsByTagName("a")
End Sub
Google Web Api, HTA, (MDN){https://developer.mozilla.org/en-US/docs/Web/API} and if you get stuck try to refactor Javascript code to vbscript. It's