Parse Saved HTML File VBA - html

I have a HTML file that is saved locally on the desktop which contains a table of statistics from which I need to pull specific data, paste it into a excel workbook table and then email it.
I've got the rest of the process working, I'm just struggling to figure out how to parse the html file and all other examples I've seen are parsing a website rather than a locally saved html file.
Apologies if this is a bit of beginner question but I'm finding it hard to make sense of the other examples I've seen.
thank you for any assistance.

Thank you to everyone for your examples and pointing me in the right direction ! The example posted below copies the data from a HTML file stored on the users desktop and pastes it into a new worksheet in Excel.
Option Explicit
Sub ParseHTML()
Dim URL As String
Dim IE As InternetExplorer
Dim htmldoc As MSHTML.IHTMLDocument 'Document object
Dim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tags
Dim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tags
Dim htmlTables As MSHTML.IHTMLElementCollection 'Element collection for table tags
Dim eleRow As MSHTML.IHTMLElement 'Row elements
Dim eleCol As MSHTML.IHTMLElement 'Column elements
Dim wksOut As Worksheet
Dim rngOut As Range
Dim intTableIndex As Integer
Dim intRowIndex As Integer
Dim intColIndex As Integer
URL = Environ("userProfile") & "\desktop\FileName.HTML"
'Open InternetExplorer.
Set IE = New InternetExplorer
'Navigate to URL.
With IE
.navigate URL
.Visible = False
'Extract html information to objects.
Set htmldoc = IE.document
Set htmlTables = htmldoc.getElementsByTagName("table")
Set eleColtr = htmlTables(intTableIndex).getElementsByTagName("tr")
'Extract table to a new blank worksheet.
On Error Resume Next
Set wksOut = ThisWorkbook.Worksheets("WorksheetName")
If Err.Number <> 0 Then
Set wksOut = ThisWorkbook.Worksheets.Add(After:=Worksheets(Worksheets.Count))
wksOut.Name = "WorksheetName"
End If
With wksOut
.Cells.Clear
.Cells.NumberFormat = "General"
.Cells.ColumnWidth = 2
End With
On Error GoTo 0
'This section populates Excel
intRowIndex = 0
For Each eleRow In eleColtr
Set eleColtd = htmlTables(intTableIndex).getElementsByTagName("tr")(intRowIndex).getElementsByTagName("td") 'get all the td elements in that specific tr
Set rngOut = wksOut.Range("A1000000").End(xlUp).Offset(1, 0)
intColIndex = 0
For Each eleCol In eleColtd
rngOut.Offset(0, intColIndex) = eleCol.innerText
intColIndex = intColIndex + 1
Next eleCol
intRowIndex = intRowIndex + 1
Next eleRow
wksOut.Cells.EntireColumn.AutoFit
'Cleanup
IE.Quit
Set IE = Nothing
Set htmldoc = Nothing
Set htmlTables = Nothing
Set eleColtr = Nothing
Set eleColtd = Nothing
Set wksOut = Nothing
Set rngOut = Nothing
End With
End Sub
Please note that excel may throw a Runtime Error Automation Error on line:
Set IE = New InternetExplorer
If this happens try setting InternetExplorer integrity to Medium:
Set IE = New InternetExplorerMedium
If you need more information regarding InternetExplorer Integrity please see
https://blogs.msdn.microsoft.com/ieinternals/2011/08/03/default-integrity-level-and-automation/
As Tim mentioned I could open the file in excel and copy and paste the values which runs a lot faster:
Sub CopyHTML()
dim Wb as Workbook
dim Ws as Worksheet
Set Wb = ActiveWorkbook
Set Ws = Wb.Sheets("Sheet1")
'Opens html file and copies range
Workbooks.Open (Environ("userProfile") & "\desktop\FileName.html")
Range("A1:AJ21").Select
Selection.Copy
'pastes range in cell B5 on active workbook
Wb.Activate
Range("B5").Select
Ws.Paste
Application.CutCopyMode = False
Workbooks("FileName.html").Close
Thanks for the advice Tim !

Related

Accessing a website's table with a WinHTTPRequest in Excel VBA

I have written code that grabs a table from a website and pulls out each cell from that table and then drops them into an excel spreadsheet. The code works flawlessly when the website loads correctly.
The issue is the website does not play nice with internet explorer, therefore the code only executes successfully about half of the time. I could write a routine that checks to see if the website loaded successfully and repeat if it did not, However I want to see if I can get it to work with a WinHTTPRequest.
The lines below are how I access the table using internet explorer based webscraping, with the last line being how i load the table into a variable.
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "https://weather.com/weather/tenday/l/12345:4:US"
IE.Visible = True
Application.Wait (Now() + TimeValue("00:02:00"))
Set doc = IE.document
Set WeatherTable = doc.getElementsByClassName("twc-table")(0)
I can load the website in question via WinHTTPRequest using the code below.
Set doc = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "https://weather.com/weather/tenday/l/12345:4:US", False
.send
doc.body.innerHTML = .responseText
End With
However when I try and grab the table using the line below i get "Run-time error '438': Object doesn't support this property or method.
Set WeatherTable = doc.getElementByclassname("twc-table")(0)
Basically i need the equivalent of this line for WinHTTP webscraping.
I have looked at descending down through the html document(doc.body.all.item(1), etc) but I don't get very far before I run into errors. I have also looked at the Selenium addon, but I don't remember being able to download and install it successfully, and I am not sure if it is even still maintained for current versions of chrome / firefox.
Here is the full code that allows me to get the table via internet explorer webscraping and then drop it onto an excel spreadsheet.
Any help is appreciated.
Sub GetTable2()
Dim IE As Object
Dim doc As HTMLDocument
Dim WeatherTable As HTMLTable
Dim WeatherTableRows As HTMLTableRow
Dim HTMLTableCell As HTMLTableCell
Dim HeaderRow As Boolean
Dim RowCount As Long
Dim ColumnCount As Long
Dim i As Long
RowCount = 1
ColumnCount = 1
HeaderRow = True
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "https://weather.com/weather/tenday/l/12345:4:US"
IE.Visible = True
'Application.Wait (Now() + TimeValue("00:02:00"))
Set doc = IE.document
Set WeatherTable = doc.getElementsByClassName("twc-table")(0)
For Each WeatherTableRows In WeatherTable.Rows
i = 1
For Each HTMLTableCell In WeatherTableRows.Cells
If HeaderRow = True Then
ThisWorkbook.Sheets("Sheet5").Cells(RowCount, ColumnCount).Value = HTMLTableCell.innerText
ColumnCount = ColumnCount + 1
Else
If i = 1 Then
i = i + 1
Else
ThisWorkbook.Sheets("Sheet5").Cells(RowCount, ColumnCount).Value = HTMLTableCell.innerText
ColumnCount = ColumnCount + 1
End If
End If
Next HTMLTableCell
HeaderRow = False
ColumnCount = 1
RowCount = RowCount + 1
Next WeatherTableRows
IE.Quit
Set IE = Nothing
Set doc = Nothing
End Sub
You missed an s. It is plural as you are getting a collection of elements by className.
Set WeatherTable = doc.getElementsByClassName("twc-table")(0)
To make your approach slightly cleaner, you can try this way as well.
Sub FetchTabularData()
Dim elem As Object, trow As Object, S$, R&, C&
[B1:G1] = [{"Day","Description","High/Low","Precip","Wind","Humidity"}]
With New WinHttp.WinHttpRequest
.Open "GET", "https://weather.com/weather/tenday/l/12345:4:US", False
.send
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
For Each elem In .querySelector(".twc-table").getElementsByTagName("tr")
For Each trow In elem.getElementsByTagName("td")
C = C + 1: Cells(R + 1, C) = trow.innerText
Next trow
C = 0: R = R + 1
Next elem
End With
End Sub
Reference to add:
Microsoft HTML Object Library
Microsoft WinHTTP Services, version 5.1

Automatically export HTML Table from Outlook to Excel w/ VBA

I'd like to export an email that contains many tables in HTML format.
Each table is something like this:
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="100%" style="width:100.0%;background:green">...</table>
I've added a New Rule in Outlook, so everytime I receive an email with 'specific word' in the Subject, the macro runs and saves all the tables from this email to a .xlsm file. The rule itself seems to work fine, but i'm having issues to make the macro work.
I've found many topics about exporting data from Outlook to Excel and I managed to copy email's TextBody using split (in rows), but it only worked with text, not with tables.
So I started searching the web for topics about exporting Tables, and I did find one. Although, it talks about importing Tables from Outlook using Excel VBA, not exactly what i'm trying to do. I tried to edit this code in order to work when running from Outlook, but it didn't work.
References:
Here's the code:
Option Explicit
Public Sub SalvaExcel()
'This macro writes an Outlook email's body to an Excel workbook
Dim olApp As Outlook.Application
Dim olMail As Outlook.MailItem
Dim olFolder As Outlook.Folder
Dim olItems As Outlook.Items
Dim olNameSpace As Outlook.NameSpace
Dim olHTML As MSHTML.HTMLDocument: Set olHTML = New MSHTML.HTMLDocument
Dim olEleColl As MSHTML.IHTMLElementCollection
Dim xlApp As Excel.Application
Dim ExcelWkBk As Excel.Workbook
Dim FileName As String
'Dim TextBody As String
'Dim iArr() As String
Dim eRow As Integer
Dim xlUp As Integer
Dim i As Long
Dim j As Long
xlUp = -4162
'set email to be saved
Set olApp = Outlook.Application
Set olNameSpace = Application.GetNamespace("MAPI")
Set olFolder = olNameSpace.GetDefaultFolder(olFolderInbox)
Set olItems = olFolder.Items
'olItems.Sort ("[ReceivedTime]")
Set olMail = olItems(olItems.Count)
'save Outlook email's html body (tables)
With olHTML
.Body.innerHTML = olMail.HtmlBody
Set olEleColl = .getElementsByTagName("table")
End With
'set excel file to be opened
FileName = "C:\Users\rafael.kobayashi\Desktop\projeto_licitacoes\Palavras-Chave.xlsm"
'create an Excel instance
Set xlApp = Application.CreateObject("Excel.Application")
'in this instance
With xlApp
.Visible = True 'this slows down the macro, but helps during debugging
.ScreenUpdating = False 'reduces flash and increases speed
'open workbook
Set ExcelWkBk = xlApp.Workbooks.Open(FileName)
'in this workbook
With ExcelWkBk
'in [email] worksheet
With .Worksheets("email")
'find first empty row
'eRow = .Range("B" & .Rows.Count).End(xlUp).Row + 1
'write table in excel
Debug.Print olEleColl(0)
For i = 0 To olEleColl(0).Rows.Length - 1
For j = 0 To olEleColl(0).Rows(i).Cells.Length - 1
.Range("A1").Offset(i, j).Value = olEleColl(0).Rows(i).Cells(j).innerText
Next j
Next i
'resize columns (DO NOT)
'.Columns("B:C").AutoFit
End With
'close Workbook and save changes
.Close SaveChanges:=True
End With
'quit excel
.Quit
End With
Set xlApp = Nothing
Set ExcelWkBk = Nothing
Set olMail = Nothing
Set olHTML = Nothing
Set olEleColl = Nothing
End Sub
EDIT: There was a typo in the code, now it seems to be running, I can see that Excel opens then closes very quickly when I run the macro. However, when I open the workbook, the sheet where the tables were supposed to be is blank :(
EDIT2: I have tested the macro in an mail item where i inserted a random table and it worked, but it won't work with the tables in the mail that i showed.
EDIT3: I've found out that it wasn't working because the first table found didn't have any text in innerText, so I tested a macro that gets all the tables and it worked!
Change that line to this instead
For i = 0 To olEleColl(0).Rows.Length - 1
(You spelled Length wrong)
I've found out that it wasn't working because the first table found didn't have any text in innerText, so I tested a macro that gets all the tables and it worked!
Here's the code:
Public Sub SalvaExcel(item As Outlook.MailItem)
'This macro writes an Outlook email's tables to an Excel workbook
Dim olApp As Outlook.Application
Dim olMail As Outlook.MailItem
Dim olFolder As Outlook.Folder
Dim olItems As Outlook.Items
Dim olNameSpace As Outlook.NameSpace
Dim olHTML As MSHTML.HTMLDocument: Set olHTML = New MSHTML.HTMLDocument
Dim olEleColl As MSHTML.IHTMLElementCollection
Dim xlApp As Excel.Application
Dim ExcelWkBk As Excel.Workbook
Dim FileName As String
Dim eRow As Long
Dim i As Long
Dim j As Long
Dim t
Dim posicao As String
'set email to be saved
'Set olApp = Outlook.Application
'Set olNameSpace = Application.GetNamespace("MAPI")
'Set olFolder = olNameSpace.GetDefaultFolder(olFolderInbox)
'Set olItems = olFolder.Items
'olItems.Sort ("[ReceivedTime]")
'the most recent one
'Set olMail = olItems(olItems.Count)
'save Outlook email's html body (tables)
With olHTML
.Body.innerHTML = item.HtmlBody
Set olEleColl = .getElementsByTagName("table")
End With
'set excel file to be opened
FileName = "C:\Users\rafael.kobayashi\Desktop\projeto_licitacoes\Palavras-Chave.xlsm"
'create an Excel instance
Set xlApp = Application.CreateObject("Excel.Application")
'in this instance
With xlApp
.Visible = True 'if True, this slows down the macro, but helps during debugging
.ScreenUpdating = False 'if False, this reduces flash and increases speed
'open workbook
Set ExcelWkBk = xlApp.Workbooks.Open(FileName)
'in this workbook
With ExcelWkBk
'in [email] worksheet
With .Worksheets("email")
'which row to start
eRow = 1
posicao = "A" & eRow
'write each table in excel
For Each t In olEleColl
For i = 0 To t.Rows.Length - 1
For j = 0 To t.Rows(i).Cells.Length - 1
'ignore any problems with merged cells etc
On Error Resume Next
.Range(posicao).Offset(i, j).Value = t.Rows(i).Cells(j).innerText
On Error GoTo 0
Next j
Next i
'define from which row the next table will be written
eRow = eRow + t.Rows.Length + 1
posicao = "A" & eRow
Next t
End With
'close Workbook and save changes
.Close SaveChanges:=True
End With
'quit excel
.Quit
End With
Set xlApp = Nothing
Set ExcelWkBk = Nothing
'Set olMail = Nothing
Set olHTML = Nothing
Set olEleColl = Nothing
End Sub
It exports all the tables from the last received email in the Outlook Inbox to an Excel file. It skips 1 row between one table and the next. Since it gets the most recent email and it runs from Outlook, it's useful to use in a New Rule, so it will be automatic, according to a defined criteria. I hope it helps other people!
edit: in order to run this macro in an Outlook Rule, it's necessary to give the following argument to the Sub, otherwise the macro won't be shown in the list of macros to be chosen for the Rule:
Public Sub SalvaExcel(item As Outlook.MailItem)
I have updated the code in this answer.
Thanks for sharing the code.
Have rectified your code to make it finally work ;)
Public Sub SalvaExcel()
'Public Sub SalvaExcel(item As Outlook.MailItem)
'This macro writes an Outlook email's tables to an Excel workbook
Dim olApp As Outlook.Application
Dim olMail As Outlook.MailItem
Dim olFoldersDefault As Outlook.Folders
Dim olFolder As Outlook.Folder
Dim olItems As Outlook.Items
Dim olNameSpace As Outlook.NameSpace
Dim olHTML As MSHTML.HTMLDocument: Set olHTML = New MSHTML.HTMLDocument
Dim olEleColl As MSHTML.IHTMLElementCollection
Dim xlApp As Excel.Application
Dim ExcelWkBk As Excel.Workbook
Dim FileName As String
Dim eRow As Long
Dim i As Long
Dim j As Long
Dim t
Dim posicao As String
'set email to be saved
'Set olApp = Outlook.Application
'Set olNameSpace = Application.GetNamespace("MAPI")
'Set olFolder = olNameSpace.GetDefaultFolder(olFolderInbox)
'Set olItems = olFolder.Items
'olItems.Sort ("[ReceivedTime]")
'Set olApp = Outlook.Application
Set olNameSpace = Application.GetNamespace("MAPI")
Set newFolder = olNameSpace.GetDefaultFolder(olFolderInbox)
Set olFolder = newFolder.Folders("Projects").Folders("Management").Folders("Notifications")
Set olItems = olFolder.Items
olItems.Sort ("[ReceivedTime]")
'the most recent one
Set olMail = olItems(olItems.Count)
'MsgBox olMail
'MsgBox olMail.HTMLBody
'save Outlook email's html body (tables)
With olHTML
.Body.innerHTML = olMail.HTMLBody
Set olEleColl = .getElementsByTagName("table")
End With
'set excel file to be opened
FileName = "D:\OutlookEmails.xlsm"
'create an Excel instance
Set xlApp = Application.CreateObject("Excel.Application")
'in this instance
With xlApp
.Visible = True 'if True, this slows down the macro, but helps during debugging
.ScreenUpdating = False 'if False, this reduces flash and increases speed
'open workbook
Set ExcelWkBk = xlApp.Workbooks.Open(FileName)
'in this workbook
With ExcelWkBk
'in [email] worksheet
With .Worksheets("emails")
'which row to start
eRow = 1
posicao = "A" & eRow
'write each table in excel
For Each t In olEleColl
For i = 0 To t.Rows.Length - 1
For j = 0 To t.Rows(i).Cells.Length - 1
'ignore any problems with merged cells etc
On Error Resume Next
.Range(posicao).Offset(i, j).Value = t.Rows(i).Cells(j).innerText
On Error GoTo 0
Next j
Next i
'define from which row the next table will be written
eRow = eRow + t.Rows.Length + 1
posicao = "A" & eRow
Next t
End With
'close Workbook and save changes
.Close SaveChanges:=True
End With
'quit excel
.Quit
End With
Set xlApp = Nothing
Set ExcelWkBk = Nothing
'Set olMail = Nothing
Set olHTML = Nothing
Set olEleColl = Nothing
End Sub

Unable to click at hyperlink on webpage with anchor tag

After testing of different logic's, finally I stuck in Visual Basic for Applications to find out the right way to trigger the below attribute:
I want to click on hyperlink which does not remain same, it shows different numbers with hyperlink on every next attempt and below is my VBA code:
Dim MyBrowser As InternetExplore
Dim MyHTML_Element As IHTMLElement
Dim myURL As String
Dim htmlInput As HTMLInputElement
Dim htmlColl As IHTMLElementCollection
Dim p As String
Dim link As Object
Dim I As Integer
Dim ie As SHDocVw.InternetExplorer
Dim doc As MSHTML.HTMLDocument
myURL = "url............."
Set MyBrowser = New InternetExplorer
MyBrowser.Silent = True
MyBrowser.navigate myURL
MyBrowser.Visible = True
Do
Loop Until MyBrowser.readyState = READYSTATE_COMPLETE
Set HTMLDoc = MyBrowser.Document
If htmldoc.all.item(i).innerText = Range("K20").Value Then ' Range is equal to cell value "4000123486736"
htmldoc.all.item(i).Click <------- not woking both lines
Please also see inspects on IE appended below:
Of course this cannot work
If htmldoc.all.item(i).innerText = Range("K20").Value Then ' Range is equalto cell value "4000123486736"
htmldoc.all.item(i).Click <------- not woking both lines
because there is no loop that defines i.
I suggest to loop through all link tags <a> only:
Dim LinkItem As Variant
For Each LinkItem In HTMLDoc.getElementsByTagName("a")
If LinkItem.innerText = Range("K20").Value Then
LinkItem.Click
Exit For 'stop looping when link was found
End If
Next LinkItem

Getting information from HTML page via VBA

From VBA, I am trying to access to the "username" cell from a web page so that I could type in the appropriate username.
The problem is that in the HTML code from the page we have more than one element with the same name which is "LOGON_USERID" and I can't figure out how to access to the right one.
As you can see on the image "part of the HTML code", the line I'm trying to access to is the highlighted one, but there are also 2 other elements which have the same name above it.
part of the HTML code
I tried lots of different ways (using different methods or variable types etc), but since I'm not familiar with HTML I can't manage to get what I want.
Sub Pum()
Dim ie As New InternetExplorer
'Dim IEDoc As IHTMLElementCollection
Dim IEDoc As HTMLDocument
Dim name As Object
Dim nameList As HTMLInputElement
Dim WRONGS As DispHTMLElementCollection
Dim Elems As HTMLElementCollection
Dim i As Integer
ie.navigate "thewebsiteinquestion"
ie.Visible = False
WaitIE ie
Set IEDoc = ie.document
'MsgBox IEDoc.DocumentElement.
'Elems = IEDoc.getElementsByTagName("INPUT")
MsgBox TypeName(IEDoc.getElementById("LOGON_USERID").all)
Set Elems = IEDoc.getElementById("LOGON_USERID")
'For i = 0 To 5
MsgBox Elems.Length
'Next i
For Each name In Elems.Children
MsgBox name.nodeName
MsgBox name.Attributes
MsgBox name.all
Next
'If ((NameStr Isnot Nothing And (NameStr.Length <> 0)) Then
'If NameStr = "LOGON_USERID" Then
'If TypeName(IEDoc.all("LOGON_USERID")) = "HTMLInputElement" Then
'MsgBox TypeName(IEDoc.all("LOGON_USERID"))
'Set names = IEDoc.all.Item("text")
'TypeName (InputUsernameTextzone)
'Dim Question As IHTMLElement
'Question = InputUsernameTextzone.parentElement
'MsgBox TypeName(InputUsernameTextzone.parentElement.getAttribute("name"))
'InputUsernameTextzone.parentElement
'CELLULE.value = "qtc2464"
WaitIE ie
Set ie = Nothing
Set IEDoc = Nothing
End Sub
I tried two other similar codes using different methods but I still have no results. Hopefully you can help me.
If you need more information, let me know.
The other two input elements are of different type (they are hidden) so you could use querySelector with attribute type=text to find your desired element.
Dim userid As HTMLInputElement
Set userid = IEDoc.querySelector("input[name='LOGON_USERID'][type='text']")
If Not userid Is Nothing Then
' Continue with user id element
Else
MsgBox "LOGON_USERID not found on the page"
End If
I am a newbie at this but if this could help anyone, here's the simplified version of the macro I made :
Sub Access_Puma()
Dim ie As New InternetExplorer
Dim IEDoc As HTMLDocument
Dim userid As HTMLInputElement
Dim userpwd As HTMLInputElement
ie.navigate "thewebsitetoaccess"
ie.Visible = True
WaitIE ie
Set IEDoc = ie.document
Set userid = IEDoc.querySelector("input[name='LOGON_USERID'][type='text']")
If Not userid Is Nothing Then
userid.value = "myusername"
Else
MsgBox "LOGON_USERID not found on the page"
End If
Set userpwd = IEDoc.querySelector("input[name='LOGON_PASSWD'][type='password']")
If Not userpwd Is Nothing Then
userpwd.value = "mypassword"
Else
MsgBox "LOGON_PASSWD not found on the page"
End If
End Sub

Copy data from HTML

I am trying to learn how to parse data from HTML using Excel VBA. So I found one example online which works fine but when I change URL address from www.yahoo.com to local HTML file on C it gives me error i.e. Method 'busy' of object 'IwebBrowser2' failed. Code is:
Sub GetBodyText()
Dim URL As String
Dim Data As String
URL = "file:///C:/test.html"
Dim ie As Object
Dim ieDoc As Object
Set ie = CreateObject("InternetExplorer.Application")
ie.navigate URL
Do Until (ie.readyState = 4 And Not ie.Busy)
DoEvents
Loop
Set ieDoc = ie.Document
Data = ieDoc.body.innerText
'Split Data into separate lines
'or just use Range("A1")=data
Dim myarray As Variant
myarray = Split(Data, vbCrLf)
For i = 0 To UBound(myarray)
'Start writing in cell A1
Cells(i + 1, 1) = myarray(i)
Next
ie.Quit
Set ie = Nothing
Set ieDoc = Nothing
End Sub
For IE, just use:
URL = "c:\test.html"