I am trying to automate IE through VBA to click a button that is embedded in an "iframe" as on screenshot below (image in the last line):
However my code is unsuccesful in referencing the button (the last line doesn't print any value):
Sub intercept()
Dim my_title As String
my_title = "Accounting"
marker = 0
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
For x = 0 To (IE_count - 1)
On Error Resume Next ' sometimes more web pages are counted than are open
page_url = objShell.Windows(x).Document.Location
page_title = objShell.Windows(x).Document.Title
If page_title Like my_title & "*" Then 'compare to find if the desired web page is already open
Set IE = objShell.Windows(x)
marker = 1
Exit For
Else
End If
Next
If marker = 0 Then
MsgBox ("A matching webpage was NOT found")
Else
MsgBox ("A matching webpage was found")
End If
Set mytable = IE.Document.frames(1).Document.tables(1)
Debug.Print mytable.getElementsByTagName("td").Length
End Sub
I have also tried: IE.Document.frames(0).Document.tables(0).
Any help on how to click on this element is appreciated.
Related
I am using Internet Explorer to click on a file.
I get to a point where an Internet Explorer pop-up appears saying "Do you want to open or save the file?":
I want to write a VBA code that clicks on the save button.
I realized it is not possible to right click and "inspect element" in order to show the HTML page because the pop-up is not part of the internet explorer webpage.
So I tried the sendKeys method even though it is not reliable. I tried different options such as :
Application.SendKeys "%S"
Application.SendKeys "%s"
Application.SendKeys "%{S}"
Application.SendKeys "%{s}"
SendKeys ("%S")
SendKeys ("%s")
SendKeys ("%{S}")
SendKeys ("%{s}")
Application.SendKeys "%{S}"
When I run the code, none of them save the file.
Where is my error?
Are there other propositions to click on that "Save" button?
Maybe the object to which I am applying SendKeys should not be "Application"?
If you wish to use the UIAutomationCore.dll and reference it, you can do something like:
Public Function AutoSave() As Boolean
On Error Goto handler
Dim sysAuto As New UIAutomationClient.CUIAutomation
Dim ieWindow As UIAutomationClient.IUIAutomationElement
Dim cond As IUIAutomationCondition
Set cond = sysAuto.CreateAndCondition(sysAuto.CreatePropertyCondition(UIA_NamePropertyId, "Notification"), _
sysAuto.CreatePropertyCondition(UIA_PropertyIds.UIA_ControlTypePropertyId, UIA_ToolBarControlTypeId))
Set ieWindow = sysAuto.GetRootElement.FindFirst(TreeScope_Descendants, cond)
Dim tField As UIAutomationClient.IUIAutomationElement
Dim tFieldCond As IUIAutomationCondition
Set tFieldCond = sysAuto.CreatePropertyCondition(UIA_ControlTypePropertyId, UIA_ControlTypeIds.UIA_SplitButtonControlTypeId)
Set tField = ieWindow.FindFirst(TreeScope_Descendants, tFieldCond)
Dim invPattern As UIAutomationClient.IUIAutomationInvokePattern
Set invPattern = tField.GetCurrentPattern(UIA_InvokePatternId)
invPattern.Invoke
AutoSave = True
Exit Function
handler:
End Function
And call that routine after clicking on the item - perhaps give it a Hard Wait to allow the Notification bar to show.
EDIT
To invoke the Close Button:
Set cond = sysAuto.CreateAndCondition(sysAuto.CreatePropertyCondition(UIA_NamePropertyId, "Close"), _
sysAuto.CreatePropertyCondition(UIA_ControlTypePropertyId, UIA_ControlTypeIds.UIA_ButtonControlTypeId))
Dim closeButton As IUIAutomationElement
Set closeButton = WaitForElement(ieWindow, cond, 10)
If closeButton Is Nothing Then Exit Sub
Dim clickButtonPattern As IUIAutomationInvokePattern
Set clickButtonPattern = closeButton.GetCurrentPattern(UIA_InvokePatternId)
clickButtonPattern.Invoke
........
Helper function:
Function WaitForElement(rootElement As IUIAutomationElement, condition As IUIAutomationCondition, timeout As Long) As IUIAutomationElement
Dim startTime As Date
startTime = Now
Dim element As IUIAutomationElement
Set element = rootElement.FindFirst(TreeScope_Descendants, condition)
While element Is Nothing And 100000 * (Now - startTime) < timeout
Application.Wait Now + TimeValue("00:00:01")
Set element = rootElement.FindFirst(TreeScope_Descendants, condition)
Wend
Set WaitForElement = element
End Function
You would typically wait until the 'Notification bar Text' element text had changed to have "download has completed" at the end.
This can be done with a couple of helper functions:
Function WaitForTextValue(textElement As IUIAutomationElement, text As String, timeout As Long, Optional exactMatch As Boolean = False) As Boolean
Dim startTime As Date
startTime = Now
Dim result As String
result = ReadValue(textElement)
Dim isMatch As Boolean
If exactMatch Then
isMatch = result = text
Else
isMatch = InStr(1, result, text, vbTextCompare) > 0
End If
'keep reading the element until we have a match, or the timeout has expired
While Not isMatch And 100000 * (Now - startTime) < timeout
Application.Wait Now + TimeValue("00:00:01")
result = ReadValue(textElement)
If exactMatch Then
isMatch = result = text
Else
isMatch = InStr(1, result, text, vbTextCompare) > 0
End If
Wend
WaitForTextValue = isMatch
End Function
Function ReadValue(element As IUIAutomationElement) As String
Dim valPattern As IUIAutomationValuePattern
Set valPattern = element.GetCurrentPattern(UIA_ValuePatternId)
If Not valPattern Is Nothing Then
ReadValue = element.GetCurrentPropertyValue(UIA_ValueValuePropertyId)
Else
' raise error here if element's value cannot be read - err.Raise
End If
End Function
And you would call this check after clicking the 'Save' button like so:
Set cond = sysAuto.CreateAndCondition(sysAuto.CreatePropertyCondition(UIA_NamePropertyId, "Notification bar Text"), _
sysAuto.CreatePropertyCondition(UIA_ControlTypePropertyId, UIA_ControlTypeIds.UIA_TextControlTypeId))
Dim barText As IUIAutomationElement
Set barText = WaitForElement(ieWindow, cond, 10)
If barText Is Nothing Then Exit Sub
If Not WaitForTextValue(barText, "download has completed", 30, False) Then Exit Sub
'if we get to here, then text has changed, and we can go ahead and click close button
Reproduce your problem on my side, it seems that before clicking the Save button, we need time to display the popup file download prompt. So, please try to use the Application.Wait method to wait the prompt display.
Code like this:
Sub Test()
Dim ie As Object
Dim Rank As Object
Set ie = CreateObject("InternetExplorer.application")
ie.Visible = True
ie.Navigate ("http://localhost:54382/HtmlPage47.html")
Do
If ie.ReadyState = 4 Then
Exit Do
Else
End If
Loop
Set doc = ie.document
doc.getElementsByTagName("tr")(1).getElementsByTagName("td")(5).getElementsByTagName("a")(0).Click
Application.Wait (Now + TimeValue("0:00:02"))
Application.SendKeys "%{S}"
End Sub
The screenshot as below:
Here is a piece of code I have been working on to print the title of a window.
Dim my_title2 as Variant
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
MsgBox ("The number of pages is: " & IE_count)
For x = 0 To (IE_count - 1)
On Error Resume Next
my_url = objShell.Windows(x).document.Location
my_title = objShell.Windows(x).document.Title
If my_title Like "F-Engine" & "*" Then
Set ie = objShell.Windows(x)
my_title2 = ie.document.Title
'my_title2 = objShell.Windows(x).document.Title
MsgBox ("The wanted title for the page should corrrespond. " & my_title2)
Exit For
Else
End If
Next
I am having trouble printing the title of the window after Set ie = objShell.Windows(x).
When y_title2 = ie.document.title, the MsgBox displays:
"The wanted title for the page should correspond."
It prints nothing after this sentence. So the title assigned to "ie" is not being displayed.
If my_title2 = objShell.Windows(x).document.title, the MsgBox displays:
"The wanted title for the page should correspond. F-Engine"
Why am I not able to print the title of the page with the first declaration of my_title2?
I am doing this to verify if the page is being correctly picked up after a title "F-Engine" is found. To do so, I am trying to print the value of the title of the Internet Explorer window. It seems like nothing has been set and passed.
Not every object in objShell.Windows represents an IE page/tab - they might be instances of Windows Explorer. In those cases there is no document property to access.
You can test for this instead of using On Error Resume Next:
Dim w As Object, myUrl, myTitle, ie
For Each w In CreateObject("Shell.Application").Windows
If w.Name = "Internet Explorer" Then
myUrl = w.document.Location
myTitle = w.document.Title
Debug.Print myUrl, myTitle
If myTitle Like "F-Engine*" Then
Set ie = w
Debug.Print "Found: " & myTitle
Exit For
End If
End If
Next w
I am in need of editing html code using VBA. I actually got this working as far as editing values of text boxes. My problem is that when I simulate clicking the "submit" button there are new tables that come up. The web address stays the same but now there is new html code generated for the tables. I am trying to read data from these tables but it seems as if they don't exist when I try and query them. So I am guessing that I need to update or refresh the IE html code after I press the "submit" button. I can not seem to figure out how to do this. Any help is greatly appreciated. Here is my code so far:
Sub ImportStackOverflowData()
Dim SearchFor As IHTMLElement
Dim RowNumber As Long
RowNumber = 4
'to refer to the running copy of Internet Explorer
Dim ie As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = True
ie.Navigate "http://google.com"
'Wait until IE is done loading page
Do While ie.ReadyState <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to TRSDataBase ..."
DoEvents
Loop
Set html = ie.Document
Application.StatusBar = ""
'clear old data out and put titles in
Cells.Clear
Set SearchFor = html.getElementById("ddl_process")
'if this is the tag containing the question details, process it
If SearchFor.ID = "ddl_process" Then
'Replace the value of dl-process with copperhead name
Call SearchFor.setAttribute("value", "CPHB_FAT")
Cells(RowNumber, 1).Value = "Successfully replaced ddl_process to : " &
SearchFor.getAttribute("value")
'go on to next row of worksheet
RowNumber = RowNumber + 1
End If
Set SearchFor = html.getElementById("txt_startdate")
If SearchFor.ID = "txt_startdate" Then
'Replace the value of dl-process with copperhead name
Call SearchFor.setAttribute("value", "07-07-17")
Cells(RowNumber, 1).Value = "Successfully replaced startdate to : " &
SearchFor.getAttribute("value")
'go on to next row of worksheet
RowNumber = RowNumber + 1
End If
Set SearchFor = html.getElementById("txt_enddate")
If SearchFor.ID = "txt_enddate" Then
'Replace the value of dl-process with copperhead name
Call SearchFor.setAttribute("value", "07-14-17")
Cells(RowNumber, 1).Value = "Successfully replaced enddate to : " &
SearchFor.getAttribute("value")
'go on to next row of worksheet
RowNumber = RowNumber + 1
End If
'find view button and click it
Set SearchFor = html.getElementById("btn_header")
If SearchFor.ID = "btn_header" Then
SearchFor.Click
Cells(RowNumber, 1).Value = "The View Button has been clicked."
'go on to next row of worksheet
RowNumber = RowNumber + 1
End If
'Now get data from table after it loads
Application.Wait (Now + TimeValue("0:00:20"))
Set html = ie.Document <----------This is where i am trying to update or refresh my code after it loads with the new tables
Debug.Print ie.Document.body.innerHTML
Range("L5").Value = ie.Document.getElementsByTag("table")
(1).Rows(1).Cells(2).innerText
Try getting a new pointer to the window. Sometimes that does the trick.
Public Function FindWindow(SearchBy As String, SearchCriteria As String) As Object
Dim Window As Object
For Each Window In CreateObject("Shell.Application").Windows
If SearchBy = "URL" And Window.LocationURL Like "*" & SearchCriteria & "*" Then
Set FindWindow = Window
Exit Function
ElseIf SearchBy = "Name" And Window.LocationName Like "*" & SearchCriteria & "*" Then
Set FindWindow = Window
Exit Function
End If
Next
Set FindWindow = Nothing
End Function
Sub getNewPointer()
Dim ie As InternetExplorer
ie.Navigate "www.google.com"
'Wait for the page to load and do actions etc
'Your code here
'
'
'Clear the IE reference
Set ie = Nothing
'get a new pointer to the window of interest
'Keep in mind it finds the first matching Window based on your criteria!
Set ie = FindWindow("URL", "www.google.ca")
'Try getting the property you want
Debug.Print
End Sub
In the below code I'm trying to click on the "About" link (href) in the www.google.co.in website. This worked on IE11 (Windows 10), but is not working for IE10 (Windows 7). Is this in anyway machine dependent. If not what is the right code?
Please remember I'm trying to click on a link in an already opened browser window.
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
For x = 0 To (IE_count - 1)
On Error Resume Next ' sometimes more web pages are counted than are open
my_url = objShell.Windows(x).Document.Location
my_title = objShell.Windows(x).Document.Title
'You can use my_title of my_url, whichever you want
If my_title Like "Google" & "*" Then 'identify the existing web page
Set ie = objShell.Windows(x)
Exit For
Else
End If
Next
Dim LinkHref
Dim a
LinkHref = "//www.google.co.in/intl/en/about.html?fg=1"
For Each a In ie.Document.GetElementsByTagName("A")
If LCase(a.GetAttribute("href")) = LCase(LinkHref) Then
a.Click
Exit For ''# to stop after the first hit
End If
Next
You can achieve the goal with descriptive programming in QTP (if you don't want to use the object repository for some reason). This code should give you an example of what you can do:
Dim oDesc ' create a Description object for objects of class Link
Set oDesc = Description.Create
oDesc("micclass").value = "Link"
'Find all the Links in the browser using ChildObjects
Set obj = Browser("title=Google").Page("title=Google").ChildObjects(oDesc)
Dim i
'obj.Count value has the number of links in the page
For i = 0 to obj.Count - 1 ' indexed from zero, so use 0 to Count -1
'get the name of all the links in the page
If obj(i).GetROProperty("innerhtml")= LinkHref Then
obj(i).Click 'click the link if it matched the href you specfied
Exit For ' no need to carry on the loop if we found the right link
End If
Next
If you just need to use vbscript, you can do it like this:
Dim oShell : Set oShell = CreateObject("Shell.Application")
Dim oWindow
For Each oWindow In oShell.Windows
If InStr(oWindow.FullName, "iexplore") > 0 Then
If InStr(1, oWindow.Document.Title, "Google", vbTextCompare) > 0 Then
Set ieApp = oWindow
Exit For
End If
End If
Next
LinkHref = "//www.google.co.in/intl/en/about.html?fg=1"
For Each linky In ieApp.Document.GetElementsbyTagName("a")
If LCase(linky.GetAttribute("href")) = LCase(LinkHref) Then
linky.Click
Exit For
End If
Next
This is pretty much the answer given above by Ansgar, but with a little extra to fix the object error. Only a browser window has the Document.Title, and the loop is working through every window that's open, so you get the error when the loop tries a non IE window. This version fixes that by only checking for the Document.Title if the window has been identified as an IE instance in the first place.
Don't know about QTP, but VBScript doesn't have a Like operator.
This is the usual way to attach to an IE window with a specific title in plain VBScript:
Set app = CreateObject("Shell.Application")
For Each wnd In app.Windows
If wnd.Name = "Internet Explorer" Then
If InStr(1, wnd.Document.Title, "Google", vbTextCompare) > 0 Then
Set ie = wnd
Exit For
End If
End If
Next
Aim
I am looking to scrape 20/20 cricket scorecard data from the Cricinfo website, ideally into CSV form for data analysis in Excel
As an example the current Australian Big Bash 2011/12 scorecards are available from
Game 1: http://www.espncricinfo.com/big-bash-league-2011/engine/match/524915.html
Last Game: http://www.espncricinfo.com/big-bash-league-2011/engine/match/524935.html
Background
I am proficient in using VBA (either automating IE or using XMLHTTP and then using regular expressions) to scrape data from websites, ie
Extract values from HTML TD and Tr
In that same question a comment was posted suggesting html parsing - which I hadn't come accross before - so I have taken a look at questions such as RegEx match open tags except XHTML self-contained tags
Query
While I could write a regex to parse the cricket data below I would like advice as to how I could efficiently retrieve these results with html parsing.
Please bear in mind that my preference is a repeatable CSV format containing:
the date/name of the match
Team 1 name
the output should dump up to 11 records for Team 1 (blank records where players haven't batted, ie "Did Not Bat")
Team 2 name
the output should dump up to 11 records for Team 2 (blank records where players haven't batted)
Nirvana for me would be a solution that I could deploy using VBA or VBscript so I could fully automate my analysis, but I presume I will have to use a separate tool for the html parse.
Sample Site links and Data to be Extracted
There are 2 techniques that I use for "VBA". I will describe them 1 by one.
1) Using FireFox / Firebug Addon / Fiddler
2) Using Excel's inbuilt facility to get data from the web
Since this post will be read by many so I will even cover the obvious. Please feel free to skip whatever part you know
1) Using FireFox / Firebug Addon / Fiddler
FireFox : http://en.wikipedia.org/wiki/Firefox
Free download (http://www.mozilla.org/en-US/firefox/new/)
Firebug Addon: http://en.wikipedia.org/wiki/Firebug_%28software%29
Free download (https://addons.mozilla.org/en-US/firefox/addon/firebug/)
Fiddler : http://en.wikipedia.org/wiki/Fiddler_%28software%29
Free download (http://www.fiddler2.com/fiddler2/)
Once you have installed Firefox, install the Firebug Addon. The Firebug Addon lets you inspect the different elements in a webpage. For example if you want to know the name of a button, simply right click on it and click on "Inspect Element with Firebug" and it will give you all the details that you will need for that button.
Another example would be finding the name of a table on a website which has the data that you need scrapped.
I use Fiddler only when I am using XMLHTTP. It helps me to see the exact info being passed when you click on a button. Because of the increase in the number of BOTS which scrape the sites, most sites now, to prevent automatic scrapping, capture your mouse coordinates and pass that information and fiddler actually helps you in debugging that info that is being passed. I will not get into much details here about it as this info can be used maliciously.
Now let's take a simple example on how to scrape the URL posted in your question
http://www.espncricinfo.com/big-bash-league-2011/engine/match/524915.html
First let's find the name of the table which has that info. Simply right click on the table and click on "Inspect Element with Firebug" and it will give you the below snapshot.
So now we know that our data is stored in a table called "inningsBat1" If we can extract the contents of that table to an Excel file then we can definitely work with the data to do our analysis. Here is sample code which will dump that table in Sheet1
Before we proceed, I would recommend, closing all Excel and starting a fresh instance.
Launch VBA and insert a Userform. Place a command button and a webcrowser control. Your Userform might look like this
Paste this code in the Userform code area
Option Explicit
'~~> Set Reference to Microsoft HTML Object Library
Private Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Private Sub CommandButton1_Click()
Dim URL As String
Dim oSheet As Worksheet
Set oSheet = Sheets("Sheet1")
URL = "http://www.espncricinfo.com/big-bash-league-2011/engine/match/524915.html"
PopulateDataSheets oSheet, URL
MsgBox "Data Scrapped. Please check " & oSheet.Name
End Sub
Public Sub PopulateDataSheets(wsk As Worksheet, URL As String)
Dim tbl As HTMLTable
Dim tr As HTMLTableRow
Dim insertRow As Long, Row As Long, col As Long
On Error GoTo whoa
WebBrowser1.navigate URL
WaitForWBReady
Set tbl = WebBrowser1.Document.getElementById("inningsBat1")
With wsk
.Cells.Clear
insertRow = 0
For Row = 0 To tbl.Rows.Length - 1
Set tr = tbl.Rows(Row)
If Trim(tr.innerText) <> "" Then
If tr.Cells.Length > 2 Then
If tr.Cells(1).innerText <> "Total" Then
insertRow = insertRow + 1
For col = 0 To tr.Cells.Length - 1
.Cells(insertRow, col + 1) = tr.Cells(col).innerText
Next
End If
End If
End If
Next
End With
whoa:
Unload Me
End Sub
Private Sub Wait(ByVal nSec As Long)
nSec = nSec + Timer
While Timer < nSec
DoEvents
Sleep 100
Wend
End Sub
Private Sub WaitForWBReady()
Wait 1
While WebBrowser1.ReadyState <> 4
Wait 3
Wend
End Sub
Now run your Userform and click on the Command button. You will notice that the data is dumped in Sheet1. See snapshot
Similarly you can scrape other info as well.
2) Using Excel's inbuilt facility to get data from the web
I believe you are using Excel 2007 so I will take that as an example to scrape the above mentioned link.
Navigate to Sheet2. Now navigate to Data Tab and click on the button "From Web" on the extreme right. See snapshot.
Enter the url in the "New Web Query Window" and click on "Go"
Once the page is uploaded, select the relevant table that you want to import by clicking on the small arrow as shown in the snapshot. Once done, click on "Import"
Excel will then ask you where you want the data to be imported. Select the relevant cell and click on OK. And you are done! The data will be imported to the cell which you specified.
If you wish you can record a macro and automate this as well :)
Here is the macro that I recorded.
Sub Macro1()
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;http://www.espncricinfo.com/big-bash-league-2011/engine/match/524915.html" _
, Destination:=Range("$A$1"))
.Name = "524915"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = """inningsBat1"""
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
End Sub
Hope this helps. Let me know if you still have some queries.
Sid
For anyone else interested in this I ended up using the code below based on Siddhart Rout's earlier answer
XMLHttp was significantly quicker than automating IE
the code generates a CSV file for each series to be dowloaded (held in the X variable)
the code dumps each match to a regular 29 row range (regardless of how many players batted) to facillitate easier analysis later on
Public Sub PopulateDataSheets_XML()
Dim URL As String
Dim ws As Worksheet
Dim lngRow As Long
Dim lngRecords As Long
Dim lngWrite As Long
Dim lngSpare As Long
Dim lngInnings As Long
Dim lngRow1 As Long
Dim X(1 To 15, 1 To 4) As String
Dim objFSO As Object
Dim objTF As Object
Dim xmlHttp As Object
Dim htmldoc As HTMLDocument
Dim htmlbody As htmlbody
Dim tbl As HTMLTable
Dim tr As HTMLTableRow
Dim strInnings As String
s = Timer()
Set xmlHttp = CreateObject("MSXML2.ServerXMLHTTP")
Set objFSO = CreateObject("scripting.filesystemobject")
X(1, 1) = "http://www.espncricinfo.com/indian-premier-league-2011/engine/match/"
X(1, 2) = 501198
X(1, 3) = 501271
X(1, 4) = "indian-premier-league-2011"
X(2, 1) = "http://www.espncricinfo.com/big-bash-league-2011/engine/match/"
X(2, 2) = 524915
X(2, 3) = 524945
X(2, 4) = "big-bash-league-2011"
X(3, 1) = "http://www.espncricinfo.com/ausdomestic-2010/engine/match/"
X(3, 2) = 461028
X(3, 3) = 461047
X(3, 4) = "big-bash-league-2010"
Set htmldoc = New HTMLDocument
Set htmlbody = htmldoc.body
For lngRow = 1 To UBound(X, 1)
If Len(X(lngRow, 1)) = 0 Then Exit For
Set objTF = objFSO.createtextfile("c:\temp\" & X(lngRow, 4) & ".csv")
For lngRecords = X(lngRow, 2) To X(lngRow, 3)
URL = X(lngRow, 1) & lngRecords & ".html"
xmlHttp.Open "GET", URL
xmlHttp.send
Do While xmlHttp.Status <> 200
DoEvents
Loop
htmlbody.innerHTML = xmlHttp.responseText
objTF.writeline X(lngRow, 1) & lngRecords & ".html"
For lngInnings = 1 To 2
strInnings = "Innings " & lngInnings
objTF.writeline strInnings
Set tbl = Nothing
On Error Resume Next
Set tbl = htmlbody.Document.getElementById("inningsBat" & lngInnings)
On Error GoTo 0
If Not tbl Is Nothing Then
lngWrite = 0
For lngRow1 = 0 To tbl.Rows.Length - 1
Set tr = tbl.Rows(lngRow1)
If Trim(tr.innerText) <> vbNewLine Then
If tr.Cells.Length > 2 Then
If tr.Cells(1).innerText <> "Extras" Then
If Len(tr.Cells(1).innerText) > 0 Then
objTF.writeline strInnings & "-" & lngWrite & "," & Trim(tr.Cells(1).innerText) & "," & Trim(tr.Cells(3).innerText)
lngWrite = lngWrite + 1
End If
Else
objTF.writeline strInnings & "-" & lngWrite & "," & Trim(tr.Cells(1).innerText) & "," & Trim(tr.Cells(3).innerText)
lngWrite = lngWrite + 1
Exit For
End If
End If
End If
Next
For lngSpare = 12 To lngWrite Step -1
objTF.writeline strInnings & "-" & lngWrite + (12 - lngSpare)
Next
Else
For lngSpare = 1 To 13
objTF.writeline strInnings & "-" & lngWrite + (12 - lngSpare)
Next
End If
Next
Next
Next
'Call ConsolidateSheets
End Sub
RegEx is not a complete solution for parsing HTML because it is not guaranteed to be regular.
You should use the HtmlAgilityPack to query the HTML. This will allow you to use the CSS selectors to query the HTML similar to how you do it with jQuery.
As quite a few people may see this I thought I would use it as a chance to demonstrate a few features I rarely see people using in VBA web-scraping: deleteRow, querySelector and use of clipboard to write out a table (complete with formatting and hyperlinks) to a sheet based on the table.outerHTML.
deleteRow is used to remove the unwanted rows. querySelector is used to apply faster css selectors to match on nodes. Modern browsers/html parsers are optimized for css and class selectors (which I use) are the second fastest selector type (after id).
Use of css selectors and understanding htmlTable methods/properties will allow for much greater flexibility in your web-scraping endeavours. Understanding the use of the clipboard means a simple copy paste method for transferring a table to Excel.
Execution could easily be tied to a button push and the url read in from a cell.
VBA:
Option Explicit
Public Sub test()
WriteOutTable "https://www.espncricinfo.com/series/8044/scorecard/524935/hobart-hurricanes-vs-melbourne-stars-big-bash-league-2011-12"
End Sub
Public Sub WriteOutTable(ByVal url As String)
'required VBE (Alt+F11) > Tools > References > Microsoft HTML Object Library ; Microsoft XML, v6 (your version may vary)
Dim hTable As MSHTML.HTMLTable, clipboard As Object
Dim xhr As MSXML2.xmlhttp60, html As MSHTML.htmlDocument
Set xhr = New MSXML2.xmlhttp60
Set html = New MSHTML.htmlDocument
With xhr
.Open "GET", url, False
.Send
html.body.innerHTML = .responseText
End With
Set hTable = html.querySelector(".batsman")
rowCount = hTable.Rows.Length - 1
For i = rowCount To 0 Step -1
Select Case True
Case i = rowCount Or i = rowCount - 1 Or InStr(hTable.Rows(i).outerHTML, "wicket-details") > 0
hTable.deleteRow i
End Select
Next
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ActiveSheet.Cells(1, 1).PasteSpecial
End Sub