I have the below VBA codes to automate IE, and then extract the figures of the HTML table and populate the data to Excel table. Is it possible to do the same thing by automate Edge Browser? Since my company don't allow us to install any 3rd party application, Selenium is not an option. As I am not too familarize with coding, highly apprecipate if someone can offer some sample codes
Sub sfc_esg_list()
Dim IE As New InternetExplorer
Dim doc As New MSHTML.HTMLDocument
IE.Visible =*emphasized text* True
'use IE browser to navigate SFC website
IE.navigate "https://www.sfc.hk/en/Regulatory-functions/Products/List-of-ESG-funds"
Do
DoEvents
'Application.Wait (Now() + TimeValue("00:00:04"))
Loop Until IE.readyState = 4
Set doc = IE.Document
Set TRs = doc.getElementsByTagName("tr")
Sheets("ESG list_SFC").Activate
'copy and paste the ESG fund list from SFC website to sheets<ESG list_SFC>
With Sheets("ESG list_SFC")
.Cells.Clear
For Each TR In TRs
r = r + 1
For Each Cell In TR.Children
C = C + 1
.Cells(r, C).NumberFormat = "#"
.Cells(r, C) = Cell.innerText
Next Cell
C = 0
Next TR
End With
IE.Quit
Set doc = Nothing
Set IE = Nothing
'Save the file
Application.ScreenUpdating = False
Application.DisplayAlerts = False
'ActiveWorkbook.Save
End Sub
IE is pretty much dead at this point. I think it should be something like this.
Sub TryMe()
Dim xmlhttp As Object
Set xmlhttp = CreateObject("MSXML2.serverXMLHTTP")
Set request = CreateObject("MSXML2.XMLHTTP")
Dim oHtml As HTMLDocument
Dim oElement As Object
Dim htmlText As String
Set oHtml = New HTMLDocument
request.Open "GET", "https://www.sfc.hk/en/Regulatory-functions/Products/List-of-ESG-funds/", False
request.send
oHtml.body.innerHTML = request.responseText
htmlText = oHtml.getElementsByClassName("tablesorter tablesorter-default tablesorterfcd4c178102ad8")(0).outerhtml
With CreateObject("new:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") 'Clipboard
.SetText htmlText
.PutInClipboard
Sheets(1).Range("A1").Select
Sheets(1).PasteSpecial Format:="Unicode Text"
End With
End Sub
I thought the class name was 'tablesorter tablesorter-default tablesorterfcd4c178102ad8' but it doesn't seem to work, and I'm not sure why. Can you play around with some other class names? When you hit F12, you will see the HTML code behind the page.
Related
I am trying to get historical data from Nasdaq directly into excel. I have managed to call up the website, change the "FromDate" input and hit the search button, but I can't scrape the table "historical Output" in a nice format to Excel - please help.
MY VBA CODE
Sub OMX_data()
Dim URL As String
Dim ie As InternetExplorer
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate ("http://www.nasdaqomxnordic.com/indexes/historical_prices?Instrument=DK0060368991")
Do
DoEvents
Loop Until ie.readyState = 4
ie.document.all("FromDate").Value = "2018-01-01"
Set search_button = ie.document.getElementsByClassName("doSearch")
search_button(0).Click
End Sub
MY HTML
<div id="historicalOutput"><table id="historicalTable" class="tablesorter tablesorter-default" border="0" cellpadding="0" cellspacing="0" role="grid">
Adding on to your code supplied, something like so
......search_button(0).Click
Dim d As MSHTML.HTMLDocument
Set d = ie.document
Dim e As MSHTML.HTMLTable
Set e = d.getElementById("historicalTable")
You can then look at the intellisense for this object e and get properties like
e.rows(100).innertext
e.rows.length
Have a look and build a loop out of it
I also changed Loop Until ie.readyState = 4 And Not ie.Busy
Try the script below to get the tabular content from that webpage using the date you wanna start from.
Sub FetchTable()
Const postUrl$ = "http://www.nasdaqomxnordic.com/webproxy/DataFeedProxy.aspx"
Dim Http As New XMLHTTP60, Html As New HTMLDocument
Dim sPostData$, dateFrom$, dateupto$, C&, R&
dateFrom = "2018-01-01"
dateupto = "2020-04-02"
sPostData = "xmlquery=%3Cpost%3E%0A%3Cparam+name%3D%22Exchange%22+value%3D%22NMF%22%2F%3E%0A%3Cparam+name%3D%22SubSystem%22+value%3D%22History%22%2F%3E%0A%3Cparam+name%3D%22Action%22+value%3D%22GetDataSeries%22%2F%3E%0A%3Cparam+name%3D%22AppendIntraDay%22+value%3D%22no%22%2F%3E%0A%3Cparam+name%3D%22Instrument%22+value%3D%22DK0060368991%22%2F%3E%0A%3Cparam+name%3D%22FromDate%22+value%3D%22" & dateFrom & "%22%2F%3E%0A%3Cparam+name%3D%22ToDate%22+value%3D%22" & dateupto & "%22%2F%3E%0A%3Cparam+name%3D%22hi__a%22+value%3D%220%2C1%2C2%2C4%2C21%2C8%2C10%2C11%2C12%22%2F%3E%0A%3Cparam+name%3D%22ext_xslt%22+value%3D%22%2FnordicV3%2Fhi_table.xsl%22%2F%3E%0A%3Cparam+name%3D%22ext_xslt_lang%22+value%3D%22en%22%2F%3E%0A%3Cparam+name%3D%22ext_xslt_hiddenattrs%22+value%3D%22%2Cip%2Civ%2C%22%2F%3E%0A%3Cparam+name%3D%22ext_xslt_tableId%22+value%3D%22historicalTable%22%2F%3E%0A%3Cparam+name%3D%22app%22+value%3D%22%2Findexes%2Fhistorical_prices%22%2F%3E%0A%3C%2Fpost%3E"
With Http
.Open "POST", postUrl, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send sPostData
Html.body.innerHTML = .responseText
End With
For Each elem In Html.getElementById("historicalTable").Rows
For Each trow In elem.Cells
C = C + 1: Cells(R + 1, C) = trow.innerText
Next trow
C = 0: R = R + 1
Next elem
End Sub
Make sure to add the following references before executing the above script:
Microsoft XML, v6.0
Microsoft HTML Object Library
I am trying to extract first name of the actor from the url i am passing,
for my URL,i need to extact "Will Smith" from the HTML page.
Web Page
I know how to extract elements from HTML page using tag,classnaem etc.
But the problem i am facing is when i pass the URL, "https://ssl.ofdb.de/film/138627,I-Am-Legend"
in the response text,i am not at receiving the full HTML page,due to this i am not able extract the content "Will Smith".
I tried other methods like MSXML2.XMLHTTP60 also both returns the partial HTML page only
I have attached my code here,any one please help
Sub Fetch_Info()
Dim ie As New InternetExplorer
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.Top = 0
ie.Left = 700
ie.Width = 1000
ie.Height = 750
ie.AddressBar = 0
ie.StatusBar = 0
ie.Toolbar = 0
ie.navigate "https://ssl.ofdb.de/film/138627,I-Am-Legend"
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Application.Wait Now + TimeValue("00:00:04")
Dim doc As HTMLDocument
Set doc = ie.document
doc.Focus
Debug.Print doc.DocumentElement.innerHTML
End Sub
You can use the following css selector. querySelector returns the first node matched for the css pattern. The pattern is [itemprop='actor'] span which looks for a child span with parent element having attribute itemprop with value actor. Note, I am working off ie.document node.
Debug.Print ie.document.querySelector("[itemprop=actor] span").innerText
That content is static so you could use xhr and avoid overhead of browser. The response header charset is none so you need the response body.
Option Explicit
Public Sub GetActor()
Dim xhr As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument
'required VBE (Alt+F11) > Tools > References > Microsoft HTML Object Library ; Microsoft XML, v6 (your version may vary)
Set xhr = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", "https://ssl.ofdb.de/film/138627,I-Am-Legend", False
.send
html.body.innerHTML = StrConv(.responseBody, vbUnicode)
End With
ActiveSheet.Cells(1, 1) = html.querySelector("[itemprop=actor] span").innerText
End Sub
I have 15 different URLs, and I need to fetch price from the particular website in Excel a particular column, can you please help me out. It's my first VBA program and I try but it show my syntax error.
Sub myfile()
Dim IE As New InternetExplorer Dim url As String Dim item As
HTMLHtmlElement Dim Doc As HTMLDocument Dim tagElements As Object
Dim element As Object Dim lastRow Application.ScreenUpdating =
False Application.DisplayAlerts = False Application.EnableEvents =
False Application.Calculation = xlCalculationManual url =
"https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
IE.navigate url IE.Visible = True Do DoEvents Loop Until
IE.readyState = READYSTATE_COMPLETE
Set Doc = IE.document
lastRow = Sheet1.UsedRange.Rows.Count + 1 Set tagElements =
Doc.all.tags("tr") For Each element In tagElements
If InStr(element.innerText, "ciq-price")> 0 And
InStr(element.className, "ciq-product-name") > 0 Then
Sheet1.Cells(lastRow, 1).Value = element.innerText
' Exit the for loop once you get the temperature to avoid unnecessary processing
Exit For End If Next
IE.Quit Set IE = Nothing Application.ScreenUpdating = True
Application.DisplayAlerts = True Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
End Sub
You can't copy any web scraping macro for your purposes. Every page has it's own HTML code structure. So you must write for every page an own web scraping macro.
I can't explain all about web scraping with VBA here. Please start your recherche for information with "excel vba web scraping" and "document object model". Further you need knowlege about HTML and CSS. In best case also about JavaScript:
The error message user-defined type not defined ocours because you use early binding without a reference to the libraries Microsoft HTML Object Library and Microsoft Internet Controls. You can read here how to set a reference via Tools -> References... and about the differences between early and late binding Early Binding v/s Late Binding and here deeper information from Microsoft Using early binding and late binding in Automation
To get the prices from the shown url you can use the following macro. I use late binding:
Option Explicit
Sub myfile()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim element As Object
Dim item As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
Set tagElements = IE.document.getElementsByClassName("ciq-online-offer-item ")
For Each element In tagElements
Set item = element.getElementsByTagName("td")(1)
ActiveSheet.Cells(lastRow, 1).Value = Trim(item.innerText)
lastRow = lastRow + 1
Next
IE.Quit
Set IE = Nothing
End Sub
Edit for a second Example:
The new link leads to an offer. I assume the price of the product is to be fetched. No loop is needed for this. You just have to find out in which HTML segment the price is and then you can decide how to get it. In the end there are only two lines of VBA that write the price into the Excel spreadsheet.
I'm in Germany and Excel has automatically set the currency sign from Dollar to Euro. This is of course wrong. Depending on where you are, this may have to be intercepted.
Sub myfile2()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://www.wayfair.com/kitchen-tabletop/pdx/cuisinart-air-fryer-toaster-oven-cui3490.html"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
'Break for 3 seconds
Application.Wait (Now + TimeSerial(0, 0, 3))
Set tagElements = IE.document.getElementsByClassName("BasePriceBlock BasePriceBlock--highlight")(0)
ActiveSheet.Cells(lastRow, 1).Value = Trim(tagElements.innerText)
IE.Quit
Set IE = Nothing
End Sub
I have written code that grabs a table from a website and pulls out each cell from that table and then drops them into an excel spreadsheet. The code works flawlessly when the website loads correctly.
The issue is the website does not play nice with internet explorer, therefore the code only executes successfully about half of the time. I could write a routine that checks to see if the website loaded successfully and repeat if it did not, However I want to see if I can get it to work with a WinHTTPRequest.
The lines below are how I access the table using internet explorer based webscraping, with the last line being how i load the table into a variable.
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "https://weather.com/weather/tenday/l/12345:4:US"
IE.Visible = True
Application.Wait (Now() + TimeValue("00:02:00"))
Set doc = IE.document
Set WeatherTable = doc.getElementsByClassName("twc-table")(0)
I can load the website in question via WinHTTPRequest using the code below.
Set doc = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "https://weather.com/weather/tenday/l/12345:4:US", False
.send
doc.body.innerHTML = .responseText
End With
However when I try and grab the table using the line below i get "Run-time error '438': Object doesn't support this property or method.
Set WeatherTable = doc.getElementByclassname("twc-table")(0)
Basically i need the equivalent of this line for WinHTTP webscraping.
I have looked at descending down through the html document(doc.body.all.item(1), etc) but I don't get very far before I run into errors. I have also looked at the Selenium addon, but I don't remember being able to download and install it successfully, and I am not sure if it is even still maintained for current versions of chrome / firefox.
Here is the full code that allows me to get the table via internet explorer webscraping and then drop it onto an excel spreadsheet.
Any help is appreciated.
Sub GetTable2()
Dim IE As Object
Dim doc As HTMLDocument
Dim WeatherTable As HTMLTable
Dim WeatherTableRows As HTMLTableRow
Dim HTMLTableCell As HTMLTableCell
Dim HeaderRow As Boolean
Dim RowCount As Long
Dim ColumnCount As Long
Dim i As Long
RowCount = 1
ColumnCount = 1
HeaderRow = True
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "https://weather.com/weather/tenday/l/12345:4:US"
IE.Visible = True
'Application.Wait (Now() + TimeValue("00:02:00"))
Set doc = IE.document
Set WeatherTable = doc.getElementsByClassName("twc-table")(0)
For Each WeatherTableRows In WeatherTable.Rows
i = 1
For Each HTMLTableCell In WeatherTableRows.Cells
If HeaderRow = True Then
ThisWorkbook.Sheets("Sheet5").Cells(RowCount, ColumnCount).Value = HTMLTableCell.innerText
ColumnCount = ColumnCount + 1
Else
If i = 1 Then
i = i + 1
Else
ThisWorkbook.Sheets("Sheet5").Cells(RowCount, ColumnCount).Value = HTMLTableCell.innerText
ColumnCount = ColumnCount + 1
End If
End If
Next HTMLTableCell
HeaderRow = False
ColumnCount = 1
RowCount = RowCount + 1
Next WeatherTableRows
IE.Quit
Set IE = Nothing
Set doc = Nothing
End Sub
You missed an s. It is plural as you are getting a collection of elements by className.
Set WeatherTable = doc.getElementsByClassName("twc-table")(0)
To make your approach slightly cleaner, you can try this way as well.
Sub FetchTabularData()
Dim elem As Object, trow As Object, S$, R&, C&
[B1:G1] = [{"Day","Description","High/Low","Precip","Wind","Humidity"}]
With New WinHttp.WinHttpRequest
.Open "GET", "https://weather.com/weather/tenday/l/12345:4:US", False
.send
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
For Each elem In .querySelector(".twc-table").getElementsByTagName("tr")
For Each trow In elem.getElementsByTagName("td")
C = C + 1: Cells(R + 1, C) = trow.innerText
Next trow
C = 0: R = R + 1
Next elem
End With
End Sub
Reference to add:
Microsoft HTML Object Library
Microsoft WinHTTP Services, version 5.1
I'm trying to automate going to a website and pulling the ratings from several apps.
I've figured out how to navigate and login to the page.
How do I pull the element - the number "3.3" in this case - from this specific section into Excel.
Being unfamiliar with HTML in VBA, I got this far following tutorials/other questions.
Rating on website and the code behind it
Sub PullRating()
Dim HTMLDoc As HTMLDocument
Dim ie As InternetExplorer
Dim oHTML_Element As IHTMLElement
Dim sURL As String
On Error GoTo Err_Clear
sURL = "https://www.appannie.com/account/login/xxxxxxxxxx"
Set ie = New InternetExplorer
ie.Silent = True
ie.navigate sURL
ie.Visible = True
Do
'Wait until the Browser is loaded
Loop Until ie.readyState = READYSTATE_COMPLETE
Set HTMLDoc = ie.Document
HTMLDoc.all.Email.Value = "xxxxxxxxx#xxx.com"
HTMLDoc.all.Password.Value = "xxxxx"
For Each oHTML_Element In HTMLDoc.getElementById("login-form")
If oHTML_Element.Type = "submit" Then oHTML_Element.Click: Exit For
Next
Dim rating As Variant
Set rating = HTMLDoc.getElementsByClassName("rating-number ng-binding")
Range("A1").Value = rating
'ie.Refresh 'Refresh if required
Err_Clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
End Sub
The code below will let you extract text from first element with class name "rating-number ng-binding" in HTML document. By the way GetElementsByClassName is supported since IE 9.0. I use coding compatible also with older versions in my example.
Dim htmlEle1 as IHTMLElement
For Each htmlEle1 in HTMLDoc.getElementsByTagName("div")
If htmlEle1.className = "rating-number ng-binding" then
Range("A1").Value = htmlEle1.InnerText
Exit For
End if
Next htmlEle1
While Ryszards code should do the trick if you want to use the code you have already written then here is the alterations I believe you need to make.
For Each oHTML_Element In HTMLDoc.getElementById("login-form")
If oHTML_Element.Type = "submit" Then oHTML_Element.Click: Exit For
Next
'Need to wait for page to load before collecting the value
Loop Until ie.readyState = READYSTATE_COMPLETE
Dim rating As IHTMLElement
Set rating = HTMLDoc.getElementsByClassName("rating-number ng-binding")
'Need to get the innerhtml of the element
Range("A1").Value = rating.innerhtml