Scrape values from website using VBA - html

Help needed in order to scrape some data from a website.
As a first step i manage to visit the website and import my variables but:
1.i don't know how to press "Convert currencies" button
2.and afterwards to get "Converted Amount" & "Rate" to excel.
any help will be appreciate!!!
Sub Test()
Dim IE As InternetExplorer
Dim Amount As String
Dim Source As String
Dim Target As String
Dim Datestring As String
Amount = 10000
Source = "Euro"
Target = "UK pound sterling"
Datestring = "03-08-2018"
'Open Browser and download data
Set IE = New InternetExplorer
With IE
.Visible = True
.Navigate "http://sdw.ecb.europa.eu/curConverter.do?sourceAmount=" & _
Amount & _
"&sourceCurrency=" & _
Source & _
"&targetCurrency=" & _
Target & _
"&inputDate=" & _
Datestring & _
"&submitConvert.x=209&submitConvert.y=10"
submitConvert.Click
While .Busy Or .readyState < 4: DoEvents: Wend
End With
End Sub

XmlHttpRequest (XHR):
Faster to use XHR where there is no browser opening.
Option Explicit
Public Sub GetRates()
Dim sResponse As String, i As Long, html As New HTMLDocument, clipboard As Object
Dim sourceAmount As String, sourceCurrency As String, targetCurrency As String, inputDate As String
sourceAmount = "10000"
sourceCurrency = "EUR"
targetCurrency = "GBP"
inputDate = "03-08-2018"
Dim url As String
url = "http://sdw.ecb.europa.eu//curConverter.do?sourceAmount=" & sourceAmount & "&sourceCurrency=" & sourceCurrency & _
"&targetCurrency=" & targetCurrency & "&inputDate=" & inputDate & "&submitConvert.x=52&submitConvert.y=8"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
With html
.body.innerHTML = sResponse
Set clipboard = New MSForms.DataObject
clipboard.SetText .querySelectorAll("table.tableopenpage").item(1).outerHTML
clipboard.PutInClipboard
ActiveSheet.Cells(1, 1).PasteSpecial
End With
End Sub
Less robust but if you just want the
converted amount:
.querySelectorAll("table.tableopenpage").item(1).getElementsbytagname("td")(7).innertext
And the rate:
.querySelectorAll("table.tableopenpage").item(1).getElementsbytagname("td")(10).innertext
Output:
References:
Microsoft Forms Object Library (or add a userform to your project)
Microsoft HTML Object Library
Internet Explorer:
The data is already there due to the query string you are using in the URL. No need for a click.
Just use the correct abbreviations for currencies.
Option Explicit
Public Sub Test()
Dim IE As InternetExplorer, Amount As String, Source As String, Target As String
Dim Datestring As String, hTable As HTMLTable
Amount = 10000
Source = "EUR"
Target = "GBP"
Datestring = "03-08-2018"
Dim url As String
url = "http://sdw.ecb.europa.eu/curConverter.do?sourceAmount=" & _
Amount & _
"&sourceCurrency=" & _
Source & _
"&targetCurrency=" & _
Target & _
"&inputDate=" & _
Datestring & _
"&submitConvert.x=209&submitConvert.y=10"
Set IE = New InternetExplorer
With IE
.Visible = True
.navigate url
While .Busy Or .readyState < 4: DoEvents: Wend
Dim clipboard As Object
Set clipboard = New MSForms.DataObject
clipboard.SetText .document.getElementsByClassName("tableopenpage")(1).outerHTML
clipboard.PutInClipboard
ActiveSheet.Cells(1, 1).PasteSpecial
End With
End Sub
If interested in how to click though:
1) Use the correct 3 letter abbreviations for the currencies.
2) You can click the submit button with:
.document.querySelector("input[name=submitConvert]").Click
It uses a CSS selector of
input[name=submitConvert]
This says
element with input tag having an attribute name whose value is submitconvert.
3) You then need a
While .Busy Or .readyState < 4: DoEvents: Wend
to allow the page to refresh.
4) You can then grab the results table with:
.document.querySelectorAll("table.tableopenpage").item(1)
This collects all elements with a tag table and class tableopenpage. You want the second of these, which is 1 on a 0-based index system.
References required:
Microsoft Internet Controls
Microsoft HTML Object Library
Microsoft Forms Object Library
Other:
I find it simpler to grab the table in one go but you could target the rate, for example, more specifically with a CSS selector of:
a[target*=quickview]
Be aware that Excel may swop Date from dd/mm/yyyy to mm/dd/yyyy on output and so you will need to correct this, or at least be aware of it.

Related

Grabbing a single piece of data from a website's HTML and assign it to a variable

I'm working on a project where I grab data that I stored in an excel sheet and search a specific website that can be seen in the code below. Once the website completes the search, I want to grab the "worth" from the top right of the page. I'm fairly new to using VBA with HTML, so I'm not sure how to take the element (worth) that I'm looking for from the web page, and assign it to a variable in VBA so I can paste it into my excel sheet.
Right now I'm able to open IE, insert my data into the search bar of the specific website that I'm using, and click search. What I have is seen below. Thank you in advance!
Sub BrowsetoSite()
Dim IE As New SHDocVw.InternetExplorer
Dim website As String
Dim i As Integer
i = 2
'Set ie = New SHDocVw.InternetExplorer
website = "https://cardmavin.com/category/football"
IE.navigate website
IE.Visible = False
Do While IE.readyState <> READYSTATE_COMPLETE
'assign info to variable to enter into the search bar
Loop
Dim idoc As MSHTML.HTMLDocument
Set idoc = IE.document
Dim Brand As String
Dim Year As String
Dim Num As String
Dim Name As String
Dim search As String
Dim value As Variant
Brand = Range("A" & i).value
Year = Range("B" & i).value
Num = Range("D" & i).value
Name = Range("E" & i).value
search = (Year & " " & Brand & " " & Name & " " & Num)
i=i+1
idoc.getElementById("search-field").value = search
idoc.getElementById("to-mavin").Click
While IE.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
Dim value As Variant
value = idoc.getElementsByTagName("h4")(0).innerText
MsgBox value
IE.Quit
End Sub
The issue that I'm having is the value = idoc.getElementsByTagName("h4")(0).innerText. I've tried to get the element a few different ways, but have been unsuccessful so far.
You need Set idoc = IE.document after you've submitted the search, to get a reference to that new page. Otherwise you're still trying to access the previous page.
i=i+1
idoc.getElementById("search-field").value = search
idoc.getElementById("to-mavin").Click
While IE.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
Set idoc = IE.document '<<<<<<<<<<<<<<
Dim value As Variant
value = idoc.getElementById("worthBox") _
.getElementsByTagName("h4")(0).innerText
MsgBox value
Try this approach. Suppose in cell A1 the string 2008 Topps Thomas DeCoud
Sub Test()
Const sURL As String = "https://mavin.io/search?q="
Dim json As Object
Set json = GetJSONFromHTMLHead(sURL & Application.WorksheetFunction.EncodeURL(Range("A1").Value))
Debug.Print json("offers")("priceCurrency")
Debug.Print json("offers")("price")
End Sub
Function GetJSONFromHTMLHead(ByVal sURL As String) As Object
Dim http As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument, re As Object, json As Object
Set http = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
Set re = CreateObject("VBScript.RegExp")
re.Pattern = "<head>([\s\S]+)<\/head>"
With http
.OPEN "Get", sURL, False
.send
html.body.innerHTML = Replace$(Replace$(re.Execute(.responseText)(0), "<head>", "<body>"), "</head>", "</body>")
End With
Set json = JSONConverter.ParseJson(html.querySelector("script[type='application/ld+json']").innerHTML)
Set GetJSONFromHTMLHead = json
End Function

VBA Web Scraping: Object turns out empty (getelementbyID)

I intend to extract the information from the website below (website is in Malay) containing information on tender awards.
https://tender.selangor.my/results?type=tenders
My code is as below, but the 'tenders' object appears as Nothing causing me unable to proceed further. Hope you can share some guidance on what I am doing wrong.
Worksheets("Data").Cells.Clear
Dim xhr As Object
Dim html As New HTMLDocument
Dim tenders As Object, item As Object, item2 As Object
Dim tender As Object
Dim i As Integer
Set xhr = CreateObject("MSXML2.XMLHTTP")
With xhr
.Open "GET", "https://tender.selangor.my/results?type=tenders", False
.send
If .readyState = 4 And .Status = 200 Then
html.body.innerHTML = .responseText
Else
MsgBox "Error" & vbNewLine & "Ready state: " & .readyState & _
vbNewLine & "HTTP request status: " & .Status
End If
End With
Set tenders = html.getElementById("DataTables_Table_0")
The tabular content that you are interested in are generated dynamically, so you can either make use of Internet Explorer or issue a get http requests with appropriate parameters to parse the json content using third party library. As the first option is easy to go with, I've created an example using the same:
Sub GetInformation()
Dim tenders As Object
With CreateObject("InternetExplorer.Application")
.Visible = True
.navigate "https://tender.selangor.my/results?type=tenders"
While .Busy Or .readyState < 4: DoEvents: Wend
Application.Wait Now + TimeValue("00:00:05")
Set tenders = .document.getElementById("DataTables_Table_0")
Debug.Print tenders.innerText
.Quit
End With
End Sub

VBA Web Scraping, SPAN issue

Im currently trying to scrape some data from a website utilizing but having a complete block on this.
My VBA code is as follows;
Sub WISE()
Dim IE As InternetExplorer
Dim HTML As HTMLDocument
Dim WPage As String
WPage = "www.thencc.org.uk/Our_Members/MemDetails.aspx?CompID=AAA-01-01"
Set IE = New InternetExplorer
IE.Visible = True
IE.navigate WPage
Do While IE.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to " & WPage
DoEvents
Loop
Dim hlpe As String
'Clearly missing something
hlpe = _
HTML.getElementsByTagName("span").getElementsByTagName("b").innerText
Range("a5").Value = hlpe
Set IE = Nothing
Application.StatusBar = ""
End Sub
The website HTML details which i'm trying to get sits in the below;
Essentially i was going to pull the lot then parse in excel, ideally i want the address / Number.
<span id="MainContent_lblDetails"><table class=tabLayout width='90%'> <tr><td style='height:20px'><b>AA Autovan Leisure Ltd</b><br/><br/>Servicing and repairs – mobile specialists in servicing and repairing touring caravans and motorhomes. Contact us for more information<br/><br/>7 Sycamore Lane, Holmfirth, Huddersfield, HD9 7RT<br/>West Yorkshire, England<br/><br/><b>Tel - </b>01484 683636<br/><b>Web - </b><a href='http://www.aaautovanleisure.com' style='color:#0061a0' target='_blank'>www.aaautovanleisure.com</div></td></tr><tr><td align='right'><a href='javascript:history.go(-1)' style='color:#0061a0'> Back </a></td></tr></table></span>
We can parse text related to the element, but we want to use its html so as to have useful delimiters to split on. Also, we can do away with the slow browser opening and issue an XMLHTTP GET request.
Option Explicit
Public Sub GetInfo()
Dim sResponse As String, HTML As New htmldocument, arrayItems() As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://www.thencc.org.uk/Our_Members/MemDetails.aspx?CompID=AAA-01-01", False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
With HTML
.body.innerHTML = sResponse
arrayItems = Split(.querySelector("#MainContent_lblDetails").outerHTML, "<BR>")
Debug.Print "Address: " & arrayItems(4) & Chr$(32) & arrayItems(5)
Debug.Print Replace$(Replace$(Replace$(arrayItems(7), "<B>", vbNullString), "</B", vbNullString), "- >", vbNullString)
End With
End Sub

Copy data from Web to excel using VBA

I have a webpage and need my code to copy the entire data from the page and copy it into the excel sheet, which is not happening right now. My excel sheet is coming to be completely blank. I think the ^a feature is not working on the IE to select the data and then copy it.
Any help is much appreciated. Below is the code I am using.
Sub Webdata()
Dim assetname As String, country As String, area As String, region As String, pth As String, folname As Variant, assetname1 As String
Website = "http://website.com/"
Set myIE = CreateObject("InternetExplorer.Application")
myIE.Navigate source
myIE.Visible = True
Application.Wait Now + TimeSerial(0, 0, 10)
SendKeys "^a"
Application.Wait Now + TimeSerial(0, 0, 2)
SendKeys "^c"
Application.Wait Now + TimeSerial(0, 0, 2)
Sheets.Add
ActiveSheet.Name = "Webdata"
ActiveSheet.Paste
Application.Wait Now + TimeSerial(0, 0, 2)
Range("A1").Select
Cells.Find(What:="Api Number", After:=ActiveCell, LookIn:= _
xlFormulas, LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:= _
xlNext, MatchCase:=False, SearchFormat:=False).Activate
ActiveCell.Offset(1, 0).Select
Selection.Copy
Sheets("Sheet1").Activate
Range("C2").Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
Application.CutCopyMode = False
myIE.Quit
Set myIE = Nothing
Err.Clear
Sheets("Webdata").Select
ActiveSheet.Delete
End Sub
That table is a mess so rather than spending time perfecting how to write out the table to the sheet in the way I normally would i.e. looping rows of tables and table cells within rows, I will stick with your idea of copying the table but use the clipboard, with .SetText, rather than SendKeys.
The table of interest is within nested frames so you have to negotiate those first.
Set hTable = .frames(2).document.getElementsByTagName("table")(0)
Code:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer, html As HTMLDocument, hTable As HTMLTable, clipboard As Object
Application.ScreenUpdating = False
With IE
.Visible = True
.navigate "http://pipeline.wyo.gov/Wellapi.cfm?oops=IDxxxxx&nAPINO=xxxxxx" '<==Input your personal URL here
While .Busy Or .readyState < 4: DoEvents: Wend
Set html = .document
With html
Set hTable = .frames(2).document.getElementsByTagName("table")(0)
Set clipboard = New MSForms.DataObject
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ActiveSheet.Cells(1, 1).PasteSpecial
End With
.Quit
End With
Application.ScreenUpdating = True
End Sub
References:
VBE> Tools > References:
Microsoft Forms 2.0 Object Library
HTML Object Library
Internet Explorer Controls

Web Query where there are multiple Frames

My goal is to scrape the source code of a web page.
The site seems to have different Frames which is why my code won't work properly.
I tried to modify a code which I found online which should solve the Frame issue.
The following code creates an error (object required) at:
Set profileFrame .document.getElementById("profileFrame")
Public Sub IE_Automation()
'Needs references to Microsoft Internet Controls and Microsoft HTML Object Library
Dim baseURL As String
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim profileFrame As HTMLIFrame
Dim slotsDiv As HTMLDivElement
'example URL with multiple frames
baseURL = "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"
Set IE = New InternetExplorer
With IE
.Visible = True
'Navigate to the main page
.navigate baseURL & "/publictrophy/index.htm?onlinename=ace_anubis"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
'Get the profileFrame iframe and navigate to it
Set profileFrame = .document.getElementById("profileFrame")
.navigate baseURL & profileFrame.src
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
Set HTMLdoc = .document
End With
'Display all the text in the profileFrame iframe
MsgBox HTMLdoc.body.innerText
'Display just the text in the slots_container div
Set slotsDiv = HTMLdoc.getElementById("slots_container")
MsgBox slotsDiv.innerText
End Sub
Hummmm, I'm not exactly sure what you are doing here, but can you try the code below?
Option Explicit
Sub Sample()
Dim ie As Object
Dim links As Variant, lnk As Variant
Dim rowcount As Long
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"
'Wait for site to fully load
'ie.Navigate2 URL
Do While ie.Busy = True
DoEvents
Loop
Set links = ie.document.getElementsByTagName("a")
rowcount = 1
With Sheets("Sheet1")
For Each lnk In links
'Debug.Print lnk.innerText
'If lnk.classname Like "*Real Statistics Examples Part 1*" Then
.Range("A" & rowcount) = lnk.innerText
rowcount = rowcount + 1
'Exit For
'End If
Next
End With
End Sub
General:
I think in your research you may have come across this question and misunderstood how it relates/doesn't relate to your circumstance.
I don't think iFrames are relevant to your query. If you are after the list of names, their details and the URLs to their pages you can use the code below.
CSS Selectors
To target the elements of interest I use the following two CSS selectors. These use style infomation on the page to target the elements:
.SearchResults-link
.SearchResults-item
"." means class, which is like saying .getElementsByClassName. The first gets the links, and the second gets the description information on the first page.
With respect to the first CSS selector: The actual link required is dynamically constructed, but we can use the fact that the actual profile URLs have a common base string of "https://www.xing.com/profile/", which is then followed by the profileName. So, in function GetURL, we parse the outerHTML returned by the CSS selector to get the profileName and concatenate it with the BASESTRING constant to get our actual profile link.
Code:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://www.xing.com/publicsearch/query?search%5Bq%5D=IT"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim a As Object, exitTime As Date, linksNodeList As Object, profileNodeList As Object
' exitTime = Now + TimeSerial(0, 0, 5) '<== uncomment this section if timing problems
'
' Do
' DoEvents
' On Error Resume Next
' Set linksNodeList = .document.querySelectorAll(".SearchResults-link")
' On Error GoTo 0
' If Now > exitTime Then Exit Do
' Loop While linksNodeList Is Nothing
Set linksNodeList = .document.querySelectorAll(".SearchResults-link") '<== comment this out if uncommented section above
Set profileNodeList = .document.querySelectorAll(".SearchResults-item")
Dim i As Long
For i = 0 To profileNodeList.Length - 1
Debug.Print "Profile link: " & GetURL(linksNodeList.item(i).outerHTML)
Debug.Print "Basic info: " & profileNodeList.item(i).innerText
Next i
End With
End Sub
Public Function GetURL(ByVal htmlSection As String) As String
Const BASESTRING As String = "https://www.xing.com/profile/"
Dim arr() As String
arr = Split(htmlSection, "/")
GetURL = BASESTRING & Replace$(Split((arr(UBound(arr) - 1)), ">")(0), Chr$(34), vbNullString)
End Function
Example return information: