I'm creating a program that can visit each link that is located in Column A and download it as HTML/PDF into my local storage. So far, the program can visit each link listed in the column.
Code for opening the links one by one:
Sub visitLinks()
Dim extractedLinks As Range
Dim urls As String
Dim appIE As Object
Dim LastRow As Long
Dim rCell As Range
Dim rRng As Range
Application.ScreenUpdating = False
Set appIE = CreateObject("InternetExplorer.Application")
Set sht = ThisWorkbook.Worksheets("s1")
LastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
Set rRng = sht.Range("A2:A" & LastRow)
For Each rCell In rRng.Cells
With appIE
.navigate rCell.Value
.Visible = True
End With
Do While appIE.readyState <> 4: Wait 5: Loop
Application.StatusBar = "Downloading as HTML..."
DoEvents
On Error Resume Next
Next rCell
appIE.Quit
Set appIE = Nothing
Application.StatusBar = ""
Application.ScreenUpdating = True
End Sub
------------------------
Private Sub Wait(ByVal nSec As Long)
nSec = nSec + Timer
While nSec > Timer
DoEvents
Wend
End Sub
What I'm trying to do is, after the program visits the first link. It will save the webpage as HTML/PDF and save it into C:......\folder[name].html/.pdf
If it's possible I would also like it to take the file name in column B.
Column A
Column B
google.com
google
facebook.com
facebook
The files located in C:......\folder\ should be google.html/.pdf and facebook.html/.pdf
Related
I am having a problem converting a a class element into a table.
My code is copying the information but is pasting just into one cell and I would like to transform into a table identical as in the website.
Issue: Sheet1.Range("A1").Value = objIE.document.getElementsByClassName("niwa-table regionalIndices")(0).innerText
I am selecting just A1 but I cant figure out how to transform this element into a table.
Thanks in advance.
My Code:
Sub ExtractLastValue()
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True
objIE.navigate ("https://fireweather.niwa.co.nz/region/Otago")
Dim t As Date, ele As Object
Const MAX_WAIT_SEC As Long = 8 '<==Adjust wait time
While objIE.Busy Or objIE.readyState < 4: DoEvents: Wend
t = Timer
Do
DoEvents
On Error Resume Next
Set ele = objIE.document.getElementById("")
If Timer - t > MAX_WAIT_SEC Then Exit Do
On Error GoTo 0
Loop While ele Is Nothing
If Not ele Is Nothing Then
'do something
End If
Sheet1.Range("A1").Value = objIE.document.getElementsByClassName("niwa-table regionalIndices")(0).innerText
End Sub
Result:
My Code:
Sub ExtractLastValue()
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True
objIE.navigate ("https://fireweather.niwa.co.nz/region/Otago")
Dim t As Date, ele As Object
Const MAX_WAIT_SEC As Long = 8 '<==Adjust wait time
While objIE.Busy Or objIE.readyState < 4: DoEvents: Wend
t = Timer
Do
DoEvents
On Error Resume Next
Set ele = objIE.document.getElementById("")
If Timer - t > MAX_WAIT_SEC Then Exit Do
On Error GoTo 0
Loop While ele Is Nothing
If Not ele Is Nothing Then
'do something
End If
Sheet1.Range("A1").Value = objIE.document.getElementsByClassName("niwa-table regionalIndices")(0).innerText
End Sub
Result:
The easiest thing to do is to wrap table's html in a html tag, copy it to the clipboard and paste it to the target range.
Sub ExtractLastValue()
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True
objIE.navigate ("https://fireweather.niwa.co.nz/region/Otago")
Dim t As Date, ele As Object
Const MAX_WAIT_SEC As Long = 8 '<==Adjust wait time
While objIE.Busy Or objIE.readyState < 4: DoEvents: Wend
t = Timer
Do
DoEvents
On Error Resume Next
Set ele = objIE.document.getElementById("")
If Timer - t > MAX_WAIT_SEC Then Exit Do
On Error GoTo 0
Loop While ele Is Nothing
If Not ele Is Nothing Then
'do something
End If
Dim HTML As String
HTML = "<html>" & objIE.document.getElementsByClassName("niwa-table regionalIndices")(0).outerHTML & "</html>"
With CreateObject("new:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") ' New DataObject
.SetText HTML
.PutInClipboard
End With
With Sheet1
.Range("A1").PasteSpecial
With .Range("A1").CurrentRegion
.WrapText = False
.Columns.AutoFit
End With
End With
End Sub
Result
This is the program I created. The goal of this program is to visit each link on a specific cell range and get the "a href" of each listed links.
I used a list of links but there's a certain link that ends with .pdf and from there I get a type mismatch. Is there a way that I could make my program continue and just skip the error that it got from a specific link?
This is the link that causes the error https://ir-web-assets-v.s3.amazonaws.com/uploads/nuggets/5d40644eafe17554cf969aab/Islands_Locals_Program_Guest_FAQ.pdf
Sub extensiveScrape()
Dim extractedLinks As Range 'Links taken from RUN
Dim urls As String 'Links taken from Extensive Search
Dim appIE As Object
Dim LastRow As Long 'Number of rows
Dim rCell As Range
Dim rRng As Range
Dim html2 As HTMLDocument
Dim itemEle As Object
Dim linkurl As Object
Dim y As Integer
Application.ScreenUpdating = False
Set appIE = CreateObject("InternetExplorer.Application")
Set sht = ThisWorkbook.Worksheets("results")
LastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
Set rRng = sht.Range("A1:A" & LastRow)
For Each rCell In rRng.Cells
With appIE
.navigate rCell.Value
.Visible = True
End With
Do While appIE.readyState <> 4: Wait 5: Loop
Application.StatusBar = "Scraping Extensively..."
DoEvents
Set html2 = appIE.document
Set itemEle = html2.getElementsByTagName("a")
y = 1
For Each linkurl In itemEle
Sheets("results").Range("B" & y).Value = linkurl
y = y + 1
Next
'rCell.Offset(0, 1).Value = itemEle
Next rCell
appIE.Quit
Set appIE = Nothing
Application.StatusBar = ""
Application.ScreenUpdating = True
End Sub
Private Sub Wait(ByVal nSec As Long)
nSec = nSec + Timer
While nSec > Timer
DoEvents
Wend
End Sub
I am pulling data from NSE site,
the URL is:https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=VOLTAS&instrument=FUTSTK&type=-&strike=-&expiry=28MAY2020#
I am successfully extract the item using Internet explorer,How ever this method is slow,
so i moved to MSXML2.XMLHTTP60 method,but this method returns null string
please find my codes
Method 1:Works fine
Sub OI_Slow_Method()
Dim ie As New InternetExplorer
Set ie = CreateObject("InternetExplorer.Application")
Dim Link As String
Link = ActiveSheet.Range("C4").Value
ie.Visible = False
ie.navigate Link
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Dim doc As HTMLDocument
Set doc = ie.document
Dim objElement As HTMLObjectElement
Dim sDD As String
doc.Focus
ActiveSheet.Cells(1, 1).Value = doc.getElementById("openInterest").innerText 'Open Interest Value
ie.Quit
ie.Visible = True
Set doc = Nothing
Set ie = Nothing
End Sub
'--------------------------
Method 2:Help required in this method only
Sub OI_Fast_Method()
Dim xhr As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument
Set xhr = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", "https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=VOLTAS&instrument=FUTSTK&type=-&strike=-&expiry=30APR2020#", False
.send
html.body.innerHTML = StrConv(.responseBody, vbUnicode)
End With
Debug.Print html.getElementById("openInterest").Innertext
'The output of this is "<SPAN id=openInterest>??</SPAN>" only question mark returned inside the SPAN
End Sub
I think Tim hit the nail on the head, as always. You are getting some raw XML and the stuff you want is not in that XML. You can do a data dump and get what you want.
Sub DumpData()
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
URL = "https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=VOLTAS&instrument=FUTSTK&type=-&strike=-&expiry=28MAY2020#"
'Wait for site to fully load
ie.Navigate2 URL
Do While ie.Busy = True
DoEvents
Loop
RowCount = 1
With Sheets("Sheet1")
.Cells.ClearContents
RowCount = 1
For Each itm In ie.Document.all
.Range("B" & RowCount) = Left(itm.innerText, 1024)
RowCount = RowCount + 1
Next itm
End With
End Sub
Then you would have to parse the text. It's not hard, but it will be a little extra labor.
Another option may be to download the entire contents of the website, save it as a text file, import the data, and then parse that data.
Sub Sample()
Dim ie As Object
Dim retStr As String
Set ie = CreateObject("internetexplorer.application")
With ie
.Navigate "https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=VOLTAS&instrument=FUTSTK&type=-&strike=-&expiry=28MAY2020#"
.Visible = True
End With
Do While ie.readystate <> 4: Wait 5: Loop
DoEvents
retStr = ie.document.body.innerText
'~> Write the above to a text file
Dim filesize As Integer
Dim FlName As String
'~~> Change this to the relevant path
FlName = "C:\Users\ryans\OneDrive\Desktop\Sample.Txt"
filesize = FreeFile()
Open FlName For Output As #filesize
Print #filesize, retStr
Close #filesize
End Sub
Private Sub Wait(ByVal nSec As Long)
nSec = nSec + Timer
While nSec > Timer
DoEvents
Wend
End Sub
I couldn't get either of your code samples to run on my machine.
I am setting up a spreadsheet that will get the NAV for specific index funds every day from the blackrock website. There are a number of these index funds however and the URL changes for each cell. I am unsure how to make my way down the spreadsheet table I have made and place the target value in another cell.
I have tried listing every possible URL one after another and it seems to work, however, I keep receiving a run-time error so I must assume there is a more efficient way of doing this.
Sub GetCurrentPrices()
Dim ws As Worksheet: Set ws = ThisWorkbook.Worksheets("Sheet1")
Dim text As String
With CreateObject("internetexplorer.application")
.navigate "https://www.blackrock.com/uk/individual/products/xxxxxx/"
Do While .Busy And .readyState <> 4: DoEvents: Loop
text = .document.getElementsByClassName("header-nav-data")(0).innerText
.Quit
End With
ws.Cells(32, 1).Value = text
With CreateObject("internetexplorer.application")
.navigate "https://www.blackrock.com/uk/individual/products/yyyyyy/"
Do While .Busy And .readyState <> 4: DoEvents: Loop
text = .document.getElementsByClassName("header-nav-data")(0).innerText
.Quit
End With
ws.Cells(33, 1).Value = text
End Sub
Currently, this code is copied about 22 times for each different URL. I have also tried something like this:
Sub GetCurrentPrices2()
Dim ws As Worksheet: Set ws = ThisWorkbook.Worksheets("Sheet1")
Dim text As String
Dim i As Integer
i=32
With CreateObject("internetexplorer.application")
.navigate "https://www.blackrock.com/uk/individual/products/" & (ws.Range("H2:H24").Value) & "/"
Do While .Busy And .readyState <> 4: DoEvents: Loop
text = .document.getElementsByClassName("header-nav-data")(0).innerText
.Quit
End With
Do Until i > 46
ws.Cells(i,1).Value = text
Loop
End Sub
Assuming that Column A contains the index fund of interest, and that the data starts at Row 2, your code can be re-written as follows...
Sub GetCurrentPrices()
Dim ws As Worksheet
Dim text As String
Dim lastRow As Long
Dim i As Long
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ws
lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
End With
With CreateObject("InternetExplorer.Application")
For i = 2 To lastRow
.navigate "https://www.blackrock.com/uk/individual/products/" & ws.Cells(i, 1).Value & "/"
Do While .Busy And .readyState <> 4: DoEvents: Loop
text = .document.getElementsByClassName("header-nav-data")(0).innerText
ws.Cells(i, 2).Value = text
Next i
.Quit
End With
Set ws = Nothing
End Sub
I had been successfully pulling mutual fund performance data from Marketwatch.com using the following code:
Dim A As Long
Dim B As Long
Dim C As Long
Dim Z As Long
For Z = 1 To 35
Range("A1").Select
ActiveCell.Offset((37 + (Z * 10)), 0).Select
If ActiveCell.Value = "" Then
Exit For
Else
End If
Dim oHTML As Object
Dim oTable As Object
Dim x As Long
Dim Y As Long
Dim vData As Variant
Set oHTML = CreateObject("HTMLFile")
With CreateObject("WinHTTP.WinHTTPRequest.5.1")
.Open "GET", "http://www.marketwatch.com/investing/fund/vfinx", False
.send
oHTML.body.innerhtml = .responsetext
End With
For Each oTable In oHTML.Getelementsbytagname("table")
If oTable.classname = "fundstable" Then
ReDim vData(1 To oTable.Rows.Length, 1 To oTable.Rows(1).Cells.Length)
For x = 1 To UBound(vData)
For Y = 1 To UBound(vData, 2)
vData(x, Y) = oTable.Rows(x - 1).Cells(Y - 1).innertext
Next Y
Next x
With ActiveCell.Offset(1, 0)
.Resize(UBound(vData), UBound(vData, 2)).Value = vData
End With
Exit For
End If
Next oTable
Next Z
Unfortunately, Marketwatch has added a Captcha to stop bots (i.e. me) from scraping their data. I don't know of anyway around this, so I figured I'd try another site.
I looked at Morningstar: http://performance.morningstar.com/fund/performance-return.action?t=VFINX®ion=usa&culture=en_US
It appears that the table I want on that page would be: "table.r_table3 width955px print97" or just "r_table3 width955px print97", but neither one seems to work for me.
Any ideas?
Thanks!
The data is loaded by javascript and won't be available via XMLHTTP request as scripts won't have run to load content.
You can use that second link, for example, with IE and introduce a wait to ensure info is loaded. I show getting the table with that class name at index 1. You can change the index here:
ele.item(1).outerHTML
So, for the next table use clipboard.SetText ele.item(2).outerHTML .
You can also loop the .Length of ele to get each table but ensure you write out to a different cell when you paste:
Dim i As Long
For i = 0 To ele.Length-1
clipboard.SetText ele.item(i).outerHTML
'Etc
Next
VBA:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer, clipboard As Object
Dim ele As Object, ws As Worksheet, t As Date, tableCount As Long
Const MAX_WAIT_SEC As Long = 5
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
With IE
.Visible = True
.navigate "http://performance.morningstar.com/fund/performance-return.action?t=VFINX®ion=usa&culture=en_US"
While .Busy Or .readyState < 4: DoEvents: Wend
With .document
t = Timer
Do
DoEvents
On Error Resume Next
Set ele = .querySelectorAll(".r_table3.print97")
tableCount = ele.Length
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While tableCount < 3
If Not ele Is Nothing Then
clipboard.SetText ele.item(1).outerHTML
clipboard.PutInClipboard
ws.Cells(1, 1).PasteSpecial
End If
End With
.Quit
End With
End Sub