Trying to only extract a piece of innertext - html

Option Explicit
Sub VBAWebscraping2()
Dim IEObject As Object
Set IEObject = New InternetExplorer
IEObject.Visible = True
IEObject.navigate url:="https://streeteasy.com/building/" & Cells(2, 4).Value
Do While IEObject.Busy = True Or IEObject.readyState <> READYSTATE_COMPLETE
Application.Wait Now + TimeValue("00:00:01")
Loop
Dim IEDocument As HTMLDocument
Set IEDocument = IEObject.document
'GRAB by classname'
Dim IEElements As IHTMLElementCollection
Dim IEElement As IHTMLElement
Set IEElements = IEDocument.getElementsByClassName("details")
For Each IEElement In IEElements
If IEElement.innerText = "price" Then
Debug.Print (IEElement.innerText)
End If
Exit For
Next
'Dim lastRow As Long
'lastRow = Range("A" & Rows.count).End(xlUp).row
End Sub
It goes to the desired location, but when it's trying to extract the price it either doesn't print in the debug menu or grabs too much. Trying to figure out a more precise way to only grab the price, a lot of the tutorials mention id tags but there aren't many on the website.
This is the website I'm trying to scrape https://streeteasy.com/building/the-cambridge-500-east-85-street-new_york/15l?card=1
Also trying to retrieve number of rooms, baths, and neighborhood

Price you can use class
IEDocument.querySelector(".price").innerText
Same for rooms
IEDocument.querySelector(".first_detail_cell").innerText
Beds
IEDocument.querySelector("[class='detail_cell ']").innerText
Baths
IEDocument.querySelector("last_detail_cell").innerText
Looking at how the classes are named, your mileage with different pages will almost certainly vary if any of these items are missing or in a different order. You will then need to loop the element list returned by selecting for class .detail_cell and test the .innerText for the presence of the text "room", "bed" or "bath", and assign accordingly.

Related

How can I scrape multiple pages/links at once using VBA?

I'm currrently trying to scrape info from this Reddit Page. My goal is to make excel open all the posts in new tabs and then I want to scrape information from each of those pages, since the starting page doesn't have as much information.
I've been trying for the last few hours to figure this out, but I'm admittedly pretty confused about how to do it, just overall unsure what to do next, so any pointers would be greatly appreciated!
Here is my current code, it works decently enough but as I said, I'm not sure what I should do next to open the links it finds one by one and scrape each page for data.
The links are scraped off that first page and then added to my spreadsheet right now, but if possible I'd like to just skip that step and scrape them all at once.
Thanks! :)
Sub GetData()
Dim objIE As InternetExplorer
Dim itemEle As Object
Dim upvote As Integer, awards As Integer, animated As Integer
Dim postdate As String, upvotepercent As String, oc As String, filetype As String, linkurl As String, myhtmldata As String, visiComments As String, totalComments As String, removedComments As String
Dim y As Integer
Set objIE = New InternetExplorer
objIE.Visible = False
objIE.navigate (ActiveCell.Value)
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
y = 1
For Each itemEle In objIE.document.getElementsByClassName("flat-list buttons")
visiComments = itemEle.getElementsByTagName("a")(0).innerText
linkurl = itemEle.getElementsByTagName("a")(0).href
Sheets("Sheet1").Range("A" & y).Value = visiComments
Sheets("Sheet1").Range("B" & y).Value = linkurl
y = y + 1
Next
End Sub
You should be able to gather the urls then visit in a loop and write results from page visited to array, then array to sheet. Add this after your existing line
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Add:
Dim nodeList As Object , i As Long, urls(), results()
Note: You are only potentially gaining on the page loads, as VBA is single threaded. To do that you would need to store a reference to each tab, or open all first, then loop through relevant open windows to do the scrape. My preference would be to keep in same tab to be honest.
Set nodeList = ie.document.querySelectorAll(".comments")
Redim urls(0 To nodeList.Length-1)
Redim results(1 to nodeList.Length, 1 to 3)
'Store all urls in an array to later loop
For i = 0 To nodeList.Length -1
urls(i) = nodeList.item(i).href
Next
For i = LBound(urls) To UBound(urls)
ie.Navigate2 urls(i)
While ie.Busy Or ie.Readystate <> 4: DoEvents:Wend
'may need a pause here
results(i + 1, 1) = ie.document.querySelector("a.title").innerText 'title
results(i + 1, 2) = ie.document.querySelector(".number").innerText 'upvotes
results(i + 1, 3) = ie.document.querySelector(".word").NextSibling.nodeValue '%
Next
ActiveSheet.Cells(1,1).Resize(UBound(results,1) , UBound(results,2)) = results

VBA Get HTML Element Info for Changing Id

I am trying to create an excel web scraper that logs into my companies ticket tracking system and logs certain information on the sheet (Lead assigned, Desired Date for the project, etc.). I was doing fine until I had to pull a field off the website that has a changing ID.
For example, on two pages the same field will have the IDs:
"cq_widget_CqFilteringSelect_32"
"cq_widget_CqFilteringSelect_9"
Can somebody provide guidance to how I should search and paste the "IT Lead" value into excel?
HTML snippet of div
Snippet of actual website
Setup in excel
Below is what I have so far
I get confused in this area:
lead = objCollection(i).Value
Sub CQscrub()
Dim i As Long
Dim objElement As Object
Dim objCollection As Object
Dim objCollection2 As Object
Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim numbers() As String
Dim size As Integer
Dim row As Integer
Dim objLead As Object
Dim objLead2 As Object
Dim lead As String
Dim counter As Integer
size = WorksheetFunction.CountA(Worksheets(1).Columns(1)) - 4
ReDim numbers(size)
For row = 10 To (size + 10)
numbers(row - 10) = Cells(row, 1).Value
'Cells(row, 2) = numbers(row - 10)
Next row
Set ie = New InternetExplorer
ie.Height = 1000
ie.Width = 1000
ie.Visible = True
ie.navigate "http://clearquest/cqweb/"
Application.StatusBar = "Loading http://clearquest/cqweb"
Do While ie.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
Application.StatusBar = "Searching form. Please wait..."
'Had these below as comment
Dim WRnumber1 As String
WRnumber1 = Range("A10").Value
'Range("A6").Value = WRnumber1
Dim iLastRow As Integer
Dim Rng As Range
iLastRow = Cells(Rows.Count, "a").End(xlUp).row 'last row of A
'Set objCollection = ie.document.getElementsByTagName("input") originally here
For counter = 0 To size - 1
Set objCollection = ie.document.getElementsByTagName("input")
i = 0
While i < objCollection.Length
If objCollection(i).Name = "cqFindRecordString" Then
objCollection(i).Value = numbers(counter)
End If
i = i + 1
Wend
'''''''''''''''''' Find Label ''''''''''''''''''''''''''''
Set objCollection = ie.document.getElementsByTagName("label")
i = 0
While i < objCollection.Length
If objCollection(i).innerText = "IT Lead/Assigned To" Then
lead = objCollection(i).Value
'Set objLead = objCollection(i)
End If
i = i + 1
Wend
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Range("B" & (iLastRow - (size - counter - 1))).Value = lead
Set objElement = ie.document.getElementById("cqFindRecordButton")
objElement.Click
Do While ie.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
Application.Wait (Now + TimeValue("0:00:02"))
Next counter
ie.Quit
Set ie = Nothing
Set objElement = Nothing
Set objCollection = Nothing
Application.StatusBar = ""
MsgBox "Done!"
End Sub
Note: Website is internal only
Goal: Select Name under "IT Lead/Assigned To" field and paste to Excel
Thanks
Regarding the supplied code, tl;dr.
But if you are wanting the scratched out portion you supplied in your HTML snippet, the following may work (I can't test something that I don't have access to :D).
There are many different ways to grab an element, and this method you are grabbing the first instance of the class name dijitReset dijitInputField dijitInputContainer. Class names are not always a unique value, but due to the somewhat complexity of this class name, I feel somewhat safe that in your case it is.
You could have used one line to Set yourObj... but for demonstration purposes I decided to break it up. 1-liner method to Set your obj:
Set yourObj = doc.getElementsByClassName("dijitReset dijitInputField dijitInputContainer")(0).getElementsByTagName("input")(1)
Code Snippet:
Sub getElementFromIE()
Dim ie As InternetExplorer
' ... your above code pulls up webpage ...
'''''''''''''''''' Find Label ''''''''''''''''''''''''''''
Dim doc As HTMLDocument, yourObj As Object
Set doc = ie.document
' I assume the class name is unique? If so, just append (0) as I did below
Set yourObj = doc.getElementsByClassName("dijitReset dijitInputField dijitInputContainer")(0)
Set yourObj = yourObj.getElementsByTagName("input")(1)
lead = yourObj.Value
End Sub
The reason for the (1) on Set yourObj = yourObj.getElementsByTagName("input")(1) is because there are 2 input tags after your class dijitReset.... You are wanting the 2nd instance of this tag, which contains your value; and as you are probably already aware, you are using Base 0, meaning the 2nd instance is actually the number 1.

Excel Macros - Using Excel Data to scrape HTML page

As with many other questions, I'm really new to using Excel macros and HTML.
I'm building a table with entries of items from Dungeons and Dragons (DnD) that I want to get descriptions of from a particular web page: http://www.5esrd.com/gamemastering/magic-items/ . For example, An entry in my table would be Adamantine Armor, so I would want the macro to search the web page for that listing, scrape the description that's entered on the webpage and save that to the excel document. In this case, that specific description (as per the 5esrd webpage) is as follows:
Armor (medium or heavy, but not hide), uncommon
This suit of armor is reinforced with adamantine, one of the hardest substances in existence. While you’re wearing it, any critical hit against you becomes a normal hit.
On this page, all the items are links to pages that have them grouped by type (Sword, wand, etc.). I would like to iterate through my row of items in Excel, search for that item on the page and scrape its description into my excel table.
I was following along this page: https://www.wiseowl.co.uk/blog/s393/scrape-website-html.htm, which scrapes data from stackoverflow's home page. It directly targets some of the tables that the web page uses to organize its entries. The DnD page I'm trying to use doesn't organize its entries into tables like this, so I'm a little lost with how to proceed.
If anyone could help point me in the right direction, I would be very grateful!
This is the code I use for scraping a web page on our company site. I put all my data in column B, so make adjustments accordingly. This should get you started.
Sub TestScrape()
' SCRAPE A WEB PAGE
Dim ieDoc As New HTMLDocument 'ieDocDocument
Dim tdCollection As Object 'table that has the javascript attributes and contains the element I want to click
Dim AnchorLinks As Object
Dim tdElements As Object
Dim tdElement As Object
Dim AnchorLink As Object
Dim lRow As Long
Dim ie As InternetExplorer
Dim cls As IHTMLElementCollection
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.Left = 0
.TheaterMode = True '<<-- Comment this out if you don't want Theater Mode
.Navigate url:="Enter your URL here"
While .ReadyState < 4 Or .Busy: DoEvents: Wend
End With
Application.Wait Now + TimeSerial(0, 0, 4)
ieDoc.body.innerHTML = ie.Document.body.innerHTML
With ieDoc.body
Set AnchorLinks = .getElementsByTagName("table")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 2).Value = tdElement.innerText
lRow = lRow + 1
Next
Application.Wait (Now + TimeValue("0:00:1"))
With ie
.TheaterMode = False
.Quit
End With
Set ie = Nothing
End Sub

Excel VB Searching for text on a webpage and Copying information in the same Element

I am relatively new VBA.
I am trying to use this code to grab a bit of information from a website. When I do it by Element I have to search for the tag name which is tr and use a number next to it to define which one I want to use. The problem with that is it changes frequently with the position on the website. Currently the Keyword I want to search for and the information it contains is like so:
<tr>
<td class="nt">Operations</td>
<td>Windows</td>
</tr>
So if I can search by the class "Operations", and get the information "Windows", that would help. Also, I currently having an error
Next without For
If possible, is there a way I can use this to do multiple searches before I close the page? So I look for multiple specific words and input that data into different cells before moving onto the next column where it would repeat until completed at the end of the x value. I currently only have it set to x=2 To 5 but I would like to increase that to 10 or higher in the future.
The current code looks like this.
Private Sub Worksheet_Change(ByVal Target As Range)
For x = 2 To 5
If Target.Row = Cells(x, 35).Row And _
Target.Column = Cells(x, 35).Column Then
'If Target.Row = Range("ManufacturerPartNumber").Row And _
'Target.Column = Range("ManufacturerPartNumber").Column Then
Dim IE As New InternetExplorer
'IE.Visible = True
'For x = 2 To 5
'IE.navigate "" & Range("Website_1").Value
IE.navigate "" & Cells(x, 35).Value
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Dim sDD As String
sDD = InStr(1, IE.document.body.innerHTML, "Processor Model")
'sDD = Trim(Doc.getElementsByTagName("Processor Model")(1).innerText) 'Use this with tag like dd and number for which it appears like 0 or 1
IE.Quit
Dim aDD As Variant
aDD = Split(sDD, ",")
Cells(x, 44).Value = aDD(0)
'Range("ProcessorNumberCd").Value = aDD(0)
'Range("OSProvided").Value = aDD(0)
Next x
End If
'MsgBox "Complete"
End Sub
I think you want to grab the 'inner text'. Take a look at the example below.
Sub Scraper()
Dim item As Long
Dim priceStr As String
Dim priceTag As Object
Dim priceTable As Object
item = "10011" 'this will eventually be placed in a loop for multiple searches
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True
' navigate and download the web page
objIE.Navigate "www.google.com"
Do While objIE.ReadyState <> 4 Or objIE.Busy
DoEvents
Loop
'objIE.Document.getElementsByTagName("input")(0).Value = item
'objIE.Document.getElementByID("FDI").Click
Set priceTable = objIE.Document.getElementByID("price_FGC")
Set priceTag = priceTable.getElementsByTagName("u")(3)
priceStr = priceTag.innerText
Sheet1.Range("A1").Value = priceStr
objIE.Quit
End Sub
Also, check out this link for several other ways of how to do other, similar things.
http://www.tushar-mehta.com/publish_train/xl_vba_cases/vba_web_pages_services/index.htm

Search on website conditional of changing different cells

I am currently trying to write a code in VBA, the purpose of this code is in general:
Type input into a cell "A" in a defined range.
Left from this particular cell (within the range) i want different information from a website-search to be printed next to the input-cell "A".
More specifically I want to write a code, where I can type in specific stock tickers in a column (ie. "IBM"), and when I do this, a procedure begins, which goes to finance.yahoo.com, collects different information about this specifik stock (ie. "International Business Machines"), and prints this information "to the right" of the cell where i typed the stock ticker in the first place.
The goal is to be able to type in 20-30 stock tickers, and make it retrieve the information for every ticker and print it to the right of those stock tickers.
I have already figured out how to type in one ticker in a specific cell, and make it retrieve the desired data from the website, and then print it into specific cells "to the right" of the input-cell.
My challenge now, is to be able to do this for a large range of cells below the first input-cell.
My code as of now:
Private Sub Worksheet_Change(ByVal Target As Range)
If Target.Row = Range("Sheet1!$A$2").Row And _
Target.Column = Range("Sheet1!$A$2").Column Then
Dim IE As New InternetExplorer
IE.Visible = False
IE.navigate "http://finance.yahoo.com/q;_ylt=AiMiBWm16z_q5Ai0SlNb3jaiuYdG;_ylu=X3oDMTBxdGVyNzJxBHNlYwNVSCAzIERlc2t0b3AgU2VhcmNoIDEx;_ylg=X3oDMTBsdWsyY2FpBGxhbmcDZW4tVVMEcHQDMgR0ZXN0Aw--;_ylv=3?s=" & Range("Sheet1!$A$2").Value 'This types in the value from my input-cell into the website, so i get directed to the webpage for this particular company.
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Dim Name_001 As String 'Info-cell with name of the company
Dim Ticker_001 As String 'Info-cell with current price of the company
Name_001 = Trim(Doc.getElementsByClassName("title")(0).innerText)
Ticker_001 = Trim(Doc.getElementsByClassName("yfi_rt_quote_summary_rt_top sigfig_promo_1")(0).innerText)
IE.Quit
Dim Nam_001 As Variant
Dim Tic_001 As Variant
Nam_001 = Split(Name_001, "(")
Tic_001 = Split(Ticker_001, " ")
Range("Sheet1!$B$2").Value = Nam_001(0)
Range("Sheet1!$C$2").Value = Tic_001
End If
End Sub
"Sheet1!$A$2": The input-cell
"Sheet1!$B$2": First output cell for the name of the stock ticker.
"Sheet1!$C$2": Second output cell for the current price of the stock ticker.
Right now my code only works for one row, i want it to work for all the rows beneath the current input-cell:
I hope my question is clear, and that you can help me with my problem (I am not allowed to post any pictures, so I cant show you my work book, however I have tickers in column "A", name in column "B" and price in column "C").
Thanks in advance - Juhlers.
I modified your code: (1) changed sheet name; (2) Removed anchored row ($2) and replaced with 'Target.Row'; (3) Added Error Trap - sometimes get error; (4) Changed Cursor to 'Busy' since it takes a few seconds. Try The following:
Option Explicit
Private Sub Worksheet_Change(ByVal Target As Range)
Dim IE As New InternetExplorer
Dim Doc As HTMLDocument
Dim Name_001 As String 'Info-cell with name of the company
Dim Ticker_001 As String 'Info-cell with current price of the company
Dim Nam_001 As Variant
Dim Tic_001 As Variant
On Error GoTo Error_Trap
If Target.column <> 1 Then
Exit Sub
End If
Application.Cursor = xlWait
IE.Visible = False
IE.navigate "http://finance.yahoo.com/q;_ylt=AiMiBWm16z_q5Ai0SlNb3jaiuYdG;_ylu=X3oDMTBxdGVyNzJxBHNlYwNVSCAzIERlc2t0b3AgU2VhcmNoIDEx;_ylg=X3oDMTBsdWsyY2FpBGxhbmcDZW4tVVMEcHQDMgR0ZXN0Aw--;_ylv=3?s=" & Range("Stocks!$A" & Target.row).value 'This types in the value from my input-cell into the website, so i get directed to the webpage for this particular company.
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Set Doc = IE.document
Name_001 = Trim(Doc.getElementsByClassName("title")(0).innerText)
Ticker_001 = Trim(Doc.getElementsByClassName("yfi_rt_quote_summary_rt_top sigfig_promo_1")(0).innerText)
IE.Quit
Nam_001 = Split(Name_001, "(")
Tic_001 = Split(Ticker_001, " ")
Range("Stocks!$B" & Target.row).value = Nam_001(0)
Range("Stocks!$C" & Target.row).value = Tic_001
Application.Cursor = xlNormal
Exit Sub
Error_Trap:
Application.Cursor = xlNormal
MsgBox "Error: " & Err.Number & vbTab & Err.Description
Exit Sub
End Sub