I'm trying to write a simple code for studying vocabulary and want this code to look up the words in column "A" using my favorite online dictionary "Cambridge" automatically and then print the definitions to the cells next to the words. I have written the code below so far and it goes to the site and searches the word. The question is what code is needed to get the definitions and print them to the cells?
Sub SearchWords()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLInput As MSHTML.IHTMLElement
Dim HTMLButtons As MSHTML.IHTMLElementCollection
Dim HTMLButton As MSHTML.IHTMLElement
IE.Visible = True
IE.Navigate "www.dictionary.cambridge.org"
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.Document
Set HTMLInput = HTMLDoc.getElementById("cdo-search-input")
HTMLInput.Value = ThisWorkbook.Sheets(1).Range("A1").Value
Set HTMLButtons = HTMLDoc.getElementsByClassName("cdo-search__button")
HTMLButtons(0).Click
End Sub
Thanks in advance.
The result appears to be in an element with classname entry. I read your column A search words in to an array and loop that to look up each word. The result is written back out to the sheet. I use css selectors mostly as a more flexible and faster method for selecting elements. css selectors, in this instance, are applied via querySelector method of HTMLDocument (i.e. ie.Document)
Proper page loads waits are used throughout.
Option Explicit
'entry
Public Sub SearchWords()
Dim IE As SHDocVw.InternetExplorer, lookups(), dataSheet As Worksheet, iRow As Long
Set dataSheet = ThisWorkbook.Worksheets("Sheet1")
Set IE = New SHDocVw.InternetExplorer
lookups = Application.Transpose(dataSheet.Range("A2:A3").Value) '<Read words to lookup into a 2d array and transpose into 1D
With IE
.Visible = True
.Navigate2 "www.dictionary.cambridge.org"
While .Busy Or .readyState <> 4: DoEvents: Wend
For iRow = LBound(lookups) To UBound(lookups)
.document.getElementById("cdo-search-input").Value = lookups(iRow) 'work off .document to avoid stale elements
.document.querySelector(".cdo-search__button").Click
While .Busy Or .readyState <> 4: DoEvents: Wend 'wait for page reload
Application.Wait Now + TimeSerial(0, 0, 1)
Do
Loop While .document.querySelectorAll(".entry").Length = 0
dataSheet.Cells(iRow + 1, 2) = .document.querySelector(".entry").innerText
Next
.Quit
End With
End Sub
Done! Perfectly working. (Since this post is too long for a comment, I had to post this as an answer) Now I am trying to get some more data from the page(since I need the other explanations and Turkish definitions as well). When I inspect the page, I see that full descriptions are placed in "di $ entry-body__el entry-body__el--smalltop clrd js-share-holder" class. I added "/turkish" to the URL and tried to get the related element using the class name I mentioned instead of ".def-block", but it didn't work. Then I tried a different way using this code:
Sub GetMeaningsFromCambridgeDictionary()
Dim ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Meanings")
Dim sourceWord As String
sourceWord = ws.Range("A2").Value
Dim i As Integer
Dim çeviri As String
Dim ilkSatir As Integer
ilkSatir = ws.Cells(ws.Rows.Count, "B").End(xlUp).Row + 1
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
Dim URL As String
Dim countElement As Integer
Range("B2:B1000").Delete
IE.Visible = False
URL = "https://dictionary.cambridge.org/dictionary/turkish/" & sourceWord
IE.Navigate URL
Do While IE.Busy: DoEvents: Loop
Application.Wait (Now + TimeValue("0:00:01"))
Do While IE.readyState <> 4
Application.Wait (Now + TimeValue("0:00:01"))
Loop
countElement = IE.document.getElementsByClassName("di $ entry-body__el entry-body__el--smalltop clrd js-share-holder").Length
For i = 0 To countElement - 1
çeviri = IE.document.getElementsByClassName("di $ entry-body__el entry-body__el--smalltop clrd js-share-holder")(i).innerText
Range("B" & i + 2).Value = çeviri
Range("B" & i + 2).Rows.AutoFit
Next i
Columns(2).AutoFit
IE.Quit
MsgBox "All meanings have been copied."
End Sub
This code is also working, and I see all the definitions in detail, but this time the problem is only the first word is done. What should I do to do the same thing for the other words?
Related
My script runs for few row and then i a getting object variable or with block variable not set error.
I am using the below script to extract the 5,6,7 value from the NSEIndia website.
I get the value of a stock from the same Excel and update the same excel with the values from the nseindia website.
Sub Stock_Basic_Update_NSE()
Dim ie As InternetExplorer
Dim webpage As HTMLDocument
Dim ws As Worksheet
For Item = 23 To 1505
Set ws = ThisWorkbook.Worksheets("NSE Stocks Details")
sSearch = ws.Range("A" & Item).Value
'sSearch = Filestk.Worksheets("Sheet1").Range("E1").Value
Set ie = New InternetExplorer
'ie.Visible = True
ie.navigate ("https://www.nseindia.com/get-quotes/equity?symbol=" & sSearch)
Do While ie.readyState = 4: DoEvents: Loop
Do Until ie.readyState = 4: DoEvents: Loop
While ie.Busy
DoEvents
Wend
Set webpage = ie.document
ws.Cells(Item, 3).Value = webpage.getElementsByClassName("eq-series table-fullwidth w-100")(0).getElementsByTagName("td")(5).innerText
ws.Cells(Item, 4).Value = webpage.getElementsByClassName("eq-series table-fullwidth w-100")(0).getElementsByTagName("td")(6).innerText
ws.Cells(Item, 5).Value = webpage.getElementsByClassName("eq-series table-fullwidth w-100")(0).getElementsByTagName("td")(7).innerText
ie.Quit
Set ie = Nothing
Next Item
End Sub
You had some errors in your code and you hadn't wait for the full document to load. Try the following code. I have commented it. So you can see, what I have changed and why. I have tried it with the top 50 symbols.
Sub Stock_Basic_Update_NSE()
'Declare always all variables
Dim ie As Object 'I switched this from early to late binding (not required)
Dim nodeTable As Object
Dim ws As Worksheet
Dim item As Long
Dim sSearch As String
'Use this outside the loop. You only need it once
Set ws = ThisWorkbook.Worksheets("NSE Stocks Details")
For item = 23 To 1505
sSearch = ws.Range("A" & item).Value
Set ie = CreateObject("internetexplorer.application")
ie.Visible = False
'Encode symbols that are restricted for using in URLs. Like &, : or ?
ie.navigate ("https://www.nseindia.com/get-quotes/equity?symbol=" & WorksheetFunction.EncodeURL(sSearch))
'It's not "While = 4" because 4 stands for "readystate = complete"
'If you want to use "= 4" you must use "Until" instead of "While"
'It doesn't matter what you use
Do While ie.readyState <> 4: DoEvents: Loop
'Manual break to load dynamic content after the IE reports the page load was complete
'This was your main problem
Application.Wait (Now + TimeSerial(0, 0, 2))
'The needed html table has an ID. If possible use always that instead of class names
'because an html ID is unique if the standard is kept
'Also use a variable to save the elements
'So you don't need to shorten the html document string in most cases because
'it's only needed one time
Set nodeTable = ie.document.getElementByID("equityInfo")
ws.Cells(item, 3).Value = nodeTable.getElementsByTagName("td")(5).innerText
ws.Cells(item, 4).Value = nodeTable.getElementsByTagName("td")(6).innerText
ws.Cells(item, 5).Value = nodeTable.getElementsByTagName("td")(7).innerText
'Clean up
ie.Quit
Set ie = Nothing
Next item
End Sub
I have 15 different URLs, and I need to fetch price from the particular website in Excel a particular column, can you please help me out. It's my first VBA program and I try but it show my syntax error.
Sub myfile()
Dim IE As New InternetExplorer Dim url As String Dim item As
HTMLHtmlElement Dim Doc As HTMLDocument Dim tagElements As Object
Dim element As Object Dim lastRow Application.ScreenUpdating =
False Application.DisplayAlerts = False Application.EnableEvents =
False Application.Calculation = xlCalculationManual url =
"https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
IE.navigate url IE.Visible = True Do DoEvents Loop Until
IE.readyState = READYSTATE_COMPLETE
Set Doc = IE.document
lastRow = Sheet1.UsedRange.Rows.Count + 1 Set tagElements =
Doc.all.tags("tr") For Each element In tagElements
If InStr(element.innerText, "ciq-price")> 0 And
InStr(element.className, "ciq-product-name") > 0 Then
Sheet1.Cells(lastRow, 1).Value = element.innerText
' Exit the for loop once you get the temperature to avoid unnecessary processing
Exit For End If Next
IE.Quit Set IE = Nothing Application.ScreenUpdating = True
Application.DisplayAlerts = True Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
End Sub
You can't copy any web scraping macro for your purposes. Every page has it's own HTML code structure. So you must write for every page an own web scraping macro.
I can't explain all about web scraping with VBA here. Please start your recherche for information with "excel vba web scraping" and "document object model". Further you need knowlege about HTML and CSS. In best case also about JavaScript:
The error message user-defined type not defined ocours because you use early binding without a reference to the libraries Microsoft HTML Object Library and Microsoft Internet Controls. You can read here how to set a reference via Tools -> References... and about the differences between early and late binding Early Binding v/s Late Binding and here deeper information from Microsoft Using early binding and late binding in Automation
To get the prices from the shown url you can use the following macro. I use late binding:
Option Explicit
Sub myfile()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim element As Object
Dim item As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
Set tagElements = IE.document.getElementsByClassName("ciq-online-offer-item ")
For Each element In tagElements
Set item = element.getElementsByTagName("td")(1)
ActiveSheet.Cells(lastRow, 1).Value = Trim(item.innerText)
lastRow = lastRow + 1
Next
IE.Quit
Set IE = Nothing
End Sub
Edit for a second Example:
The new link leads to an offer. I assume the price of the product is to be fetched. No loop is needed for this. You just have to find out in which HTML segment the price is and then you can decide how to get it. In the end there are only two lines of VBA that write the price into the Excel spreadsheet.
I'm in Germany and Excel has automatically set the currency sign from Dollar to Euro. This is of course wrong. Depending on where you are, this may have to be intercepted.
Sub myfile2()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://www.wayfair.com/kitchen-tabletop/pdx/cuisinart-air-fryer-toaster-oven-cui3490.html"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
'Break for 3 seconds
Application.Wait (Now + TimeSerial(0, 0, 3))
Set tagElements = IE.document.getElementsByClassName("BasePriceBlock BasePriceBlock--highlight")(0)
ActiveSheet.Cells(lastRow, 1).Value = Trim(tagElements.innerText)
IE.Quit
Set IE = Nothing
End Sub
I'm using IE11 with the HMTL Object Library and Internet Controls references activated.
There's no element ID on the button but am able to use ie.Document.getElementsByClassName by adding some html and xml declarations thanks to this post.
I'm taking a name and city, state from Excel and plugging it into the website then clicking the search button.
This is where my error occurs.
Run-time error '438': Object doesn't support this property or method.
HTML:
VBA:
Option Explicit
Sub HGScrape()
'Application.ScreenUpdating = False
'we define the essential variables
Dim ie As Object
Dim my_url As String
Dim SearchButton, NameBox, AddressBox
Dim ele As Object
Dim x As Integer
Dim y As Integer
Dim IsOdd As Integer
Dim html_doc As Object 'HTMLDocument
Dim xml_obj As Object 'MSXML2.DOMDocument
my_url = "https://www.healthgrades.com/"
'add the "Microsoft Internet Controls" reference in your VBA Project indirectly
Set ie = New InternetExplorer
ie.Visible = True 'False ''''''''''''''''''''''
ie.Navigate my_url '"13.33.74.92" '("https://www.healthgrades.com/")
While ie.ReadyState <> 4
DoEvents
Wend
Set NameBox = ie.Document.getElementById("search-term-selector-child")
NameBox.Value = ActiveSheet.Range("A2")
Set AddressBox = ie.Document.getElementById("search-location-selector-child")
AddressBox.Value = ActiveSheet.Range("B2")
Set html_doc = CreateObject("htmlfile")
Set xml_obj = CreateObject("MSXML2.XMLHTTP")
xml_obj.Open "GET", my_url, False
xml_obj.send
html_doc.body.innerHTML = xml_obj.responseText
Set SearchButton = ie.Document.getElementsByClassName("autosuggest_submiter") 'id of the button control (HTML Control)
SearchButton.Click
While ie.ReadyState <> 4
DoEvents
Wend
I condensed your code a bit. You really do not need to set every element to a variable. This just wastes resources in the long run.
Use the ClassName submiter__text to grab your submit button, and it's index of 0.
Sub HGScrape()
Const sURL As String = "https://www.healthgrades.com/"
Dim ie As New InternetExplorer
With ie
.Visible = True
.Navigate sURL
While .Busy Or .ReadyState < 4: DoEvents: Wend
.Document.getElementById("search-term-selector-child"). _
Value = ActiveSheet.Range("A2")
.Document.getElementById("search-location-selector-child"). _
Value = ActiveSheet.Range("B2")
.Document.getElementsByClassName("submiter__text")(0).Click
While .Busy Or .ReadyState < 4: DoEvents: Wend
End With
End Sub
"..Why was the "submitter_text" class the correct one?"
The best way to explain it is to show you. If you are unsure what selection to make, then right-click the element and choose "Inspect Element" and look around the highlighted line.
I am trying to extract all the hyperlinks which contains"http://www.bursamalaysia.com/market/listed-companies/company-announcements/" from the webpages I input.
Firstly, the code ran well but after then I am facing the problems which I could not extract the url link that I needed. It just missing every time i run the sub.
Link:http://www.bursamalaysia.com/market/listed-companies/company-announcements/#/?category=SH&sub_category=all&alphabetical=All
Sub scrapeHyperlinks()
Dim IE As InternetExplorer
Dim html As HTMLDocument
Dim ElementCol As Object
Dim Link As Object
Dim erow As Long
Application.ScreenUpdating = False
Set IE = New InternetExplorer
For u = 1 To 50
IE.Visible = False
IE.navigate Cells(u, 2).Value
Do While IE.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to websitehahaha"
DoEvents
Loop
Set html = IE.document
Set ElementCol = html.getElementsByTagName("a")
For Each Link In ElementCol
erow = Worksheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1).Value = Link
Cells(erow, 1).Columns.AutoFit
Next
Next u
ActiveSheet.Range("$A$1:$A$152184").AutoFilter Field:=1, Criteria1:="http://www.bursamalaysia.com/market/listed-companies/company-announcements/???????", Operator:=xlAnd
For k = 1 To [A65536].End(xlUp).Row
If Rows(k).Hidden = True Then
Rows(k).EntireRow.Delete
k = k - 1
End If
Next k
Set IE = Nothing
Application.StatusBar = ""
Application.ScreenUpdating = True
End Sub
Just to get the qualifying hrefs that you mention from the URL given I would use the following. It uses a CSS selector combination to target the URLs of interest from the specified page.
The CSS selector combination is
#bm_ajax_container [href^='/market/listed-companies/company-announcements/']
This is a descendant selector looking for elements with attribute href whose value starts with /market/listed-companies/company-announcements/, and having a parent element with id of bm_ajax_container. That parent element is the ajax container div. The "#" is an id selector and the "[] " indicates an attribute selector. The "^" means starts with.
Example of container div and first matching href:
As more than one element is to be matched the CSS selector combination is applied via querySelectorAll method. This returns a nodeList whose .Length can be traversed to access individual items by index.
The full set of qualifying links are written out to the worksheet.
Example CSS query results from page using selector (sample):
VBA:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
Application.ScreenUpdating = False
With IE
.Visible = True
.navigate "http://www.bursamalaysia.com/market/listed-companies/company-announcements/#/?category=SH&sub_category=all&alphabetical=All"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim links As Object, i As Long
Set links = .document.querySelectorAll("#bm_ajax_container [href^='/market/listed-companies/company-announcements/']")
For i = 0 To links.Length - 1
With ThisWorkbook.Worksheets("Sheet1")
.Cells(i + 1, 1) = links.item(i)
End With
Next i
.Quit
End With
Application.ScreenUpdating = True
End Sub
My goal is to scrape the source code of a web page.
The site seems to have different Frames which is why my code won't work properly.
I tried to modify a code which I found online which should solve the Frame issue.
The following code creates an error (object required) at:
Set profileFrame .document.getElementById("profileFrame")
Public Sub IE_Automation()
'Needs references to Microsoft Internet Controls and Microsoft HTML Object Library
Dim baseURL As String
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim profileFrame As HTMLIFrame
Dim slotsDiv As HTMLDivElement
'example URL with multiple frames
baseURL = "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"
Set IE = New InternetExplorer
With IE
.Visible = True
'Navigate to the main page
.navigate baseURL & "/publictrophy/index.htm?onlinename=ace_anubis"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
'Get the profileFrame iframe and navigate to it
Set profileFrame = .document.getElementById("profileFrame")
.navigate baseURL & profileFrame.src
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
Set HTMLdoc = .document
End With
'Display all the text in the profileFrame iframe
MsgBox HTMLdoc.body.innerText
'Display just the text in the slots_container div
Set slotsDiv = HTMLdoc.getElementById("slots_container")
MsgBox slotsDiv.innerText
End Sub
Hummmm, I'm not exactly sure what you are doing here, but can you try the code below?
Option Explicit
Sub Sample()
Dim ie As Object
Dim links As Variant, lnk As Variant
Dim rowcount As Long
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"
'Wait for site to fully load
'ie.Navigate2 URL
Do While ie.Busy = True
DoEvents
Loop
Set links = ie.document.getElementsByTagName("a")
rowcount = 1
With Sheets("Sheet1")
For Each lnk In links
'Debug.Print lnk.innerText
'If lnk.classname Like "*Real Statistics Examples Part 1*" Then
.Range("A" & rowcount) = lnk.innerText
rowcount = rowcount + 1
'Exit For
'End If
Next
End With
End Sub
General:
I think in your research you may have come across this question and misunderstood how it relates/doesn't relate to your circumstance.
I don't think iFrames are relevant to your query. If you are after the list of names, their details and the URLs to their pages you can use the code below.
CSS Selectors
To target the elements of interest I use the following two CSS selectors. These use style infomation on the page to target the elements:
.SearchResults-link
.SearchResults-item
"." means class, which is like saying .getElementsByClassName. The first gets the links, and the second gets the description information on the first page.
With respect to the first CSS selector: The actual link required is dynamically constructed, but we can use the fact that the actual profile URLs have a common base string of "https://www.xing.com/profile/", which is then followed by the profileName. So, in function GetURL, we parse the outerHTML returned by the CSS selector to get the profileName and concatenate it with the BASESTRING constant to get our actual profile link.
Code:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://www.xing.com/publicsearch/query?search%5Bq%5D=IT"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim a As Object, exitTime As Date, linksNodeList As Object, profileNodeList As Object
' exitTime = Now + TimeSerial(0, 0, 5) '<== uncomment this section if timing problems
'
' Do
' DoEvents
' On Error Resume Next
' Set linksNodeList = .document.querySelectorAll(".SearchResults-link")
' On Error GoTo 0
' If Now > exitTime Then Exit Do
' Loop While linksNodeList Is Nothing
Set linksNodeList = .document.querySelectorAll(".SearchResults-link") '<== comment this out if uncommented section above
Set profileNodeList = .document.querySelectorAll(".SearchResults-item")
Dim i As Long
For i = 0 To profileNodeList.Length - 1
Debug.Print "Profile link: " & GetURL(linksNodeList.item(i).outerHTML)
Debug.Print "Basic info: " & profileNodeList.item(i).innerText
Next i
End With
End Sub
Public Function GetURL(ByVal htmlSection As String) As String
Const BASESTRING As String = "https://www.xing.com/profile/"
Dim arr() As String
arr = Split(htmlSection, "/")
GetURL = BASESTRING & Replace$(Split((arr(UBound(arr) - 1)), ">")(0), Chr$(34), vbNullString)
End Function
Example return information: