Extract hyperlink from website using VBA facing error - html

I am trying to extract all the hyperlinks which contains"http://www.bursamalaysia.com/market/listed-companies/company-announcements/" from the webpages I input.
Firstly, the code ran well but after then I am facing the problems which I could not extract the url link that I needed. It just missing every time i run the sub.
Link:http://www.bursamalaysia.com/market/listed-companies/company-announcements/#/?category=SH&sub_category=all&alphabetical=All
Sub scrapeHyperlinks()
Dim IE As InternetExplorer
Dim html As HTMLDocument
Dim ElementCol As Object
Dim Link As Object
Dim erow As Long
Application.ScreenUpdating = False
Set IE = New InternetExplorer
For u = 1 To 50
IE.Visible = False
IE.navigate Cells(u, 2).Value
Do While IE.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to websitehahaha"
DoEvents
Loop
Set html = IE.document
Set ElementCol = html.getElementsByTagName("a")
For Each Link In ElementCol
erow = Worksheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1).Value = Link
Cells(erow, 1).Columns.AutoFit
Next
Next u
ActiveSheet.Range("$A$1:$A$152184").AutoFilter Field:=1, Criteria1:="http://www.bursamalaysia.com/market/listed-companies/company-announcements/???????", Operator:=xlAnd
For k = 1 To [A65536].End(xlUp).Row
If Rows(k).Hidden = True Then
Rows(k).EntireRow.Delete
k = k - 1
End If
Next k
Set IE = Nothing
Application.StatusBar = ""
Application.ScreenUpdating = True
End Sub

Just to get the qualifying hrefs that you mention from the URL given I would use the following. It uses a CSS selector combination to target the URLs of interest from the specified page.
The CSS selector combination is
#bm_ajax_container [href^='/market/listed-companies/company-announcements/']
This is a descendant selector looking for elements with attribute href whose value starts with /market/listed-companies/company-announcements/, and having a parent element with id of bm_ajax_container. That parent element is the ajax container div. The "#" is an id selector and the "[] " indicates an attribute selector. The "^" means starts with.
Example of container div and first matching href:
As more than one element is to be matched the CSS selector combination is applied via querySelectorAll method. This returns a nodeList whose .Length can be traversed to access individual items by index.
The full set of qualifying links are written out to the worksheet.
Example CSS query results from page using selector (sample):
VBA:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
Application.ScreenUpdating = False
With IE
.Visible = True
.navigate "http://www.bursamalaysia.com/market/listed-companies/company-announcements/#/?category=SH&sub_category=all&alphabetical=All"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim links As Object, i As Long
Set links = .document.querySelectorAll("#bm_ajax_container [href^='/market/listed-companies/company-announcements/']")
For i = 0 To links.Length - 1
With ThisWorkbook.Worksheets("Sheet1")
.Cells(i + 1, 1) = links.item(i)
End With
Next i
.Quit
End With
Application.ScreenUpdating = True
End Sub

Related

get specific data from database and print it to a cell

I'm trying to write a simple code for studying vocabulary and want this code to look up the words in column "A" using my favorite online dictionary "Cambridge" automatically and then print the definitions to the cells next to the words. I have written the code below so far and it goes to the site and searches the word. The question is what code is needed to get the definitions and print them to the cells?
Sub SearchWords()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLInput As MSHTML.IHTMLElement
Dim HTMLButtons As MSHTML.IHTMLElementCollection
Dim HTMLButton As MSHTML.IHTMLElement
IE.Visible = True
IE.Navigate "www.dictionary.cambridge.org"
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.Document
Set HTMLInput = HTMLDoc.getElementById("cdo-search-input")
HTMLInput.Value = ThisWorkbook.Sheets(1).Range("A1").Value
Set HTMLButtons = HTMLDoc.getElementsByClassName("cdo-search__button")
HTMLButtons(0).Click
End Sub
Thanks in advance.
The result appears to be in an element with classname entry. I read your column A search words in to an array and loop that to look up each word. The result is written back out to the sheet. I use css selectors mostly as a more flexible and faster method for selecting elements. css selectors, in this instance, are applied via querySelector method of HTMLDocument (i.e. ie.Document)
Proper page loads waits are used throughout.
Option Explicit
'entry
Public Sub SearchWords()
Dim IE As SHDocVw.InternetExplorer, lookups(), dataSheet As Worksheet, iRow As Long
Set dataSheet = ThisWorkbook.Worksheets("Sheet1")
Set IE = New SHDocVw.InternetExplorer
lookups = Application.Transpose(dataSheet.Range("A2:A3").Value) '<Read words to lookup into a 2d array and transpose into 1D
With IE
.Visible = True
.Navigate2 "www.dictionary.cambridge.org"
While .Busy Or .readyState <> 4: DoEvents: Wend
For iRow = LBound(lookups) To UBound(lookups)
.document.getElementById("cdo-search-input").Value = lookups(iRow) 'work off .document to avoid stale elements
.document.querySelector(".cdo-search__button").Click
While .Busy Or .readyState <> 4: DoEvents: Wend 'wait for page reload
Application.Wait Now + TimeSerial(0, 0, 1)
Do
Loop While .document.querySelectorAll(".entry").Length = 0
dataSheet.Cells(iRow + 1, 2) = .document.querySelector(".entry").innerText
Next
.Quit
End With
End Sub
Done! Perfectly working. (Since this post is too long for a comment, I had to post this as an answer) Now I am trying to get some more data from the page(since I need the other explanations and Turkish definitions as well). When I inspect the page, I see that full descriptions are placed in "di $ entry-body__el entry-body__el--smalltop clrd js-share-holder" class. I added "/turkish" to the URL and tried to get the related element using the class name I mentioned instead of ".def-block", but it didn't work. Then I tried a different way using this code:
Sub GetMeaningsFromCambridgeDictionary()
Dim ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Meanings")
Dim sourceWord As String
sourceWord = ws.Range("A2").Value
Dim i As Integer
Dim çeviri As String
Dim ilkSatir As Integer
ilkSatir = ws.Cells(ws.Rows.Count, "B").End(xlUp).Row + 1
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
Dim URL As String
Dim countElement As Integer
Range("B2:B1000").Delete
IE.Visible = False
URL = "https://dictionary.cambridge.org/dictionary/turkish/" & sourceWord
IE.Navigate URL
Do While IE.Busy: DoEvents: Loop
Application.Wait (Now + TimeValue("0:00:01"))
Do While IE.readyState <> 4
Application.Wait (Now + TimeValue("0:00:01"))
Loop
countElement = IE.document.getElementsByClassName("di $ entry-body__el entry-body__el--smalltop clrd js-share-holder").Length
For i = 0 To countElement - 1
çeviri = IE.document.getElementsByClassName("di $ entry-body__el entry-body__el--smalltop clrd js-share-holder")(i).innerText
Range("B" & i + 2).Value = çeviri
Range("B" & i + 2).Rows.AutoFit
Next i
Columns(2).AutoFit
IE.Quit
MsgBox "All meanings have been copied."
End Sub
This code is also working, and I see all the definitions in detail, but this time the problem is only the first word is done. What should I do to do the same thing for the other words?

using excel VBA to grab data from a web page that runs scrpits to show the table data

Day two of researching this. I'm just not getting it. The web page is public:
https://register.fca.org.uk/ShPo_FirmDetailsPage?id=001b000000MfF1EAAV
Manually I pgdn x 2 to get to the button [+] Individuals, click it then pgdn x 1 to get to the "results per page" drop down and change it to 500. then copy and paste the results into excel
this is the code that I found on this site "Selecting a dropdown list when inserting data from web (VBA)" answered by QHarr which I tried to adapt and failed miserably. I put "HELP" where I think I should be making the changes but I'm just guessing
Public Sub MakeSelectiongGetData()
Dim IE As New InternetExplorer
Const URL = "https://register.fca.org.uk/ShPo_FirmDetailsPage?id=001b000000Mfe5TAAR#ShPo_FirmDetailsPage"
'Const optionText As String = "RDVT11"
Application.ScreenUpdating = False
With IE
.Visible = True
.navigate URL
While .Busy Or .readyState < 4: DoEvents: Wend
Dim a As Object
Set a = .document.getElementById("HELP")
Dim currentOption As Object
For Each currentOption In a.getElementsByTagName("HELP")
If InStr(currentOption.innerText, optionText) > 0 Then
currentOption.Selected = "HELP"
Exit For
End If
Next currentOption
.document.getElementById("HELP").Click
While .Busy Or .readyState < 4: DoEvents: Wend
Dim nTable As HTMLTable
Do: On Error Resume Next: Set nTable = .document.getElementById("HELP"): On Error GoTo 0: DoEvents: Loop While nTable Is Nothing
Dim nRow As Object, nCell As Object, r As Long, c As Long
With ActiveSheet
Dim nBody As Object
Set nBody = nTable.getElementsByTagName("tbody")(0).getElementsByTagName("tr")
.Cells(1, 1) = nBody(0).innerText
For r = 2 To nBody.Length - 1
Set nRow = nBody(r)
For Each nCell In nRow.Cells
c = c + 1: .Cells(r + 1, c) = nCell.innerText
Next nCell
c = 0
Next r
End With
.Quit
End With
Application.ScreenUpdating = True
End Sub
So I have included your changes and am here.
Public Sub MakeSelections()
Dim IE As New InternetExplorer
With IE
.Visible = True
.Navigate2 "https://register.fca.org.uk/ShPo_FirmDetailsPage?id=001b000000MfF1EAAV"
While .Busy Or .readyState < 4: DoEvents: Wend
.document.querySelector("[href*=FirmIndiv]").Click '<==click the + for indiv
.document.querySelector("#IndividualSearchResults_length[value='500']").Selected = True
End With
Dim nTable As HTMLTable
Do: On Error Resume Next: Set nTable =IE.document.getElementById("IndividualSearchResults"): On Error GoTo 0: DoEvents: Loop While nTable Is Nothing
Dim nRow As Object, nCell As Object, r As Long, c As Long
With ActiveSheet
Dim nBody As Object
Set nBody = nTable.getElementsByTagName("Name")(0) _
.getElementsByTagName("ShG1_IRN_c") _
.getElementsByTagName("ShGl_IndividualStatus__c") _
.getElementsByTagName("ShPo_Registerstatus__c") _
.getElementsByTagName("Id") _
.getElementsByTagName("RecordTypeId") _
.getElementsByTagName("CurrencyIsoCode") _
.Cells(1, 1) = nBody(0).innerText
For r = 2 To nBody.Length - 1
Set nRow = nBody(r)
For Each nCell In nRow.Cells
c = c + 1: .Cells(r + 1, c) = nCell.innerText
Next nCell
c = 0
Next r
End With
End Sub
You can use css attribute = value selectors to target the + for individuals and also to make the option selection for 500
Option Explicit
'VBE > Tools > References:
' Microsoft Internet Controls
Public Sub MakeSelections()
Dim IE As New InternetExplorer
With IE
.Visible = True
.Navigate2 "https://register.fca.org.uk/ShPo_FirmDetailsPage?id=001b000000MfF1EAAV"
While .Busy Or .readyState < 4: DoEvents: Wend
.document.querySelector("[href*=FirmIndiv]").Click '<==click the + for indiv
.document.querySelector("#IndividualSearchResults_length [value='500']").Selected = True
Dim event_onchange As Object
Set event_onchange = .document.createEvent("HTMLEvents")
event_onchange.initEvent "change", True, False
.document.querySelector("[name=IndividualSearchResults_length]").dispatchEvent event_onchange
Application.Wait Now + TimeSerial(0, 0, 5)
Dim clipboard As Object, ws As Worksheet
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
Set ws = ThisWorkbook.Worksheets("Sheet1")
clipboard.SetText .document.querySelector("#IndividualSearchResults").outerHTML
clipboard.PutInClipboard
ws.Cells(1, 1).PasteSpecial
.Quit
End With
End Sub
This selector, [href*=FirmIndiv], is an attribute = value selector with contains (*) modifier. It looks for the matches for href attributes that contain the substring FirmIndiv in the href value. querySelector all method of HTMLDocument *(ie.Document) will return the first match found.
You can see the match here:
The selector for the option tag element (the parent select tag for result counts contains child option tag elements):
#IndividualSearchResults_length [value='500']
It uses an id (#) selector to target the div parent, of the parent select element, by its id value IndividualSearchResults_length, then uses a descendant combinator (" ") followed by attribute = value selector to specify the option element with value = 500.
You can see that here:
Selenium basic version:
Option Explicit
Public Sub MakeChanges()
'VBE > Tools > References > Selenium Type Library
'Download: https://github.com/florentbr/SeleniumBasic/releases/tag/v2.0.9.0
Dim d As WebDriver
Set d = New ChromeDriver
Const url = "https://register.fca.org.uk/ShPo_FirmDetailsPage?id=001b000000MfF1EAAV"
With d
.Start "Chrome"
.get url
.FindElementByCss("[href*=FirmIndiv]").Click
.FindElementByCss("[name=IndividualSearchResults_length]").WaitDisplayed True, 10000
.FindElementByCss("[name=IndividualSearchResults_length]").AsSelect.SelectByValue "500"
Stop '<==delete me later
.Quit
End With
End Sub

Excel VBA extracting href value

I have a macro that tries to extract all the href values from a page but it only seems to get the first one. If someone could help me out that would be greatly appreciated.
The URL I used is https://www.facebook.com/marketplace/vancouver/entertainment
Screenshot of HTML
<div class="_3-98" data-testid="marketplace_home_feed">
<div>
<div>
<div class="_65db">
<a class="_1oem" href="/marketplace/item/920841554781924" data-testid="marketplace_feed_item">
<a class="_1oem" href="/marketplace/item/580124349088759" data-testid="marketplace_feed_item">
<a class="_1oem" href="/marketplace/item/1060730340772072" data-testid="marketplace_feed_item">
Sub Macro1()
``marker = 0
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
For x = 0 To (IE_count - 1)
On Error Resume Next ' sometimes more web pages are counted than are open
my_url = objShell.Windows(x).document.Location
my_title = objShell.Windows(x).document.Title
If my_title Like "Facebook" & "*" Then 'compare to find if the desired web page is already open
Set ie = objShell.Windows(x)
marker = 1
Exit For
Else
End If
Next
Set my_data = ie.document.getElementsByClassName("_3-98")
Dim link
i = 1
For Each elem In my_data
Set link = elem.getElementsByTagName("a")(0)
i = i + 1
'copy the data to the excel sheet
ActiveSheet.Cells(i, 4).Value = link.href
Next
End Sub
You can use a CSS selector combination to get the elements. If you provide the actual HTML, not as an image it will be easier to test and determine best combination. The selector is applied via the querySelectorAll method to return a nodeList of all matching elements. You traverse the .Length of the nodeList to access items by index from 0 to .Length-1.
VBA:
Dim aNodeList As Object, i As Long
Set aNodeList = ie.document.querySelectorAll("._1oem[href]")
For i = 0 To aNodeList.Length-1
Activesheet.Cells(i + 2,4) = aNodeList.item(i)
Next
The css selector combination is ._1oem[href], which selects the href attributes of elements with a class of _1oem. The "." is a class selector and the [] an attribute selector. It is a fast and robust method.
The above assumes there are no parent form/frame/iframe tags to negotiate.
An alternative selector that matches on the two attributes, rather than the class would be:
html.querySelectorAll("[data-testid='marketplace_feed_item'][href]")
Full example:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://www.facebook.com/marketplace/vancouver/entertainment"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim aNodeList As Object, i As Long
Set aNodeList = IE.document.querySelectorAll("._1oem[href]")
For i = 0 To aNodeList.Length - 1
ActiveSheet.Cells(i + 2, 4) = aNodeList.item(i)
Next
'Quit '<== Remember to quit application
End With
End Sub
You only ask for the first anchor element within each element with a _3-98 class. Iterate through the collection of anchor elements within the parent element.
...
dim j as long
Set my_data = ie.document.getElementsByClassName("_65db")
For Each elem In my_data
for i = 0 to elem.getelementsbytagname("a").count -1
j = j+1
ActiveSheet.Cells(j, 4).Value = elem.getElementsByTagName("a")(i).href
next i
Next elem
...

VBA extracting only select info between <div> tags

I'm trying to check if the html tag:
<nobr>Target</nobr>
exists on the page, and if it does, search for the text between the html tag:
<div style='width: 555px; -ms-overflow-x: auto; -ms-overflow-y: hidden;> ... </div>
The text between the div tags look messy like:
ABC [HSA:
...
] [KO:
...
]
<br />
GHI-JK [JKI:
...
And I want to get and print to my spreadsheet however many items there are, but I only want the item name (in the above example, there're 2 items - ABC and GHI-JK).
Of course my code below doesn't work., I don't think I'm using queryselector correctly and I'm also not sure how to only grab the item names, instead of the entirety between the tags
If IE.document.querySelector("nobr").innerHTML = "Target" Then
If IE.document.querySelector("div[style^='width: 555px; -ms-overflow-x: auto; -ms-overflow-y: hidden;']") <> 0 Then
Cells(1, 15).Value = IE.document.querySelector("div[style^='width: 555px; -ms-overflow-x: auto; -ms-overflow-y: hidden;']").innerText
End If
End If
CSS selector:
You can use a CSS selector combination to target the element of interest.
The data is in a div, that is inside an element with class td51.
You can write a CSS selector combination to target this pattern of:
.td51 div
This says elements with div tag whose parent is td51 class. Where "." is a class selector.
The element space element pattern is known as a descendant combinator.
CSS query results:
This pattern matches multiple elements and you want the item as index 6.
As multiple items are retrieved you use the querySelectorAll to apply the CSS combinator and retrieve a nodeList you index into to get the item of interest.
As you only want part of the information retrieved you can use split to "slice" out the required info. Note that Kit is not Kit alone but is Kit (CD117).
XMLHTTPRequest XHR:
Option Explicit
Public Sub GetInfo()
Dim sResponse As String, i As Long, html As New HTMLDocument, arr() As String, ele As Object
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.kegg.jp/dbget-bin/www_bget?dr:D01441", False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
With html
.body.innerHTML = sResponse
On Error Resume Next
Set ele = .querySelectorAll(".td51 div")(6)
On Error GoTo 0
If ele Is Nothing Then Exit Sub
arr = Split(ele.innerText, Chr$(10))
End With
For i = LBound(arr) To UBound(arr)
Debug.Print Split(arr(i), "[")(0)
Next i
End Sub
References (VBE > Tools > References):
Microsoft HTML Object Library
Internet Explorer:
Option Explicit
Public Sub GetInfo()
Dim ie As New InternetExplorer, html As HTMLDocument, arr() As String, ele As Object, i As Long
With ie
.Visible = True
.navigate "https://www.kegg.jp/dbget-bin/www_bget?dr:D01441"
While .Busy Or .readyState < 4: DoEvents: Wend
Set html = .document
On Error Resume Next
Set ele = html.querySelectorAll(".td51 div")(6)
On Error GoTo 0
If ele Is Nothing Then Exit Sub
arr = Split(ele.innerText, Chr$(10))
For i = LBound(arr) To UBound(arr)
Debug.Print Split(arr(i), "[")(0)
Next i
'.Quit '<== Remember to quit application
End With
End Sub
References:
Microsoft Internet Controls
Microsoft HTML Object Library
EDIT:
This has become rather long but following our debugging to merge with your other code:
Option Explicit
Public Sub ht()
Dim ie As Object, ele As Object, i As Long
Dim sourceSheet As Worksheet, lastRow As Long, rawString() As String, rowIndex As Long
Dim arrayOfValues() As Variant, html As HTMLDocument, arr() As String
Const URL As String = "https://www.genome.jp/kegg/drug/"
Set sourceSheet = Worksheets("Sheet1")
lastRow = sourceSheet.Range("A30000").End(xlUp).Row
arrayOfValues = sourceSheet.Range("A1:A" & lastRow)
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
For rowIndex = 1 To lastRow
.navigate URL
Do While .readyState <> 4 Or .Busy: DoEvents: Loop
rawString = VBA.Strings.Split(VBA.Strings.LCase$(arrayOfValues(rowIndex, 1)), ": ", -1, vbBinaryCompare)
'MsgBox rawString(1)
.document.querySelector("input[name=q]").Value = rawString(1)
.document.querySelector("input[value=Go]").Click
Do While .readyState <> 4 Or .Busy: DoEvents: Loop
Dim ele2 As Object
On Error Resume Next
Set ele2 = .document.querySelector("a[href^='/dbget-bin/www_bget?dr:']")
On Error GoTo 0
If ele2 Is Nothing Then GoTo NextLink
ele2.Click
Do While .readyState <> 4 Or .Busy: DoEvents: Loop
Set html = .document
On Error Resume Next
Set ele = html.querySelectorAll(".td51 div")(6)
On Error GoTo 0
If Not ele Is Nothing Then
arr = Split(ele.innerText, Chr$(10))
For i = LBound(arr) To UBound(arr)
Debug.Print Split(arr(i), "[")(0)
Next i
End If
NextLink:
Next rowIndex
.Quit
End With
End Sub

Web Query where there are multiple Frames

My goal is to scrape the source code of a web page.
The site seems to have different Frames which is why my code won't work properly.
I tried to modify a code which I found online which should solve the Frame issue.
The following code creates an error (object required) at:
Set profileFrame .document.getElementById("profileFrame")
Public Sub IE_Automation()
'Needs references to Microsoft Internet Controls and Microsoft HTML Object Library
Dim baseURL As String
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim profileFrame As HTMLIFrame
Dim slotsDiv As HTMLDivElement
'example URL with multiple frames
baseURL = "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"
Set IE = New InternetExplorer
With IE
.Visible = True
'Navigate to the main page
.navigate baseURL & "/publictrophy/index.htm?onlinename=ace_anubis"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
'Get the profileFrame iframe and navigate to it
Set profileFrame = .document.getElementById("profileFrame")
.navigate baseURL & profileFrame.src
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
Set HTMLdoc = .document
End With
'Display all the text in the profileFrame iframe
MsgBox HTMLdoc.body.innerText
'Display just the text in the slots_container div
Set slotsDiv = HTMLdoc.getElementById("slots_container")
MsgBox slotsDiv.innerText
End Sub
Hummmm, I'm not exactly sure what you are doing here, but can you try the code below?
Option Explicit
Sub Sample()
Dim ie As Object
Dim links As Variant, lnk As Variant
Dim rowcount As Long
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"
'Wait for site to fully load
'ie.Navigate2 URL
Do While ie.Busy = True
DoEvents
Loop
Set links = ie.document.getElementsByTagName("a")
rowcount = 1
With Sheets("Sheet1")
For Each lnk In links
'Debug.Print lnk.innerText
'If lnk.classname Like "*Real Statistics Examples Part 1*" Then
.Range("A" & rowcount) = lnk.innerText
rowcount = rowcount + 1
'Exit For
'End If
Next
End With
End Sub
General:
I think in your research you may have come across this question and misunderstood how it relates/doesn't relate to your circumstance.
I don't think iFrames are relevant to your query. If you are after the list of names, their details and the URLs to their pages you can use the code below.
CSS Selectors
To target the elements of interest I use the following two CSS selectors. These use style infomation on the page to target the elements:
.SearchResults-link
.SearchResults-item
"." means class, which is like saying .getElementsByClassName. The first gets the links, and the second gets the description information on the first page.
With respect to the first CSS selector: The actual link required is dynamically constructed, but we can use the fact that the actual profile URLs have a common base string of "https://www.xing.com/profile/", which is then followed by the profileName. So, in function GetURL, we parse the outerHTML returned by the CSS selector to get the profileName and concatenate it with the BASESTRING constant to get our actual profile link.
Code:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://www.xing.com/publicsearch/query?search%5Bq%5D=IT"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim a As Object, exitTime As Date, linksNodeList As Object, profileNodeList As Object
' exitTime = Now + TimeSerial(0, 0, 5) '<== uncomment this section if timing problems
'
' Do
' DoEvents
' On Error Resume Next
' Set linksNodeList = .document.querySelectorAll(".SearchResults-link")
' On Error GoTo 0
' If Now > exitTime Then Exit Do
' Loop While linksNodeList Is Nothing
Set linksNodeList = .document.querySelectorAll(".SearchResults-link") '<== comment this out if uncommented section above
Set profileNodeList = .document.querySelectorAll(".SearchResults-item")
Dim i As Long
For i = 0 To profileNodeList.Length - 1
Debug.Print "Profile link: " & GetURL(linksNodeList.item(i).outerHTML)
Debug.Print "Basic info: " & profileNodeList.item(i).innerText
Next i
End With
End Sub
Public Function GetURL(ByVal htmlSection As String) As String
Const BASESTRING As String = "https://www.xing.com/profile/"
Dim arr() As String
arr = Split(htmlSection, "/")
GetURL = BASESTRING & Replace$(Split((arr(UBound(arr) - 1)), ">")(0), Chr$(34), vbNullString)
End Function
Example return information: