Web-Scraping for Table Formate - html

I am trying to copy the data from TD and TR format. I am new in Data scraping so I did not find any results in google search.
I want to copy All Unit Size (example 5*5) in 'A' column, Regular
Price in 'B', Cash Price 'D', First 3 months 50% Off(offers) in 'F' and Reserve in 'G' column.
http://westgateselfstorage.com/index.php?page=estimator
Frankly saying, I'm not able to frame the code so it would be really helpful to me if someone helped me to build the code.

Clipboard
Do you care about layout? You can copy direct via clipboard and have same layout as page
Option Explicit
Public Sub GetTable()
Dim html As HTMLDocument, clipboard As Object, ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://westgateselfstorage.com/index.php?page=estimator", False
.send
html.body.innerHTML = .responseText
End With
clipboard.SetText html.querySelector("#hiderow").outerHTML
clipboard.PutInClipboard
ws.Range("A1").PasteSpecial
End Sub
QuerySelector and surrogate:
If you do care more then we can use our usual method of finding the rows and looping, using a surrogate HTMLDocument variable to house html so we can leverage querySelector at a more granular level given we can't chain. The only difference here being that if we chuck html from tr level into HTMLDocument we need to add the table tags onto the html to ensure querySelectorAll can pick up the tds within the row i.e. the columns.
Now, I haven't split out old price from reduced price in the Cash Price column. If you need that just let me know. For now, I have left both.
Option Explicit
Public Sub GetTable()
Dim html As HTMLDocument, html2 As HTMLDocument, ws As Worksheet, i As Long
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument
Set html2 = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://westgateselfstorage.com/index.php?page=estimator", False
.send
html.body.innerHTML = .responseText
End With
html.body.innerHTML = html.querySelector("#hiderow").outerHTML
Dim headers(), rows As Object, results(), columns As Object
headers = Array("Size", "Reg price", vbNullString, "Cash price", vbNullString, "Offers", "Reserve")
'grab the rows
Set rows = html.querySelectorAll("tr")
ReDim results(1 To rows.Length, 1 To UBound(headers) + 1)
For i = 1 To rows.Length - 1 'skip headers row
html2.body.innerHTML = "<table>" & rows.item(i).outerHTML & "</table>"
Set columns = html2.querySelectorAll("td")
results(i, 1) = columns.item(0).innerText
results(i, 2) = columns.item(3).innerText
results(i, 4) = columns.item(4).innerText
results(i, 6) = columns.item(5).innerText
results(i, 7) = "Reserve this unit"
Next
ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub
getElement(s)Bymethod and chaining:
We could also be more traditional and loop tr/td within a table as with getElement(s)Bymethod allows us chaining to some extent (though I store in variables to make it easy to compare with the above)
Note:
rows(i).getElementsByTagName("td")
is basically chaining onto an individual element (row) within
Set rows = hTable.getElementsByTagName("tr")
e.g.
hTable.getElementsByTagName("tr")(0).getElementsByTagName("td")
All columns in first row via chaining.
VBA:
Option Explicit
Public Sub GetTable()
Dim html As HTMLDocument, hTable As HTMLTable, ws As Worksheet, i As Long
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://westgateselfstorage.com/index.php?page=estimator", False
.send
html.body.innerHTML = .responseText
End With
Set hTable = html.getElementById("hiderow")
Dim headers(), rows As Object, results(), columns As Object
headers = Array("Size", "Reg price", vbNullString, "Cash price", vbNullString, "Offers", "Reserve")
'grab the rows
Set rows = hTable.getElementsByTagName("tr")
ReDim results(1 To rows.Length, 1 To UBound(headers) + 1)
For i = 1 To rows.Length - 1 'skip headers row
Set columns = rows(i).getElementsByTagName("td")
results(i, 1) = columns(0).innerText
results(i, 2) = columns(3).innerText
results(i, 4) = columns(4).innerText
results(i, 6) = columns(5).innerText
results(i, 7) = "Reserve this unit"
Next
ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub

You can use the python library pandas for this:
import pandas as pd
tabs = pd.read_html('http://westgateselfstorage.com/index.php?page=estimator', header =0)
tabs[0].drop(tabs[0].columns[2], axis=1)
You may also want to drop column G since all rows there are identical.

Related

How to extract data from HTML divs into Excel

I am trying to extract the details in this webpage and they seem to be under certain "divs" with "selection-left" and "selection-right" right. I haven't found a way to successfully pull it yet.
This is the URL - https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/
And here is an image of what I want to extract. I want to copy the competition name and each participant and score.
I have tried using QHar's approach in this link - How to extract values from nested divs using VBA. But I'm getting errors along this line -
ReDim results(1 To countries.Length / 2, 1 To 4)
Here is the code I've been trying to make work
Option Explicit
Public Sub GetData()
Dim html As HTMLDocument, ws As Worksheet, countries As Object, scores As Object, results(), i As
Long, r As Long
Set ws = ThisWorkbook.Worksheets("Sheet1"): Set html = New HTMLDocument: r = 1
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/", False
.send
html.body.innerHTML = .responseText
End With
Set participant = html.querySelectorAll(".market-content .selection-left"): Set scores = html.querySelectorAll("..market-content .selection-right")
ReDim results(1 To countries.Length / 2, 1 To 4)
For i = 0 To participant.Length - 1 Step 2
results(r, 1) = participant.item(i).innerText: results(r, 2) = "'" & scores.item(i).innerText
r = r + 1
Next
ws.Cells(1, 1).Resize(1, 4) = Array("Competition", "Participant", "Score")
ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub
I will need help to make this code work
Content is dynamically added so will not be present in your current request format; hence your error as you have a nodeList of Length 0. You could try making POST requests as the page does but it doesn't look like a quick and easy bit of coding. I would go with browser automation, if this is a small project, so that js can run on the page and you can click the show more button. You will need a wait condition for the page to have properly loaded. I use the presence of the show more button.
Option Explicit
Public Sub GetOddsIE()
Dim d As InternetExplorer, odds As Object, names As Object, i As Long
Dim ws As Worksheet, results(), competition As String
Set d = New InternetExplorer
Set ws = ThisWorkbook.Worksheets("Sheet1")
Const URL = "https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/"
With d
.Visible = False
.Navigate2 URL
While .Busy Or .ReadyState <> 4: DoEvents: Wend
With .Document.getElementsByClassName("expandable-below-container-button")
Do
DoEvents
Loop While .Length = 0 'wait for element to be present
.Item(0).Click 'click on show more
End With
Set names = .Document.getElementsByClassName("selection-left-selection-name")
Set odds = .Document.getElementsByClassName("odds-convert")
competition = .Document.getElementsByClassName("league")(0).innerText
ReDim results(1 To names.Length, 1 To 3)
For i = 0 To names.Length - 1
results(i + 1, 1) = competition
results(i + 1, 2) = names.Item(i).innerText
results(i + 1, 3) = "'" & odds.Item(i).innerText
Next
.Quit
End With
ws.Cells(1, 1).Resize(1, 3) = Array("Competition", "Participant", "Score")
ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub

Inner Loop design for webscraping

I want to import restaurant data like Restaurant name, phone number, website & address to excel but unfortunately, I am getting sponsored results & also not getting website & full address as it in on the inner page when we click on the hotel name. I with some help at platforms have created a code using but it is not helping out. Please rectify the issue in my code. Website:https://www.yelp.com/searchcflt=restaurants&find_loc=San%20Francisco%2C%20CA&start=
Here is my code:
Sub GetInfo()
Const URL$ = "https://www.yelp.com/search?cflt=restaurants&find_loc=San%20Francisco%2C%20CA&start="
Dim Http As New XMLHTTP60, Html As New HTMLDocument, Htmldoc As New HTMLDocument, page&, I&
For page = 0 To 1 ' this is where you change the last number for the pages to traverse
With Http
.Open "GET", URL & page * 30, False
.send
Html.body.innerHTML = .responseText
End With
With Html.querySelectorAll("[class*='searchResult']")
For I = 0 To .Length - 1
Htmldoc.body.innerHTML = .Item(I).outerHTML
On Error Resume Next
r = r + 1: Cells(r, 1) = Htmldoc.querySelector("[class*='heading--h3'] > a").innerText
Cells(r, 2) = Htmldoc.querySelector("[class*='container'] > [class*='display--inline-block']").innerText
' Cells(r, 3) = Htmldoc.querySelector("[class*='container'] > address").innerText
'Cells(r, 4) = Htmldoc.querySelector("[class*='container'] > address").NextSibling.innerText
'Inner loop creation
Cells(r, 5) = Htmldoc.querySelector("[class*='container'] > website").href ' Extract from window after clicking on hotel name
Cells(r, 6) = Htmldoc.querySelector("[class*='container'] > fulladdress").innerText ' Extract from window after clicking on hotel name
On Error GoTo 0
Next I
End With
Next page
End Sub
You can use the free API to get the top 50 from the business_search endpoint. Pass sort parameter in query string to get top rated.
Use a json parser, such as jsonconverter.bas to handle the response. After installing the code from that link in a standard module called JsonConverter, go to VBE > Tools > References > Add a reference to Microsoft Scripting Runtime.
API instructions are here. You need to set up a test app, which requires some basic user info, and verify your email. You will then receive an API key for authentication which is passed in the authorization header as shown below.
There is other info returned which you can parse if wanted.
Option Explicit
Public Sub GetTopRestuarants()
Dim json As Object, headers(), r As Long, c As Long
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://api.yelp.com/v3/businesses/search?term=restuarant&location=san-francisco&limit=50&sort_by=rating", False
.setRequestHeader "Authorization", "Bearer yourAPIkey"
.send
Set json = JsonConverter.ParseJson(.responseText)("businesses")
headers = Array("Restaurant name", "phone", "website", "address")
Dim results(), item As Object
ReDim results(1 To json.Count, 1 To UBound(headers) + 1)
For Each item In json
r = r + 1
results(r, 1) = item("name")
results(r, 2) = item("phone")
results(r, 3) = item("url")
Dim subItem As Variant, address As String
address = vbNullString
For Each subItem In item("location")("display_address")
address = address & Chr$(32) & subItem
Next
results(r, 4) = Trim$(address)
Next
End With
With ActiveSheet
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Example top 20 of 50 returned:
Caveat emptor
Note that specifying the sort_by is a suggestion (not strictly enforced) to Yelp's search, which considers multiple input parameters to return the most relevant results. For example, the rating sort is not strictly sorted by the rating value, but by an adjusted rating value that takes into account the number of ratings, similar to a Bayesian average. This is to prevent skewing results to businesses with a single review.
This is one of the ways which will let you parse the results from it's inner pages. I can't access that webpage anymore to assist you further. However, give it a shot. I suppose it will work:
Sub GetInfo()
Const URL$ = "https://www.yelp.com/search?cflt=restaurants&find_loc=San%20Francisco%2C%20CA&start="
Const base$ = "https://www.yelp.com"
Dim Http As New XMLHTTP60, Html As New HTMLDocument
Dim oTitle$, oPhone As Object, Htmldoc As New HTMLDocument
Dim R&, newUrl$, I&, oWeb As Object, page&, oAddress As Object
[A1:D1] = [{"Name","Phone","Address","Website"}]
For page = 1 To 3 'this is where you change the last number for this script to traverse
With Http
.Open "GET", URL & page * 30, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
Html.body.innerHTML = .responseText
End With
With Html.querySelectorAll("[class*='searchResult'] [class*='heading--h3'] > a")
For I = 0 To .Length - 1
If Not InStr(.item(I).getAttribute("href"), "/adredir?") > 0 Then
oTitle = .item(I).innerText
newUrl = Replace(.item(I).getAttribute("href"), "about:", base)
With Http
.Open "GET", newUrl, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
Htmldoc.body.innerHTML = .responseText
End With
R = R + 1: Cells(R + 1, 1) = oTitle
Set oPhone = Htmldoc.querySelector(".biz-phone")
If Not oPhone Is Nothing Then
Cells(R + 1, 2) = oPhone.innerText
End If
Set oAddress = Htmldoc.querySelector(".map-box-address")
If Not oAddress Is Nothing Then
Cells(R + 1, 3) = WorksheetFunction.Clean(oAddress.innerText)
End If
Set oWeb = Htmldoc.querySelector(".biz-website > a")
If Not oWeb Is Nothing Then
Cells(R + 1, 4) = oWeb.innerText
End If
End If
Next I
End With
Next page
End Sub
Btw, the ads have been kicked out.

Loop the header array with elements

I am looking for the code which can be loop the header array with the class names but it must not include the the tag name or id. This is just to ensure if any class does not exist then the corresponding cell should be left blank and the next element should be copied.
I tried to add the header array like
headers = Array("size", "features", "promo", "in store", "web")
But it needs to be loop with the tag name which I don't want.
also want promo (Class name is "promo_offers") '1st Month Free!' in row 2, the problem is this promo is given for specific cells only - hence the data is misleading and I am getting promo in 1st 4 cells and then getting error.
However, I want to copy promo for only those units where promo information is given else the cell should be blank or any other value needs to be set. Below is the code...
Please suggest how to frame the code.
Sub GetClassNames()
Dim html As HTMLDocument
Dim objIE As Object
Dim element As IHTMLElement
Dim ie As InternetExplorer
Dim elements As IHTMLElementCollection
Dim result As String 'string variable that will hold our result link
Dim count As Long
Dim erow As Long
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://www.allstorageonline.com/storage-units/texas/amarillo/all-storage-hardy-115423/#utm_source=GoogleLocal&utm_medium=WWLocal&utm_campaign=115423"
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
count = 0
Set html = objIE.document
Set elements = html.getElementsByClassName("unit_size medium")
For Each element In elements
If element.className = "unit_size medium" Then
erow = Sheet2.Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1) = html.getElementsByClassName("unit_size medium")(count).innerText
Cells(erow, 2) = html.getElementsByClassName("promo_offers")(count).innerText
count = count + 1
End If
Next element
End Sub
For anything i.e promo is null then the corresponding cell should be left blank the next element should be copied
You can get all that info using xmlhttp.
I grab all the li elements for the boxes and loop those putting the html of each li into a new HTMLDocument. I use querySelector method of that object to get all the other items within each row using css selectors. I wrap selection in On Error Resume Next On Error GoTo 0 to mask errors for when attempting to access elements not present e.g. some rows do not have a promo. Those entries then get left blank as requested.
Option Explicit
Public Sub GetInfo()
Dim ws As Worksheet, html As HTMLDocument, s As String
Const URL As String = "https://www.allstorageonline.com/storage-units/texas/amarillo/all-storage-hardy-115423"
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
s = .responseText
html.body.innerHTML = s
Dim headers(), results(), listings As Object
headers = Array("Size", "Features", "Promo", "In store", "Web")
Set listings = html.querySelectorAll(".li_unit_listing")
Dim rowCount As Long, numColumns As Long, r As Long, c As Long, item As Long
rowCount = listings.Length
numColumns = UBound(headers) + 1
ReDim results(1 To rowCount, 1 To numColumns)
Dim html2 As HTMLDocument
Set html2 = New HTMLDocument
For item = 0 To listings.Length - 1
r = r + 1
html2.body.innerHTML = listings.item(item).innerHTML
On Error Resume Next
results(r, 1) = Trim$(html2.querySelector(".unit_size").innerText)
results(r, 2) = Trim$(html2.querySelector(".features").innerText)
results(r, 3) = Trim$(html2.querySelector(".promo_offers").innerText)
results(r, 4) = html2.querySelector(".board_rate").innerText
results(r, 5) = html2.querySelector("[itemprop=price]").getAttribute("content")
On Error GoTo 0
Next
ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Output:

Data Scraping Elements By ClassName

I am trying to pull data from a web-site, I want to copy the '10' x 5'unit (class name is "unit_size medium") ' in row 1 for which I am able to copy data successfully but I also want promo (Class name is "promo_offers") '1st Month Free!' in row 2, the problem is this promo is given for specific cells only. hence the data is misleading and I am getting promo in 1st 4 cells and then getting error. However, I want to copy promo for only those units where promo information is given else the cell should be blank or any other value needs to be set. Below is the code...
Please suggest how to frame the code.
Sub GetClassNames()
Dim html As HTMLDocument
Dim objIE As Object
Dim element As IHTMLElement
Dim ie As InternetExplorer
Dim elements As IHTMLElementCollection
Dim result As String 'string variable that will hold our result link
Dim count As Long
Dim erow As Long
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://www.allstorageonline.com/storage-units/texas/amarillo/all-storage-hardy-115423/#utm_source=GoogleLocal&utm_medium=WWLocal&utm_campaign=115423"
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
count = 0
Set html = objIE.document
Set elements = html.getElementsByClassName("unit_size medium")
For Each element In elements
If element.className = "unit_size medium" Then
erow = Sheet2.Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1) = html.getElementsByClassName("unit_size medium")(count).innerText
Cells(erow, 2) = html.getElementsByClassName("promo_offers")(count).innerText
count = count + 1
End If
Next element
End Sub
I would simply wrap in an On Error Resume Next when attempting to access the element. Have a place already reserved for it in an output array so if not present the place remains empty.
Option Explicit
'VBE > Tools > References:
' Microsoft Internet Controls
Public Sub GetData()
Dim ie As New InternetExplorer, ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ie
.Visible = True
.Navigate2 "https://www.allstorageonline.com/storage-units/texas/amarillo/all-storage-hardy-115423/#utm_source=GoogleLocal&utm_medium=WWLocal&utm_campaign=115423"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim listings As Object, listing As Object, headers(), results(), r As Long, c As Long
headers = Array("size", "features", "promo", "in store", "web")
Set listings = .document.getElementById("small_units_accordion_panel").getElementsByTagName("li")
'.unit_size medium, .features, .promo_offers, .board_rate_wrapper p, .board_rate
ReDim results(1 To listings.Length, 1 To UBound(headers) + 1)
For Each listing In listings
r = r + 1
On Error Resume Next
results(r, 1) = listing.getElementsByClassName("unit_size medium")(0).innerText
results(r, 2) = listing.getElementsByClassName("features")(0).innerText
results(r, 3) = listing.getElementsByClassName("promo_offers")(0).innerText
results(r, 4) = listing.getElementsByClassName("board_rate")(0).innerText
results(r, 5) = listing.getElementsByClassName("price")(0).innerText
On Error GoTo 0
Next
ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
.Quit
End With
End Sub
All boxes:
Option Explicit
'VBE > Tools > References:
' Microsoft Internet Controls
Public Sub GetData()
Dim ie As New InternetExplorer, ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ie
.Visible = True
.Navigate2 "https://www.allstorageonline.com/storage-units/texas/amarillo/all-storage-hardy-115423/#utm_source=GoogleLocal&utm_medium=WWLocal&utm_campaign=115423"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim listings As Object, listing As Object, headers(), results()
Dim r As Long, list As Object, item As Object
headers = Array("size", "features", "promo", "in store", "web")
Set list = .document.getElementsByClassName("main_unit")
'.unit_size medium, .features, .promo_offers, .board_rate_wrapper p, .board_rate
Dim rowCount As Long
rowCount = .document.querySelectorAll(".main_unit li").Length
ReDim results(1 To rowCount, 1 To UBound(headers) + 1)
For Each listing In list
For Each item In listing.getElementsByTagName("li")
r = r + 1
On Error Resume Next
results(r, 1) = item.getElementsByClassName("unit_size medium")(0).innerText
results(r, 2) = item.getElementsByClassName("features")(0).innerText
results(r, 3) = item.getElementsByClassName("promo_offers")(0).innerText
results(r, 4) = item.getElementsByClassName("board_rate")(0).innerText
results(r, 5) = item.getElementsByClassName("price")(0).innerText
On Error GoTo 0
Next
Next
ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
.Quit
End With
End Sub

How do I clean up objects in Excel vba?

Public Sub D_Galoplar()
Application.ScreenUpdating = False
Dim Asay(1 To 250)
Dim Jsay(1 To 100)
For q = 2 To Sheets("Y").Columns("A:A").Find(What:="boş").Row - 1
Asay(q - 1) = Sheets("Y").Range("A" & q)
Next q
For q = 2 To Sheets("Y").Columns("C:C").Find(What:="boş").Row - 1
Jsay(q - 1) = Sheets("Y").Range("C" & q)
Next q
For w = 1 To 250
Cells.Delete Shift:=xlUp
Range("A1").Select
If Asay(w) < 1 Then Exit For
Dim elem As Object, trow As Object
Dim R&, C&, s$
With New XMLHTTP60
.Open "POST", "https://yenibeygir.com/at/getatdetaytab", False
.setRequestHeader "content-type", "application/x-www-form-urlencoded; charset=UTF-8"
.send "tab=galopTab&id=" & Asay(w)
s = .responseText
End With
With New HTMLDocument
.body.innerHTML = s
For Each elem In .getElementsByClassName("at_Galoplar")(0).Rows
For Each trow In elem.Cells
C = C + 1: Cells(R + 1, C) = trow.innerText
Next trow
C = 0: R = R + 1
Next elem
End With
Cells.UnMerge
Range("A1").Select
If Range("A1048576").End(xlUp).Row < 2 Then GoTo ATLA2
Columns("A:A").Insert
For i = 2 To Range("B1048576").End(xlUp).Row - 1
Range("A" & i) = Asay(w)
Next i
Range("O2").FormulaR1C1 = "=IF(ISBLANK(RC[-9]),""-"",IF(ISNUMBER(RC[-9]),RC[-9]/4,((LEFT(RC[-9],1)*6000)+(MID(RC[-9],3,2)*100)+(RIGHT(RC[-9],1)*10))/400))"
Range("P2").FormulaR1C1 = "=IF(ISBLANK(RC[-9]),""-"",IF(ISNUMBER(RC[-9]),RC[-9]/6,((LEFT(RC[-9],1)*6000)+(MID(RC[-9],3,2)*100)+(RIGHT(RC[-9],1)*10))/600))"
Range("Q2").FormulaR1C1 = "=IF(ISBLANK(RC[-9]),""-"",IF(ISNUMBER(RC[-9]),RC[-9]/8,((LEFT(RC[-9],1)*6000)+(MID(RC[-9],3,2)*100)+(RIGHT(RC[-9],1)*10))/800))"
Range("R2").FormulaR1C1 = "=IF(ISBLANK(RC[-9]),""-"",IF(ISNUMBER(RC[-9]),RC[-9]/10,((LEFT(RC[-9],1)*6000)+(MID(RC[-9],3,2)*100)+(RIGHT(RC[-9],1)*10))/1000))"
Range("S2").FormulaR1C1 = "=IF(ISBLANK(RC[-9]),""-"",IF(ISNUMBER(RC[-9]),RC[-9]/12,((LEFT(RC[-9],1)*6000)+(MID(RC[-9],3,2)*100)+(RIGHT(RC[-9],1)*10))/1200))"
Range("T2").FormulaR1C1 = "=IF(ISBLANK(RC[-9]),""-"",IF(ISNUMBER(RC[-9]),RC[-9]/14,((LEFT(RC[-9],1)*6000)+(MID(RC[-9],3,2)*100)+(RIGHT(RC[-9],1)*10))/1400))"
Range("O2:T2").Copy
Range("O2:O" & Range("A1048576").End(xlUp).Row).PasteSpecial Paste:=xlPasteFormulas
Cells.Copy
Cells.PasteSpecial Paste:=xlPasteValues
Columns("O:T").Cut Columns("F:K")
Range("A2:N" & Range("A1048576").End(xlUp).Row).Copy
Sheets("Galop").Range("A" & Sheets("Galop").Range("A1048576").End(xlUp).Row + 1).PasteSpecial Paste:=xlPasteValues
ATLA2:
Cells.Delete Shift:=xlUp
Next w
End Sub
I want to get a lot of data with the For Next cycle, but after a while the page hangs. How can I reset objects at the end of each cycle?
Asay numbers
10182
10221
10279
10303
10316
10325
10360
10370
10680
11598
11629
11715
11745
12335
12385
12533
12559
13154
13393
13635
13641
13669
13673
14027
14057
14062
14228
14619
14674
14687
14743
14770
14778
15197
15217
15323
15382
15507
15775
15828
16077
16335
16510
17149
17513
17867
18532
37964
60176
66067
66255
66581
66582
66896
66998
67056
67309
67356
67379
67473
68008
68012
68162
68298
68312
68320
68332
68333
68353
68383
68545
68702
68775
68922
69445
69606
69817
69963
69968
69985
69986
70048
70202
71372
(boş)
Slowing down maybe due to throttling of network if you are trying to hit the site too many times in quick succession. This is particularly likely given your access method. Better would be to see if an API is available to bulk access info. You are likely going through many networks to get to this page as well. It may be possible to get some basic info about delays from TRACERT command from a command prompt.
You are doing a POST so remember there is a fair amount of server side stuff going on as well.
You don't need to set elem to Nothing as it only exists during your For Loop. Same for tRow.
Putting .getElementsByClassName("at_Galoplar")(0).Rows into a variable will provided faster referencing.
Write the results to an array first and then dump the array out to the sheet in one go will provide significant improvement in speed.
Using New keyword can lead to unexpected behaviour. You can create one instance of HTMLDocument and work with that provided you have good error handling in. I have had occassional cases in a loop where I have had to set HTMLDocument to Nothing before looping back round.
Personally, I would cheat and re-write this to leverage that you can issue GET requests to get the same info. I use a class to hold the XMLHTTP object, and an array to hold the results. I write the results out in one go. This takes a few seconds to run for me. The asay numbers are in Sheet1 range A1:A84.
Class module clsHTTP
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetString(ByVal url As String) As String
Dim sResponse As String
With http
.Open "GET", url, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
GetString = sResponse
End With
End Function
Standard module 1
Option Explicit
Public Sub DGaloplar()
Dim asays(), ws As Worksheet, asay As Long, html As HTMLDocument
Dim http As clsHTTP, url As String, headers(), numberOfRequests As Long
headers = Array("Asay", "Tarih", "Sehir", "Kg", "Jokey", "400", "600", "800", "1000", "1200", "1400", "Ç", "Pist", "Durum")
Set http = New clsHTTP
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument
asays = Application.Transpose(ws.Range("A1:A84").Value) 'Load asay values from sheet 1
Const numTableRows As Long = 11
Const numTableColumns As Long = 15
Const BASE_URL As String = "https://yenibeygir.com/at/getatdetaytab/?tab=galopTab&id="
numberOfRequests = UBound(asays)
Dim results(), headerRow As Boolean, tRows As Object, tRow As Object, iRow As Long
Dim tCells As Object, tCell As Object, r As Long, c As Long, hTable As HTMLTable
ReDim results(1 To numTableRows * numberOfRequests, 1 To numTableColumns)
Application.ScreenUpdating = False
For asay = 1 To numberOfRequests
headerRow = True
url = BASE_URL & asays(asay)
html.body.innerHTML = http.GetString(url)
Set hTable = html.querySelector(".at_Galoplar")
Set tRows = hTable.getElementsByTagName("tr")
For Each tRow In tRows
If Not headerRow Then
c = 2: r = r + 1
results(r, 1) = asays(asay)
Set tCells = tRow.getElementsByTagName("td")
For Each tCell In tCells
results(r, c) = tCell.innerText
c = c + 1
Next
End If
headerRow = False
Next
Next
With ws
.Cells(1, 3).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 3).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
Application.ScreenUpdating = True
End Sub
Refereces:
Microsoft HTML Object Library
In general Set elem = Nothing is what you need.
In your code you are assigning the variables within a for-each loop, thus even if you set them to Nothing later, there would not be a performance bonus.
Try setting those object to Nothing, like below:
Set elem = Nothing
Set trow = Nothing
I am not sure whether you need variable declarations in your loop, you can take them out of the loop, this might save some time.
But I think your HTTP requests are taking so long, not any VBA code.
UPDATE
Try setting Application.EnableEvents and Application.ScreenUpdating to False at the beggining of macro and setting them back to True at the end.