My code below will extract a value for each hour of the day.
However, the webpage I'm scraping can change and so I want to find a way to assign the location of the to a variable so that it will know what number it is everytime. I found the current number "116" by trial and error.
I included the html structure below as well. Any suggestions?
Sub scrape()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.application")
With IE
.Visible = False
.navigate "web address"
Do Until .readyState = 4
DoEvents
Loop
.document.all.item("Login1_UserName").Value = "user"
.document.all.item("Login1_Password").Value = "pw"
.document.all.item("Login1_LoginButton").Click
Do Until .readyState = 4
DoEvents
Loop
End With
Dim htmldoc As Object
Dim r
Dim c
Dim aTable As Object
Dim TDelement As Object
Set htmldoc = IE.document
Dim td As Object
For Each td In htmldoc.getElementsByTagName("td")
On Error Resume Next
If span.Children(0).id = "ctl00_PageContent_grdReport_ctl08_Label50" Then
ThisWorkbook.Sheets("sheet1").Range("j8").Offset(r, c).Value = td.Children(1).innerText
End If
On Error GoTo 0
Next td
End Sub
HTML:
<form name="aspnetForm" id="aspnetForm" action="./MinMaxReport.aspx"
method="post">
<div>
</div>
<script type="text/javascript">...</script>
<div>
</div>
<table class="header-table">...</table>
<table class="page-area">
<tbody>
<tr>
<table id="ctl00_PageContent_Table1" border="0">...</table>
<table id="ctl00_PageContent_Table2" border="0">
<tbody>
<tr>
<td>
<div id="ctl00_PageContent_grdReport_div">
<tbody>
<tr style="background-color: beige;">
<td>...</td>
<td>
<span id="ctl00_PageContent_grdReport_ctl08_Label50">Most Restrictive
Capacity Maximum</span>
</td>
<td>
<span id="ctl00_PageContent_grdReport_ctl08_Label51">159</span>
</td>
</tr>
</tbody>
</div>
</td>
</tr>
</tbody>
</table>
</table>
</tr>
</tbody>
</table>
You could loop through all TDs and check if id= "ctl00_PageContent_grdReport_ctl08_Label50" for example:
For Each td In htmldoc.getElementsByTagName("td")
On Error Resume Next
If td.Children(0).ID = "ctl00_PageContent_grdReport_ctl08_Label50" Then
ThisWorkbook.Sheets("sheet1").Range("j8").Offset(r, c).Value = td.Children(1).innerText
End If
On Error GoTo 0
Next td
Children(0) will pick the first iHTML element contained in your table cell. On Error Resume Next is for the situation when td element has no child.
It is possible that you have more then one element with this id in your webpage. Then, you must identify table or table row first. I cannot do it because I can't see your whole HTML code.
Related
I'm writing a code to automatically fill some website with cells values:
Sub prueba()
Dim oIE As InternetExplorer: Set oIE = New InternetExplorer
Dim oDocument As HTMLDocument
Dim ECICOR As HTMLSelectElement
Dim i, j As Integer
Dim x As Long
oIE.Visible = True
oIE.Navigate "http://sirem.eci.geci/smcfs/console/login.jsp"
Do While oIE.readyState <> 4: DoEvents: Loop
With oDocument
Set oDocument = oIE.Document
End With
Call oDocument.parentWindow.execScript("window.parent.sc.postDummyFormForWindow('/smcfs/console/inventory.search');", "JScript")
Set ECICOR = oDocument.getElementById("enterpriseFieldObj")
ECICOR.Focus
ECICOR.Click
ECICOR.Value = "ECICOR"
ECICOR.FireEvent ("onChange")
oDocument.getElementsByClassName("unprotectedinput")(0).Value = Cells(i, 1)
oDocument.getElementsByTagName("a")(0).Click
oDocument.getElementsbyClassName("evenrow")(1).click
End Sub
So my problem is that my program doesn't do anything after the last line on the code and I don't know what problem it is because it worked before.
Here you can see the HTML code:
<
<TR class=evenrow><TD class=checkboxcolumn><INPUT type=checkbox value=%3CInventoryItem+ItemID%3D%22000000000152030052%22+OrganizationCode%3D%22ECICOR%22+ProductClass%3D%22%22+UnitOfMeasure%3D%22%22%2F%3E name=EntityKey oldChecked="false"> <INPUT type=hidden value=000000000152030052 name=ItemID_1> <INPUT type=hidden name=UOM_1> <INPUT type=hidden name=PC_1> <INPUT type=hidden value=ECICOR name=OrgCode_1> </TD>
<TD class=tablecolumn><A onclick="javascript:showDetailFor('%3CInventoryItem+ItemID%3D%22000000000152030052%22+OrganizationCode%3D%22ECICOR%22+ProductClass%3D%22%22+UnitOfMeasure%3D%22%22%2F%3E');return false;" href="">000000000152030052</A> </TD>
<TD class=tablecolumn></TD>
<TD class=tablecolumn></TD>
<TD class=tablecolumn>001097578527174</TD></TR>">
How can I find a solution?
document.getElementsByClassName() will return an array, not an Element. If you have only one element with the unprotectedinput class, then you need to get the first element in the array returned by document.getElementsByClassName().
I need help scraping the tags onto my excel from an internal company website.
This is the source code.
<br />
<span class="RptTitle"><input id="chkPromisDataLog" type="checkbox" name="chkPromisDataLog" checked="checked" onclick="showOnOffPromisLog();" /><label for="chkPromisDataLog">Promis Processing data log [83508442.1].</label></span>
<div id="divPromisDataLog" style="display: none;">
<table id="tblPromisDataLog" cellspacing="0" cellpadding="0" width="100%" border="0" class="table">
<tr>
<td width="60%"></td>
<td>
<a class="textnormal" href="javascript:popwnd=window.open('../Tools/ExportExcel.aspx?KEY=LOT_GEN_PROMIS','popwnd','status=no,toolbar=Yes,menubar=Yes,location=no,scrollbars=yes,resizable=Yes');popwnd.focus()">
Export to Excel
</a>
</td>
</tr>
<tr>
<td colspan="2">
<table cellspacing="0" rules="all" border="1" id="dgPromisDataLog" style="border-color: Black; border-collapse: collapse;">
<tr class="rptDetailsHeaderMgt" align="center">
<td>LotID</td>
<td>Hist Stage</td>
<td>Datein</td>
<td>Dateout</td>
<td>Qtyin</td>
<td>Qtyout</td>
<td>M/C ID</td>
<td>Emp TrackOut</td>
<td>Hold Code</td>
<td>Hold Reason</td>
<td>Staging (Hrs)</td>
</tr>
<tr class="rptDetailsItemMgt" align="center" style="white-space: nowrap;">
<td>83508442.1</td>
<td>
<a
href="javascript:popwnd=window.open('LotGen_Dtl.aspx?iDate=04/09/2021 09:07:07 PM&oDate=04/10/2021 03:47:59 PM&oLotid=83508442.1&oStage=C-WFRPROCS&oLastRow=N','popwnd','width=900,height=600,status=no,toolbar=no,menubar=no,location=no,scrollbars=yes,top=100,right=50,left=50');popwnd.focus();"
>
C-WFRPROCS
</a>
</td>
<td>4/9/2021 9:07:07 PM</td>
<td>4/10/2021 3:47:59 PM</td>
<td>0</td>
<td>9</td>
<td></td>
<td>10911700</td>
<td> </td>
<td> </td>
<td>18.68</td>
</tr>
</table>
</td>
</tr>
</table>
</div>
This is roughly my code
Sub Lotsearch()
Dim ie As InternetExplorer
Dim htmlEle As IHTMLElement
Dim i As Integer
Set ie = New InternetExplorer 'start new IE page
ie.Visible = True 'View what is happening in IE
ie.navigate "www.internalcompanywebsite.aspx" 'Open link in IE
While ie.readyState <> 4 'Waits for IE to finish loading
DoEvents
Wend
i = 1
'ie.document.getElementById("tblPromisDataLog") = Cells(2, 1).Value
'ie.document.getElementsByTagName("td").Value = Cells(5, 1).Value
'Set Data = ie.document.getElementByTagName("rptDetailsItemMgt")
'Dim myValue As String
'myValue = allRowOfData.Cells(0).innerHTML
'Cells(3, 13) = myValue
'Range("L1").Value = myValue
'For Each htmlEle In ie.document.getElementById("tblPromisDataLog")(0).getElementsByClassName("rptDetailsItemMgt")
With ActiveSheet
.Range("A" & i).Value = htmlEle.Children(0).textContent
' .Range("B" & i).Value = htmlEle.Children(1).textContent
' .Range("C" & i).Value = htmlEle.Children(2).textContent
' .Range("D" & i).Value = htmlEle.Children(3).textContent
' .Range("E" & i).Value = htmlEle.Children(4).textContent
' .Range("F" & i).Value = htmlEle.Children(5).textContent
' .Range("G" & i).Value = htmlEle.Children(6).textContent
' .Range("H" & i).Value = htmlEle.Children(7).textContent
' .Range("I" & i).Value = htmlEle.Children(8).textContent
' .Range("J" & i).Value = htmlEle.Children(9).textContent
' .Range("K" & i).Value = htmlEle.Children(10).textContent
' .Range("L" & i).Value = htmlEle.Children(11).textContent
End With
i = i + 1
Next htmlEle
ie.Quit
End Sub
As you can see, I have tried various methods but to no avail.
getElementbyID not working
getElementsbyTagName not working
getElementsByClassName not working
Any help would be appreciated. Thanks.
it may not actually be the most efficient way to deal with HTML extraction, but you might consider using Regex matching.. Raw Coding on youtube just made a killer regex tutorial, and I remembered seeing this question, and thought it might be a good alternative if you didn't like dealing with html explicitly.
Regex Tutorial for Beginners from Raw Coding on Youtube
like, if you only wanted normal text between td tags, you could regex search for
(?<OpenTag>[\<]+td[\>]+)(?<Contents>[\w\/\(\)\[\]\.\&\:\;\s]*?)(?<CloseTag>[\<]+[\/]+[td]+[\>]+)
here's an example at Regex101
Regex101 example using your html
Dim ht As HTMLDocument
Dim i As Integer
Dim htmltable As MSHTML.htmltable
Set htmltable = ht.getElementById("dgPromisDataLog")
myValue = htmltable.getElementsByClassName("rptDetailsItemMgt")(0).getElementsByTagName("td")(0).innerText
After messing with it for a few days, I found that the code works if I split up the getElementbyId from the other 'getElements'.
Changed htmlEle As IHTMLElement into ht As HTMLDocument. Also added htmltable As MSHTML.htmltable
For some reason the code returns an error if I chain the entire 'getelement' together. Hope this helps someone else with the same problem.
I'm trying to scrape website and need only 1 value.
How do I retrieve purchase method using code below? see html below
Private Sub CommandButton1_Click()
Dim IE As Object
' Create InternetExplorer Object
Set IE = CreateObject("InternetExplorer.Application")
' You can uncoment Next line To see form results
IE.Visible = False
' URL to get data from
IE.Navigate Cells(1, 1)
' Statusbar
Application.StatusBar = "Loading, Please wait..."
' Wait while IE loading...
Do While IE.Busy
Application.Wait DateAdd("s", 5, Now)
Loop
Application.StatusBar = "Searching for value. Please wait..."
Dim tr As Object, td As Object, tb As Object
Dim value As String
Set tb = IE.Document.getElementById("prop_desc clearfix")
For Each tr In tb.Rows 'loop through the <tr> rows of your table
For Each td In tr.Cells 'loop through the <th> cells of your row
value = td.outerText 'your value is now in the variable "value"
MsgBox value
Next td
Next tr
' Show IE
IE.Visible = True
' Clean up
Set IE = Nothing
Application.StatusBar = ""
End Sub
</div>
</div>
<div class="prop_desc clearfix"><div class = "span-half">
<h3>RV Park/Campground for Sale</h3>
<table>
<tr>
<td>Number of RV Lots: </td>
<td>270</td>
</tr>
<tr>
<td>Size:</td>
<td>
157 acre(s)
</td>
</tr>
<tr>
<td>Purchase Method:</td>
<td>Cash, New Loan</td>
</tr>
<tr>
<td>Status:</td>
<td>
Active
</td>
</tr>
<tr>
<td>Property ID:</td>
<td>966994</td>
</tr>
<tr>
<td>Posted on:</td>
<td>Jul 10, 2018</td>
</tr>
<tr>
<td>Updated on:</td>
<td>Jul 10, 2018</td>
</tr>
<tr>
You can target by class by index and then the td by index
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://www.rvparkstore.com/rv-parks/902077--2843-lake-frontage-42-acres-for-sale-in-north-central-us"
While .Busy Or .readyState < 4: DoEvents: Wend
Debug.Print .document.getElementsByClassName("span-half")(0).getElementsByTagName("td")(5).innerText
.Quit '<== Remember to quit application
End With
End Sub
UPDATE:
Url https://www.rvparkstore.com/rv-parks/902077--2843-lake-frontage-42-acres-for-sale-in-north-central-us
HTML Here https://docs.google.com/document/d/1J5tDV99IbzucCB_z8QX8lDa4X3ecxQbQOWeVy5B7Irg/edit?usp=sharing
please use links for more info jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
I am working on an Excel VBA project to scrape some specific information from a website. The view of this data on the website is as such:
Website View:
What I am looking to do is extract text based on two criteria: Name and post date. For example, I have the name Kaelan and the post date of 11/16/2016. I want to extract the amount of $365.
This is the HTML code:
<div class="familyLedgerAmountCategory" id="id_4541278">
<table>
<tr>
<td class="tdCategoryRow">
<div class="cmFloatLeft divExpandToggle expanded" id="divCategoryToggle_id_4541278"></div>
<div class="cmFloatLeft" id="divCategoryLabel_id_4541278" style="width: 430px;">
Kaelan
</div><span style="margin-left: 5px;">$ 465.00</span>
</td>
</tr>
<tbody>
<tr class="trListTableBody LedgerExisting" id="CamperFamilyLedgerRowControl_14816465">
<td class="tdCamperFamilyLedgerTableColumnDescription tdBorderTop" id="tdCamperFamilyLedgerTableColumnDescription_CamperFamilyLedgerRowControl_14816465">
<div class="divListTableBodyCell" id="tdColumnDescriptionCell">
<table class="tblListTableBodyCell">
<tr>
<td>
<div class="divListTableBodyLabel">
<a class="aColumnDescriptionCell" id="aColumnDescriptionCell_CamperFamilyLedgerRowControl_14816465" name="aColumnDescriptionCell_CamperFamilyLedgerRowControl_14816465" target="_self" title="Click to view details">2017 Super Early Bird Teen Camp - Tuition</a>
</div>
</td>
</tr>
</table>
</div>
</td>
<td class="tdCamperFamilyLedgerTableColumnPostDate tdBorderTop" id="tdCamperFamilyLedgerTableColumnPostDate_CamperFamilyLedgerRowControl_14816465">
<div class="divListTableBodyCell" id="tdColumnPostDateCell">
<table class="tblListTableBodyCell">
<tr>
<td>
<div class="divListTableBodyLabel">
11/16/2016
</div>
</td>
</tr>
</table>
</div>
</td>
<td class="tdCamperFamilyLedgerTableColumnEffective tdBorderTop" id="tdCamperFamilyLedgerTableColumnEffective_CamperFamilyLedgerRowControl_14816465">
<div class="divListTableBodyCell" id="tdColumnEffectiveCell">
<table class="tblListTableBodyCell">
<tr>
<td>
<div class="divListTableBodyLabel">
11/15/2016
</div>
</td>
</tr>
</table>
</div>
</td>
<td class="tdCamperFamilyLedgerTableColumnQty tdBorderTop" id="tdCamperFamilyLedgerTableColumnQty_CamperFamilyLedgerRowControl_14816465">
<div class="divListTableBodyCell" id="tdColumnQtyCell">
<table class="tblListTableBodyCell">
<tr>
<td>
<div class="divListTableBodyLabel">
1
</div>
</td>
</tr>
</table>
</div>
</td>
<td class="tdCamperFamilyLedgerTableColumnAmount tdBorderTop" id="tdCamperFamilyLedgerTableColumnAmount_CamperFamilyLedgerRowControl_14816465">
<div class="divListTableBodyCell" id="tdColumnAmountCell">
<table class="tblListTableBodyCell">
<tr>
<td>
<div class="divListTableBodyLabel">
$ 365.00
</div>
</td>
</tr>
</table>
</div>
</td>
<td class="tdCamperFamilyLedgerTableColumnAction tdBorderTop" id="tdCamperFamilyLedgerTableColumnAction_CamperFamilyLedgerRowControl_14816465"></td>
</tr>
</tbody>
</table>
</div>
My attempt to pull the amount is as follows:
Sub Test()
Dim ie As Object
Dim oElement As Object
Dim wsTarget As Worksheet
Dim i As Integer
Dim NewWB As Workbook
Set NewWB = ActiveWorkbook
Set wsTarget = NewWB.Sheets(1)
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate website...
Wait 6
ie.document.All.Item("txtUserName").Value = "User"
ie.document.All.Item("pswdPassword").Value = "Pass
Wait 1
ie.document.getElementById("btnLogin").Click
Wait 5
ie.navigate website...
i = 1
For Each oElement In ie.document.getElementsByClassName("cmFloatLeft")
If oElement.innerText = "Kaelan" Then
extract1 = oElement.getElementsByClassName("divListTableBodyLabel").innerText
MsgBox extract1
Else
End If
Next
However, I get an error when running the code above. Can I get the class for cmFloatLeft that I am looking for and then try to call the divLisTableBodyLabel class immediately even though that class does not fall directly below the cmFloatLeft class?
Sorry, I'm still pretty new to scraping web data.
Thanks
That structure is a bit difficult to scrape - you could try going "up" from the "Kaelan" node to the patent table, and then looping over that to extract the various pieces of information. If the post structures are consistent then that could provide one approach.
Set doc = IE.document
Set els = doc.getElementsByClassName("cmFloatLeft")
i = 1
For Each oElement In els
Debug.Print oElement.innerText
If Trim(oElement.innerText) = "Kaelan" Then
Set tbl = GetParent(oElement, "table") '<< find the parent table
If Not tbl Is Nothing Then
'loop over the parent table
For Each rw In tbl.Rows
For Each cl In rw.Cells
Debug.Print cl.innerText
Next cl
Next rw
End If
End If
Next
Function to find a named parent (by tag name):
Function GetParent(el, tagParent)
Dim rv As Object
Set rv = el
Do While Not rv.parentElement Is Nothing
Set rv = rv.parentElement
If UCase(rv.tagName) = UCase(tagParent) Then
Set GetParent = rv
Exit Function
End If
Loop
Set GetParent = Nothing
End Function
I am using the VBA automation to get some informations of a ticket system in my job. I am trying to get the value into the generated table but only information that doest'go to the column "A" on sheet "Plan1" is <td> which contains the overflow: hidden CSS atribute. I don't know if are them related but coincidently are the only data that don't appears. Someone can help me?
HTML code:
<div id="posicionamentoContent">
<table class="grid">
<thead>...</thead>
<tbody>
<tr id="937712" class="gridrow">
<td width="200px"> Leonardo Peixoto </td>
<td width="200px"> 23/12/2015 09:45 </td>
<td width="200px"> SIM </td>
<td width="200px"> Telhado da loja com pontos de vazamento.</td>
<td width="200px" align="center"></td>
<td width="200px" align="center"></td>
</tr>
...
...
...
The complete code: http://i.stack.imgur.com/4BsFo.png
I need to get the first 4 <td> text ( Leonardo Peixoto, 23/12/2015 09:45, SIM and Telhado da loja com pontos de vazamento.) but they are only texts which I can't get.
Obs: When I use developers tools (f12) to inspect each element, it shows me perfectly the information I need inside <td>. But when I open "source code" page to checkthe html, the code is like this:
<div id="tabPosicionamento" style="padding: 5px 0 5px 0;" class="ui-tabs-hide">
div id="posicionamentoContent"></div>
</div>
Example VBA:
Sub extractTablesData1()
'we define the essential variables
Dim IE As Object, obj As Object
Dim ticket As String
Set IE = CreateObject("InternetExplorer.Application")
ticket= InputBox("Enter the ticket code")
With IE
.Visible = False
.navigate ("https://www.example.com/details/") & ticket
While IE.ReadyState <> 4
DoEvents
Wend
ThisWorkbook.Sheets("Plan1").Range("A1:K500").ClearContents
Set data = IE.document.getElementsByClassName("thead")(0).getElementsByTagName("td")
i = 0
For Each elemCollection In data
ThisWorkbook.Sheets("Plan1").Range("A" & i + 1) = data(i).innerText
i = i + 1
Next elemCollection
End With
IE.Quit
Set IE = Nothing
....
....
End Sub
This function returns in column "A" of sheet Plan1 only <td class=info3"></td> and <td class=info4"></td> but I need <td class=info1"></td> and <td class=info2 also."></td>
I wasn't able to read the page code due the proxy blocking me, but I faced a similar issue a while ago and the solution I found out was put all data on clipboard and paste. After that I clean the data on the sheet.
Here the code I used to do that:
Set ieTable = ie.document.getElementById("ID")
If Not ieTable Is Nothing Then
Set clip = New DataObject
clip.SetText "<html>" & ieTable.outerHTML & "</html>"
clip.PutInClipboard
Sheet1.Range("A1").Select
ActiveSheet.PasteSpecial Format:="Unicode Text", link:=False, DisplayAsIcon:=False, NoHTMLFormatting:=True
End If
Considering that you need to isolate the 4 td lines, you can do that with a loop for every search.
In your sample it numerates the Data, but not using it. Also, the cell assignment should be cells(x,y).value. Here is the working code.
Sub extractTablesData1()
'we define the essential variables
Dim IE As Object, Data As Object
Dim ticket As String
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = False
.navigate ("put your data url here")
While IE.ReadyState <> 4
DoEvents
Wend
Set Data = IE.document.getElementsByTagName("tr")(0).getElementsByTagName("td")
i = 1
For Each elemCollection In Data
ActiveWorkbook.Sheets(1).Cells(1, i).Value = elemCollection.innerHTML
i = i + 1
Next elemCollection
End With
IE.Quit
Set IE = Nothing
End Sub
It doesn't bring the information what I need (lasts <td>)
<div id="posicionamentoContent">
<table class="grid">
<thead>...</thead>
<tbody>
<tr id="937712" class="gridrow">
<td width="200px"> Leonardo Peixoto </td>
<td width="200px"> 23/12/2015 09:45 </td>
<td width="200px"> SIM </td>
<td width="200px"> Telhado da loja com pontos de vazamento.</td>
<td width="200px" align="center"></td>
<td width="200px" align="center"></td>
</tr>