I'm trying to scrape website and need only 1 value.
How do I retrieve purchase method using code below? see html below
Private Sub CommandButton1_Click()
Dim IE As Object
' Create InternetExplorer Object
Set IE = CreateObject("InternetExplorer.Application")
' You can uncoment Next line To see form results
IE.Visible = False
' URL to get data from
IE.Navigate Cells(1, 1)
' Statusbar
Application.StatusBar = "Loading, Please wait..."
' Wait while IE loading...
Do While IE.Busy
Application.Wait DateAdd("s", 5, Now)
Loop
Application.StatusBar = "Searching for value. Please wait..."
Dim tr As Object, td As Object, tb As Object
Dim value As String
Set tb = IE.Document.getElementById("prop_desc clearfix")
For Each tr In tb.Rows 'loop through the <tr> rows of your table
For Each td In tr.Cells 'loop through the <th> cells of your row
value = td.outerText 'your value is now in the variable "value"
MsgBox value
Next td
Next tr
' Show IE
IE.Visible = True
' Clean up
Set IE = Nothing
Application.StatusBar = ""
End Sub
</div>
</div>
<div class="prop_desc clearfix"><div class = "span-half">
<h3>RV Park/Campground for Sale</h3>
<table>
<tr>
<td>Number of RV Lots: </td>
<td>270</td>
</tr>
<tr>
<td>Size:</td>
<td>
157 acre(s)
</td>
</tr>
<tr>
<td>Purchase Method:</td>
<td>Cash, New Loan</td>
</tr>
<tr>
<td>Status:</td>
<td>
Active
</td>
</tr>
<tr>
<td>Property ID:</td>
<td>966994</td>
</tr>
<tr>
<td>Posted on:</td>
<td>Jul 10, 2018</td>
</tr>
<tr>
<td>Updated on:</td>
<td>Jul 10, 2018</td>
</tr>
<tr>
You can target by class by index and then the td by index
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://www.rvparkstore.com/rv-parks/902077--2843-lake-frontage-42-acres-for-sale-in-north-central-us"
While .Busy Or .readyState < 4: DoEvents: Wend
Debug.Print .document.getElementsByClassName("span-half")(0).getElementsByTagName("td")(5).innerText
.Quit '<== Remember to quit application
End With
End Sub
UPDATE:
Url https://www.rvparkstore.com/rv-parks/902077--2843-lake-frontage-42-acres-for-sale-in-north-central-us
HTML Here https://docs.google.com/document/d/1J5tDV99IbzucCB_z8QX8lDa4X3ecxQbQOWeVy5B7Irg/edit?usp=sharing
please use links for more info jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
Related
I'm writing a code to automatically fill some website with cells values:
Sub prueba()
Dim oIE As InternetExplorer: Set oIE = New InternetExplorer
Dim oDocument As HTMLDocument
Dim ECICOR As HTMLSelectElement
Dim i, j As Integer
Dim x As Long
oIE.Visible = True
oIE.Navigate "http://sirem.eci.geci/smcfs/console/login.jsp"
Do While oIE.readyState <> 4: DoEvents: Loop
With oDocument
Set oDocument = oIE.Document
End With
Call oDocument.parentWindow.execScript("window.parent.sc.postDummyFormForWindow('/smcfs/console/inventory.search');", "JScript")
Set ECICOR = oDocument.getElementById("enterpriseFieldObj")
ECICOR.Focus
ECICOR.Click
ECICOR.Value = "ECICOR"
ECICOR.FireEvent ("onChange")
oDocument.getElementsByClassName("unprotectedinput")(0).Value = Cells(i, 1)
oDocument.getElementsByTagName("a")(0).Click
oDocument.getElementsbyClassName("evenrow")(1).click
End Sub
So my problem is that my program doesn't do anything after the last line on the code and I don't know what problem it is because it worked before.
Here you can see the HTML code:
<
<TR class=evenrow><TD class=checkboxcolumn><INPUT type=checkbox value=%3CInventoryItem+ItemID%3D%22000000000152030052%22+OrganizationCode%3D%22ECICOR%22+ProductClass%3D%22%22+UnitOfMeasure%3D%22%22%2F%3E name=EntityKey oldChecked="false"> <INPUT type=hidden value=000000000152030052 name=ItemID_1> <INPUT type=hidden name=UOM_1> <INPUT type=hidden name=PC_1> <INPUT type=hidden value=ECICOR name=OrgCode_1> </TD>
<TD class=tablecolumn><A onclick="javascript:showDetailFor('%3CInventoryItem+ItemID%3D%22000000000152030052%22+OrganizationCode%3D%22ECICOR%22+ProductClass%3D%22%22+UnitOfMeasure%3D%22%22%2F%3E');return false;" href="">000000000152030052</A> </TD>
<TD class=tablecolumn></TD>
<TD class=tablecolumn></TD>
<TD class=tablecolumn>001097578527174</TD></TR>">
How can I find a solution?
document.getElementsByClassName() will return an array, not an Element. If you have only one element with the unprotectedinput class, then you need to get the first element in the array returned by document.getElementsByClassName().
I'm writing a code to automatically fill some website with cells values:
Sub prueba()
Dim oIE As InternetExplorer: Set oIE = New InternetExplorer
Dim oDocument As HTMLDocument
Dim ECICOR As HTMLSelectElement
Dim i, j As Integer
Dim x As Long
oIE.Visible = True
oIE.Navigate "http://sirem.eci.geci/smcfs/console/login.jsp"
Do While oIE.readyState <> 4: DoEvents: Loop
With oDocument
Set oDocument = oIE.Document
End With
Call oDocument.parentWindow.execScript("window.parent.sc.postDummyFormForWindow('/smcfs/console/inventory.search');", "JScript")
Set ECICOR = oDocument.getElementById("enterpriseFieldObj")
ECICOR.Focus
ECICOR.Click
ECICOR.Value = "ECICOR"
ECICOR.FireEvent ("onChange")
oDocument.getElementsByClassName("unprotectedinput")(0).Value = Cells(i, 1)
oDocument.getElementsByTagName("a")(0).Click
oDocument.getElementsbyClassName("evenrow")(1).click
End Sub
So my problem is that my program doesn't do anything on the last line of the code and I don't know what problem it is because it worked before.
Here you can see the HTML code:
<
<TR class=evenrow><TD class=checkboxcolumn><INPUT type=checkbox value=%3CInventoryItem+ItemID%3D%22000000000152030052%22+OrganizationCode%3D%22ECICOR%22+ProductClass%3D%22%22+UnitOfMeasure%3D%22%22%2F%3E name=EntityKey oldChecked="false"> <INPUT type=hidden value=000000000152030052 name=ItemID_1> <INPUT type=hidden name=UOM_1> <INPUT type=hidden name=PC_1> <INPUT type=hidden value=ECICOR name=OrgCode_1> </TD>
<TD class=tablecolumn><A onclick="javascript:showDetailFor('%3CInventoryItem+ItemID%3D%22000000000152030052%22+OrganizationCode%3D%22ECICOR%22+ProductClass%3D%22%22+UnitOfMeasure%3D%22%22%2F%3E');return false;" href="">000000000152030052</A> </TD>
<TD class=tablecolumn></TD>
<TD class=tablecolumn></TD>
<TD class=tablecolumn>001097578527174</TD></TR>">
How can I find a solution?
I need help scraping the tags onto my excel from an internal company website.
This is the source code.
<br />
<span class="RptTitle"><input id="chkPromisDataLog" type="checkbox" name="chkPromisDataLog" checked="checked" onclick="showOnOffPromisLog();" /><label for="chkPromisDataLog">Promis Processing data log [83508442.1].</label></span>
<div id="divPromisDataLog" style="display: none;">
<table id="tblPromisDataLog" cellspacing="0" cellpadding="0" width="100%" border="0" class="table">
<tr>
<td width="60%"></td>
<td>
<a class="textnormal" href="javascript:popwnd=window.open('../Tools/ExportExcel.aspx?KEY=LOT_GEN_PROMIS','popwnd','status=no,toolbar=Yes,menubar=Yes,location=no,scrollbars=yes,resizable=Yes');popwnd.focus()">
Export to Excel
</a>
</td>
</tr>
<tr>
<td colspan="2">
<table cellspacing="0" rules="all" border="1" id="dgPromisDataLog" style="border-color: Black; border-collapse: collapse;">
<tr class="rptDetailsHeaderMgt" align="center">
<td>LotID</td>
<td>Hist Stage</td>
<td>Datein</td>
<td>Dateout</td>
<td>Qtyin</td>
<td>Qtyout</td>
<td>M/C ID</td>
<td>Emp TrackOut</td>
<td>Hold Code</td>
<td>Hold Reason</td>
<td>Staging (Hrs)</td>
</tr>
<tr class="rptDetailsItemMgt" align="center" style="white-space: nowrap;">
<td>83508442.1</td>
<td>
<a
href="javascript:popwnd=window.open('LotGen_Dtl.aspx?iDate=04/09/2021 09:07:07 PM&oDate=04/10/2021 03:47:59 PM&oLotid=83508442.1&oStage=C-WFRPROCS&oLastRow=N','popwnd','width=900,height=600,status=no,toolbar=no,menubar=no,location=no,scrollbars=yes,top=100,right=50,left=50');popwnd.focus();"
>
C-WFRPROCS
</a>
</td>
<td>4/9/2021 9:07:07 PM</td>
<td>4/10/2021 3:47:59 PM</td>
<td>0</td>
<td>9</td>
<td></td>
<td>10911700</td>
<td> </td>
<td> </td>
<td>18.68</td>
</tr>
</table>
</td>
</tr>
</table>
</div>
This is roughly my code
Sub Lotsearch()
Dim ie As InternetExplorer
Dim htmlEle As IHTMLElement
Dim i As Integer
Set ie = New InternetExplorer 'start new IE page
ie.Visible = True 'View what is happening in IE
ie.navigate "www.internalcompanywebsite.aspx" 'Open link in IE
While ie.readyState <> 4 'Waits for IE to finish loading
DoEvents
Wend
i = 1
'ie.document.getElementById("tblPromisDataLog") = Cells(2, 1).Value
'ie.document.getElementsByTagName("td").Value = Cells(5, 1).Value
'Set Data = ie.document.getElementByTagName("rptDetailsItemMgt")
'Dim myValue As String
'myValue = allRowOfData.Cells(0).innerHTML
'Cells(3, 13) = myValue
'Range("L1").Value = myValue
'For Each htmlEle In ie.document.getElementById("tblPromisDataLog")(0).getElementsByClassName("rptDetailsItemMgt")
With ActiveSheet
.Range("A" & i).Value = htmlEle.Children(0).textContent
' .Range("B" & i).Value = htmlEle.Children(1).textContent
' .Range("C" & i).Value = htmlEle.Children(2).textContent
' .Range("D" & i).Value = htmlEle.Children(3).textContent
' .Range("E" & i).Value = htmlEle.Children(4).textContent
' .Range("F" & i).Value = htmlEle.Children(5).textContent
' .Range("G" & i).Value = htmlEle.Children(6).textContent
' .Range("H" & i).Value = htmlEle.Children(7).textContent
' .Range("I" & i).Value = htmlEle.Children(8).textContent
' .Range("J" & i).Value = htmlEle.Children(9).textContent
' .Range("K" & i).Value = htmlEle.Children(10).textContent
' .Range("L" & i).Value = htmlEle.Children(11).textContent
End With
i = i + 1
Next htmlEle
ie.Quit
End Sub
As you can see, I have tried various methods but to no avail.
getElementbyID not working
getElementsbyTagName not working
getElementsByClassName not working
Any help would be appreciated. Thanks.
it may not actually be the most efficient way to deal with HTML extraction, but you might consider using Regex matching.. Raw Coding on youtube just made a killer regex tutorial, and I remembered seeing this question, and thought it might be a good alternative if you didn't like dealing with html explicitly.
Regex Tutorial for Beginners from Raw Coding on Youtube
like, if you only wanted normal text between td tags, you could regex search for
(?<OpenTag>[\<]+td[\>]+)(?<Contents>[\w\/\(\)\[\]\.\&\:\;\s]*?)(?<CloseTag>[\<]+[\/]+[td]+[\>]+)
here's an example at Regex101
Regex101 example using your html
Dim ht As HTMLDocument
Dim i As Integer
Dim htmltable As MSHTML.htmltable
Set htmltable = ht.getElementById("dgPromisDataLog")
myValue = htmltable.getElementsByClassName("rptDetailsItemMgt")(0).getElementsByTagName("td")(0).innerText
After messing with it for a few days, I found that the code works if I split up the getElementbyId from the other 'getElements'.
Changed htmlEle As IHTMLElement into ht As HTMLDocument. Also added htmltable As MSHTML.htmltable
For some reason the code returns an error if I chain the entire 'getelement' together. Hope this helps someone else with the same problem.
My code below will extract a value for each hour of the day.
However, the webpage I'm scraping can change and so I want to find a way to assign the location of the to a variable so that it will know what number it is everytime. I found the current number "116" by trial and error.
I included the html structure below as well. Any suggestions?
Sub scrape()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.application")
With IE
.Visible = False
.navigate "web address"
Do Until .readyState = 4
DoEvents
Loop
.document.all.item("Login1_UserName").Value = "user"
.document.all.item("Login1_Password").Value = "pw"
.document.all.item("Login1_LoginButton").Click
Do Until .readyState = 4
DoEvents
Loop
End With
Dim htmldoc As Object
Dim r
Dim c
Dim aTable As Object
Dim TDelement As Object
Set htmldoc = IE.document
Dim td As Object
For Each td In htmldoc.getElementsByTagName("td")
On Error Resume Next
If span.Children(0).id = "ctl00_PageContent_grdReport_ctl08_Label50" Then
ThisWorkbook.Sheets("sheet1").Range("j8").Offset(r, c).Value = td.Children(1).innerText
End If
On Error GoTo 0
Next td
End Sub
HTML:
<form name="aspnetForm" id="aspnetForm" action="./MinMaxReport.aspx"
method="post">
<div>
</div>
<script type="text/javascript">...</script>
<div>
</div>
<table class="header-table">...</table>
<table class="page-area">
<tbody>
<tr>
<table id="ctl00_PageContent_Table1" border="0">...</table>
<table id="ctl00_PageContent_Table2" border="0">
<tbody>
<tr>
<td>
<div id="ctl00_PageContent_grdReport_div">
<tbody>
<tr style="background-color: beige;">
<td>...</td>
<td>
<span id="ctl00_PageContent_grdReport_ctl08_Label50">Most Restrictive
Capacity Maximum</span>
</td>
<td>
<span id="ctl00_PageContent_grdReport_ctl08_Label51">159</span>
</td>
</tr>
</tbody>
</div>
</td>
</tr>
</tbody>
</table>
</table>
</tr>
</tbody>
</table>
You could loop through all TDs and check if id= "ctl00_PageContent_grdReport_ctl08_Label50" for example:
For Each td In htmldoc.getElementsByTagName("td")
On Error Resume Next
If td.Children(0).ID = "ctl00_PageContent_grdReport_ctl08_Label50" Then
ThisWorkbook.Sheets("sheet1").Range("j8").Offset(r, c).Value = td.Children(1).innerText
End If
On Error GoTo 0
Next td
Children(0) will pick the first iHTML element contained in your table cell. On Error Resume Next is for the situation when td element has no child.
It is possible that you have more then one element with this id in your webpage. Then, you must identify table or table row first. I cannot do it because I can't see your whole HTML code.
I'm trying to get a certain value from a web page but all that I try doesn't work. My HTML code is:
<tr class="TRLinha1">
<td class="TDTitulo">Adv. Funcionário</td>
<td class="TDLinha2"><span id="C2">Fulano De tal </span></td>
<td class="TDTitulo">Adv. Credenciado</td>
<td class="TDLinha2"><span id="C3">Escritorio Identidade</span></td>
<td class="TDTitulo" colspan="2">
<img id="imgAdvogadoCredenciado" onclick="fcnShowAdvogadoCredenciado(1)" onmouseover="this.style.cursor='hand'" src="../../../../../../../Portal/Imagens/arquivo.png" alt="Exibir Advogado Credenciado" style="border-width:0px;">
</td>
The value that I'm trying to get is the one with id = "C2"--> Fulano De tal. Then I will set this value in a cell. I already have the VBA code to log in the current web page.
Can anyone help me?
this is my VBA code:
Sub FazerLoginSite()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.application")
Dim node, nodeList
NumeroPasta = InputBox("Pasta")
With IE
.Visible = True
.Navigate ("https:mywebsite")
While .Busy Or .ReadyState <> 4: DoEvents: Wend
On Error Resume Next
.document.getElementById("username").Focus
.document.getElementById("username").Value = "login"
.document.getElementById("password").Focus
.document.getElementById("password").Value = "password"
.document.All("button1").Click
On Error GoTo 0
While .Busy Or .ReadyState <> 4: DoEvents: Wend
While .Busy Or .ReadyState <> 4: DoEvents: Wend
.document.getElementById("pasta").Focus
.document.getElementById("pasta").Value = NumeroPasta
While .Busy Or .ReadyState <> 4: DoEvents: Wend
While .Busy Or .ReadyState <> 4: DoEvents: Wend
IE.document.parentWindow.execScript "localizar();", "javascript"
While .Busy Or .ReadyState <> 4: DoEvents: Wend
' here is where i suppose to have a code that get the desire value
End With
End Sub
This answer requires a basic knowledge of Internet development technologies.
I guess that to understand the answer, you should read a little. I've included links.
You have 2 options:
1. Use html form and put the value into a input field.
<form action="YOUR_HANDLER" method="post">
<table>
<tr class="TRLinha1">
<td class="TDTitulo">Adv. Funcionário</td>
<td class="TDLinha2"><input type="text" id="C2" value="Fulano De tal" /></td>
<td class="TDTitulo">Adv. Credenciado</td>
<td class="TDLinha2"><span id="C3">Escritorio Identidade</span></td>
<td class="TDTitulo" colspan="2">
<img id="imgAdvogadoCredenciado" onclick="fcnShowAdvogadoCredenciado(1)" onmouseover="this.style.cursor='hand'" src="../../../../../../../Portal/Imagens/arquivo.png" alt="Exibir Advogado Credenciado" style="border-width:0px;">
</td>
</tr>
</table>
</form>
http://www.w3schools.com/html/html_forms.asp
Using AJAX. With ajax you can send data to your server without using a html form.
http://www.w3schools.com/ajax/
IE.document.getElementById("C2").innerText
should give you what you need
You could try using split on the document.innerHTML to get that string i.e.
Split(Split(.document.innerHTML, "id=""C2"">")(1), " <")(0)