Excel VBA Web Scraping Table Elements from a <frameset> and a <frame> - html

I am trying to scrape some table-looking items from a website into Excel.
I'm no stranger to coding in general, though I'm pretty new to VBA in an Excel sense :)
I have tried using Excel's Data>From Web interface, it's not recognizing the table. I'm guessing it's because it's built using (or at least that's what my Google-Fu has lead me to understand).
Snipping of what the second table looks like
<html>
<frame title="links" ...>...</frame>
<frame title="queue">
#document
<head>...</head>
<body>
<div id="container>
<script>...</script>
<div>
<table id="oTable">
<colgroup>...</colgroup>
<thead>...</thead>
<tbody>
<tr onclick="changeHighlight( 'eid0' )" id="eid0" class="queryshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.5599976.5599976');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap="">12345</td>
<td nowrap="">28/08/2018 17:00:49</td>
<td nowrap="">11/09/2018 16:28:39</td>
<td nowrap="">5,599,976</td>
<td nowrap="">dijm</td></tr>
<tr onclick="changeHighlight( 'eid1' )" id="eid1" class="queryunshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443276.6443276');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443276.6443276','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap="">67890</td>
<td nowrap="">25/06/2019 11:01:01</td>
<td nowrap="">09/07/2019 10:32:32</td>
<td nowrap="">6,443,276</td>
<td nowrap=""></td></tr>
<tr onclick="changeHighlight( 'eid2' )" id="eid2" class="queryshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443287.6443287');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443287.6443287','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap="">23456</td>
<td nowrap="">25/06/2019 11:01:24</td>
<td nowrap="">09/07/2019 10:35:30</td>
<td nowrap="">6,443,287</td>
<td nowrap=""></td></tr>
<tr onclick="changeHighlight( 'eid3' )" id="eid3" class="queryunshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443339.6443339');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443339.6443339','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap="">78901</td>
<td nowrap="">25/06/2019 11:06:02</td>
<td nowrap="">09/07/2019 10:40:39</td>
<td nowrap="">6,443,339</td>
<td nowrap=""></td></tr>
<tr onclick="changeHighlight( 'eid4' )" id="eid4" class="queryshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443344.6443344');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443344.6443344','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap="">34567</td>
<td nowrap="">25/06/2019 11:06:17</td>
<td nowrap="">09/07/2019 10:40:43</td>
<td nowrap="">6,443,344</td>
<td nowrap=""></td></tr>
I have tried various solutions that look somewhat like this:
https://www.ozgrid.com/forum/forum/other-software-applications/excel-and-web-browsers-help/131683-extracting-data-from-a-grid-on-webpage
and
Scraping data from website using vba
and trying to define the frames themselves to try and get the info from there?
(again: new to Excel VBA)
'set myHTMLDoc to the main pages IE document
Dim myHTMLDoc As HTMLDocument
Set myHTMLDoc = ie.Document
'set myHTMLFrame2 as the 2nd frame of the main page (index starts at 0)
Dim myHTMLFrame2 As HTMLDocument
Set myHTMLFrame2 = myHTMLDoc.Frames(1).Document
With the above block of code I'm getting a "Run-time error '438'
Without the above block I'm getting a "Run-time error '1004'
The info I eventually want is in each row:
</td><td scope="row" nowrap="">67890</td>
<td nowrap="">25/06/2019 11:01:01</td>
<td nowrap="">09/07/2019 10:32:32</td>
<td nowrap="">6,443,276</td>
Ideally I'd like to dump each element into a cell
67890 | 25/06/2019 11:01:01 | 09/07/2019 10:32:32 | 6,443,276
There's 20 of these rows on each page (there's a button to press to get to the next page which I'll figure out later...hopefully haha)
Massive premptive Thank You to anyone who can help :)
-EDIT-
This is the code that I'm currently working with (not precious about it :P )
Private Sub CommandButton1_Click()
Dim ie As Object
Dim html As Object
Dim objElementTR As Object
Dim objTR As Object
Dim objElementsTD As Object
Dim objTD As Object
Dim result As String
Dim intRow As Long
Dim intCol As Long
Set ie = CreateObject("InternetExplorer.Application")
ie.Navigate "URL"
ie.Visible = True ' loop until page is loaded
Do Until (ie.ReadyState = 4 And Not ie.Busy)
DoEvents
Loop
'set myHTMLDoc to the main pages IE document
Dim myHTMLDoc As HTMLDocument
Set myHTMLDoc = ie.Document
'set myHTMLFrame2 as the 2nd frame of the main page (index starts at 0)
Dim myHTMLFrame2 As HTMLDocument
Set myHTMLFrame2 = ie.Document.querySelector("[title=queue]").contentDocument.getElementById("oTable")
result = myHTMLFrame2
Set html = CreateObject("htmlfile")
myHTMLFrame2 = result
Set objElementTR = html.getElementsByTagName("tr")
ReDim myarray(0 To objElementTR.Length, 0 To 10)
For Each objTR In objElementTR
intRow = intRow + 1
Set objElementsTD = objTR.getElementsByTagName("td")
For Each objTD In objElementsTD
myarray(intRow, intCol) = objTD.innerText
intCol = intCol + 1
Next objTD
intCol = 0
Next objTR
With Sheets(1).Cells(1, 1).Cells(Rows.Count, "A").End(xlUp).Offset(1, 0)
.Resize(UBound(myarray), UBound(myarray, 2)).Value = myarray
End With
End Sub

You could try isolating the frame by its title attribute, then go via contentDocument and get the table by id
ie.document.querySelector("[title=queue]").contentDocument.querySelector("#oTable")
Then end .querySelector("#oTable") can be interchanged with .getElementById("oTable")
I would then dump the .outerHTML of the table via clipboard so as to paste table direct into sheet.

Related

how to extract text from html using beautifulsoup?

I want to extract some words from this html like
<tr class="BgSilver" style="border-color:Gray;border-width:1px;border-style:Solid;">
<td align="right" style="width:75px;" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblRowNum" style="display:inline-block;width:50px;">124</span>
</td>
<td align="left" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblAwardBasicNumber" style="display:inline-block;width:150px;"><img alt="PDF Document" border="0" height="16" hspace="2" src="https://www.dibbs.bsm.dla.mil/app_themes/images/icons/IconPdf.gif" width="16"/>SP450017D0007</span>
</td>
<td align="left" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblDeliveryOrder" style="display:inline-block;width:175px;"><img alt="-spacer-" border="0" height="16" hspace="1" src="https://www.dibbs.bsm.dla.mil/app_themes/images/common/space.gif" width="16"/>0243 <br/><img alt="-spacer-" border="0" height="16" hspace="1" src="https://www.dibbs.bsm.dla.mil/app_themes/images/common/space.gif" width="16"/><span style="font-size: 9px;">» Delivery Order Package View</span></span>
</td>
<td align="right" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblDeliveryOrderCounter" style="display:inline-block;width:50px;"> </span>
</td>
<td align="left" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblLastModPostingDate" style="display:inline-block;width:75px;">04-12-2018</span>
</td>
</tr>
this is a section of the my code that generates the html above
import requests
from bs4 import BeautifulSoup
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
import urllib3
import numpy as np
import re
from datetime import datetime, timedelta
containers = pagesoup.find_all('tr', {'class': ['BgWhite', 'BgSilver']})
for batch in containers:
for item in range(53)[2:]:
try:
// batch is the html above
print(batch)
uid = "ctl00_cph1_grdAwardSearch_ctl"+str(item)+"_lblAwardBasicNumber"
print("uid id ", uid)
awardid = batch.find_all("span", text = re.compile("_lblAwardBasicNumber"))
print("award id is")
print(awardid)
except Exception as e:
print(colorama.Fore.MAGENTA + "award error.."+ str(e) )
# print(container1)
continue
except Exception as e:
raise e
print (batch) is what produces the html above, I wanted to obtain this number SP450017D0007 from this
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblAwardBasicNumber" style="display:inline-block;width:150px;"><img alt="PDF Document" border="0" height="16" hspace="2" src="https://www.dibbs.bsm.dla.mil/app_themes/images/icons/IconPdf.gif" width="16"/>SP450017D0007</span>
but awardid is outputing none. how can i extract SP450017D0007 ?
Solution:
To get this text SP450017D0007, I used pagesoup.find('a', text=True).text.
Note:
You have the following extra lines in your code above that should be taken out
except Exception as e:
raise e
Code:
import requests
from bs4 import BeautifulSoup
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
import urllib3
import numpy as np
import re
from datetime import datetime, timedelta
data = '''
<tr class="BgSilver" style="border-color:Gray;border-width:1px;border-style:Solid;">
<td align="right" style="width:75px;" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblRowNum" style="display:inline-block;width:50px;">124</span>
</td>
<td align="left" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblAwardBasicNumber" style="display:inline-block;width:150px;"><img alt="PDF Document" border="0" height="16" hspace="2" src="https://www.dibbs.bsm.dla.mil/app_themes/images/icons/IconPdf.gif" width="16"/>SP450017D0007</span>
</td>
<td align="left" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblDeliveryOrder" style="display:inline-block;width:175px;"><img alt="-spacer-" border="0" height="16" hspace="1" src="https://www.dibbs.bsm.dla.mil/app_themes/images/common/space.gif" width="16"/>0243 <br/><img alt="-spacer-" border="0" height="16" hspace="1" src="https://www.dibbs.bsm.dla.mil/app_themes/images/common/space.gif" width="16"/><span style="font-size: 9px;">» Delivery Order Package View</span></span>
</td>
<td align="right" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblDeliveryOrderCounter" style="display:inline-block;width:50px;"> </span>
</td>
<td align="left" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblLastModPostingDate" style="display:inline-block;width:75px;">04-12-2018</span>
</td>
</tr>
'''
pagesoup = BeautifulSoup(data, 'html.parser')
containers = pagesoup.find_all('tr', {'class': ['BgWhite', 'BgSilver']})
for batch in containers:
for item in range(53)[2:]:
try:
print(batch)
uid = "ctl00_cph1_grdAwardSearch_ctl" + str(item) + "_lblAwardBasicNumber"
print("uid id ", uid)
awardid = pagesoup.find('a', text=True).text
print("award id is")
print(awardid)
dateid = pagesoup.find('span', id='ctl00_cph1_grdAwardSearch_ctl26_lblLastModPostingDate').text
print("date id is")
print(dateid)
except Exception as e:
print(colorama.Fore.MAGENTA + "award error.." + str(e))
# print(container1)
continue
Output:
<tr class="BgSilver" style="border-color:Gray;border-width:1px;border-style:Solid;">
<td align="right" style="width:75px;" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblRowNum" style="display:inline-block;width:50px;">124</span>
</td>
<td align="left" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblAwardBasicNumber" style="display:inline-block;width:150px;"><img alt="PDF Document" border="0" height="16" hspace="2" src="https://www.dibbs.bsm.dla.mil/app_themes/images/icons/IconPdf.gif" width="16"/>SP450017D0007</span>
</td>
<td align="left" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblDeliveryOrder" style="display:inline-block;width:175px;"><img alt="-spacer-" border="0" height="16" hspace="1" src="https://www.dibbs.bsm.dla.mil/app_themes/images/common/space.gif" width="16"/>0243 <br/><img alt="-spacer-" border="0" height="16" hspace="1" src="https://www.dibbs.bsm.dla.mil/app_themes/images/common/space.gif" width="16"/><span style="font-size: 9px;">» Delivery Order Package View</span></span>
</td>
<td align="right" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblDeliveryOrderCounter" style="display:inline-block;width:50px;"> </span>
</td>
<td align="left" valign="top">
<span id="ctl00_cph1_grdAwardSearch_ctl26_lblLastModPostingDate" style="display:inline-block;width:75px;">04-12-2018</span>
</td>
</tr>
uid id ctl00_cph1_grdAwardSearch_ctl2_lblAwardBasicNumber
award id is
SP450017D0007
date id is
04-12-2018

Web Scraping with Internet Explorer VBA - Get data from an unknown variable?

I am working on an Excel VBA project to scrape some specific information from a website. The view of this data on the website is as such:
Website View:
What I am looking to do is extract text based on two criteria: Name and post date. For example, I have the name Kaelan and the post date of 11/16/2016. I want to extract the amount of $365.
This is the HTML code:
<div class="familyLedgerAmountCategory" id="id_4541278">
<table>
<tr>
<td class="tdCategoryRow">
<div class="cmFloatLeft divExpandToggle expanded" id="divCategoryToggle_id_4541278"></div>
<div class="cmFloatLeft" id="divCategoryLabel_id_4541278" style="width: 430px;">
Kaelan
</div><span style="margin-left: 5px;">$ 465.00</span>
</td>
</tr>
<tbody>
<tr class="trListTableBody LedgerExisting" id="CamperFamilyLedgerRowControl_14816465">
<td class="tdCamperFamilyLedgerTableColumnDescription tdBorderTop" id="tdCamperFamilyLedgerTableColumnDescription_CamperFamilyLedgerRowControl_14816465">
<div class="divListTableBodyCell" id="tdColumnDescriptionCell">
<table class="tblListTableBodyCell">
<tr>
<td>
<div class="divListTableBodyLabel">
<a class="aColumnDescriptionCell" id="aColumnDescriptionCell_CamperFamilyLedgerRowControl_14816465" name="aColumnDescriptionCell_CamperFamilyLedgerRowControl_14816465" target="_self" title="Click to view details">2017 Super Early Bird Teen Camp - Tuition</a>
</div>
</td>
</tr>
</table>
</div>
</td>
<td class="tdCamperFamilyLedgerTableColumnPostDate tdBorderTop" id="tdCamperFamilyLedgerTableColumnPostDate_CamperFamilyLedgerRowControl_14816465">
<div class="divListTableBodyCell" id="tdColumnPostDateCell">
<table class="tblListTableBodyCell">
<tr>
<td>
<div class="divListTableBodyLabel">
11/16/2016
</div>
</td>
</tr>
</table>
</div>
</td>
<td class="tdCamperFamilyLedgerTableColumnEffective tdBorderTop" id="tdCamperFamilyLedgerTableColumnEffective_CamperFamilyLedgerRowControl_14816465">
<div class="divListTableBodyCell" id="tdColumnEffectiveCell">
<table class="tblListTableBodyCell">
<tr>
<td>
<div class="divListTableBodyLabel">
11/15/2016
</div>
</td>
</tr>
</table>
</div>
</td>
<td class="tdCamperFamilyLedgerTableColumnQty tdBorderTop" id="tdCamperFamilyLedgerTableColumnQty_CamperFamilyLedgerRowControl_14816465">
<div class="divListTableBodyCell" id="tdColumnQtyCell">
<table class="tblListTableBodyCell">
<tr>
<td>
<div class="divListTableBodyLabel">
1
</div>
</td>
</tr>
</table>
</div>
</td>
<td class="tdCamperFamilyLedgerTableColumnAmount tdBorderTop" id="tdCamperFamilyLedgerTableColumnAmount_CamperFamilyLedgerRowControl_14816465">
<div class="divListTableBodyCell" id="tdColumnAmountCell">
<table class="tblListTableBodyCell">
<tr>
<td>
<div class="divListTableBodyLabel">
$ 365.00
</div>
</td>
</tr>
</table>
</div>
</td>
<td class="tdCamperFamilyLedgerTableColumnAction tdBorderTop" id="tdCamperFamilyLedgerTableColumnAction_CamperFamilyLedgerRowControl_14816465"></td>
</tr>
</tbody>
</table>
</div>
My attempt to pull the amount is as follows:
Sub Test()
Dim ie As Object
Dim oElement As Object
Dim wsTarget As Worksheet
Dim i As Integer
Dim NewWB As Workbook
Set NewWB = ActiveWorkbook
Set wsTarget = NewWB.Sheets(1)
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate website...
Wait 6
ie.document.All.Item("txtUserName").Value = "User"
ie.document.All.Item("pswdPassword").Value = "Pass
Wait 1
ie.document.getElementById("btnLogin").Click
Wait 5
ie.navigate website...
i = 1
For Each oElement In ie.document.getElementsByClassName("cmFloatLeft")
If oElement.innerText = "Kaelan" Then
extract1 = oElement.getElementsByClassName("divListTableBodyLabel").inn‌​erText
MsgBox extract1
Else
End If
Next
However, I get an error when running the code above. Can I get the class for cmFloatLeft that I am looking for and then try to call the divLisTableBodyLabel class immediately even though that class does not fall directly below the cmFloatLeft class?
Sorry, I'm still pretty new to scraping web data.
Thanks
That structure is a bit difficult to scrape - you could try going "up" from the "Kaelan" node to the patent table, and then looping over that to extract the various pieces of information. If the post structures are consistent then that could provide one approach.
Set doc = IE.document
Set els = doc.getElementsByClassName("cmFloatLeft")
i = 1
For Each oElement In els
Debug.Print oElement.innerText
If Trim(oElement.innerText) = "Kaelan" Then
Set tbl = GetParent(oElement, "table") '<< find the parent table
If Not tbl Is Nothing Then
'loop over the parent table
For Each rw In tbl.Rows
For Each cl In rw.Cells
Debug.Print cl.innerText
Next cl
Next rw
End If
End If
Next
Function to find a named parent (by tag name):
Function GetParent(el, tagParent)
Dim rv As Object
Set rv = el
Do While Not rv.parentElement Is Nothing
Set rv = rv.parentElement
If UCase(rv.tagName) = UCase(tagParent) Then
Set GetParent = rv
Exit Function
End If
Loop
Set GetParent = Nothing
End Function

How to get HTML element with VBA in Excel?

I have an HTML in my Excel:
<form name="scform" action="online_range.aspx" autocomplete="off">
<input name="AcctNo" type="hidden" value="3949067512">
<table width="100%" border="0" cellpadding="3">
<tbody><tr>
<td width="6%"></td>
<td width="18%" align="center" valign="middle"><font color="#ffffff" face="verdana" size="1"><b>Numbers</b>
<td width="18%" align="right" valign="middle"><font color="#000000" face="verdana" size="1">**000,000,000,000.00**</font></td>
<td width="18%" align="right" valign="middle"><font color="#000000" face="verdana" size="1">**100,100,100,100.00**</font>
<td width="5%" align="center" valign="middle"><font color="#000000" face="verdana" size="1">
<!--<a href="javascript:document.scform.submit();" onmouseover="sctest('0479281963'); window.status='Account Details'; return true;">-->
<!-- INSERT BUILDMENU - APSMITH -->
<script>BuildMenu_SCPHP(0,'')</script>
<a onmouseover="showmenu(event,linksetSCPHP[0]); sctest(479281963, 'IM'); window.status='Account Details';" onmouseout="delayhidemenu()" href="javascript:document.scform.submit();">
<!-- END BUILDMENU - APSMITH -->
<img width="21" height="17" src="/images/detail2.gif" border="0"></a>
</font>
</td>
</tr>
</tbody></table></td>
<td width="3%"></td>
</tr></tbody></table></form>
I want to get the value from the td which is 000,000,000,000.00 and 100,000,000,000.00 but have no luck.
Here's what i tried:
Dim IE As New InternetExplorer
Dim Doc As HTMLDocument
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
'Navigate to Website
IE.navigate "https://secure1.bpiexpressonline.com/AuthFiles/login.aspx?URL=/direct_signin.htm"
'Loop until page load complete
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Set Doc = IE.document
Doc.getElementById("UserID").Value = Range("E23").Value
Doc.getElementById("Password").Value = Range("E24").Value
Doc.getElementById("login").submit
'Loop until page load complete
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
'Dim tb As Object, tr As Object, th As Object
Dim tb As Object
Set tb = Doc.getElementsByTagName("AcctNo")
what to do here? i tried getElementsById(td)(1) and so on, but no luck.
by using getElementsById(td)(n) there's no error but what the output is wrong, can someone help me or teach me how to parse form type.
thanks in advance
As I understand you have difficulties with constructing path to desired element?
1) Add id attribute to your table row element, it will be:
<table width="100%" border="0" cellpadding="3">
<tbody><tr id="row_1">
2) Now you can use:
Dim row As Object
Set row = Doc.getElementsByTagName("row_1");
3) Now you can retrieve get your values like this:
row.getElementsByTagName("td")(1).getElementsByTagsName("font")(1).innerText

How to use getElementsByTagName with <td> with overflow: hidden on VBA?

I am using the VBA automation to get some informations of a ticket system in my job. I am trying to get the value into the generated table but only information that doest'go to the column "A" on sheet "Plan1" is <td> which contains the overflow: hidden CSS atribute. I don't know if are them related but coincidently are the only data that don't appears. Someone can help me?
HTML code:
<div id="posicionamentoContent">
<table class="grid">
<thead>...</thead>
<tbody>
<tr id="937712" class="gridrow">
<td width="200px"> Leonardo Peixoto </td>
<td width="200px"> 23/12/2015 09:45 </td>
<td width="200px"> SIM </td>
<td width="200px"> Telhado da loja com pontos de vazamento.</td>
<td width="200px" align="center"></td>
<td width="200px" align="center"></td>
</tr>
...
...
...
The complete code: http://i.stack.imgur.com/4BsFo.png
I need to get the first 4 <td> text ( Leonardo Peixoto, 23/12/2015 09:45, SIM and Telhado da loja com pontos de vazamento.) but they are only texts which I can't get.
Obs: When I use developers tools (f12) to inspect each element, it shows me perfectly the information I need inside <td>. But when I open "source code" page to checkthe html, the code is like this:
<div id="tabPosicionamento" style="padding: 5px 0 5px 0;" class="ui-tabs-hide">
div id="posicionamentoContent"></div>
</div>
Example VBA:
Sub extractTablesData1()
'we define the essential variables
Dim IE As Object, obj As Object
Dim ticket As String
Set IE = CreateObject("InternetExplorer.Application")
ticket= InputBox("Enter the ticket code")
With IE
.Visible = False
.navigate ("https://www.example.com/details/") & ticket
While IE.ReadyState <> 4
DoEvents
Wend
ThisWorkbook.Sheets("Plan1").Range("A1:K500").ClearContents
Set data = IE.document.getElementsByClassName("thead")(0).getElementsByTagName("td")
i = 0
For Each elemCollection In data
ThisWorkbook.Sheets("Plan1").Range("A" & i + 1) = data(i).innerText
i = i + 1
Next elemCollection
End With
IE.Quit
Set IE = Nothing
....
....
End Sub
This function returns in column "A" of sheet Plan1 only <td class=info3"></td> and <td class=info4"></td> but I need <td class=info1"></td> and <td class=info2 also."></td>
I wasn't able to read the page code due the proxy blocking me, but I faced a similar issue a while ago and the solution I found out was put all data on clipboard and paste. After that I clean the data on the sheet.
Here the code I used to do that:
Set ieTable = ie.document.getElementById("ID")
If Not ieTable Is Nothing Then
Set clip = New DataObject
clip.SetText "<html>" & ieTable.outerHTML & "</html>"
clip.PutInClipboard
Sheet1.Range("A1").Select
ActiveSheet.PasteSpecial Format:="Unicode Text", link:=False, DisplayAsIcon:=False, NoHTMLFormatting:=True
End If
Considering that you need to isolate the 4 td lines, you can do that with a loop for every search.
In your sample it numerates the Data, but not using it. Also, the cell assignment should be cells(x,y).value. Here is the working code.
Sub extractTablesData1()
'we define the essential variables
Dim IE As Object, Data As Object
Dim ticket As String
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = False
.navigate ("put your data url here")
While IE.ReadyState <> 4
DoEvents
Wend
Set Data = IE.document.getElementsByTagName("tr")(0).getElementsByTagName("td")
i = 1
For Each elemCollection In Data
ActiveWorkbook.Sheets(1).Cells(1, i).Value = elemCollection.innerHTML
i = i + 1
Next elemCollection
End With
IE.Quit
Set IE = Nothing
End Sub
It doesn't bring the information what I need (lasts <td>)
<div id="posicionamentoContent">
<table class="grid">
<thead>...</thead>
<tbody>
<tr id="937712" class="gridrow">
<td width="200px"> Leonardo Peixoto </td>
<td width="200px"> 23/12/2015 09:45 </td>
<td width="200px"> SIM </td>
<td width="200px"> Telhado da loja com pontos de vazamento.</td>
<td width="200px" align="center"></td>
<td width="200px" align="center"></td>
</tr>

ASP: Either EOF or BOF is True, but I have records and they are displaying

I have a database with around 40 records, I am trying to display them in a table fashion with 3 columns but extends to the extent of the records.
All of the records display but when it gets to the very end of the record list i get what looks like another cell with the message:
ADODB.Field error '800a0bcd' Either BOF or EOF is True, or the current
record has been deleted. Requested operation requires a current
record. /products/index1.asp, line 668
in, I'll post the code below, can anyone help, I've searched the web but can't find anything. This is the only way that I could find to display the records in a 3 wide table, if there is a better way, even using CSS, it would be greatly appreciated.
<table border="0">
<tr><td class="product_title">Our Products</td></tr>
<tr><td colspan="5" height="7"></td></tr>
<% While ((Repeat1__numRows <> 3) AND (NOT products_page.EOF)) %>
<tr>
<td align="center" valign="middle">
<div class="thumbgrey" align="center">
<a href="/products/<%=(products_page.Fields.Item("" & lang & "_URL").Value)%>" title="<%=(products_page.Fields.Item("" & lang & "_title").Value)%>">
<img src="/images/product_page/<%=(products_page.Fields.Item("" & lang & "_image").Value)%>" alt="<%=(products_page.Fields.Item("" & lang & "_title").Value)%>" width="230" height="97" border="0" />
</a>
</div>
</td>
<%
Repeat1__index=Repeat1__index+1
Repeat1__numRows=Repeat1__numRows-1
products_page.MoveNext()
%>
<td align="center" valign="middle">
<div class="thumbgrey" align="center">
<a href="/products/<%=(products_page.Fields.Item("" & lang & "_URL").Value)%>" title="<%=(products_page.Fields.Item("" & lang & "_title").Value)%>">
<img src="/images/product_page/<%=(products_page.Fields.Item("" & lang & "_image").Value)%>" alt="<%=(products_page.Fields.Item("" & lang & "_title").Value)%>" width="230" height="97" border="0" />
</a>
</div>
</td>
<%
Repeat1__index=Repeat1__index+1
Repeat1__numRows=Repeat1__numRows-1
products_page.MoveNext()
%>
<td align="center" valign="middle">
<div class="thumbgrey" align="center">
<a href="/products/<%=(products_page.Fields.Item("" & lang & "_URL").Value)%>" title="<%=(products_page.Fields.Item("" & lang & "_title").Value)%>">
<img src="/images/product_page/<%=(products_page.Fields.Item("" & lang & "_image").Value)%>" alt="<%=(products_page.Fields.Item("" & lang & "_title").Value)%>" width="230" height="97" border="0" />
</a>
</div>
</td>
<%
Repeat1__index=Repeat1__index+1
Repeat1__numRows=Repeat1__numRows-1
products_page.MoveNext()
Wend
%>
</tr>
</table>
Hi you should just add this into a loop which creates a new row after every third entry, this way your not re-writing the same code 3 times and will save work when updating the code.
<table border="0">
<tr><td class="product_title">Our Products</td></tr>
<tr><td colspan="3" height="7"></td></tr>
<tr>
<%
Repeat1__index = 1
Do While (NOT products_page.EOF)
%>
<td align="center" valign="middle">
<div class="thumbgrey" align="center">
<a href="/products/<%=(products_page.Fields.Item("" & lang & "_URL").Value)%>" title="<%=(products_page.Fields.Item("" & lang & "_title").Value)%>">
<img src="/images/product_page/<%=(products_page.Fields.Item("" & lang & "_image").Value)%>" alt="<%=(products_page.Fields.Item("" & lang & "_title").Value)%>" width="230" height="97" border="0" />
</a>
</div>
</td>
<%
Repeat1__index = Repeat1__index+1
If Repeat1__index = 4 Then
Response.Write("</tr><tr>")
Repeat1__index = 1
End If
products_page.MoveNext()
Loop
If Repeat1__index > 1 Then
Response.Write("<td colspan='" & 4-Repeat1__index & "'></td>")
End If
%>
</tr>
</table>
The last if statement (If Repeat1__index > 1 Then) will tidy up the remaining columns if needed so if there is 4 entries being returned it will create 1 full row of three and the last row will only have 1 entry so adds the last table cell to even out the row.
nb. Don't forget to close products_page and set it to Nothing at the end
products_page.Close()
Set products_page = Nothing
This happens because your recordset does not contain a multiple of 3 rows.
You get the error while trying to write the second or third column.
Try like this (I omitted the actual writing of the images because it's not relevant):
<%While Not products_page.EOF%>
<tr>
<td>
<%' first image%>
<div class="thumbgrey" align="center">...</div>
<%products_page.MoveNext()%>
</td>
<td>
<%If Not products_page.EOF Then%>
<%' second image%>
<div class="thumbgrey" align="center">...</div>
<%products_page.MoveNext()%>
<%Else%>
<%End If%>
</td>
<td>
<%If Not products_page.EOF Then%>
<%' third image%>
<div class="thumbgrey" align="center">...</div>
<%products_page.MoveNext()%>
<%Else%>
<%End If%>
</td>
</tr>
<%Wend%>