I'm having a weird problem with Excel build version and a macro.
Basically, the macro reads a html page and does a queryselectorAll for a specific table ID and tag.
The code runs smooth as butter on my machine and for other users.
We all have version 16.0 (Office 365) and build 13801 (from application.version and application.build).
The user for which the macro doesn't work has the build 14326.
Does anything change between the 2 different build version ? I'm confused ...
The error is generated by the object inputBoxes that remains Null.
All the users are printing the line Debug.Print html.body.innerHTML just fine.
I've tried with a different selector html.querySelectorAll("td") but the error remains the same.
The macro goes :
Public Sub getGDIN(cp12 As String, adl As Integer, frm As UserForm)
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
Dim response As String
With http
.Open "GET", "http://me.intra-dmu-13/gdi.asp?Type=P&cp12=" & cp12, False
.send
If .Status <> 200 Then
MsgBox "Unable to load the requested page (GDI)", vbCritical
Exit Sub
End If
response = StrConv(.responseBody, vbUnicode)
End With
html.body.innerHTML = response
Dim inputBoxes As Object
Dim i As Long
Dim ctrl As Control
Set inputBoxes = html.querySelectorAll("#Table3 td")
'Set inputBoxes2 = html.querySelectorAll("td") <--- same error
Debug.Print html.body.innerHTML ' <--- print the whole html (both build version)
If inputBoxes.Length < 32 Then ' <--- ERROR NULL
MsgBox "Nothing has been found with this file number.", vbInformation
frm.btnGenerer.Enabled = False
For Each ctrl In frm.Controls
If ctrl.Name Like "tb*adl" & adl And Not ctrl.Name = "tbCp12_adl" & adl Then
ctrl = vbNullString
End If
Next ctrl
Exit Sub
End If
' ...
End Sub
In addition, here is the generated html (since it's from an Intranet) :
<TABLE id=Table3 cellSpacing=0 cellPadding=0 width="100%" border=0>
<TBODY>
<TR height=10>
<TD class=cell-login height=10><IMG border=0 alt="" src="about:../../Images/Spacer.gif" width=1 height=10></TD>
</TR>
<TR>
<TD class=cell-login>
<TABLE style="FONT: 11px Arial, Helvetica, sans-serif" cellSpacing=2 cellPadding=2 border=0>
<TBODY>
<TR>
<TD><B>CP-12</B></TD>
<TD>:</TD>
<TD>BBBB01018097<INPUT type=hidden value=BBBB01018097 name=CP12></TD>
</TR>
<TR>
<TD><B>Nom</B></TD>
<TD>:</TD>
<TD>Byer<INPUT type=hidden value=Byer name=Nom></TD>
</TR>
<TR>
<TD><B>Prenom</B></TD>
<TD>:</TD>
<TD>Joe<INPUT type=hidden value=Joe name=Prenom></TD>
</TR>
<TR>
<TD><B>Titre</B></TD>
<TD>:</TD>
<TD>monsieur<INPUT type=hidden value=monsieur name=Sexe></TD>
</TR>
<TR>
<TD><B>Date of birth</B></TD>
<TD>:</TD>
<TD>1980-01-01<INPUT type=hidden value=1980-01-01 name=DB></TD>
</TR>
<TR>
<TD><B>SIN</B></TD>
<TD>:</TD>
<TD>121-111-111<INPUT type=hidden value=121-111-111 name=SIN></TD>
</TR>
<TR>
<TD><B>Langue</B></TD>
<TD>:</TD>
<TD>Anglais<INPUT type=hidden value=Anglais name=Langue></TD>
</TR>
<TR>
<TD><B>Adresse</B></TD>
<TD>: </TD>
<TD>9999, CH DE LA COTE-SAINT-LUC APP.9
<BR>MONTREAL (QUEBEC) H1X 7B8
<INPUT type=hidden value="9999, CH DE LA COTE-SAINT-LUC APP.9" name=Adresse>
<INPUT type=hidden value=MONTREAL name=Ville>
<INPUT type=hidden value="H1X 7B8" name=CP>
</TD>
</TR>
<TR>
<TD><B>Telephone</B></TD>
<TD>:</TD>
<TD>(000) 000-0000<INPUT type=hidden value="(000) 000-0000" name=Telephone></TD>
</TR>
<!--
<TR>
<TD colspan=3 align=center><img src="../../Images/fleche-connect.gif" alt="" align="absmiddle" border="0"> <a class="lien" href="javascript:AncAdr('BYEJ09027497');">Plus...</a></TD>
</TR>
-->
</TBODY>
</TABLE>
</TD>
</TR>
</FORM>
</TBODY>
</TABLE>
The variable cp12 passed into the sub in simply a string that contains a file number to query our ministry database.
If anyone can shed some light ... or maybe suggest a different approach.
Thank you.
Related
I am trying to get a specific columns data which is in a form of table in a webpage to my excel file using VBA, I'm good at opening webpage and log-in and navigate to table area but I'm unable to get the specific columns from the table. I don't have idea to pull only a column from table with in the web page.
I use chrome for the automation. Below is the sample Html code for your reference.
<table class="Performed-Detailes-Mac">
<thead class="table-head-basic">
<tr>
<th>File</th>
<th>Name</th>
<th>Date</th>
<th>Wait 1</th>
<th>Wait 2</th>
<th>Status</th>
<th class="text-right">Machines</th>
<th class="text-right">Usage</th>
</tr>
</thead>
<tbody>
<tr class="table-row">
<td data-bind="text: id">File12</td>
<td data-bind="text: Name">JCB</td>
<td data-bind="text: Date">02/01/2022</td>
<td data-bind="text: check1">10:55 </td>
<td data-bind="text: check2">12:30</td>
<td data-bind="text: Status">Completed</td>
<td class="text-right" data-bind="text: Machines">2</td>
<td class="text-right" data-bind="text: Str">100 Percent</td>
</tr>
<tr class="table-row" data-bind="visible : $root.isEditItemsOnDetailsEnabled || $root.Items().length > 0">
<td class="text-right" data-bind="text: TotalDuration">1.75</td>
</tr>
</tbody>
</table>
For reference I have provided only one line (tr) code with header details.
From the above html I would like to extract only "Date" and "Machines" column details with all rows.
The code which I tried is provided below. I did some here and there in For loop but no luck as of now.
Sub GetTable()
Dim Dr As New Selenium.ChromeDriver
Dim hTable, hBody, hTR, hTD, tb As Object
Dim bb, tr, td As Object
Dr.Get "My Webpage Url"
Dr.Wait 2000
With Sheet1
Set hTable = Dr.FindElementsByCss(".Performed-Detailes-Mac")
For Each tb In hTable
Set hBody = tb.FindElementsByTag("tbody")
For Each bb In hBody
Set hTR = bb.FindElementsByTag("tr")
For r = 1 To hTR.Count - 2
Set hTD = hTR(r).FindElementsByTag("td")
If hTD.Count = 0 Then Set hTD = hTD(r).FindElementsByTag("th")
Lastrow = .Cells(Rows.Count, 1).End(xlUp).Row + 1
For c = 1 To hTD.Count
.Cells(Lastrow, c).Value = hTD(c - 1).Text
Next c
Next r
Next bb
Exit For
Next tb
End With
End Sub
This is my first query, My apologies if I'm wrong in anywhere.
Thanks Gold
My code below will extract a value for each hour of the day.
However, the webpage I'm scraping can change and so I want to find a way to assign the location of the to a variable so that it will know what number it is everytime. I found the current number "116" by trial and error.
I included the html structure below as well. Any suggestions?
Sub scrape()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.application")
With IE
.Visible = False
.navigate "web address"
Do Until .readyState = 4
DoEvents
Loop
.document.all.item("Login1_UserName").Value = "user"
.document.all.item("Login1_Password").Value = "pw"
.document.all.item("Login1_LoginButton").Click
Do Until .readyState = 4
DoEvents
Loop
End With
Dim htmldoc As Object
Dim r
Dim c
Dim aTable As Object
Dim TDelement As Object
Set htmldoc = IE.document
Dim td As Object
For Each td In htmldoc.getElementsByTagName("td")
On Error Resume Next
If span.Children(0).id = "ctl00_PageContent_grdReport_ctl08_Label50" Then
ThisWorkbook.Sheets("sheet1").Range("j8").Offset(r, c).Value = td.Children(1).innerText
End If
On Error GoTo 0
Next td
End Sub
HTML:
<form name="aspnetForm" id="aspnetForm" action="./MinMaxReport.aspx"
method="post">
<div>
</div>
<script type="text/javascript">...</script>
<div>
</div>
<table class="header-table">...</table>
<table class="page-area">
<tbody>
<tr>
<table id="ctl00_PageContent_Table1" border="0">...</table>
<table id="ctl00_PageContent_Table2" border="0">
<tbody>
<tr>
<td>
<div id="ctl00_PageContent_grdReport_div">
<tbody>
<tr style="background-color: beige;">
<td>...</td>
<td>
<span id="ctl00_PageContent_grdReport_ctl08_Label50">Most Restrictive
Capacity Maximum</span>
</td>
<td>
<span id="ctl00_PageContent_grdReport_ctl08_Label51">159</span>
</td>
</tr>
</tbody>
</div>
</td>
</tr>
</tbody>
</table>
</table>
</tr>
</tbody>
</table>
You could loop through all TDs and check if id= "ctl00_PageContent_grdReport_ctl08_Label50" for example:
For Each td In htmldoc.getElementsByTagName("td")
On Error Resume Next
If td.Children(0).ID = "ctl00_PageContent_grdReport_ctl08_Label50" Then
ThisWorkbook.Sheets("sheet1").Range("j8").Offset(r, c).Value = td.Children(1).innerText
End If
On Error GoTo 0
Next td
Children(0) will pick the first iHTML element contained in your table cell. On Error Resume Next is for the situation when td element has no child.
It is possible that you have more then one element with this id in your webpage. Then, you must identify table or table row first. I cannot do it because I can't see your whole HTML code.
I am working on an Excel VBA project to scrape some specific information from a website. The view of this data on the website is as such:
Website View:
What I am looking to do is extract text based on two criteria: Name and post date. For example, I have the name Kaelan and the post date of 11/16/2016. I want to extract the amount of $365.
This is the HTML code:
<div class="familyLedgerAmountCategory" id="id_4541278">
<table>
<tr>
<td class="tdCategoryRow">
<div class="cmFloatLeft divExpandToggle expanded" id="divCategoryToggle_id_4541278"></div>
<div class="cmFloatLeft" id="divCategoryLabel_id_4541278" style="width: 430px;">
Kaelan
</div><span style="margin-left: 5px;">$ 465.00</span>
</td>
</tr>
<tbody>
<tr class="trListTableBody LedgerExisting" id="CamperFamilyLedgerRowControl_14816465">
<td class="tdCamperFamilyLedgerTableColumnDescription tdBorderTop" id="tdCamperFamilyLedgerTableColumnDescription_CamperFamilyLedgerRowControl_14816465">
<div class="divListTableBodyCell" id="tdColumnDescriptionCell">
<table class="tblListTableBodyCell">
<tr>
<td>
<div class="divListTableBodyLabel">
<a class="aColumnDescriptionCell" id="aColumnDescriptionCell_CamperFamilyLedgerRowControl_14816465" name="aColumnDescriptionCell_CamperFamilyLedgerRowControl_14816465" target="_self" title="Click to view details">2017 Super Early Bird Teen Camp - Tuition</a>
</div>
</td>
</tr>
</table>
</div>
</td>
<td class="tdCamperFamilyLedgerTableColumnPostDate tdBorderTop" id="tdCamperFamilyLedgerTableColumnPostDate_CamperFamilyLedgerRowControl_14816465">
<div class="divListTableBodyCell" id="tdColumnPostDateCell">
<table class="tblListTableBodyCell">
<tr>
<td>
<div class="divListTableBodyLabel">
11/16/2016
</div>
</td>
</tr>
</table>
</div>
</td>
<td class="tdCamperFamilyLedgerTableColumnEffective tdBorderTop" id="tdCamperFamilyLedgerTableColumnEffective_CamperFamilyLedgerRowControl_14816465">
<div class="divListTableBodyCell" id="tdColumnEffectiveCell">
<table class="tblListTableBodyCell">
<tr>
<td>
<div class="divListTableBodyLabel">
11/15/2016
</div>
</td>
</tr>
</table>
</div>
</td>
<td class="tdCamperFamilyLedgerTableColumnQty tdBorderTop" id="tdCamperFamilyLedgerTableColumnQty_CamperFamilyLedgerRowControl_14816465">
<div class="divListTableBodyCell" id="tdColumnQtyCell">
<table class="tblListTableBodyCell">
<tr>
<td>
<div class="divListTableBodyLabel">
1
</div>
</td>
</tr>
</table>
</div>
</td>
<td class="tdCamperFamilyLedgerTableColumnAmount tdBorderTop" id="tdCamperFamilyLedgerTableColumnAmount_CamperFamilyLedgerRowControl_14816465">
<div class="divListTableBodyCell" id="tdColumnAmountCell">
<table class="tblListTableBodyCell">
<tr>
<td>
<div class="divListTableBodyLabel">
$ 365.00
</div>
</td>
</tr>
</table>
</div>
</td>
<td class="tdCamperFamilyLedgerTableColumnAction tdBorderTop" id="tdCamperFamilyLedgerTableColumnAction_CamperFamilyLedgerRowControl_14816465"></td>
</tr>
</tbody>
</table>
</div>
My attempt to pull the amount is as follows:
Sub Test()
Dim ie As Object
Dim oElement As Object
Dim wsTarget As Worksheet
Dim i As Integer
Dim NewWB As Workbook
Set NewWB = ActiveWorkbook
Set wsTarget = NewWB.Sheets(1)
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate website...
Wait 6
ie.document.All.Item("txtUserName").Value = "User"
ie.document.All.Item("pswdPassword").Value = "Pass
Wait 1
ie.document.getElementById("btnLogin").Click
Wait 5
ie.navigate website...
i = 1
For Each oElement In ie.document.getElementsByClassName("cmFloatLeft")
If oElement.innerText = "Kaelan" Then
extract1 = oElement.getElementsByClassName("divListTableBodyLabel").innerText
MsgBox extract1
Else
End If
Next
However, I get an error when running the code above. Can I get the class for cmFloatLeft that I am looking for and then try to call the divLisTableBodyLabel class immediately even though that class does not fall directly below the cmFloatLeft class?
Sorry, I'm still pretty new to scraping web data.
Thanks
That structure is a bit difficult to scrape - you could try going "up" from the "Kaelan" node to the patent table, and then looping over that to extract the various pieces of information. If the post structures are consistent then that could provide one approach.
Set doc = IE.document
Set els = doc.getElementsByClassName("cmFloatLeft")
i = 1
For Each oElement In els
Debug.Print oElement.innerText
If Trim(oElement.innerText) = "Kaelan" Then
Set tbl = GetParent(oElement, "table") '<< find the parent table
If Not tbl Is Nothing Then
'loop over the parent table
For Each rw In tbl.Rows
For Each cl In rw.Cells
Debug.Print cl.innerText
Next cl
Next rw
End If
End If
Next
Function to find a named parent (by tag name):
Function GetParent(el, tagParent)
Dim rv As Object
Set rv = el
Do While Not rv.parentElement Is Nothing
Set rv = rv.parentElement
If UCase(rv.tagName) = UCase(tagParent) Then
Set GetParent = rv
Exit Function
End If
Loop
Set GetParent = Nothing
End Function
I am using the VBA automation to get some informations of a ticket system in my job. I am trying to get the value into the generated table but only information that doest'go to the column "A" on sheet "Plan1" is <td> which contains the overflow: hidden CSS atribute. I don't know if are them related but coincidently are the only data that don't appears. Someone can help me?
HTML code:
<div id="posicionamentoContent">
<table class="grid">
<thead>...</thead>
<tbody>
<tr id="937712" class="gridrow">
<td width="200px"> Leonardo Peixoto </td>
<td width="200px"> 23/12/2015 09:45 </td>
<td width="200px"> SIM </td>
<td width="200px"> Telhado da loja com pontos de vazamento.</td>
<td width="200px" align="center"></td>
<td width="200px" align="center"></td>
</tr>
...
...
...
The complete code: http://i.stack.imgur.com/4BsFo.png
I need to get the first 4 <td> text ( Leonardo Peixoto, 23/12/2015 09:45, SIM and Telhado da loja com pontos de vazamento.) but they are only texts which I can't get.
Obs: When I use developers tools (f12) to inspect each element, it shows me perfectly the information I need inside <td>. But when I open "source code" page to checkthe html, the code is like this:
<div id="tabPosicionamento" style="padding: 5px 0 5px 0;" class="ui-tabs-hide">
div id="posicionamentoContent"></div>
</div>
Example VBA:
Sub extractTablesData1()
'we define the essential variables
Dim IE As Object, obj As Object
Dim ticket As String
Set IE = CreateObject("InternetExplorer.Application")
ticket= InputBox("Enter the ticket code")
With IE
.Visible = False
.navigate ("https://www.example.com/details/") & ticket
While IE.ReadyState <> 4
DoEvents
Wend
ThisWorkbook.Sheets("Plan1").Range("A1:K500").ClearContents
Set data = IE.document.getElementsByClassName("thead")(0).getElementsByTagName("td")
i = 0
For Each elemCollection In data
ThisWorkbook.Sheets("Plan1").Range("A" & i + 1) = data(i).innerText
i = i + 1
Next elemCollection
End With
IE.Quit
Set IE = Nothing
....
....
End Sub
This function returns in column "A" of sheet Plan1 only <td class=info3"></td> and <td class=info4"></td> but I need <td class=info1"></td> and <td class=info2 also."></td>
I wasn't able to read the page code due the proxy blocking me, but I faced a similar issue a while ago and the solution I found out was put all data on clipboard and paste. After that I clean the data on the sheet.
Here the code I used to do that:
Set ieTable = ie.document.getElementById("ID")
If Not ieTable Is Nothing Then
Set clip = New DataObject
clip.SetText "<html>" & ieTable.outerHTML & "</html>"
clip.PutInClipboard
Sheet1.Range("A1").Select
ActiveSheet.PasteSpecial Format:="Unicode Text", link:=False, DisplayAsIcon:=False, NoHTMLFormatting:=True
End If
Considering that you need to isolate the 4 td lines, you can do that with a loop for every search.
In your sample it numerates the Data, but not using it. Also, the cell assignment should be cells(x,y).value. Here is the working code.
Sub extractTablesData1()
'we define the essential variables
Dim IE As Object, Data As Object
Dim ticket As String
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = False
.navigate ("put your data url here")
While IE.ReadyState <> 4
DoEvents
Wend
Set Data = IE.document.getElementsByTagName("tr")(0).getElementsByTagName("td")
i = 1
For Each elemCollection In Data
ActiveWorkbook.Sheets(1).Cells(1, i).Value = elemCollection.innerHTML
i = i + 1
Next elemCollection
End With
IE.Quit
Set IE = Nothing
End Sub
It doesn't bring the information what I need (lasts <td>)
<div id="posicionamentoContent">
<table class="grid">
<thead>...</thead>
<tbody>
<tr id="937712" class="gridrow">
<td width="200px"> Leonardo Peixoto </td>
<td width="200px"> 23/12/2015 09:45 </td>
<td width="200px"> SIM </td>
<td width="200px"> Telhado da loja com pontos de vazamento.</td>
<td width="200px" align="center"></td>
<td width="200px" align="center"></td>
</tr>
I'm looking for an applescript routine or subroutine to find this HTML tag string:
<td width="487">
in this HTML code:
<h1><span id="profile-name-94461" >Jan Schlatter</span></h1>
</span>
<table width="100%" border="0" cellspacing="0" cellpadding="0" id="profile-table">
<tr>
<th width="163" scope="col">Introduction</th>
<td width="487">Education :
<br />Management and support on responsibilities in finances and accounting.</td>
</tr>
<tr>
<th>Role</th>
<td>
<p>Portfolio Management</p><p>Senior Management</p> </td>
</tr>
<tr>
<th>Organisation Type</th>
<td>
<p>Family Office</p> </td>
</tr>
<tr>
<th>Email</th>
<td><a href="mailto:jan.schlatter#bohnetschlatter.ch" title="jan.schlatter#bohnetschlatter.ch" >jan.schlatter#bohnetschlatter.ch</a></td>
</tr>
<tr>
<th>Website</th>
<td><a href="http://bohnetschlatter.ch" target="_new" title="http://bohnetschlatter.ch" >http://bohnetschlatter.ch</a></td>
</tr>
<tr>
<th>Phone</th>
<td>+41 41 727 61 61</td>
</tr>
<tr>
<th>Fax</th>
<td>+41 41 727 61 62</td>
</tr>
<tr>
<th>Mailing Address</th>
<td>Gartenstrasse 2<br>Postfach 42</td>
</tr>
<tr>
<th>City</th>
<td>Zurich</td>
</tr>
<tr>
<th>State</th>
<td></td>
</tr>
<tr>
<th>Country</th>
<td>Switzerland</td>
</tr>
<tr>
<th class="lastrow" >Zip/ Postal Code</th>
<td class="lastrow" >6301</td>
</tr>
</table>
Because the HTML tag is not always in every HTML file that I would like to process, I would like it to return a boolean value to be used in an if, then, else statement, to then complete an action if the value returns "true".
The applescript that I've started with is
set intoTag to "<td width=" & quote & "487" & quote & ">"
on stripLastWordBeforeLogoEndTag(theText)
set text item delimiters to introTag
set a to text items of theText
set b to item 1 of a
set text item delimiters to space
set item 1 of a to (text items 1 thru -2 of b) as text
set text item delimiters to "</Logo>"
set fixedText to a as text
set text item delimiters to ""
return fixedText
if infoTag = fixedText then set bool to true
else set bool to false
end if
if true then (do action[[set extractText_INTRODUCTION to extractBetween(extractText, "<td width=" & quote & "487" & quote & ">", "</td>")]])
else (do not do action)
end if
I would rather not use a shell script because I have almost no knowledge in how to edit shell scripts. Text delimiters would be the best solution in my point of view, although any answers are welcome. Thanks
The simplest is to use is in
set introTag to "<td width=\"487\">"
set existTag to introTag is in theText
if existTag then
-- true
else
-- false
end if
If you don't want to use a shell script, you could use the offset command from Standard Additions, which will search for one piece of text inside another. If the text is not found, the result will be 0, which can be used in your if statement, for example:
set theText to "...<table width=\"100%\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\" id=\"profile-table\">
<tr>
<th width=\"163\" scope=\"col\">Introduction</th>
<td width=\"487\">Education :..."
set here to offset of "<td width=\"487\">" in theText
if here is not 0 then
log "text found at " & here -- do your stuff
end if