vb.net Stringbuilder to create HTML file - html

I am using a StringBuilder to create a HTML file from my DataTable. The file is created but when I open it in the webbrowser I have to scroll all the way down to see the table. In other words there is a big blank page first with nothing at all.
Public Function ConvertToHtmlFile(ByVal myTable As DataTable) As String
Dim myBuilder As New StringBuilder
If myTable Is Nothing Then
Throw New System.ArgumentNullException("myTable")
Else
'Open tags and write the top portion.
myBuilder.Append("<html xmlns='http://www.w3.org/1999/xhtml'>")
myBuilder.Append("<head>")
myBuilder.Append("<title>")
myBuilder.Append("Page-")
myBuilder.Append("CLAS Archive")
myBuilder.Append("</title>")
myBuilder.Append("</head>")
myBuilder.Append("<body>")
myBuilder.Append("<br /><table border='1px' cellpadding='5' cellspacing='0' ")
myBuilder.Append("style='border: solid 1px Silver; font-size: x-small;'>")
myBuilder.Append("<br /><tr align='left' valign='top'>")
For Each myColumn As DataColumn In myTable.Columns
myBuilder.Append("<br /><td align='left' valign='top' style='border: solid 1px blue;'>")
myBuilder.Append(myColumn.ColumnName)
myBuilder.Append("</td><p>")
Next
myBuilder.Append("</tr><p>")
'Add the data rows.
For Each myRow As DataRow In myTable.Rows
myBuilder.Append("<br /><tr align='left' valign='top'>")
For Each myColumn As DataColumn In myTable.Columns
myBuilder.Append("<br /><td align='left' valign='top' style='border: solid 1px blue;'>")
myBuilder.Append(myRow(myColumn.ColumnName).ToString())
myBuilder.Append("</td><p>")
Next
Next
myBuilder.Append("</tr><p>")
End If
'Close tags.
myBuilder.Append("</table><p>")
myBuilder.Append("</body>")
myBuilder.Append("</html>")
'Get the string for return. myHtmlFile = myBuilder.ToString();
Dim myHtmlFile As String = myBuilder.ToString()
Return myHtmlFile
End Function

A sample HTML table (from the MDN docs):
<table>
<thead>
<tr>
<th colspan="2">The table header</th>
</tr>
</thead>
<tbody>
<tr>
<td>The table body</td>
<td>with two columns</td>
</tr>
</tbody>
</table>
If you study the "permitted content" within the various table elements (also dive deeper, for instance <tr>), there cannot be a <br> or <p> between <table>, <tr> or <td> elements, only table-related elements are allowed.
A <tr> is already a row in the table, so you don't need breaks or paragraphs to move it to a separate row.

FWIW I find using XElement to build Html pages easier than using strings.
Dim myHtml As XElement
'XML literals
' https://learn.microsoft.com/en-us/dotnet/standard/linq/xml-literals
'note lang and xmlns missing. see below
myHtml = <html>
<head>
<meta charset="utf-8"/>
<title>Put title here</title>
</head>
<body>
<table border="1px" cellpadding="5" cellspacing="0" style="border: solid 1px Silver; font-size: x-small;">
<thead>
<tr>
<th colspan="4">The table header</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
</body>
</html>
'test. five rows, four columns
For r As Integer = 1 To 5
Dim tr As XElement = <tr align="left" valign="top"></tr>
For c As Integer = 1 To 4
Dim td As XElement
' XML embedded expressions
' https://learn.microsoft.com/en-us/dotnet/standard/linq/xml-literals#use-embedded-expressions-to-create-content
td = <td align="left" valign="top"><%= "Row:" & r.ToString & " Col:" & c.ToString %></td>
tr.Add(td)
Next
myHtml.<body>.<table>.<tbody>.LastOrDefault.Add(tr)
Next
Dim s As String = myHtml.ToString
'add lang and xmlns to string!!
s = s.Replace("<html>", "<html lang='en' xmlns='http://www.w3.org/1999/xhtml'>")

Related

VBA: Get specific column from webpage using seleniumbasic

I am trying to get a specific columns data which is in a form of table in a webpage to my excel file using VBA, I'm good at opening webpage and log-in and navigate to table area but I'm unable to get the specific columns from the table. I don't have idea to pull only a column from table with in the web page.
I use chrome for the automation. Below is the sample Html code for your reference.
<table class="Performed-Detailes-Mac">
<thead class="table-head-basic">
<tr>
<th>File</th>
<th>Name</th>
<th>Date</th>
<th>Wait 1</th>
<th>Wait 2</th>
<th>Status</th>
<th class="text-right">Machines</th>
<th class="text-right">Usage</th>
</tr>
</thead>
<tbody>
<tr class="table-row">
<td data-bind="text: id">File12</td>
<td data-bind="text: Name">JCB</td>
<td data-bind="text: Date">02/01/2022</td>
<td data-bind="text: check1">10:55 </td>
<td data-bind="text: check2">12:30</td>
<td data-bind="text: Status">Completed</td>
<td class="text-right" data-bind="text: Machines">2</td>
<td class="text-right" data-bind="text: Str">100 Percent</td>
</tr>
<tr class="table-row" data-bind="visible : $root.isEditItemsOnDetailsEnabled || $root.Items().length > 0">
<td class="text-right" data-bind="text: TotalDuration">1.75</td>
</tr>
</tbody>
</table>
For reference I have provided only one line (tr) code with header details.
From the above html I would like to extract only "Date" and "Machines" column details with all rows.
The code which I tried is provided below. I did some here and there in For loop but no luck as of now.
Sub GetTable()
Dim Dr As New Selenium.ChromeDriver
Dim hTable, hBody, hTR, hTD, tb As Object
Dim bb, tr, td As Object
Dr.Get "My Webpage Url"
Dr.Wait 2000
With Sheet1
Set hTable = Dr.FindElementsByCss(".Performed-Detailes-Mac")
For Each tb In hTable
Set hBody = tb.FindElementsByTag("tbody")
For Each bb In hBody
Set hTR = bb.FindElementsByTag("tr")
For r = 1 To hTR.Count - 2
Set hTD = hTR(r).FindElementsByTag("td")
If hTD.Count = 0 Then Set hTD = hTD(r).FindElementsByTag("th")
Lastrow = .Cells(Rows.Count, 1).End(xlUp).Row + 1
For c = 1 To hTD.Count
.Cells(Lastrow, c).Value = hTD(c - 1).Text
Next c
Next r
Next bb
Exit For
Next tb
End With
End Sub
This is my first query, My apologies if I'm wrong in anywhere.
Thanks Gold

how to add class in table in python using Dominate library

I have created Table using Dominate Library but Now I want to change my table class. can someone help me to do that ?
doc1 = dominate.document(title='Dominate your HTML')
with doc1:
with div():
attr(cls='body')
h1('Survey Report : Survey Report')
oc = dominate.document(title="whatever")
with doc1:
tags.style(".calendar_table{width:880px;}")
tags.style("body{font-family:Helvetica}")
tags.style("h1{font-size:x-large}")
tags.style("h2{font-size:large}")
tags.style("table{border-collapse:collapse}")
tags.style("th{font-size:small;border:1px solid gray;padding:4px;background-color:#DDD}")
tags.style("td{font-size:small;text-align:center;border:1px solid gray;padding:4px}")
with tags.table():
with tags.thead():
tags.th("Nominee", style = "color:#ffffff;background-color:#6A75F2")
tags.th("counts", style = "color:#ffffff;background-color:#6A75F2")
with tags.tbody():
for i in range(0,len(nom)):
with tags.tr(): #Row 1
tags.td(nom[i], style = "font-size:small;text-align:center;padding:4px")
if int(count_nom[i]) > 1:
tags.td(count_nom[i], style = "font-size:small;text-align:center;padding:4px;background-color:#F4D8D2")
else:
tags.td(count_nom[i], style = "font-size:small;text-align:center;padding:4px")
with tags.tr(): #Row 1
tags.td(b("Grand Total"), style = "font-size:small;text-align:center;padding:4px")
tags.td(b(sum(count_nom)), style = "font-size:small;text-align:center;padding:4px")
with open('/root/survey/'+'survey'+'.html', 'w') as f:
f.write(doc1.render())
with this I am able to create Table in HTML
<div class="body">
<h1>Survey Report</h1>
</div>
<style>.calendar_table{width:880px;}</style>
<style>body{font-family:Helvetica}</style>
<style>h1{font-size:x-large}</style>
<style>h2{font-size:large}</style>
<style>table{border-collapse:collapse}</style>
<style>th{font-size:small;border:1px solid gray;padding:4px;background-color:#DDD}</style>
<style>td{font-size:small;text-align:center;border:1px solid gray;padding:4px}</style>
<table>
<thead>
<th style="color:#ffffff;background-color:#6A75F2">Nominee</th>
<th style="color:#ffffff;background-color:#6A75F2">counts</th>
</thead>
<tbody>
<tr>
<td style="font-size:small;text-align:center;padding:4px">Deepesh Ahuja</td>
<td style="font-size:small;text-align:center;padding:4px">1</td>
</tr>
<tr>
<td style="font-size:small;text-align:center;padding:4px">Sabyasachi Mallick</td>
<td style="font-size:small;text-align:center;padding:4px">1</td>
</tr>
<tr>
<td style="font-size:small;text-align:center;padding:4px">Raju Singh</td>
<td style="font-size:small;text-align:center;padding:4px">1</td>
</tr>
<tr>
<td style="font-size:small;text-align:center;padding:4px">Abarna Ravi</td>
<td style="font-size:small;text-align:center;padding:4px;background-color:#F4D8D2">2</td>
</tr>
<tr>
<td style="font-size:small;text-align:center;padding:4px">
<b>Grand Total</b>
</td>
<td style="font-size:small;text-align:center;padding:4px">
<b>5</b>
</td>
</tr>
</tbody>
</table><br><br><br>
Now How I will set table class in python code like
<table class='calender_tabe'>
Can someone help me to set class of table and other tag using python dominate library?
Using the example syntax from github's documentation
from dominate.tags import *
testTable = table(border = 1)
print testTable
which will return:
<table border="1"></table>
with the print statement. However since you can't use the word "class" to refer to the html attribute (class being a python-reserved word) you have to go about it indirectly:
testTable.set_attribute('class','my_class_name')
Adding the above to the original instance of testTable will result in:
<table border="1" class="my_class_name"></table>

Get content from table by webbrowser?

I have the following table.
<table class="table1">
<tbody>
<tr>
<th></th>
<th>SEQ</th>
<th>LOGIN</th>
<th>WHATSAPP</th>
<th>E-MAIL</th>
</tr>
<tr>
<td>1</td>
<td></td>
<td>name</td>
<td>99 999999999</td>
<td>xxxxxxx#hotmail.com</td>
</tr>
</tbody>
</table>
I would like to know how to get content from each TR to write to Access Database.
Because until the moment I only managed to get to this code.
Dim PageElement As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("table")
For Each CurElement As HtmlElement In PageElement
If (CurElement.GetAttribute("className") = "table1") Then
TextBox1.Text = CurElement.InnerHtml
End If
Next

Ignoring tags in XPATH using html agility pack

I am using the following code to parse html tables from an html file into a dataset:
Public Function GetDataSet(html As String) As DataSet
Dim ds As DataSet = New DataSet
Dim htmldoc As New HtmlAgilityPack.HtmlDocument
htmldoc.LoadHtml(html)
Dim tables = htmldoc.DocumentNode.SelectNodes("//table/tr") _
.GroupBy(Function(x) x.ParentNode)
For i As Integer = 0 To tables.Count - 1
Dim rows = tables(i).ToList()
ds.Tables.Add(String.Format("Table {0}", i))
Dim headers = rows(0).Elements("th").Select(Function(x) x.InnerText.Trim).ToList()
For Each Hr In headers
ds.Tables(i).Columns.Add(Hr)
Next
For j As Integer = 1 To rows.Count - 1
Dim row = rows(j)
Dim dr = row.Elements("td").Select(Function(x) x.InnerText.Trim).ToArray()
ds.Tables(i).Rows.Add(dr)
Next
Next
Return ds
End Function
and it works fine. But When There are a Tag placed inside the <Table> Tag before <tr> tag the table is not parsed
Simple Example:
<html>
<head><title>Test</title></head>
<body>
<div>Contents:</div>
<table>
<tr>
<th>Column1</th> <th>Column2</th>
</tr>
<tr>
<td>1</td> <td>11</td>
</tr>
<tr>
<td>2</td> <td>22</td>
</tr>
</table>
<table>
<tbody>
<tr>
<th>Column1</th> <th>Column2</th> <th>Column3</th>
</tr>
<tr>
<td>a</td> <td>aa</td> <td>aaa</td>
</tr>
<tr>
<td>b</td> <td>bb</td> <td>bbb</td>
</tr>
</tbody>
</table>
<table>
<div>
<tr>
<th>Column1</th> <th>Column2</th> <th>Column3</th>
</tr>
<tr>
<td>a</td> <td>aa</td> <td>aaa</td>
</tr>
<tr>
<td>b</td> <td>bb</td> <td>bbb</td>
</tr>
</div>
</table>
</body>
</html>
In This Example only the first table is parsed.
My question is how to ignore any tag between <Table> tag and <tr> tag in the following line of code:
Dim tables = htmldoc.DocumentNode.SelectNodes("//table/tr") _
.GroupBy(Function(x) x.ParentNode)
and all the tables will be parsed.
You can use // to select from all descendants:
Dim rows = htmldoc.DocumentNode.SelectNodes("//table//tr");
Also based on your requirement, it seems it's better to group the result based on the first ancestor table, because the parent of tr may be a tbody or thead and you need to group rows in tables:
Dim tables = htmldoc.DocumentNode.SelectNodes("//table//tr") _
.GroupBy(Function(x) x.Ancestors("table").First())

Find HTML tag string in text using applescript and return value true or false

I'm looking for an applescript routine or subroutine to find this HTML tag string:
<td width="487">
in this HTML code:
<h1><span id="profile-name-94461" >Jan Schlatter</span></h1>
</span>
<table width="100%" border="0" cellspacing="0" cellpadding="0" id="profile-table">
<tr>
<th width="163" scope="col">Introduction</th>
<td width="487">Education :
<br />Management and support on responsibilities in finances and accounting.</td>
</tr>
<tr>
<th>Role</th>
<td>
<p>Portfolio Management</p><p>Senior Management</p> </td>
</tr>
<tr>
<th>Organisation Type</th>
<td>
<p>Family Office</p> </td>
</tr>
<tr>
<th>Email</th>
<td><a href="mailto:jan.schlatter#bohnetschlatter.ch" title="jan.schlatter#bohnetschlatter.ch" >jan.schlatter#bohnetschlatter.ch</a></td>
</tr>
<tr>
<th>Website</th>
<td><a href="http://bohnetschlatter.ch" target="_new" title="http://bohnetschlatter.ch" >http://bohnetschlatter.ch</a></td>
</tr>
<tr>
<th>Phone</th>
<td>+41 41 727 61 61</td>
</tr>
<tr>
<th>Fax</th>
<td>+41 41 727 61 62</td>
</tr>
<tr>
<th>Mailing Address</th>
<td>Gartenstrasse 2<br>Postfach 42</td>
</tr>
<tr>
<th>City</th>
<td>Zurich</td>
</tr>
<tr>
<th>State</th>
<td></td>
</tr>
<tr>
<th>Country</th>
<td>Switzerland</td>
</tr>
<tr>
<th class="lastrow" >Zip/ Postal Code</th>
<td class="lastrow" >6301</td>
</tr>
</table>
Because the HTML tag is not always in every HTML file that I would like to process, I would like it to return a boolean value to be used in an if, then, else statement, to then complete an action if the value returns "true".
The applescript that I've started with is
set intoTag to "<td width=" & quote & "487" & quote & ">"
on stripLastWordBeforeLogoEndTag(theText)
set text item delimiters to introTag
set a to text items of theText
set b to item 1 of a
set text item delimiters to space
set item 1 of a to (text items 1 thru -2 of b) as text
set text item delimiters to "</Logo>"
set fixedText to a as text
set text item delimiters to ""
return fixedText
if infoTag = fixedText then set bool to true
else set bool to false
end if
if true then (do action[[set extractText_INTRODUCTION to extractBetween(extractText, "<td width=" & quote & "487" & quote & ">", "</td>")]])
else (do not do action)
end if
I would rather not use a shell script because I have almost no knowledge in how to edit shell scripts. Text delimiters would be the best solution in my point of view, although any answers are welcome. Thanks
The simplest is to use is in
set introTag to "<td width=\"487\">"
set existTag to introTag is in theText
if existTag then
-- true
else
-- false
end if
If you don't want to use a shell script, you could use the offset command from Standard Additions, which will search for one piece of text inside another. If the text is not found, the result will be 0, which can be used in your if statement, for example:
set theText to "...<table width=\"100%\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\" id=\"profile-table\">
<tr>
<th width=\"163\" scope=\"col\">Introduction</th>
<td width=\"487\">Education :..."
set here to offset of "<td width=\"487\">" in theText
if here is not 0 then
log "text found at " & here -- do your stuff
end if