Webscraping In powershell monitor page

Webscraping In powershell monitor page - html

I want to be able to monitor my printers status web page and have a script email me when the ink level falls below 25%. Im pretty sure this can be done in Powershell, but Im at a loss on how to do it.
This is the page HTML in question:
<h2>Supply Status</h2>
<table class="matrix">
<thead>
<tr>
<th>Supply Information</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td>Black Toner</td>
<td>End of life</td>
</tr>
<tr>
<td>Cyan Toner</td>
<td>Under 25%</td>
</tr>
<tr>
<td>Magenta Toner</td>
<td>Under 25%</td>
</tr>
<tr>
<td>Yellow Toner</td>
<td>Under 25%</td>
</tr>
</tbody>
</table>
<p>
Thanks.
Adam

Building on #Joey's answer, give this a whirl with the HTML Agility Pack.
$html = new-object HtmlAgilityPack.HtmlDocument
$result = $html.Load("http://full/path/to/file.htm")
$colors = $html.DocumentNode.SelectNodes("//table[#class='matrix']//tbody/tr")
$result = $colors | % {
$color = $_.SelectSingleNode("td[1]").InnerText
$level = $_.SelectSingleNode("td[2]").InnerText
new-object PsObject -Property #{ Color = $color; Level = $level; } |
Select Color,Level
}
$result | Sort Level | ft -a
This assumes you already have the HTML Agility Pack loaded into PowerShell. Mine is loaded in my profile as:
[System.Reflection.Assembly]::LoadFrom(
(join-path $profileDirectory HtmlAgilityPack)
+ "\HtmlAgilityPack.dll" ) | Out-Null
Using the example HTML provided, your output looks like:
At this point, you have the output and can email it out.

The easiest way would probably be the HTML Agility Pack which you can import in PowerShell. Lee Holmes has a short article demonstrating a simple example with it. Essentially you're using an XML-like API to access the HTML DOM.

Related

Insert Data from CSV to HTML table using powershell

I am new to powershell and I want to insert the data from csv to html table which is I create separately. This is my csv
Sitename EmailAddress
Test example#gmail.com
Asking for help of how should I insert this data to my html table and then if I add data in csv it also automatically added on HTML table.
test.ps1 script
$kfxteam = Get-Content ('.\template\teamnotif.html')
$notifteam = '' #result html
$teamlist = Import-Csv ".\list\teamlist.csv" | select 'SiteName','EmailAddress'
For($a=0; $a -lt $kfxteam.Length; $a++) {
# If the "<table class=content>" matches on what written on $kfxteam it will show the result`
if($kfxteam -match "<table class=content >"){
# should be replacing the data came from csv to html and also adding new row
write-host $teamlist[$a].SiteName
}
}
html format
<<table class=content >
<td class=c1 nowrap>
Remote Sitenames
</td>
<td class=c1 nowrap >
Users Email Address
</td>
</tr>
<tr>
<td class=c1 nowrap>
usitename
</td>
<td class=c1 nowrap>
[uemail]
</td>
</tr>
</table>
The output html table should be
Remote Sitenames Email Address
Test example#gmail.com

If I were you, I'd change the HTML template file regarding the table to become something like this:
<table class=content>
<tr>
<td class=c1 nowrap>Remote Sitenames</td>
<td class=c1 nowrap>Users Email Address</td>
</tr>
##TABLEROWSHERE##
</table>
Now, you have a placeholder which you can replace with the table rows you create using the CSV file like:
# create a template for each of the rows to insert
# with two placeholders to fill in using the -f Format operator
$row = #"
<tr>
<td class=c1 nowrap>{0}</td>
<td class=c1 nowrap>{1}</td>
</tr>
"#
# import the csv, loop over the records and use the $row template to create the table rows
$tableRows = Import-Csv -Path '.\list\teamlist.csv' | ForEach-Object {
$row -f $_.Sitename, $_.EmailAddress
}
# then combine it all in the html
$result = (Get-Content -Path '.\template\teamnotif.html' -Raw) -replace '##TABLEROWSHERE##', ($tableRows -join [Environment]::NewLine)
# save the completed HTML
$result | Set-Content -Path '.\list\teamlist.html'

How can i add new row to com HTML object powershell

I have a table where i'm trying to add more rows with powershell then export it as a new HTML file.
Here's the body of the HTML i'm trying to add rows to.
<BODY>
<TABLE style="WIDTH: 100%" cellPadding=5>
<TBODY>
<TR>
<TH>Bruger</TH>
<TH>Windows</TH>
<TH>Installations dato</TH>
<TH>Model</TH>
<TH>Sidst slukket</TH></TR>
<TR>
<TD>Users name</TD>
<TD>Windows 10 Pro</TD>
<TD>23-01-2020</TD>
<TD>ThinkPad</TD>
<TD>7 dage</TD></TR></TBODY></TABLE>
<TABLE>
<TBODY></TBODY></TABLE></BODY>
I figured i'd need to change the inner html of an object but it's just throwing an error.
Here's my code
$src = [IO.File]::ReadAllText($outPath)
$doc = New-Object -com "HTMLFILE"
$doc.IHTMLDocument2_write($src)
$elm = $doc.getElementsByTagName('tr')[0]
$elm.innerHTML = "<TR>New row!</TR>"
When I check the inner html variable I get the HTML output that I would expect, so it's grabbing the correct object, but I can't assign anything to it for whatever reason.
Here's the error
Exception from HRESULT: 0x800A0258
At line:1 char:1
+ $elm.innerHTML = "<TH>User</TH>"
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], COMException
+ FullyQualifiedErrorId : System.Runtime.InteropServices.COMException

Instead of modifying the innerHTML contents of an existing <tr> element, you'll want to:
Create a new <tr> element
Create any requisite <td> child element(s)
Append <td> element(s) to your new row
Append the new row to the existing <tbody>
Try something like this:
$html = #'
<BODY>
<TABLE style="WIDTH: 100%" cellPadding=5>
<TBODY>
<TR>
<TH>Bruger</TH>
<TH>Windows</TH>
<TH>Installations dato</TH>
<TH>Model</TH>
<TH>Sidst slukket</TH></TR>
<TR>
<TD>Users name</TD>
<TD>Windows 10 Pro</TD>
<TD>23-01-2020</TD>
<TD>ThinkPad</TD>
<TD>7 dage</TD></TR></TBODY></TABLE>
<TABLE>
<TBODY></TBODY></TABLE></BODY>
'#
# Create HTML document object
$doc = New-Object -ComObject HTMLFile
# Load existing HTML
$doc.IHTMLDocument2_write($html)
# Create new row element
$newRow = $doc.createElement('tr')
# Create new cell element
$newCell = $doc.createElement('td')
$newCell.innerHTML = "New row!"
$newCell.colSpan = 5
# Append cell to row
$newRow.appendChild($newCell)
# Append row to table body
$tbody = $doc.getElementsByTagName('tbody')[0]
$tbody.appendChild($newRow)
# Inspect resulting HTML
$tbody.outerHtml
You should expect to see the new row appended to the table body:
<TBODY><TR>
<TH>Bruger</TH>
<TH>Windows</TH>
<TH>Installations dato</TH>
<TH>Model</TH>
<TH>Sidst slukket</TH></TR>
<TR>
<TD>Users name</TD>
<TD>Windows 10 Pro</TD>
<TD>23-01-2020</TD>
<TD>ThinkPad</TD>
<TD>7 dage</TD></TR>
<TR>
<TD colSpan=5>New row!</TD></TR></TBODY>
You could create a nice little helper function for adding new rows:
function New-HTMLFileTableRow {
param(
[Parameter(Mandatory)]
[mshtml.HTMLDocumentClass]$Document,
[Parameter(Mandatory)]
[string[]]$Property,
[Parameter(Mandatory, ValueFromPipeline)]
$InputObject
)
process {
$newRow = $Document.createElement('tr')
foreach($propName in $Property){
$newCell = $Document.createElement('td')
$newCell.innerHtml = $InputObject.$propName
[void]$newRow.appendChild($newCell)
}
return $newRow
}
}
Then use like:
Import-Csv .\path\to\user-os-list.csv |New-HTMLFileTableRow -Property User,OSVersion,InstallDate,Model,LastActive -Document $doc |ForEach-Object {
[void]$tbody.appendChild($_)
}

TreeBuilder Get embedded nodes

Basically, I need to get the names and emails from all of these people in the HTML code.
<thead>
<tr>
<th scope="col" class="rgHeader" style="text-align:center;">Name</th><th scope="col" class="rgHeader" style="text-align:center;">Email Address</th><th scope="col" class="rgHeader" style="text-align:center;">School Phone</th>
</tr>
</thead><tbody>
<tr class="rgRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__0">
<td>
Michael Bowen
</td><td>mbowen#cpcisd.net</td><td>903-488-3671 ext3200</td>
</tr><tr class="rgAltRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__1">
<td>
Christian Calixto
</td><td>calixtoc#cpcisd.net</td><td>903-488-3671 x 3430</td>
</tr><tr class="rgRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__2">
<td>
Rachel Claxton
</td><td>claxtonr#cpcisd.net</td><td>903-488-3671 x 3450</td>
</tr>
</tbody>
</table><input id="ctl00_ContentPlaceHolder1_rg_People_ClientState" name="ctl00_ContentPlaceHolder1_rg_People_ClientState" type="hidden" autocomplete="off"> </div>
<br>
I know how to use treebuilder with the nodes and such, and I'm using this code in some of my script.
my ($file) = #_;
my $html = path($file)-> slurp;
my $tree = HTML::TreeBuilder->new_from_content($html);
my #nodes = $tree->look_down(_tag => 'input');
my $val;
foreach my $node (#nodes) {
$val = $node->look_down('name', qr/\$txt_Website/)->attr('value');
}
return $val;
I was going to use the same code for this function, but I realized that I don't have much to search for, since the <td> tag is in so many other places in the script. I'm sure there's a better way to approach this problem, but I can't seem to find it.
LINK TO HTML CODE: http://pastebin.com/qLwu80ZW
MY CODE: https://pastebin.com/wGb0eXmM
Note: I did look on google as much as possible, but I'm not quite sure what I should search for.

The table element that encloses the data you need has a unique class rgMasterTable so you can search for that in look_down
I've written this to demonstrate. It pulls the HTML directly from your pastebin
use strict;
use warnings 'all';
use LWP::Simple 'get';
use HTML::TreeBuilder;
use constant URL => 'http://pastebin.com/raw/qLwu80ZW';
my $tree = HTML::TreeBuilder->new_from_content(get URL);
my ($table) = $tree->look_down(_tag => 'table', class => 'rgMasterTable');
for my $tr ( $table->look_down(_tag => 'tr') ) {
next unless my #td = $tr->look_down(_tag => 'td');
my ($name, $email) = map { $_->as_trimmed_text } #td[0,1];
printf "%-17s %s\n", $name, $email;
}
output
Michael Bowen mbowen#cpcisd.net
Christian Calixto calixtoc#cpcisd.net
Rachel Claxton claxtonr#cpcisd.net

How to extract certain data from HTML using RegEx?

I've got the following code:
<tr class="even">
<td>
Title1
</td>
<td>
Name1
</td>
<td>
Email1
</td>
<td>
Postcode1
</td>
I want to use RegEx in to output the data between the tags like so:
Title1
Name1
Email1
Postcode1
Title2
Name2
Email2
Postcode2
...

You shouldn't use a regex to parse html, use an HTML parser instead.
Anyway, if you really want a regex you can use this one:
>\s+<|>\s*(.*?)\s*<
Working demo
Match information:
MATCH 1
1. [51-57] `Title1`
MATCH 2
1. [109-114] `Name1`
MATCH 3
1. [166-172] `Email1`
MATCH 4
1. [224-233] `Postcode1`

This should get rid of everything between the tags, and output the rest space separated:
$text =
#'
<tr class="even">
<td>
Title1
</td>
<td>
Name1
</td>
<td>
Email1
</td>
<td>
Postcode1
</td>
'#
$text -split '\s*<.+?>\s*' -match '\S' -as [string]
Title1 Name1 Email1 Postcode1

Don't use a regex. HTML isn't a regular language, so it can't be properly parsed with a regex. It will succeed most of the time, but other times will fail. Spectacularly.
Use the Internet Explorer COM object to read your HTML from a file:
$ie = new-object -com "InternetExplorer.Application"
$ie.visible = $false
$ie.navigate("F:\BuildOutput\rt.html")
$document = $ie.Document
# This will return all the tables
$document.getElementsByTagName('table')
# This will return a table with a specific ID
$document.getElementById('employees')
Here's the MSDN reference for the document class.

How to read the values of a table in HTML file and Store them in Perl?

I read many questions and many answers but I couldn't find a straight answer to my question. All the answers were either very general or different from what I want to do. I got so far that i need to use HTML::TableExtract or HTML::TreeBuilder::XPath but I couldn't really use them to store the values. I could somehow get table row values and show them with Dumper.
Something like this:
foreach my $ts ($tree->table_states) {
foreach my $row ($ts->rows) {
push (#fir , (Dumper $row));
} }
print #sec;
But this is not really doing what I'm looking for. I will add the structure of the HTML table that I want to store the values:
<table><caption><b>Table 1 </b>bla bla bla</caption>
<tbody>
<tr>
<th ><p>Foo</p>
</th>
<td ><p>Bar</p>
</td>
</tr>
<tr>
<th ><p>Foo-1</p>
</th>
<td ><p>Bar-1</p>
</td>
</tr>
<tr>
<th ><p>Formula</p>
</th>
<td><p>Formula1-1</p>
<p>Formula1-2</p>
<p>Formula1-3</p>
<p>Formula1-4</p>
<p>Formula1-5</p>
</td>
</tr>
<tr>
<th><p>Foo-2</p>
</th>
<td ><p>Bar-2</p>
</td>
</tr>
<tr>
<th ><p>Foo-3</p>
</th>
<td ><p>Bar-3</p>
<p>Bar-3-1</p>
</td>
</tr>
</tbody>
</table>
It would be convenient if I can store the row values as pairs together.
expected output would be something like an array with values of:
(Foo , Bar , Foo-1 , Bar-1 , Formula , Formula-1 Formula-2 Formula-3 Formula-4 Formula-5 , ....)
The important thing for me is to learn how to store the values of each tag and how to move around in the tag tree.

Learn XPath and DOM manipulation.
use strictures;
use HTML::TreeBuilder::XPath qw();
my $dom = HTML::TreeBuilder::XPath->new;
$dom->parse_file('10280979.html');
my %extract;
#extract{$dom->findnodes_as_strings('//th')} =
map {[$_->findvalues('p')]} $dom->findnodes('//td');
__END__
# %extract = (
# Foo => [qw(Bar)],
# 'Foo-1' => [qw(Bar-1)],
# 'Foo-2' => [qw(Bar-2)],
# 'Foo-3' => [qw(Bar-3 Bar-3-1)],
# Formula => [qw(Formula1-1 Formula1-2 Formula1-3 Formula1-4 Formula1-5)],
# )

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Webscraping In powershell monitor page - html

The easiest way would probably be the HTML Agility Pack which you can import in PowerShell. Lee Holmes has a short article demonstrating a simple example with it. Essentially you're using an XML-like API to access the HTML DOM.

Related

Insert Data from CSV to HTML table using powershell

How can i add new row to com HTML object powershell

TreeBuilder Get embedded nodes

How to extract certain data from HTML using RegEx?

How to read the values of a table in HTML file and Store them in Perl?

Categories

Resources