How to parse HTML table with Powershell Core 7? - html

I have the following code:
$html = New-Object -ComObject "HTMLFile"
$source = Get-Content -Path $FilePath -Raw
try
{
$html.IHTMLDocument2_write($source) 2> $null
}
catch
{
$encoded = [Text.Encoding]::Unicode.GetBytes($source)
$html.write($encoded)
}
$t = $html.getElementsByTagName("table") | Where-Object {
$cells = $_.tBodies[0].rows[0].cells
$cells[0].innerText -eq "Name" -and
$cells[1].innerText -eq "Description" -and
$cells[2].innerText -eq "Default Value" -and
$cells[3].innerText -eq "Release"
}
The code works fine on Windows Powershell 5.1, but on Powershell Core 7 $_.tBodies[0].rows returns null.
So, how does one access the rows of an HTML table in PS 7?

PowerShell (Core), as of 7.3.1, does not come with a built-in HTML parser - and this may never change.
You must rely on a third-party solution, such as the PowerHTML module that wraps the HTML Agility Pack.
The object model works differently than the Internet Explorer-based one available in Windows PowerShell; it is similar to the XML DOM provided by the standard System.Xml.XmlDocument type ([xml])[1]; see the documentation and the sample code below.
# Install the module on demand
If (-not (Get-Module -ErrorAction Ignore -ListAvailable PowerHTML)) {
Write-Verbose "Installing PowerHTML module for the current user..."
Install-Module PowerHTML -ErrorAction Stop
}
Import-Module -ErrorAction Stop PowerHTML
# Create a sample HTML file with a table with 2 columns.
Get-Item $HOME | Select-Object Name, Mode | ConvertTo-Html > sample.html
# Parse the HTML file into an HTML DOM.
$htmlDom = ConvertFrom-Html -Path sample.html
# Find a specific table by its column names, using an XPath
# query to iterate over all tables.
$table = $htmlDom.SelectNodes('//table') | Where-Object {
$headerRow = $_.Element('tr') # or $tbl.Elements('tr')[0]
# Filter by column names
$headerRow.ChildNodes[0].InnerText -eq 'Name' -and
$headerRow.ChildNodes[1].InnerText -eq 'Mode'
}
# Print the table's HTML text.
$table.InnerHtml
# Extract the first data row's first column value.
# Note: #(...) is required around .Elements() for indexing to work.
#($table.Elements('tr'))[1].ChildNodes[0].InnerText
A Windows-only alternative is to use the HTMLFile COM object, as shown in this answer, and as used in your own attempt - I'm unclear on why it didn't work in your specific case.
[1] Notably with respect to supporting XPath queries via the .SelectSingleNode() and .SelectNodes() methods, exposing child nodes via a .ChildNodes collection, and providing .InnerHtml / .OuterHtml / .InnerText properties. Instead of an indexer that supports child element names, methods .Element(<name>) and .Elements(<name>) are provided.

I used the answer above for my solution. I installed PowerHTML.
I wanted to extract the datatable from https://www.dicomlibrary.com/dicom/dicom-tags/ and convert them.
From this:
<tr><td>(0002,0000)</td><td>UL</td><td>File Meta Information Group Length</td><td></td></tr>
To this:
{"00020000", "ULFile Meta Information Group Length"}
$page = Invoke-WebRequest https://www.dicomlibrary.com/dicom/dicom-tags/
$htmldom = ConvertFrom-Html $page
$table = $htmlDom.SelectNodes('//table') | Where-Object {
$headerRow = $_.Element('tr') # or $tbl.Elements('tr')[0]
# Filter by column names
$headerRow.ChildNodes[0].InnerText -eq 'Tag'
}
foreach ($row in $table.SelectNodes('tr'))
{$a = $row.SelectSingleNode('td[1]').innerText.Trim() -replace "`n|`r|\s+", " " -replace "\(",'{"' -replace ",","" -replace "\)",'",'
$c = $row.SelectSingleNode('td[3]').innerText.Trim() -replace "`n|`r|\s+", " "
$b=$row.seletSingleNode('td[2]').innerText.Trim() -replace "`n|`r|\s+", ""; $c = '"'+$b+$c+'"},'
$row = New-Object -TypeName psobject
$row | Add-Member -MemberType NoteProperty -Name Tag -Value $a
$row | Add-Member -MemberType NoteProperty -Name Value -Value $c
[array]$data += $row
}
$data | Out-File c:\scripts\dd.txt

Related

Powershell not returning correct value

As some background, this should take an excel file, and convert it to PDF (and place the PDF into a temporary folder).
E.g. 'C:\Users\gjacobs\Desktop\test\stock.xlsx'
becomes
'C:\Users\gjacobs\Desktop\test\pdf_merge_tmp\stock.pdf'
However, the new file path does not return correctly.
If I echo the string $export_name from within the function, I can see that it has the correct value: "C:\Users\gjacobs\Desktop\test\pdf_merge_tmp\stock.pdf".
But once $export_name is returned, it has a different (incorrect value): "C:\Users\gjacobs\Desktop\test\pdf_merge_tmp C:\Users\gjacobs\Desktop\test\pdf_merge_tmp\stock.pdf".
function excel_topdf{
param(
$file
)
#Get the parent path
$parent = Split-Path -Path $file
#Get the filename (no ext)
$leaf = (Get-Item $file).Basename
#Add them together.
$export_name = $parent + "\pdf_merge_tmp\" + $leaf + ".pdf"
echo ($export_name) #prints without issue.
#Create tmp dir
New-Item -Path $parent -Name "pdf_merge_tmp" -ItemType "Directory" -Force
$objExcel = New-Object -ComObject excel.application
$objExcel.visible = $false
$workbook = $objExcel.workbooks.open($file, 3)
$workbook.Saved = $true
$xlFixedFormat = “Microsoft.Office.Interop.Excel.xlFixedFormatType” -as [type]
$workbook.ExportAsFixedFormat($xlFixedFormat::xlTypePDF, $export_name)
$objExcel.Workbooks.close()
$objExcel.Quit()
return $export_name
}
$a = excel_topdf -file 'C:\Users\gjacobs\Desktop\test\stock.xlsx'
echo ($a)
The issue you're experiencing is caused by the way how PowerShell returns from functions. It's not something limited to New-Item cmdlet. Every cmdlet which returns anything would cause function output being altered with the value from that cmdlet.
As an example, let's take function with one cmdlet, which returns an object:
function a {
Get-Item -Path .
}
$outputA = a
$outputA
#### RESULT ####
Directory:
Mode LastWriteTime Length Name
---- ------------- ------ ----
d--hs- 12/01/2021 10:47 C:\
If you want to avoid that, these are most popular options (as pointed out by Lasse V. Karlsen in comments):
# Assignment to $null (or any other variable)
$null = Get-Item -Path .
# Piping to Out-Null
Get-Item -Path . | Out-Null
NOTE: The behavior described above doesn't apply to Write-Host:
function b {
Write-Host "bbbbbb"
}
$outputB = b
$outputB
# Nothing displayed
Interesting thread to check if you want to learn more.

Can't call piped properties in a function. Powershell

So I'm trying to create a "download" function that uses a piped object property to determine a download method (sftp or http). Then either create an sftp script for putty/winscp or curl the http url. I am defining objects as follows:
#WinSCP
$winscp = new-object psobject
$winscp | add-member noteproperty name "WinSCP"
$winscp | add-member noteproperty dltype "http"
$winscp | add-member noteproperty file "winscp.exe"
$winscp | add-member noteproperty url "https://cdn.winscp.net/files/WinSCP-5.17.8-Setup.exe"
$winscp | add-member noteproperty path "$env:ProgramFiles(x86)\WinSCP"
$winscp | add-member noteproperty install 'msiexec /i "$DataPath\$winscp.file" /quiet /norestart'
#Database
$db = new-object psobject
$db | add-member noteproperty name "Client Database"
$db | add-member noteproperty dltype "sftp"
$db | add-member noteproperty file "database_"
$db | add-member noteproperty ver "check"
$db | add-member noteproperty ext ".csv"
$db | add-member noteproperty dir "db"
#DatabaseVersion
$db_ver = new-object psobject
$db_ver | add-member noteproperty name "Database Version File"
$db_ver | add-member noteproperty dltype "sftp"
$db_ver | add-member noteproperty file "current_version.txt"
$db_ver | add-member noteproperty dir "db"
Currently I'm having issues with the $Input variable within the function. It can only be used once and does not translate into an if statement. Since it contains an object with multiple properties, it needs converted to a new object within the function first I think. I'm new to powershell and haven't found a way of doing this yet. Here is the function I made and am trying to use:
function Download () {
#HTTP Download Method
if ($input.dltype -eq "http") {
curl $input.url -O $DataPath\$input.file
#HTTP Success or Error
$curlResult = $LastExitCode
if ($curlResult -eq 0)
{
Write-Host "Successfully downloaded $input.name"
}
else
{
Write-Host "Error downloading $input.name"
}
pause
}
#SFTP Download Method
if ($input.dltype -eq "sftp") {
sftpPassCheck
#Detect if version required
if ($input.ver = "check") {
#Download the objects version file
"$+$Input+_ver" | Download
#Update the object's ver property
$input.ver = [IO.File]::ReadAllText("$DataPath\current_version.txt")
#Build the new filename
$input.file = "$input.file"+"$input.ver"+"$input.ext"
#Delete the version file
Remove-Item "$DataPath\current_version.txt"
}
& "C:\Program Files (x86)\WinSCP\WinSCP.com" `
/log="$DataPath\SFTP.log" /ini=nul `
/command `
"open sftp://ftpconnector:$script:sftp_pass#$input.ip/ -hostkey=`"`"ssh-ed25519 255 SETvoRlAT0/eJJpRhRRpBO5vLfrhm5L1mRrMkOiPS70=`"`" -rawsettings ProxyPort=0" `
"cd /$input.dir" `
"lcd $DataPath" `
"get $input.file" `
"exit"
#SFTP Success or Error
$winscpResult = $LastExitCode
if ($winscpResult -eq 0)
{
Write-Host "Successfully downloaded $input.name"
}
else
{
Write-Host "Error downloading $input.name"
}
}
}
I'm probably missing something simple but I'm clueless at this point. Oh usage should be:
WinSCP | download
The proper way to bind input from the pipeline to a function's parameters is to declare an advanced function - see about_Functions_Advanced_Parameters and the implementation in the bottom section of this answer.
However, in simple cases a filter will do, which is a simplified form of a function that implicitly binds pipeline input to the automatic $_ variable and is called for each input object:
filter Download {
if ($_.dltype -eq "http") {
# ...
}
}
$input is another automatic variable, which in simple (non-advanced) functions is an enumerator for all pipeline input being received and must therefore be looped over.
That is, the following simple function is the equivalent of the above filter:
function Download {
# Explicit looping over $input is required.
foreach ($obj in $input) {
if ($obj.dltype -eq "http") {
# ...
}
}
}
If you do want to turn this into an advanced function (note that I've changed the name to conform to PowerShell's verb-noun naming convention):
function Invoke-Download {
param(
# Declare a parameter explicitly and mark it as
# as pipeline-binding.
[Parameter(ValueFromPipeline, Mandatory)]
$InputObject # Not type-constraining the parameter implies [object]
)
# The `process` block is called for each pipeline input object
# with $InputObject referencing the object at hand.
process {
if ($InputObject.dltype -eq "http") {
# ...
}
}
}
mklement0 is spot on - $input is not really meant to used directly, and you're probably much better off explicitly declaring your input parameters!
In addition to the $InputObject pattern shown in that answer, you can also bind input object property values to parameters by name:
function Download
{
param(
[Parameter(ValueFromPipelineByPropertyName = $true)]
[Alias('dltype')]
[string]$Protocol = 'http'
)
process {
Write-Host "Choice of protocol: $Protocol"
}
}
Notice that although the name of this parameter is $Protocol, the [Alias('dltype')] attribute will ensure that the value of the dltype property on the input object is bound.
The effect of this is:
PS ~> $WinSCP,$db |Download
Choice of protocol: http
Choice of protocol: sftp
Keep repeating this pattern for any required input parameter - declare a named parameter mapped to property names (if necessary), and you might end up with something like:
function Download
{
[CmdletBinding()]
param(
[Parameter(ValueFromPipelineByPropertyName = $true)]
[ValidateSet('sftp', 'http')]
[Alias('dltype')]
[string]$Protocol,
[Parameter(ValueFromPipelineByPropertyName = $true)]
[Alias('dir')]
[string]$Path = $PWD,
[Parameter(Mandatory = $true, ValueFromPipelineByPropertyName = $true)]
[Alias('url','file')]
[string]$Uri
)
process {
Write-Host "Downloading $Uri to $Path over $Protocol"
}
}
Now you can do:
PS ~> $WinSCP,$db |Download
Downloading https://cdn.winscp.net/files/WinSCP-5.17.8-Setup.exe to C:\Program Files(x86)\WinSCP over http
Downloading database_ to db over sftp
We're no longer dependent on direct access to $input, $InputObject or $_, nice and clean.
Please see the about_Functions_Advanced_Parameters help file for more information about parameter declaration.

powershell ConvertTo-Html add class

My script is pulling information from server, than it converts to HTML and send the report by email.
Snippet:
$sourceFile = "log.log"
$targetFile = "log.html"
$file = Get-Content $sourceFile
$fileLine = #()
foreach ($Line in $file) {
$MyObject = New-Object -TypeName PSObject
Add-Member -InputObject $MyObject -Type NoteProperty -Name Load -Value $Line
$fileLine += $MyObject
}
$fileLine | ConvertTo-Html -Property Load -head '<style> .tdclass{color:red;} </style>' | Out-File $target
Current HTML report snippet:
<table>
<colgroup><col/></colgroup>
<tr><th>Load on servers</th></tr>
<tr><td>Server1 load is 2442</td></tr>
<tr><td>Server2 load is 6126</td></tr>
<tr><td>Server3 load is 6443</td></tr>
<tr><td> </td></tr>
<tr><td>Higher than 4000:</td></tr>
<tr><td>6126</td></tr>
<tr><td>6443</td></tr>
</table>
This will generate an HTML report containing a table with tr and td.
Is there any method to make it generate td with classes, so I can insert the class name into the -head property with styles and make it red for the Higher than 4000: tds ?
I know this is in a old post, but stumbled cross it looking to do something similar.
I was able to add CSS styling by doing a replace.
Here is an example:
$Report = $Report -replace '<td>PASS</td>','<td class="GreenStatus">PASS ✔</td>'
You can then output $report to a file as normal, with the relevant css code in the header.
You would need some additional logic to find values over 4000
You can use the Get-Help ConvertTo-Html command you will get all parameters for the ConvertTo-Html command. Below is output:
ConvertTo-Html [[-Property] <Object[]>] [[-Head] <String[]>] [[-Title] <String>] [[-Body] <String[]>] [-As<String>] [-CssUri <Uri>] [-InputObject <PSObject>] [-PostContent <String[]>] [-PreContent <String[]>][<CommonParameters>]
You can create an external CSS file and give the CSS file path in the [-CssUri] parameter.

ConvertFrom-JSON to object

It looks like the way I am expecting this to work doesn't. I want multiple objects returned, but it seems to be returning just one. It is beyond me how I do it.
A very simple JSON file:
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"storageAccountName": {
"value": "sa01"
},
"virtualNetworkName": {
"value": "nvn01"
}
}
}
I want to dynamically add the parameters and their values into a nice pscustomobject (that would look like the following with the above data):
ParameterName | Value
===========================
storageAccountName | sa01
virtualNetworkName | nvn01
What I don't understand is why the following returns one object:
$TemplateParametersFile = "C:\Temp\deploy-Project-Platform.parameters.json"
$content = Get-Content $TemplateParametersFile -Raw
$JsonParameters = ConvertFrom-Json -InputObject $content
$JsonParameters.parameters | Measure-Object
Whilst writing this, I eventually found a solution that get's what I want, which I'll post in the answer section. Feel free to school me and improve...
I would do things a little differently, skipping the hashtable, and using the hidden PSObject property. So, picking up after you have the JSON data stored in $content, I would do something like this:
#Convert JSON file to an object
$JsonParameters = ConvertFrom-Json -InputObject $content
#Create new PSObject with no properties
$oData = New-Object PSObject
#Loop through properties of the $JsonParameters.parameters object, and add them to the new blank object
$JsonParameters.parameters.psobject.Properties.Name |
ForEach{
Add-Member -InputObject $oData -NotePropertyName $_ -NotePropertyValue $JsonParameters.parameters.$_.Value
}
$oData
By the way, it had issues converting the JSON you posted, I had to add quotes around the two values, such as "value": "sa01".
Using the same JSON file as shown above:
<#
# Read in JSON from file on disk
$TemplateParametersFile = "C:\Temp\deploy-Project-Platform.parameters.json"
$content = Get-Content $TemplateParametersFile -Raw
#>
#Retrieve JSON file from Azure storage account.
$TemplateParametersFile = "https://{storageAccountName}.blob.core.windows.net/{SomeContainer}/deploy-Project-Platform.parameters.json"
$oWc = New-Object System.Net.WebClient
$webpage = $oWc.DownloadData($TemplateParametersFile)
$content = [System.Text.Encoding]::ASCII.GetString($webpage)
#Convert JSON file to an object (IMHO- Sort of!)
$JsonParameters = ConvertFrom-Json -InputObject $content
#Build hashtable - easier to add new items - the whole purpose of this script
$oDataHash = #{}
$JsonParameters.parameters | Get-Member -MemberType NoteProperty | ForEach-Object{
$oDataHash += #{
$_.name = $JsonParameters.parameters."$($_.name)" | Select -ExpandProperty Value
}
}
#Example: adding a single item to the hashtable
$oDataHash.Add("VirtualMachineName","aDemoAdd")
#Convert hashtable to pscustomobject
$oData = New-Object -TypeName PSCustomObject
$oData | Add-Member -MemberType ScriptMethod -Name AddNote -Value {
Add-Member -InputObject $this -MemberType NoteProperty -Name $args[0] -Value $args[1]
}
$oDataHash.Keys | Sort-Object | ForEach-Object{
$oData.AddNote($_,$oDataHash.$_)
}
$oData
And the result:
storageAccountName VirtualMachineName virtualNetworkName
------------------ ------------------ ------------------
sa01 aDemoAdd nvn01
Agreed, the question asked for a Parameter / Value pair, and this results in the parameter's name being assigned as the noteproperty, but I think it will be easier to use it this way. Of course, $oDataHash returns it as a Key/value pair.
This script also pulls the JSON file directly from an Azure storage account. No need to save to disk. If you want to save to disk, change $oWc.DownloadData() to $oWc.DownloadFile() . The commented bit at the top, reads from disk.
I am sure there are much more succinct ways to achieve the same result, and I'd love to here them. For me, at the moment this works.

Get AD distinguished name

I'm trying to take input from a CSV file, which has a list of group names (canonical names) and get the Distinguished Name from it, then output to another CSV file. The code:
#get input file if passed
Param($InputFile)
#Set global variable to null
$WasError = $null
#Prompt for file name if not already provided
If ($InputFile -eq $NULL) {
$InputFile = Read-Host "Enter the name of the input CSV file (file must have header of 'Group')"
}
#Import Active Directory module
Import-Module -Name ActiveDirectory -ErrorAction SilentlyContinue
$DistinguishedNames = Import-Csv -Path $InputFile -Header Group | foreach-Object {
$GN = $_.Group
$DN = Get-ADGroup -Identity $GN | Select DistinguishedName
}
$FileName = "RESULT_Get-DistinguishedNames" + ".csv"
#Export list to CSV
$DNarray | Export-Csv -Path $FileName -NoTypeInformation
I've tried multiple solutions, and none have seemed to work. Currently, it throws an error because
Cannot validate argument on parameter 'Identity'. The argument is null. Supply a non-null argument and try the command again.
I tried using -Filter also, and in a previous attempt I used this code:
Param($InputFile)
#Set global variable to null
$WasError = $null
#Prompt for file name if not already provided
If ($InputFile -eq $NULL) {
$InputFile = Read-Host "Enter the name of the input CSV file(file must have header of 'GroupName')"
}
#Import Active Directory module
Import-Module -Name ActiveDirectory -ErrorAction SilentlyContinue
$DistinguishedNames = Import-Csv -Path $InputFile | foreach {
$strFilter = "*"
$Root = [ADSI]"GC://$($objDomain.Name)"
$objSearcher = New-Object System.DirectoryServices.DirectorySearcher($root)
$objSearcher.Filter = $strFilter
$objSearcher.PageSize = 1000
$objsearcher.PropertiesToLoad.Add("distinguishedname") | Out-Null
$objcolresults = $objsearcher.FindAll()
$objitem = $objcolresults.Properties
[string]$objDomain = [System.DirectoryServices.ActiveDirectory.Domain]::GetCurrentDomain()
[string]$DN = $objitem.distinguishedname
[string]$GN = $objitem.groupname
#Get group info and add mgr ID and Display Name
$props = #{'Group Name'= $GN;'Domain' = $objDomain;'Distinguished Name' = $DN;}
$DNS = New-Object psobject -Property $props
}
$FileName = "RESULT_Get-DistinguishedNames" + ".csv"
#Export list to CSV
$DistinguishedNames | Sort Name | Export-Csv $FileName -NoTypeInformation
The filter isn't the same one I was using here, I can't find the one I was using, the I currently have is a broken attempt.
Anyway, the main issue I was having is that it will get the group name, but search for it in the wrong domain (it wouldn't include Organizational Units) which caused none of them to be found. When I search for a group in PowerShell though (using Get-ADGroup ADMIN) they show up with the correct DN and everything. Any hints or code samples are appreciated.
You seemingly miss the point of $variable = cmdlet|foreach {script-block} assignment. The objects to assign to $variable should be returned (passed through the script block) in order to end up in $variable. Both your main loops contain the structure of the line $somevar=expectedOutput where expectedOutput is either a New-Object psobject or Get-ADGroup call. The assignment to $someVar suppresses the output, so that the script block does not have anything to return, and $variable remains null. To fix, do not prepend the call that should return an object into outside variable with an assignment.
$DistinguishedNames = Import-Csv -Path $InputFile -Header Group | foreach-Object {
$GN = $_.Group
Get-ADGroup -Identity $GN | Select DistinguishedName # drop '$DN=`
}
$DistinguishedNames | Export-CSV -Path $FileName -NoTypeInformation
The same issue with the second script.