Why my PowerShell script not run as expected - html

I have created a script to crawl the IMDB website. My script take a list of IMDB urls, run and extract the data like movie title, release year, plot summary and export it to a text file in CSV. I wrote the script as below.
$listToCrawl = "imdb_link_list.txt"
$pathOfFile = "K:\MY DOCUMENTS\POWERSHELL\IMDB FILE\"
$fileName = "plot_summary.txt"
New-Item ($pathOfFile + $fileName) -ItemType File
Set-Content ($pathOfFile + $fileName) '"Title","Year","URL","Plot Summary"'
Get-Content ($pathOfFile + $listToCrawl) | ForEach-Object {
$url = $_
$Result = Invoke-WebRequest -Uri $url
$movieTitleSelector = "#title-overview-widget > div.vital > div.title_block > div > div.titleBar > div.title_wrapper > h1"
$movieTitleNode = $Result.ParsedHtml.querySelector( $movieTitleSelector)
$movieTitle = $movieTitleNode.innerText
$movieYearSelector = "#titleYear"
$movieYearNode = $Result.ParsedHtml.querySelector($movieYearSelector)
$movieYear = $movieYearNode.innerText
$plotSummarySelector = "#titleStoryLine > div:nth-child(3) > p > span"
$plotSummaryNode = $Result.ParsedHtml.querySelector($plotSummarySelector)
$plotSummary = $plotSummary.innerText
$movieDataEntry = '"' + $movieTitle + '","' + $movieYear + '","' + $url + '","' + $plotSummary + '"'
Add-Content ($pathOfFile + $fileName) $movieDataEntry
}
The list of urls to extract from is saved in the "K:\MY DOCUMENTS\POWERSHELL\IMDB FILE\imdb_link_list.txt" file and the content is as below.
https://www.imdb.com/title/tt0472033/
https://www.imdb.com/title/tt0478087/
https://www.imdb.com/title/tt0285331/
https://www.imdb.com/title/tt0453562/
https://www.imdb.com/title/tt0120577/
https://www.imdb.com/title/tt0416449/
I just import and run the script. It does not run as expected. The error is threw.
Invalid argument.
At K:\MY DOCUMENTS\POWERSHELL\IMDB_Plot_Summar_ Extract.ps1:20 char:1
+ $plotSummaryNode = $Result.ParsedHtml.querySelector($plotSummarySelec ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], ArgumentException
+ FullyQualifiedErrorId : System.ArgumentException
I think the problem is due to the CSS selector I use to select the data but I don't know what's wrong. I think I have followed the CSS selector rule.
$plotSummarySelector = "#titleStoryLine > div:nth-child(3) > p > span"
Does anyone know what's wrong with the thing.

The ParsedHtml property is specific to PowerShell for Windows and doesn't exist in PowerShell Core, so if you want to future-proof your code you're better off using something like the HTML Agility Pack.
# install the HTML Agility Pack nuget package
Invoke-WebRequest -Uri "https://dist.nuget.org/win-x86-commandline/latest/nuget.exe" -OutFile ".\nuget.exe";
.\nuget.exe install "HtmlAgilityPack" -Version "1.11.21";
# import the HTML Agility Pack
Add-Type -Path ".\HtmlAgilityPack.1.11.21\lib\Net40\HtmlAgilityPack.dll";
# get the web page content and load it into a HtmlDocument
$response = Invoke-WebRequest -Uri "https://www.imdb.com/title/tt0472033/" -UseBasicParsing;
$html = $response.Content;
$doc = new-object HtmlAgilityPack.HtmlDocument;
$doc.LoadHtml($html);
then you can extract nodes using XPath syntax - e.g. for the title:
# extract the title
$titleHtml = $doc.DocumentNode.SelectSingleNode("//div[#class='title_wrapper']/h1/text()[1]").InnerText;
$titleText = [System.Net.WebUtility]::HtmlDecode($titleHtml).Trim();
write-host "'$titleText'"; # '9'
I'll leave the rest of the document elements as an exercise for the reader :-).

Related

How to use ADODB in PowerShell

I thought I'd post these code snippets for others who may find themselves trying to make ADODB calls from a PowerShell script. I inherited this convoluted mess, and had to make some changes to it.
We're using PlanetPress as part of a Docuware Document Imaging system. A PP workflow called a vbscript which in turn launched a PowerShell script. The PowerShell did the work to make two database queries. One was an update, and the other was a select. I'm not that great with PowerShell, and there may be a cmdlet out there to simplify this. But the code was creating ADODB.Connection, ADODB.Command, and ADODB.Resultset objects directly. The problem is there are no good resources for the syntax required to use these objects. Hopefully these code snippets will help some poor soul in a similar situation.
Using ADODB.Command:
$oConnection = New-Object -comobject "ADODB.Connection"
# Use correct ODBC driver
if ([Environment]::Is64BitProcess) {
$oConnection.Open("DSN=DW64")
} else {
$oConnection.Open("DSN=DW")
}
if ($oConnection.State -eq $adStateClosed) {
Write-Output "Connection not established"
Write-Output $oConnection
}
$UpdQuery = "Update dwdata.Purchasing `
set Status='Processing' `
WHERE DOCUMENT_TYPE = 'Check' `
AND STATUS in ('Approved')"
$ra=-1
$oCommand = New-Object -comobject "ADODB.Command"
$oCommand.ActiveConnection = $oConnection
$oCommand.CommandText = $UpdQuery
$oCommand.CommandType = $adCmdText
$rs=$oCommand.Execute([ref]$ra)
Write-Output ("Count of Row[s] updated: " + $ra)
Using ADODB.Resultset:
$oRS = New-Object -comobject "ADODB.Recordset"
$query = "SELECT DWDOCID, DOCUMENT_DATE, CHECK_NUMBER, PAYEE_NAME, CHECK_AMOUNT, STATUS `
FROM dwdata.Purchasing `
WHERE DOCUMENT_TYPE = 'Check' `
AND STATUS = 'Processing' `
ORDER BY CHECK_NUMBER;"
# $oConnection object created in ADODB.Command snippet above
$oConnection.CursorLocation = $adUseClient
$oRS.Open($query, $oConnection, $adOpenStatic, $adLockOptimistic)
$reccount = "Number of queried records: " + $oRS.RecordCount
write-output $reccount
If (-not ($oRS.EOF)) {
# Move to the first record returned, and loop
$oRS.MoveFirst()
$reccount = "Number of loop records: " + $oRS.RecordCount
write-output $reccount
do {
$outString = '"' + $oRS.Fields.Item("DOCUMENT_DATE").Value.ToString("MM/dd/yyyy") + '"' + ','
$outString += '"' + $oRS.Fields.Item("CHECK_NUMBER").Value + '"' + ','
$outString += '"' + $oRS.Fields.Item("PAYEE_NAME").Value + '"' + ','
$outString += '"' + $oRS.Fields.Item("CHECK_AMOUNT").Value + '"' + ','
$outString | Out-File $bankfile -Append -Encoding ASCII
$oRS.MoveNext()
} until
($oRS.EOF -eq $True)
} Else{
Write-Output "No records returned from database query."
}
$oRS.Close()
$oConnection.Close()
Some of this code is ugly (using do instead of while), but the idea is to help you get the right syntax for $oCommand.Execute and how to get a record count from the Recordset. $oRS.MoveFirst() needs to be called before the record count is available.
ss64.com and other resources usually give vbscript snippets. In vbscript variables are not preceeded with a $, and when or if you need to use parenthesis is unclear. This code does run and work.

Convert and format XML to JSON on Azure Storage Account in Powershell

So i'm trying to convert XML files on an Azure Storage Container to JSON in the same container.
This way I'm able to read the information into an Azure SQL Database via an Azure Datafactory.
I'd like to stay clear from using Logic apps if able.
The JSON files need to be formatted.
And all this through the use of PowerShell scripting.
What i've got so far after some searching on the interwebs and shamelessly copying and pasting powershell code:
#Connect-AzAccount
# Helper function that converts a *simple* XML document to a nested hashtable
# with ordered keys.
function ConvertFrom-Xml {
param([parameter(Mandatory, ValueFromPipeline)] [System.Xml.XmlNode] $node)
process {
if ($node.DocumentElement) { $node = $node.DocumentElement }
$oht = [ordered] #{}
$name = $node.Name
if ($node.FirstChild -is [system.xml.xmltext]) {
$oht.$name = $node.FirstChild.InnerText
} else {
$oht.$name = New-Object System.Collections.ArrayList
foreach ($child in $node.ChildNodes) {
$null = $oht.$name.Add((ConvertFrom-Xml $child))
}
}
$oht
}
}
function Format-Json
{
<#
.SYNOPSIS
Prettifies JSON output.
.DESCRIPTION
Reformats a JSON string so the output looks better than what ConvertTo-Json outputs.
.PARAMETER Json
Required: [string] The JSON text to prettify.
.PARAMETER Minify
Optional: Returns the json string compressed.
.PARAMETER Indentation
Optional: The number of spaces (1..1024) to use for indentation. Defaults to 4.
.PARAMETER AsArray
Optional: If set, the output will be in the form of a string array, otherwise a single string is output.
.EXAMPLE
$json | ConvertTo-Json | Format-Json -Indentation 2
#>
[CmdletBinding(DefaultParameterSetName = 'Prettify')]
Param(
[Parameter(Mandatory = $true, Position = 0, ValueFromPipeline = $true)]
[string]$Json,
[Parameter(ParameterSetName = 'Minify')]
[switch]$Minify,
[Parameter(ParameterSetName = 'Prettify')]
[ValidateRange(1, 1024)]
[int]$Indentation = 4,
[Parameter(ParameterSetName = 'Prettify')]
[switch]$AsArray
)
if ($PSCmdlet.ParameterSetName -eq 'Minify')
{
return ($Json | ConvertFrom-Json) | ConvertTo-Json -Depth 100 -Compress
}
# If the input JSON text has been created with ConvertTo-Json -Compress
# then we first need to reconvert it without compression
if ($Json -notmatch '\r?\n')
{
$Json = ($Json | ConvertFrom-Json) | ConvertTo-Json -Depth 100
}
$indent = 0
$regexUnlessQuoted = '(?=([^"]*"[^"]*")*[^"]*$)'
$result = $Json -split '\r?\n' |
ForEach-Object {
# If the line contains a ] or } character,
# we need to decrement the indentation level unless it is inside quotes.
if ($_ -match "[}\]]$regexUnlessQuoted")
{
$indent = [Math]::Max($indent - $Indentation, 0)
}
# Replace all colon-space combinations by ": " unless it is inside quotes.
$line = (' ' * $indent) + ($_.TrimStart() -replace ":\s+$regexUnlessQuoted", ': ')
# If the line contains a [ or { character,
# we need to increment the indentation level unless it is inside quotes.
if ($_ -match "[\{\[]$regexUnlessQuoted")
{
$indent += $Indentation
}
$line
}
if ($AsArray) { return $result }
return $result -Join [Environment]::NewLine
}
# Storage account details
$resourceGroup = "insert resource group here"
$storageAccountName = "insert storage account name here"
$container = "insert container here"
$storageAccountKey = (Get-AzStorageAccountKey -ResourceGroupName $resourceGroup -Name $storageAccountName).Value[0]
$storageAccount = Get-AzStorageAccount -ResourceGroupName $resourceGroup -Name $storageAccountName
# Creating Storage context for Source, destination and log storage accounts
#$context = New-AzStorageContext -StorageAccountName $storageAccountName -StorageAccountKey $storageAccountKey
$context = New-AzStorageContext -ConnectionString "insert connection string here"
$blob_list = Get-AzStorageBlob -Container $container -Context $context
foreach($blob_iterator in $blob_list){
[XML](Get-AzStorageBlobContent $blob_iterator.name -Container $container -Context $context) | ConvertFrom-Xml | ConvertTo-Json -Depth 11 | Format-Json | Set-Content ($blob_iterator.name + '.json')
}
Output =
Cannot convert value "Microsoft.WindowsAzure.Commands.Common.Storage.ResourceModel.AzureStorageBlob" to type "System.Xml.XmlDocument". Error: "The specified node cannot
be inserted as the valid child of this node, because the specified node is the wrong type."
At C:\Users\.....\Convert XML to JSON.ps1:116 char:6
+ [XML](Get-AzStorageBlobContent $blob_iterator.name -Container $c ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvalidCastToXmlDocument
When I run the code the script asks me if I want to download the xml file to a local folder on my laptop.
This is not what I want, I want the conversion to be done in Azure on the Storage container.
And I think that I'm adding ".json" to the .xml file name.
So the output would become something like filename.xml.json instead of just filename.json
What's going wrong here?
And how can it be fixed?
Thank you in advance for your help.

Get location (lat/long) from googlemap with address - Using powershell

I would like to create a script that allow myself to get the coordinates of some locations on Google map using their addresses.
To do this I use powershell code:
clear-host
$address = "Place+de+la+concorde"
$city = "Paris"
$cp = "75000"
$country = "France"
$url = "https://maps.googleapis.com/maps/api/geocode/json?address=" + $address + "+" + $city + "+" + $cp + "+" + $country
$result = Invoke-WebRequest -Uri $url
Write-Host = $result
Unfortunately I can not retrieve rows interesting to me, ie:
results> geometry> location> lat
and
results> geometry> location> lng
Any idea for getting a specific line information, considering that the number of lines could change ?
Thank you !
ConvertFrom-JSON is the cmdlet you need.
$json = Invoke-WebRequest $url | ConvertFrom-JSON
$json.results.geometry.location.lat
$json.results.geometry.location.lng
Now you are looking at a JSON object (as intended) rather than lines so you won't have to worry about line position, etc.

PowerShell match names with user email addresses and format as mailto

So i have the below script which scans a drive for folders, it then pulls in a csv with folder names and folder owners and then matches them and outputs to HTML.
I am looking for a way to within this use PS to look up the users names in the csv grab their email address from AD and then in the output of the HTML put them as mailto code.
function name($filename, $folderowners, $directory, $output){
$server = hostname
$date = Get-Date -format "dd-MMM-yyyy HH:mm"
$a = "<style>"
$a = $a + "TABLE{border-width: 1px;border-style: solid;border-color:black;}"
$a = $a + "Table{background-color:#ffffff;border-collapse: collapse;}"
$a = $a + "TH{border-width:1px;padding:0px;border-style:solid;border-color:black;}"
$a = $a + "TR{border-width:1px;padding-left:5px;border-style:solid;border-
color:black;}"
$a = $a + "TD{border-width:1px;padding-left:5px;border-style:solid;border-color:black;}"
$a = $a + "body{ font-family:Calibri; font-size:11pt;}"
$a = $a + "</style>"
$c = " <br></br> Content"
$b = Import-Csv $folderowners
$mappings = #{}
$b | % { $mappings.Add($_.FolderName, $_.Owner) }
Get-ChildItem $directory | where {$_.PSIsContainer -eq $True} | select Name,
#{n="Owner";e={$mappings[$_.Name]}} | sort -property Name |
ConvertTo-Html -head $a -PostContent $c |
Out-File $output
}
name "gdrive" "\\server\location\gdrive.csv" "\\server\location$"
"\\server\location\gdrive.html"
Try adding something like this to the select:
#{n="email";e={"mailto:"+((Get-ADUser $mappings[$_.Name] -Properties mail).mail)}
You need to load the ActiveDirectory module before you can use the Get-ADUser cmdlet:
Import-Module ActiveDirectory
On server versions this module can be installed via Server Manager or dism. On client versions you have to install the Remote Server Administration Tools before you can add the module under "Programs and Features".
Edit: I would have expected ConvertTo-Html to automatically create clickable links from mailto:user#example.com URIs, but apparently it doesn't. Since ConvertTo-Html automatically encodes angular brackets as HTML entities and I haven't found a way to prevent that, you also can't just pre-create the property as an HTML snippet. Something like this should work, though:
ConvertTo-Html -head $a -PostContent $c | % {
$_ -replace '(mailto:)([^<]*)', '$2'
} | Out-File $output
Here's how I would do it (avoiding the use of the AD Module, only because it's not on all of my workstations and this works just the same), and assuming you know the user name already:
#Setup Connection to Active Directory
$de = [ADSI]"LDAP://example.org:389/OU=Users,dc=example,dc=org"
$sr = New-Object System.DirectoryServices.DirectorySearcher($de)
After I setup a connection to AD, I set my LDAP search filter. This takes standard ldap query syntax.
#Set Properties of Search
$sr.SearchScope = [System.DirectoryServices.SearchScope]"Subtree"
$sr.Filter = "(&(ObjectClass=user)(samaccountname=$Username))"
I then execute the search.
#Grab user's information from OU. If search returns nothing, they are not a user and the script exits.
$SearchResults = $sr.FindAll()
if($SearchResults.Count -gt 0){
$emailAddr = $SearchResults[0].Properties["mail"]
$mailto = "Contact User"
}
You can of course send the $mailto variable anywhere you want, and change it's html, but hopefully this gets you started.

Passing an array of URLs as an argument to Powershell

I am trying to write a script that will take a text file containing URL links to documents and download them. I am having a hard time understanding how to pass the arguments and manipulate them in powershell. Here is what I got so far. I think I should be using the param method of taking an argument so I can require it for the script, but $args seemed easier on face value... A little help would be much appreciated.
**UPDATE
$script = ($MyInvocation.MyCommand.Name)
$scriptName = ($MyInvocation.MyCommand.Name -replace "(.ps1)" , "")
$scriptPath = ($MyInvocation.MyCommand.Definition)
$scriptDirectory = ($scriptPath.Replace("$script" , ""))
## ##################################
## begin code for directory creation.
## ##################################
## creates a direcory based on the name of the script.
do {
$scriptFolderTestPath = Test-Path $scriptDirectory\$scriptName -PathType container
$scriptDocumentFolderTestPath = Test-Path $scriptFolder\$scriptName"_Script_Documents" -PathType container
$scriptLogFolderTestPath = Test-Path $scriptFolder\$scriptName"_Script_Logs" -PathType container
if ($scriptFolderTestPath -match "False") {
$scriptFolder = New-Item $scriptDirectory\$scriptName -ItemType directory
}
elseif ($scriptDocumentFolderTestPath -match "False") {
New-Item $scriptFolder\$scriptName"_Script_Documents" -ItemType directory
}
elseif ($scriptLogFolderTestPath -match "False") {
New-Item $scriptFolder\$scriptName"_Script_Logs" -ItemType directory
}
} Until (($scriptFolderTestPath -match "True") -and ($scriptDocumentFolderTestPath -match "True") -and ($scriptLogFolderTestPath -match "True"))
## variables for downloading and renaming code.
$date = (Get-Date -Format yyyy-MM-dd)
## ################################
## begin code for link downloading.
## ################################
## gets contents of the arguement variable.
Get-Content $linkList
## downloads the linked file.
Invoke-WebRequest $linkList
Resulting Errors
PS C:\Windows\system32> C:\Users\Steve\Desktop\Website_Download.ps1
cmdlet Website_Download.ps1 at command pipeline position 1
Supply values for the following parameters:
linkList: C:\Users\Steve\Desktop\linkList.txt
Directory: C:\Users\Steve\Desktop\Website_Download
Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 10/27/2012 3:59 PM Website_Download_Script_Documents
d---- 10/27/2012 3:59 PM Website_Download_Script_Logs
Get-Content : Cannot find path 'C:\Users\Steve\Desktop\linkList.txt' because it does not exist.
At C:\Users\Steve\Desktop\Website_Download.ps1:42 char:1
+ Get-Content $linkList
+ ~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (C:\Users\Steve\Desktop\linkList.txt:String) [Get-Content], ItemNotFoundException
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
Invoke-WebRequest : Could not find file 'C:\Users\Steve\Desktop\linkList.txt'.
At C:\Users\Steve\Desktop\Website_Download.ps1:45 char:1
+ Invoke-WebRequest $linkList
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (System.Net.FileWebRequest:FileWebRequest) [Invoke-WebRequest], WebException
+ FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand
There is no difference in passing an array of arguments in Powershell, compared to any other type of arguments. See here for how it's done. Considering you have a text file, you don't need to pass an array of arguments, and only need to pass a file name, so just a string.
I don't have any experience with Powershell 3.0, which is what you are using (judging by the presence of Invoke-WebRequest in your code), but I would start with something like this:
$URLFile = #"
http://www.google.ca
http://www.google.com
http://www.google.co.uk/
"#
$URLs = $URLFile -split "`n";
$savedPages = #();
foreach ($url in $URLs) {
$savedPages += Invoke-WebRequest $url
}
That is, you have a single file, all in one place, and make sure you receive your content correctly. Not sure why you would need Start-BitsTransfer, since Invoke-WebRequest will already get you page contents. Note that I did not do anything with $savedPages, so my code is effectively useless.
After that, contents of $URLFile goes into a file and you replace a call to it with
gc "Path_To_Your_File"`
If still working, introduce a $Path parameter to your script like this:
param([string]$Path)
test again, and so on. If you are new to Powershell, always start with smaller code pieces and keep growing to include all functionality you need. If you start with a big piece, chances are you will never finish.
Figured this out with the link from Neolisk about handling params. Then changed some code at the end to create another variable and handle things as I normally would. Just some confusion with passing params.
## parameter passed to the script.
param (
[parameter(Position=0 , Mandatory=$true)]
[string]$linkList
)
## variables for dynamic naming.
$script = ($MyInvocation.MyCommand.Name)
$scriptName = ($MyInvocation.MyCommand.Name -replace "(.ps1)" , "")
$scriptPath = ($MyInvocation.MyCommand.Definition)
$scriptDirectory = ($scriptPath.Replace("$script" , ""))
## ##################################
## begin code for directory creation.
## ##################################
## creates a direcory based on the name of the script.
do {
$scriptFolderTestPath = Test-Path $scriptDirectory\$scriptName -PathType container
$scriptDocumentFolderTestPath = Test-Path $scriptFolder\$scriptName"_Script_Documents" -PathType container
$scriptLogFolderTestPath = Test-Path $scriptFolder\$scriptName"_Script_Logs" -PathType container
if ($scriptFolderTestPath -match "False") {
$scriptFolder = New-Item $scriptDirectory\$scriptName -ItemType directory
}
elseif ($scriptDocumentFolderTestPath -match "False") {
New-Item $scriptFolder\$scriptName"_Script_Documents" -ItemType directory
}
elseif ($scriptLogFolderTestPath -match "False") {
New-Item $scriptFolder\$scriptName"_Script_Logs" -ItemType directory
}
} Until (($scriptFolderTestPath -match "True") -and ($scriptDocumentFolderTestPath -match "True") -and ($scriptLogFolderTestPath -match "True"))
## variables for downloading and renaming code.
$date = (Get-Date -Format yyyy-MM-dd)
## ################################
## begin code for link downloading.
## ################################
## gets contents of the arguement variable.
$webTargets = Get-Content $linkList
## downloads the linked file.
Invoke-WebRequest $webTargets