Get specific data from html using Powershell - html

I would like to automate a task from my work using MS Powershell. Please, see my code below that log in the website. This code is working fine.
$username = "usern"
$password = "pass"
$ie = New-Object -com InternetExplorer.Application
$ie.visible=$true
$ie.navigate("http://www.exemple.com")
while($ie.ReadyState -ne 4) {start-sleep -m 100}
$ie.document.IHTMLDocument3_getElementByID("textfield").value = $username
$ie.document.IHTMLDocument3_getElementByID("textfield2").value = $password
$ie.document.IHTMLDocument3_getElementByID("btnLogin").Click();
Now, in order to download the report I need to extract a number from the HTML body and insert it into a variable. The reason I'm doing that is because this number changes every time I access the page. Please, see the following image, where the number is located inside the HTML Body of the webpage. It's always 12 digits:
This is my problem. I cannot get this number inside a variable. If I could, then I would finalize the Powershell code with the script below.
$output = "C:\Users\AlexSnake\Desktop\WeeklyReport\ReportName.pdf"
Invoke-WebRequest -Uri http://www.exemple.com.br/pdf_pub/xxxxxxxxxxxx.pdf -OutFile $output
Where you see 'xxx..' I would replace for the variable and download the report

After this bit of your code
while($ie.ReadyState -ne 4) {start-sleep -m 100}
Try this:
$($ie.Document.getElementsByTagName("a")).href | ForEach {
# The next line isn't necessary, but just to demonstrate iterating through all the anchor tags in the page (feel free to comment it out)
Write-Host "This is the href tag that I'm enumerating through: $_"
# And this bit checks for that number you're looking for and returns it:
if( $_ -match "javascript:openwindow('/\.\./\.\./[\d+]\.pdf'.*)" )
{
$matches[1]
}
}
This should work.

See the code below with the answer for my question.
$($ie.Document.getElementsByTagName("a")).href | ForEach {
if( $_ -match '(\d+)\.pdf' )
{
$matches[1]
}
}
Thanks!

Related

Check certain text from a webpage via powershell

I am trying to get the HTML code from an Intranet webpage and monitor if certain texts or titles exist. This powershell code will be used by my monitoring program to trigger alerts when the webpage is down so that I cannot see that certain texts or titles.
For now, I'm just using Write-Host to see if my piece of code works. I can now extract the HTML source to $output, and I am sure 'Create!' can be found inside. However, I'm not getting a 'YES'.
May I know if $output can be checked by using -contains?
Thank you very much for your help!
$targetUrl = 'https://myUrl/'
$ie = New-Object -com InternetExplorer.Application
$ie.visible=$true
$ie.navigate($targetUrl)
while($ie.Busy) {
Start-Sleep -m 2000
}
$output = $ie.Document.body.innerHTML
if($output -contains '*Create!*')
{Write-Host 'YES'}
else
{Write-Host 'NO'}
The operator -contains is used to search collections. The IE's innerHTML is just a string:
$output = $ie.Document.body.innerHTML
$output.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True String System.Object
Use pattern matching operators like, well, -like and -match.
By the way, if IE is not mandatory, try Invoke-WebRequest cmdlet.

Replace blank characters from a file line by line

I would like to be able to find all blanks from a CSV file and if a blank character is found on a line then should appear on the screen and I should be asked if I want to keep the entire line which contains that white space or remove it.
Let's say the directory is C:\Cr\Powershell\test. In there there is one CSV file abc.csv.
Tried doing it like this but in PowerShell ISE the $_.PSObject.Properties isn't recognized.
$csv = Import-Csv C:\Cr\Powershell\test\*.csv | Foreach-Object {
$_.PSObject.Properties | Foreach-Object {$_.Value = $_.Value.Trim()}
}
I apologize for not includding more code and what I tried more so far but they were silly attempts since I just begun.
This looks helpful but I don't know exactly how to adapt it for my problem.
Ok man here you go:
$yes = New-Object System.Management.Automation.Host.ChoiceDescription "&Yes", "Retain line."
$no = New-Object System.Management.Automation.Host.ChoiceDescription "&No", "Delete line."
$n = #()
$f = Get-Content .\test.csv
foreach($item in $f) {
if($item -like "* *"){
$res = $host.ui.PromptForChoice("Title", "want to keep this line? `n $item", [System.Management.Automation.Host.ChoiceDescription[]]($yes, $no), 0)
switch ($res)
{
0 {$n+=$item}
1 {}
}
} else {
$n+=$item
}
}
$n | Set-Content .\test.csv
if you have questions please post in the comments and i will explain
Get-Content is probably a better approach than Import-Csv, because that'll allow you to check an entire line for spaces instead of having to check each individual field. For fully automated processing you'd just use a Where-Object filter to remove non-matching lines from the output:
Get-Content 'C:\CrPowershell\test\input.csv' |
Where-Object { $_ -notlike '* *' } |
Set-Content 'C:\CrPowershell\test\output.csv'
However, since you want to prompt for each individual line that contains spaces you need a ForEach-Object (or a similiar construct) and a nested conditional, like this:
Get-Content 'C:\CrPowershell\test\input.csv' | ForEach-Object {
if ($_ -notlike '* *') { $_ }
} | Set-Content 'C:\CrPowershell\test\output.csv'
The simplest way to prompt a user for input is Read-Host:
$answer = Read-Host -Prompt 'Message'
if ($answer -eq 'y') {
# do one thing
} else {
# do another
}
In your particular case you'd probably do something like this for any matching line:
$anwser = Read-Host "$_`nKeep the line? [y/n] "
if ($answer -ne 'n') { $_ }
The above checks if the answer is not n to make removal of the line a conscious decision.
Other ways to prompt for user input are choice.exe (which has the additional advantage of allowing a timeout and a default answer):
choice.exe /c YN /d N /t 10 /m "$_`nKeep the line"
if ($LastExitCode -ne 2) { $_ }
or the host UI:
$title = $_
$message = 'Keep the line?'
$yes = New-Object Management.Automation.Host.ChoiceDescription '&Yes'
$no = New-Object Management.Automation.Host.ChoiceDescription '&No'
$options = [Management.Automation.Host.ChoiceDescription[]]($yes, $no)
$answer = $Host.UI.PromptForChoice($title, $message, $options, 1)
if ($answer -ne 1) { $_ }
I'm leaving it as an exercise for you to integrate whichever prompting routine you chose with the rest of the code.

Extraneous data returned from Invoke-Command

I'm working with PowerShell to gather data from a list of remote servers which I then turn into a JSON object. Everything is working fine, but I get some really weird output that I can't seem to exclude.
I've tried piping the Invoke-Command results and excluding properties. I've also tried removing the items manually from the returned hash file, but I can't seem to make them go away.
What am I missing?
EDIT:
For the sake of figuring out what's wrong here is a simplified, but still broken, script:
$returnedServer = #{}
$pass = cat "C:\...\securestring.txt" | convertto-securestring
$mycred = new-object -typename System.Management.Automation.PSCredential -argumentlist "UserName",$pass
$s = #("xx.xxx.xxx.xxx","xx.xxx.xxx.xxx")
foreach($server in $s)
{
$returnedServer.$server += ,(Invoke-Command -ComputerName $server -ScriptBlock
{
1
}-credential $mycred | select -ExcludeProperty PSComputerName,RunSpaceID,PSShowComputerName)
$returnedServer| ConvertTo-Json
Which outputs:
{
"xx.xxx.xxx.xxx": [
{
"value": 1,
"PSComputerName": "xx.xxx.xxx.xxx",
"RunspaceId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"PSShowComputerName": xxxx
}
],
"xx.xxx.xxx.xxx": [
{
"value": 1,
"PSComputerName": "xx.xxx.xxx.xxx",
"RunspaceId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"",
"PSShowComputerName": xxxx
}
]
}
This post is really old, but I was unable to find an acceptable answer 6 years later, so I wrote my own.
$invokeCommandResults | ForEach-Object {
$_.PSObject.Properties.Remove('PSComputerName')
$_.PSObject.Properties.Remove('RunspaceId')
$_.PSObject.Properties.Remove('PSShowComputerName')
}
You need to use Select-Object to limit the result to just the properties you want to show up in the JSON output:
$returnedServers.$server += ,(Invoke-Command -ComputerName $server -ScriptBlock
{
...$serverHash = various look ups and calculations...
$serverHash
} | select PropertyA, PropertyB, ...)
For a more thorough answer you need to go into far more detail about your "various look ups and calculations" as well as the actual conversion to JSON.
After some testing, it seems the problem is the object type. I was able to get your test script to work by explicitly casting the returned result.
$returnedServer = #{}
$pass = cat "C:\...\securestring.txt" | convertto-securestring
$mycred = new-object -typename System.Management.Automation.PSCredential -argumentlist "UserName",$pass
$s = #("xx.xxx.xxx.xxx","xx.xxx.xxx.xxx")
foreach($server in $s)
{
$returnedServer.$server += ,[int](Invoke-Command -ComputerName $server -ScriptBlock {1} -credential $mycred)
}
$returnedServer| ConvertTo-Json
You could try this... instead of attempting to exclude extraneous property values, just be specific and "call" or "grab" the one(s) you want.
Quick Code Shortcut Tip! BTW, the Invoke-Command -Computer $server -Scriptbock {command} can be greatly simplified using: icm $server {command}
Now, getting back on track...
Using your original post/example, it appears that you are attempting to utilize one "value" by excluding all other values, i.e. -ExcludeProperty (which it is ultra-frustrating).
Let's start by removing and replacing the only exclusion section:
select -ExcludeProperty PSComputerName,RunSpaceID,PSShowComputerName
And instead, attempt to use one of the following:
1st Method: using the modified original command...
$returnedServer.$server += ,(Invoke-Command -ComputerName $server -ScriptBlock {1}-credential $mycred).value
2nd Method: using the "icm" version...
$returnedServer.$server += ,(icm $server {1} -credential $mycred).value
Essentially, you are "picking out" the value(s) you need (vs. excluding property values, which is, again, pretty frustrating when it does NOT work).
Related Example(s) follows:
Here is a typical system Powershell/WMIC command call:
icm ServerNameGoesHere {Get-CimInstance -ClassName win32_operatingsystem}
But what if I only want the "version" from the object glob:
(icm ServerNameGoesHere {Get-CimInstance -ClassName win32_operatingsystem}).version
But, hold on, now I only want the "lastbootuptime" from the object glob:
(icm ServerNameGoesHere {Get-CimInstance -ClassName win32_operatingsystem}).lastbootuptime
Indecisively, I want to be more flexible:
$a=icm ServerNameGoesHere {Get-CimInstance -ClassName win32_operatingsystem}
$a.version
$a.lastbootuptime
$a.csname
(Makes sense?)
Good luck,
~PhilC

In Powershell Use -replace or .replace To Change TD value to a Link

I am trying to generate HTML reports of data collected in Powershell. I want to eventually have the HTML link to more specific information about the different elements in the table. In particular, the first column contains names of Virtual Machines, but this isn't particularly important. I am attempting to replace the name value with a link.
for($i=1; $i -le $xml.table.tr.Count-1; $i++){
$xml.table.tr[$i].td[0].replace($xml.table.tr[$i].td[0],"myLink")
}
$file = Join-Path C:\users\myname\Documents "VMSpecs.html"
ConvertTo-Html -Title "VM Specs" `
-CssUri C:\Users\myname\Documents\style.css `
-Body $($xml.InnerXml) | Out-file $file
Invoke-item $file
Based on the output, the value of $xml.table.tr[$i].td[0] is being replaced with the link information but not permanently. When the file is written to HTML the link information is nowhere to be found, and the original value is still held in the td el
So,big picture, I would like to know how to append a link into html -fragment at a particular location. Any samples or resources be much obliged.
Both .Replace and -replace produce a new string. What you'll want to do is reassign the results:
for($i=1; $i -le $xml.table.tr.Count-1; $i++){
$xml.table.tr[$i].td[0] = $xml.table.tr[$i].td[0].replace($xml.table.tr[$i].td[0],"myLink")
}
Although, in this particular case, there doesn't seem to be any need for the replace at all. Since you're replacing the entire content of the td, just directly assign it:
for($i=1; $i -le $xml.table.tr.Count-1; $i++){
$xml.table.tr[$i].td[0] = "myLink"
}
I initially misread your question as being about replace, rather than assigning to a node. It looks like when you get a particular element as above, PowerShell is converting it to a string, so you're not actually updating the node. Try this instead:
for($i=1; $i -le $xml.table.tr.Count-1; $i++){
($xml.table.tr[$i].GetElementsByTagName("td") | Select-Object -First 1)."#text" = "myLink"
}
The "#text" property may vary, I'm not sure. I did a test with a very simple HTML file, and by doing a Get-Member on each level of the resulting XML object, I found that that was the property I needed to update.
Hmmm ... okay, try this. Create a new XML element containing the link, and append it to the td element:
for($i=1; $i -le $xml.table.tr.Count-1; $i++){
$link = $xml.CreateElement("a")
$link.SetAttribute("href", "http://stackoverflow.com")
$link.InnerText = "myLink"
$xml.table.tr[$i].FirstChild."#text" = ""
$xml.table.tr[$i].FirstChild.AppendChild($link)
}

Powershell: Download or Save source code for whole ie page

I have this PS script it logins to a site and then it navigate's to another page.
I want to save whole source for that page. but for some reason. some parts of source code is not coming across.
$username = "myuser"
$password = "mypass"
$ie = New-Object -com InternetExplorer.Application
$ie.visible=$true
$ie.navigate("http://www.example.com/login.shtml")
while($ie.ReadyState -ne 4) {start-sleep -m 100}
$ie.document.getElementById("username").value = "$username"
$ie.document.getElementById("pass").value = "$password"
$ie.document.getElementById("frmLogin").submit()
start-sleep 5
$ie.navigate("http://www.example.com/thislink.shtml")
$ie.Document.body.outerHTML | Out-File -FilePath c:\sourcecode.txt
Here is pastebin of code which is not coming across
http://pastebin.com/Kcnht6Ry
After you navigate, check for the Ready State again instead of using a sleep. The same code that you had will work.
It appears after running the code, the sleep may not be long enough if the site is slow to load.
while($ie.ReadyState -ne 4) {start-sleep -m 100}
It also looks like there is another post regarding this
innerHTML converts CDATA to comments It looks like some one created a function on that page where you can clean it up. It would be something like this once you have the function declared in your code
htmlWithCDATASectionsToHtmlWithout($ie.Document.body.outerHTML) | Out-File -FilePath c:\sourcecode.txt
I agree with #tkrn regarding using the while loop to wait for IE document to be ready. And for that I recommend to use at least 2 seconds inside the loop.
while($ie.ReadyState -ne 4) {start-sleep -s 2}
Still I found an easier way to get the whole HTML source page exactly from the URL. Here it is:
$ie.Document.parentWindow.execScript("var JSIEVariable = new XMLSerializer().serializeToString(document);", "javascript")
$obj = $ie.Document.parentWindow.GetType().InvokeMember("JSIEVariable", 4096, $null, $ie.Document.parentWindow, $null)
$HTMLDoc = $obj.ToString()
Now, $HTMLDoc has the whole HTML source page intact and you can save it as html file.