Check certain text from a webpage via powershell - html

I am trying to get the HTML code from an Intranet webpage and monitor if certain texts or titles exist. This powershell code will be used by my monitoring program to trigger alerts when the webpage is down so that I cannot see that certain texts or titles.
For now, I'm just using Write-Host to see if my piece of code works. I can now extract the HTML source to $output, and I am sure 'Create!' can be found inside. However, I'm not getting a 'YES'.
May I know if $output can be checked by using -contains?
Thank you very much for your help!
$targetUrl = 'https://myUrl/'
$ie = New-Object -com InternetExplorer.Application
$ie.visible=$true
$ie.navigate($targetUrl)
while($ie.Busy) {
Start-Sleep -m 2000
}
$output = $ie.Document.body.innerHTML
if($output -contains '*Create!*')
{Write-Host 'YES'}
else
{Write-Host 'NO'}

The operator -contains is used to search collections. The IE's innerHTML is just a string:
$output = $ie.Document.body.innerHTML
$output.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True String System.Object
Use pattern matching operators like, well, -like and -match.
By the way, if IE is not mandatory, try Invoke-WebRequest cmdlet.

Related

How to parse the HTML of a website with PowerShell

I am trying to retrieve some information about a website, I want to look for a specific tag/class and then return the contained text value (innerHTML). This is what I have so far
$request = Invoke-WebRequest -Uri $url -UseBasicParsing
$HTML = New-Object -Com "HTMLFile"
$src = $request.RawContent
$HTML.write($src)
foreach ($obj in $HTML.all) {
$obj.getElementsByClassName('some-class-name')
}
I think there is a problem with converting the HTML into the HTML object, since I see a lot of undefined properties and empty results when I'm trying to "Select-Object" them.
So after spending two days, how am I supposed to parse HTML with Powershell?
I can't use IHTMLDocument2 methods, since I don't have Office installed (Unable to use IHTMLDocument2)
I can't use the Invoke-Webrequest without -UseBasicParsing since the Powershell hangs and spawns additional windows while accessing the ParsedHTML property (parsedhtml doesnt respond anymore and Using Invoke-Webrequest in PowerShell 3.0 spawns a Windows Security Warning)
So since parsing HTML with regex is such a big no-no, how do I do it otherwise? Nothing seems to work.
Since noone else has posted an answer, I managed to get a working solution with the following code:
$request = Invoke-WebRequest -Uri $URL -UseBasicParsing
$HTML = New-Object -Com "HTMLFile"
[string]$htmlBody = $request.Content
$HTML.write([ref]$htmlBody)
$filter = $HTML.getElementsByClassName($htmlClassName)
With some URLs I experienced that the $filter variable was empty while it was populated for other URLs. All in all this might work for your situation but it seems like Powershell isn't the way to go for more complex parsing.
In 2020 with PowerShell 5+ you do it like this:
$searchClass = "banana" <# in this example we parse all elements of class "banana" but you can use any class name you wish #>
$myURI = "url.com" <# replace url.com with any website you want to scrape from #>
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12 <# using TLS 1.2 is vitally important #>
$req = Invoke-Webrequest -URI $myURI
$req.ParsedHtml.getElementsByClassName($searchClass) | %{Write-Host $_.innerhtml}
#for extra credit we can parse all the links
$req.ParsedHtml.getElementsByTagName('a') | %{Write-Host $_.href} #outputs all the links

Get specific data from html using Powershell

I would like to automate a task from my work using MS Powershell. Please, see my code below that log in the website. This code is working fine.
$username = "usern"
$password = "pass"
$ie = New-Object -com InternetExplorer.Application
$ie.visible=$true
$ie.navigate("http://www.exemple.com")
while($ie.ReadyState -ne 4) {start-sleep -m 100}
$ie.document.IHTMLDocument3_getElementByID("textfield").value = $username
$ie.document.IHTMLDocument3_getElementByID("textfield2").value = $password
$ie.document.IHTMLDocument3_getElementByID("btnLogin").Click();
Now, in order to download the report I need to extract a number from the HTML body and insert it into a variable. The reason I'm doing that is because this number changes every time I access the page. Please, see the following image, where the number is located inside the HTML Body of the webpage. It's always 12 digits:
This is my problem. I cannot get this number inside a variable. If I could, then I would finalize the Powershell code with the script below.
$output = "C:\Users\AlexSnake\Desktop\WeeklyReport\ReportName.pdf"
Invoke-WebRequest -Uri http://www.exemple.com.br/pdf_pub/xxxxxxxxxxxx.pdf -OutFile $output
Where you see 'xxx..' I would replace for the variable and download the report
After this bit of your code
while($ie.ReadyState -ne 4) {start-sleep -m 100}
Try this:
$($ie.Document.getElementsByTagName("a")).href | ForEach {
# The next line isn't necessary, but just to demonstrate iterating through all the anchor tags in the page (feel free to comment it out)
Write-Host "This is the href tag that I'm enumerating through: $_"
# And this bit checks for that number you're looking for and returns it:
if( $_ -match "javascript:openwindow('/\.\./\.\./[\d+]\.pdf'.*)" )
{
$matches[1]
}
}
This should work.
See the code below with the answer for my question.
$($ie.Document.getElementsByTagName("a")).href | ForEach {
if( $_ -match '(\d+)\.pdf' )
{
$matches[1]
}
}
Thanks!

Extraneous data returned from Invoke-Command

I'm working with PowerShell to gather data from a list of remote servers which I then turn into a JSON object. Everything is working fine, but I get some really weird output that I can't seem to exclude.
I've tried piping the Invoke-Command results and excluding properties. I've also tried removing the items manually from the returned hash file, but I can't seem to make them go away.
What am I missing?
EDIT:
For the sake of figuring out what's wrong here is a simplified, but still broken, script:
$returnedServer = #{}
$pass = cat "C:\...\securestring.txt" | convertto-securestring
$mycred = new-object -typename System.Management.Automation.PSCredential -argumentlist "UserName",$pass
$s = #("xx.xxx.xxx.xxx","xx.xxx.xxx.xxx")
foreach($server in $s)
{
$returnedServer.$server += ,(Invoke-Command -ComputerName $server -ScriptBlock
{
1
}-credential $mycred | select -ExcludeProperty PSComputerName,RunSpaceID,PSShowComputerName)
$returnedServer| ConvertTo-Json
Which outputs:
{
"xx.xxx.xxx.xxx": [
{
"value": 1,
"PSComputerName": "xx.xxx.xxx.xxx",
"RunspaceId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"PSShowComputerName": xxxx
}
],
"xx.xxx.xxx.xxx": [
{
"value": 1,
"PSComputerName": "xx.xxx.xxx.xxx",
"RunspaceId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"",
"PSShowComputerName": xxxx
}
]
}
This post is really old, but I was unable to find an acceptable answer 6 years later, so I wrote my own.
$invokeCommandResults | ForEach-Object {
$_.PSObject.Properties.Remove('PSComputerName')
$_.PSObject.Properties.Remove('RunspaceId')
$_.PSObject.Properties.Remove('PSShowComputerName')
}
You need to use Select-Object to limit the result to just the properties you want to show up in the JSON output:
$returnedServers.$server += ,(Invoke-Command -ComputerName $server -ScriptBlock
{
...$serverHash = various look ups and calculations...
$serverHash
} | select PropertyA, PropertyB, ...)
For a more thorough answer you need to go into far more detail about your "various look ups and calculations" as well as the actual conversion to JSON.
After some testing, it seems the problem is the object type. I was able to get your test script to work by explicitly casting the returned result.
$returnedServer = #{}
$pass = cat "C:\...\securestring.txt" | convertto-securestring
$mycred = new-object -typename System.Management.Automation.PSCredential -argumentlist "UserName",$pass
$s = #("xx.xxx.xxx.xxx","xx.xxx.xxx.xxx")
foreach($server in $s)
{
$returnedServer.$server += ,[int](Invoke-Command -ComputerName $server -ScriptBlock {1} -credential $mycred)
}
$returnedServer| ConvertTo-Json
You could try this... instead of attempting to exclude extraneous property values, just be specific and "call" or "grab" the one(s) you want.
Quick Code Shortcut Tip! BTW, the Invoke-Command -Computer $server -Scriptbock {command} can be greatly simplified using: icm $server {command}
Now, getting back on track...
Using your original post/example, it appears that you are attempting to utilize one "value" by excluding all other values, i.e. -ExcludeProperty (which it is ultra-frustrating).
Let's start by removing and replacing the only exclusion section:
select -ExcludeProperty PSComputerName,RunSpaceID,PSShowComputerName
And instead, attempt to use one of the following:
1st Method: using the modified original command...
$returnedServer.$server += ,(Invoke-Command -ComputerName $server -ScriptBlock {1}-credential $mycred).value
2nd Method: using the "icm" version...
$returnedServer.$server += ,(icm $server {1} -credential $mycred).value
Essentially, you are "picking out" the value(s) you need (vs. excluding property values, which is, again, pretty frustrating when it does NOT work).
Related Example(s) follows:
Here is a typical system Powershell/WMIC command call:
icm ServerNameGoesHere {Get-CimInstance -ClassName win32_operatingsystem}
But what if I only want the "version" from the object glob:
(icm ServerNameGoesHere {Get-CimInstance -ClassName win32_operatingsystem}).version
But, hold on, now I only want the "lastbootuptime" from the object glob:
(icm ServerNameGoesHere {Get-CimInstance -ClassName win32_operatingsystem}).lastbootuptime
Indecisively, I want to be more flexible:
$a=icm ServerNameGoesHere {Get-CimInstance -ClassName win32_operatingsystem}
$a.version
$a.lastbootuptime
$a.csname
(Makes sense?)
Good luck,
~PhilC

Powershell: Download or Save source code for whole ie page

I have this PS script it logins to a site and then it navigate's to another page.
I want to save whole source for that page. but for some reason. some parts of source code is not coming across.
$username = "myuser"
$password = "mypass"
$ie = New-Object -com InternetExplorer.Application
$ie.visible=$true
$ie.navigate("http://www.example.com/login.shtml")
while($ie.ReadyState -ne 4) {start-sleep -m 100}
$ie.document.getElementById("username").value = "$username"
$ie.document.getElementById("pass").value = "$password"
$ie.document.getElementById("frmLogin").submit()
start-sleep 5
$ie.navigate("http://www.example.com/thislink.shtml")
$ie.Document.body.outerHTML | Out-File -FilePath c:\sourcecode.txt
Here is pastebin of code which is not coming across
http://pastebin.com/Kcnht6Ry
After you navigate, check for the Ready State again instead of using a sleep. The same code that you had will work.
It appears after running the code, the sleep may not be long enough if the site is slow to load.
while($ie.ReadyState -ne 4) {start-sleep -m 100}
It also looks like there is another post regarding this
innerHTML converts CDATA to comments It looks like some one created a function on that page where you can clean it up. It would be something like this once you have the function declared in your code
htmlWithCDATASectionsToHtmlWithout($ie.Document.body.outerHTML) | Out-File -FilePath c:\sourcecode.txt
I agree with #tkrn regarding using the while loop to wait for IE document to be ready. And for that I recommend to use at least 2 seconds inside the loop.
while($ie.ReadyState -ne 4) {start-sleep -s 2}
Still I found an easier way to get the whole HTML source page exactly from the URL. Here it is:
$ie.Document.parentWindow.execScript("var JSIEVariable = new XMLSerializer().serializeToString(document);", "javascript")
$obj = $ie.Document.parentWindow.GetType().InvokeMember("JSIEVariable", 4096, $null, $ie.Document.parentWindow, $null)
$HTMLDoc = $obj.ToString()
Now, $HTMLDoc has the whole HTML source page intact and you can save it as html file.

Search Variable for String then Run Conditional Statement in PowerShell

I am pretty new with PowerShell. I was recently tasked with making a error message popup that would help a local user determine whether or not a MS SQL on-demand DB merge worked or not. I wrote a script that woudld do the following:
Run the batch file that conducted the merge
Read the results of a text log file into a variable
Check the variable for any instances of the word "ERROR" and pop a success or fail dialog box depending on whether or not it found the word error in the log file.
Quick and simple I thought but I appear to be struggling with the conditional statement. Here is the script:
cmd /c c:\users\PERSON\desktop\merge.bat
$c = get-content c:\replmerg.log
if ($c -contains ("ERROR"))
{
$a = new-object -comobject wscript.shell
$b = $a.popup(“ERROR - Database Merge“,0,”Please Contact Support”,1)
}
else
{
$a = new-object -comobject wscript.shell
$b = $a.popup(“SUCCESS - Database Merge“,0,”Good Job!”,1)
}
Right now what happens is that the script runs and just skips to the Success message. I can confirm that simply running the 'get-content' command in powershell will on its own produce a variable that I can then call and show the content of the log file. My script however does not appear as though it is actually checking the $c variable for the word and then popping the error message as intended. What am I missing here?
Christian's answer is correct. You could also use the -match operator. For example:
if ((Get-Content c:\replmerg.log) -match "ERROR")
{
'do error stuff'
}
else
{
'do success stuff'
}
You can use -cmatch if you want a case sensitive comparison.
You actually don't need to use Get-Content at all. Select-String can take a -path parameter. I created two very simple text files, one which has the word ERROR and one which does not
PS C:\> cat .\noerror.txt
not in here
PS C:\> cat .\haserror.txt
ERROR in here
this has ERROR in here
PS C:\> if ( Select-String -Path .\noerror.txt -Pattern ERROR) {"Has Error"}
PS C:\> if ( Select-String -Path .\haserror.txt -Pattern ERROR) {"Has Error"}
Has Error
PS C:\>
The one thing that might trip you up is that the -pattern actually takes a regular expression, so be careful of what you use for your pattern. THis will find ERROR anywhere in the log file, even if there are multiple instances, like in my "haserror.txt" file.
The -contains operator is used for looking for an exact match in a list (or array). As the other answers indicate, you should use -match, -like, or -eq to compare strings.
You can use the -quiet switch of select-string if you just wnat to test for the presence of a string in a file.
select-string -path c:\replmerg.log -pattern "ERROR" -casesensetive -quiet
Will return $true if the string is found in the file, and $false if it is not.
contain operator test only an identical value (not part of a value).
You can try this
$c = get-content c:\replmerg.log | select-string "ERROR" -casesensitive
if ($c.length -gt 0)