Powershell: Download or Save source code for whole ie page - html

I have this PS script it logins to a site and then it navigate's to another page.
I want to save whole source for that page. but for some reason. some parts of source code is not coming across.
$username = "myuser"
$password = "mypass"
$ie = New-Object -com InternetExplorer.Application
$ie.visible=$true
$ie.navigate("http://www.example.com/login.shtml")
while($ie.ReadyState -ne 4) {start-sleep -m 100}
$ie.document.getElementById("username").value = "$username"
$ie.document.getElementById("pass").value = "$password"
$ie.document.getElementById("frmLogin").submit()
start-sleep 5
$ie.navigate("http://www.example.com/thislink.shtml")
$ie.Document.body.outerHTML | Out-File -FilePath c:\sourcecode.txt
Here is pastebin of code which is not coming across
http://pastebin.com/Kcnht6Ry

After you navigate, check for the Ready State again instead of using a sleep. The same code that you had will work.
It appears after running the code, the sleep may not be long enough if the site is slow to load.
while($ie.ReadyState -ne 4) {start-sleep -m 100}
It also looks like there is another post regarding this
innerHTML converts CDATA to comments It looks like some one created a function on that page where you can clean it up. It would be something like this once you have the function declared in your code
htmlWithCDATASectionsToHtmlWithout($ie.Document.body.outerHTML) | Out-File -FilePath c:\sourcecode.txt

I agree with #tkrn regarding using the while loop to wait for IE document to be ready. And for that I recommend to use at least 2 seconds inside the loop.
while($ie.ReadyState -ne 4) {start-sleep -s 2}
Still I found an easier way to get the whole HTML source page exactly from the URL. Here it is:
$ie.Document.parentWindow.execScript("var JSIEVariable = new XMLSerializer().serializeToString(document);", "javascript")
$obj = $ie.Document.parentWindow.GetType().InvokeMember("JSIEVariable", 4096, $null, $ie.Document.parentWindow, $null)
$HTMLDoc = $obj.ToString()
Now, $HTMLDoc has the whole HTML source page intact and you can save it as html file.

Related

Powershell IE Automation: How to fetch delayed loading values?

I am not sure if I mentioned the correct subject. But I will try to explain.
I am trying to do a search in a website, and then based on the result, I am trying to fetch some values.
The website is https://dotdb.com/.
When I manually go to the website, and search something (lets say 'facebook.com') and hit 'Search', I get the result which looks like this:
If you search manually, then you will notice that initially there is a loading icon on the three place (highlighted in yellow in above image),and then the values get populated.
Now, when I try to perform the same operation using Powershell IE Automation, I always see the loading icon and the value never gets set there. Therefore, I am not able to fetch the value. Here is the screenshot of the output from my powershell script:
Finally here is my powershell script:
$ie = New-Object -com internetexplorer.application
$ie.visible = $true
$ie.navigate("https://dotdb.com")
while ($ie.Busy -eq $true) {
Start-Sleep -Seconds 1;
} #wait for browser idle
($ie.document.getElementsByName("keyword") | select -first 1).value = "facebook.com"
($ie.document.getElementsByTagName('button')[2]).click()
while($ie.Busy -eq $true) {
Start-Sleep -Seconds 1;
} #wait for browser idle
Please guide me to fetch those values.
Thanks in advance!

Check certain text from a webpage via powershell

I am trying to get the HTML code from an Intranet webpage and monitor if certain texts or titles exist. This powershell code will be used by my monitoring program to trigger alerts when the webpage is down so that I cannot see that certain texts or titles.
For now, I'm just using Write-Host to see if my piece of code works. I can now extract the HTML source to $output, and I am sure 'Create!' can be found inside. However, I'm not getting a 'YES'.
May I know if $output can be checked by using -contains?
Thank you very much for your help!
$targetUrl = 'https://myUrl/'
$ie = New-Object -com InternetExplorer.Application
$ie.visible=$true
$ie.navigate($targetUrl)
while($ie.Busy) {
Start-Sleep -m 2000
}
$output = $ie.Document.body.innerHTML
if($output -contains '*Create!*')
{Write-Host 'YES'}
else
{Write-Host 'NO'}
The operator -contains is used to search collections. The IE's innerHTML is just a string:
$output = $ie.Document.body.innerHTML
$output.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True String System.Object
Use pattern matching operators like, well, -like and -match.
By the way, if IE is not mandatory, try Invoke-WebRequest cmdlet.

Get specific data from html using Powershell

I would like to automate a task from my work using MS Powershell. Please, see my code below that log in the website. This code is working fine.
$username = "usern"
$password = "pass"
$ie = New-Object -com InternetExplorer.Application
$ie.visible=$true
$ie.navigate("http://www.exemple.com")
while($ie.ReadyState -ne 4) {start-sleep -m 100}
$ie.document.IHTMLDocument3_getElementByID("textfield").value = $username
$ie.document.IHTMLDocument3_getElementByID("textfield2").value = $password
$ie.document.IHTMLDocument3_getElementByID("btnLogin").Click();
Now, in order to download the report I need to extract a number from the HTML body and insert it into a variable. The reason I'm doing that is because this number changes every time I access the page. Please, see the following image, where the number is located inside the HTML Body of the webpage. It's always 12 digits:
This is my problem. I cannot get this number inside a variable. If I could, then I would finalize the Powershell code with the script below.
$output = "C:\Users\AlexSnake\Desktop\WeeklyReport\ReportName.pdf"
Invoke-WebRequest -Uri http://www.exemple.com.br/pdf_pub/xxxxxxxxxxxx.pdf -OutFile $output
Where you see 'xxx..' I would replace for the variable and download the report
After this bit of your code
while($ie.ReadyState -ne 4) {start-sleep -m 100}
Try this:
$($ie.Document.getElementsByTagName("a")).href | ForEach {
# The next line isn't necessary, but just to demonstrate iterating through all the anchor tags in the page (feel free to comment it out)
Write-Host "This is the href tag that I'm enumerating through: $_"
# And this bit checks for that number you're looking for and returns it:
if( $_ -match "javascript:openwindow('/\.\./\.\./[\d+]\.pdf'.*)" )
{
$matches[1]
}
}
This should work.
See the code below with the answer for my question.
$($ie.Document.getElementsByTagName("a")).href | ForEach {
if( $_ -match '(\d+)\.pdf' )
{
$matches[1]
}
}
Thanks!

embedded html powershell email compressing length of pic [duplicate]

I have a powershell script that embedds (not attaches) a picture and sends an email. The picture has increased now to 1500x5000 pixels and now I'm seeing that the pictures lenth gets compressed and it distorts the picture. How ever, when I manually insert the picture via outlook and send an email, it looks fine.
If i save the picture and then open it via paint or something, the picture opens fine. It just looks compressed in the email. Anyone know what may be going on there?
{
$Application = "C:\Autobatch\Spotfire.Dxp.Automation.ClientJobSender.exe"
$Arguments = "http://s.net:8070/spotfireautomation/JobExecutor.asmx C:\Autobatch\HourlyStats.xml"
$CommandLine = "{0} {1}" -f $Application,$Arguments
invoke-expression $CommandLine
$file = "C:\Autobatch\HourlyStats.png"
$smtpServer = "smtp.staom.sec.s.net"
$att = new-object Net.Mail.Attachment($file)
$att.ContentType.MediaType = “image/png”
$att.ContentId = “pict”
$att.TransferEncoding = [System.Net.Mime.TransferEncoding]::Base64
$msg = new-object Net.Mail.MailMessage
$smtp = new-object Net.Mail.SmtpClient($smtpServer)
$msg.Attachments.Add($att)
$msg.From = "d.k#s.com"
$msg.To.Add("r.p#p.com")
$msg.Subject = "Voice and Data Hourly Stats"
$msg.Body = "<p style=’font-family: Calibri, sans-serif’>
Voice and data hourly stats details<br />
</p>
<img src='cid:pict'/>"
$msg.IsBodyHTML = $true
$smtp.Send($msg)
$att.Dispose()
invoke-expression "DEL $file"
}
here is what the picture looks like in the email.
Try adding
$att.ContentDisposition.Inline = $true
I suspect some default behavior is happening under the covers and it's just not consistent between the script and Outlook.
More info here
It seems like your email client shrinks content to a certain maximum size. Try putting <img src='cid:pict'/> in a <div> environment:
<div style="overflow: scroll">
<img src='cid:pict'/>
</div>
Also, if you have any way to retrieve the actual pixel width of the image, you can try to set the CSS of the <img> tag accordingly.
By asking this I may sound like a noob, but Just out of Curiosity, if you have a manual way to send an email via Outlook, why not to make a script to send an automated email with desired screenshot?
IDK, if this might help you or not, but I had made this script long back, for my daily reporting purposes. Well, it fits the bill. Sharing it here, for your views on it.
#In this segment, I navigate IE to my specific destination, screen which I want to capture.
$ie = New-Object -ComObject InternetExplorer.Application
$ie.Visible = $true;
$Website = $ie.navigate('https://put.your.URL.here')
while($Website.Busy){Start-Sleep -Seconds 5}
#In this class, script captures the screen, once, all the data loading is over.
$file = "C:\Users\Desktop\$(Get-Date -Format dd-MM-yyyy-hhmm).bmp"
#P.S. I made it to save that screenshot with current date and time format. Also, default screenshot will be captured in .BMP format.
Add-Type -AssemblyName System.Windows.Forms
Add-type -AssemblyName System.Drawing
$Screen = [System.Windows.Forms.SystemInformation]::VirtualScreen
$width = $Screen.width
$Height = $Screen.Height
$Left = $Screen.Left
$Right = $Screen.Right
$Top = $Screen.Top
$Bottom = $Screen.Bottom
$bitmap = New-Object System.Drawing.Bitmap $width, $Height
$Graphics = [System.Drawing.Graphics]::FromImage($bitmap)
$Graphics.CopyFromScreen($Left, $Top, 0, 0, $bitmap.Size)
$bitmap.Save($File)
Write-Output "Screenshot saved to:"
Write-Output $File
sleep -Seconds 5
#Sending an Email
$Outlook = New-Object -ComObject Outlook.Application
$mail = $Outlook.CreateItem(0)
$mail.To = "your.designated#emailid.com"
$mail.Subject = "Outstanding data as on $(Get-Date -Format dd-MM-yyyy)"
$mail.Body = "PFA screenshot, of all outstanding formation as on $(Get-Date -Format dd-MM-yyyy-hhmm)"
$mail.Attachments.Add($file)
$mail.Send()
I am just answering this, since, I tried commenting above, but I guess, my reputation score is way too less to do that.
Hope this might be helpful for you to find a workaround.
Happy coding. :)
HTML code: is <img src='cid:pict'/> supposed to be <img src='cid:pict'> -just remove the forward slash?
Added: This link might help talking about embedding pic in email. base64 encoded images in email signatures. You can try generation base64 code and put it in email body HTML.

embedded html powershell email compressing pic length

I have a powershell script that embedds (not attaches) a picture and sends an email. The picture has increased now to 1500x5000 pixels and now I'm seeing that the pictures lenth gets compressed and it distorts the picture. How ever, when I manually insert the picture via outlook and send an email, it looks fine.
If i save the picture and then open it via paint or something, the picture opens fine. It just looks compressed in the email. Anyone know what may be going on there?
{
$Application = "C:\Autobatch\Spotfire.Dxp.Automation.ClientJobSender.exe"
$Arguments = "http://s.net:8070/spotfireautomation/JobExecutor.asmx C:\Autobatch\HourlyStats.xml"
$CommandLine = "{0} {1}" -f $Application,$Arguments
invoke-expression $CommandLine
$file = "C:\Autobatch\HourlyStats.png"
$smtpServer = "smtp.staom.sec.s.net"
$att = new-object Net.Mail.Attachment($file)
$att.ContentType.MediaType = “image/png”
$att.ContentId = “pict”
$att.TransferEncoding = [System.Net.Mime.TransferEncoding]::Base64
$msg = new-object Net.Mail.MailMessage
$smtp = new-object Net.Mail.SmtpClient($smtpServer)
$msg.Attachments.Add($att)
$msg.From = "d.k#s.com"
$msg.To.Add("r.p#p.com")
$msg.Subject = "Voice and Data Hourly Stats"
$msg.Body = "<p style=’font-family: Calibri, sans-serif’>
Voice and data hourly stats details<br />
</p>
<img src='cid:pict'/>"
$msg.IsBodyHTML = $true
$smtp.Send($msg)
$att.Dispose()
invoke-expression "DEL $file"
}
here is what the picture looks like in the email.
Try adding
$att.ContentDisposition.Inline = $true
I suspect some default behavior is happening under the covers and it's just not consistent between the script and Outlook.
More info here
It seems like your email client shrinks content to a certain maximum size. Try putting <img src='cid:pict'/> in a <div> environment:
<div style="overflow: scroll">
<img src='cid:pict'/>
</div>
Also, if you have any way to retrieve the actual pixel width of the image, you can try to set the CSS of the <img> tag accordingly.
By asking this I may sound like a noob, but Just out of Curiosity, if you have a manual way to send an email via Outlook, why not to make a script to send an automated email with desired screenshot?
IDK, if this might help you or not, but I had made this script long back, for my daily reporting purposes. Well, it fits the bill. Sharing it here, for your views on it.
#In this segment, I navigate IE to my specific destination, screen which I want to capture.
$ie = New-Object -ComObject InternetExplorer.Application
$ie.Visible = $true;
$Website = $ie.navigate('https://put.your.URL.here')
while($Website.Busy){Start-Sleep -Seconds 5}
#In this class, script captures the screen, once, all the data loading is over.
$file = "C:\Users\Desktop\$(Get-Date -Format dd-MM-yyyy-hhmm).bmp"
#P.S. I made it to save that screenshot with current date and time format. Also, default screenshot will be captured in .BMP format.
Add-Type -AssemblyName System.Windows.Forms
Add-type -AssemblyName System.Drawing
$Screen = [System.Windows.Forms.SystemInformation]::VirtualScreen
$width = $Screen.width
$Height = $Screen.Height
$Left = $Screen.Left
$Right = $Screen.Right
$Top = $Screen.Top
$Bottom = $Screen.Bottom
$bitmap = New-Object System.Drawing.Bitmap $width, $Height
$Graphics = [System.Drawing.Graphics]::FromImage($bitmap)
$Graphics.CopyFromScreen($Left, $Top, 0, 0, $bitmap.Size)
$bitmap.Save($File)
Write-Output "Screenshot saved to:"
Write-Output $File
sleep -Seconds 5
#Sending an Email
$Outlook = New-Object -ComObject Outlook.Application
$mail = $Outlook.CreateItem(0)
$mail.To = "your.designated#emailid.com"
$mail.Subject = "Outstanding data as on $(Get-Date -Format dd-MM-yyyy)"
$mail.Body = "PFA screenshot, of all outstanding formation as on $(Get-Date -Format dd-MM-yyyy-hhmm)"
$mail.Attachments.Add($file)
$mail.Send()
I am just answering this, since, I tried commenting above, but I guess, my reputation score is way too less to do that.
Hope this might be helpful for you to find a workaround.
Happy coding. :)
HTML code: is <img src='cid:pict'/> supposed to be <img src='cid:pict'> -just remove the forward slash?
Added: This link might help talking about embedding pic in email. base64 encoded images in email signatures. You can try generation base64 code and put it in email body HTML.