Get values from a website - html

My problem statement goes like this - I need to pull all hotel names and corresponding price from a web portal. If not via script, this is a tedious manual process for me.
For example on following URL I need name of all hotels with corresponding prices : http://hotel.makemytrip.com/makemytrip/site/hotels/search?session_cId=1403778791562&city=SLV&country=IN&checkin=06282014&checkout=06302014&area=&roomStayQualifier=1e0e&type=&sortName=&searchText=&isBaitNWait=null&fullSearch=false
Desired Output :
Hotel Name Price
Oberoi Wildflower Hall 16,500
Hotel Chaman Palace 1,879
I am doing it in Powershell language. Basically I need to understand how to get value of one placeholder (hotel name or price).So far I have tried this.
$surl="http://hotel.makemytrip.com/makemytrip/site/hotels/search?session_cId=1403778791562&city=SLV&country=IN&checkin=06282014&checkout=06302014&area=&roomStayQualifier=1e0e&type=&sortName=&searchText=&isBaitNWait=null&fullSearch=false"
$ie = new-object -com "InternetExplorer.Application"
$ie.visible = $true
$ie.navigate($surl)
$doc = $ie.Document
$element = $doc.getElementsByClassName("hotelImgLkflL")
$element > d:\element.txt
However, I am getting following error message.
You cannot call a method on a null-valued expression.
Update : Now I am trying to do it via $web.DownloadString and figured out that the source has following pattern for all Hotel Names :
id="200701171240402395" title="Oberoi Wildflower Hall" href="/makemytrip/site/hotels/detail?
id="201111211716292072" title="Hotel Chaman Palace" href="/makemytrip/site/hotels/detail?
id="200701121106345886" title="Hotel Baljees Regency" href="/makemytrip/site/hotels/detail?
How can I proceed now ? Thanks.
Appreciate any guidance.

Navigate() runs asynchronously, so you need to wait until the website is loaded completely before you can work on it:
...
$ie.navigate($surl)
while ( $ie.ReadyState -ne 4 ) { Start-Sleep -Milliseconds 100 }
$doc = $ie.Document
...

Related

Powershell IE Automation: How to fetch delayed loading values?

I am not sure if I mentioned the correct subject. But I will try to explain.
I am trying to do a search in a website, and then based on the result, I am trying to fetch some values.
The website is https://dotdb.com/.
When I manually go to the website, and search something (lets say 'facebook.com') and hit 'Search', I get the result which looks like this:
If you search manually, then you will notice that initially there is a loading icon on the three place (highlighted in yellow in above image),and then the values get populated.
Now, when I try to perform the same operation using Powershell IE Automation, I always see the loading icon and the value never gets set there. Therefore, I am not able to fetch the value. Here is the screenshot of the output from my powershell script:
Finally here is my powershell script:
$ie = New-Object -com internetexplorer.application
$ie.visible = $true
$ie.navigate("https://dotdb.com")
while ($ie.Busy -eq $true) {
Start-Sleep -Seconds 1;
} #wait for browser idle
($ie.document.getElementsByName("keyword") | select -first 1).value = "facebook.com"
($ie.document.getElementsByTagName('button')[2]).click()
while($ie.Busy -eq $true) {
Start-Sleep -Seconds 1;
} #wait for browser idle
Please guide me to fetch those values.
Thanks in advance!

Check certain text from a webpage via powershell

I am trying to get the HTML code from an Intranet webpage and monitor if certain texts or titles exist. This powershell code will be used by my monitoring program to trigger alerts when the webpage is down so that I cannot see that certain texts or titles.
For now, I'm just using Write-Host to see if my piece of code works. I can now extract the HTML source to $output, and I am sure 'Create!' can be found inside. However, I'm not getting a 'YES'.
May I know if $output can be checked by using -contains?
Thank you very much for your help!
$targetUrl = 'https://myUrl/'
$ie = New-Object -com InternetExplorer.Application
$ie.visible=$true
$ie.navigate($targetUrl)
while($ie.Busy) {
Start-Sleep -m 2000
}
$output = $ie.Document.body.innerHTML
if($output -contains '*Create!*')
{Write-Host 'YES'}
else
{Write-Host 'NO'}
The operator -contains is used to search collections. The IE's innerHTML is just a string:
$output = $ie.Document.body.innerHTML
$output.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True String System.Object
Use pattern matching operators like, well, -like and -match.
By the way, if IE is not mandatory, try Invoke-WebRequest cmdlet.

Get specific data from html using Powershell

I would like to automate a task from my work using MS Powershell. Please, see my code below that log in the website. This code is working fine.
$username = "usern"
$password = "pass"
$ie = New-Object -com InternetExplorer.Application
$ie.visible=$true
$ie.navigate("http://www.exemple.com")
while($ie.ReadyState -ne 4) {start-sleep -m 100}
$ie.document.IHTMLDocument3_getElementByID("textfield").value = $username
$ie.document.IHTMLDocument3_getElementByID("textfield2").value = $password
$ie.document.IHTMLDocument3_getElementByID("btnLogin").Click();
Now, in order to download the report I need to extract a number from the HTML body and insert it into a variable. The reason I'm doing that is because this number changes every time I access the page. Please, see the following image, where the number is located inside the HTML Body of the webpage. It's always 12 digits:
This is my problem. I cannot get this number inside a variable. If I could, then I would finalize the Powershell code with the script below.
$output = "C:\Users\AlexSnake\Desktop\WeeklyReport\ReportName.pdf"
Invoke-WebRequest -Uri http://www.exemple.com.br/pdf_pub/xxxxxxxxxxxx.pdf -OutFile $output
Where you see 'xxx..' I would replace for the variable and download the report
After this bit of your code
while($ie.ReadyState -ne 4) {start-sleep -m 100}
Try this:
$($ie.Document.getElementsByTagName("a")).href | ForEach {
# The next line isn't necessary, but just to demonstrate iterating through all the anchor tags in the page (feel free to comment it out)
Write-Host "This is the href tag that I'm enumerating through: $_"
# And this bit checks for that number you're looking for and returns it:
if( $_ -match "javascript:openwindow('/\.\./\.\./[\d+]\.pdf'.*)" )
{
$matches[1]
}
}
This should work.
See the code below with the answer for my question.
$($ie.Document.getElementsByTagName("a")).href | ForEach {
if( $_ -match '(\d+)\.pdf' )
{
$matches[1]
}
}
Thanks!

Reuse parameterized (prepared) SQL Query

i've coded an ActiveDirectory logging system a couple of years ago...
it never become a status greater than beta but its still in use...
i got an issue reported and found out what happening...
they are serveral filds in such an ActiveDirectory Event witch are UserInputs, so i've to validate them! -- of course i didnt...
so after the first user got the brilliant idea to use singlequotes in a specific foldername it crashed my scripts - easy injection possible...
so id like to make an update using prepared statements like im using in PHP and others.
Now this is a Powershell Script.. id like to do something like this:
$MySQL-OBJ.CommandText = "INSERT INTO `table-name` (i1,i2,i3) VALUES (#k1,#k2,#k3)"
$MySQL-OBJ.Parameters.AddWithValue("#k1","value 1")
$MySQL-OBJ.Parameters.AddWithValue("#k2","value 2")
$MySQL-OBJ.Parameters.AddWithValue("#k3","value 3")
$MySQL-OBJ.ExecuteNonQuery()
This would work fine - 1 times.
My Script runs endless as a Service and loops all within a while($true) loop.
Powershell clams about the param is already set...
Exception calling "AddWithValue" with "2" argument(s): "Parameter
'#k1' has already been defined."
how i can reset this "bind" without closing the database connection?
id like the leave the connection open because the script is faster without closing and opening the connections when a event is fired (10+ / sec)
Example Code
(shortend and not tested)
##start
function db_prepare(){
$MySqlConnection = New-Object MySql.Data.MySqlClient.MySqlConnection
$MySqlConnection.ConnectionString = "server=$MySQLServerName;user id=$Username;password=$Password;database=$MySQLDatenbankName;pooling=false"
$MySqlConnection.Open()
$MySqlCommand = New-Object MySql.Data.MySqlClient.MySqlCommand
$MySqlCommand.Connection = $MySqlConnection
$MySqlCommand.CommandText = "INSERT INTO `whatever` (col1,col2...) VALUES (#va1,#va2...)"
}
while($true){
if($MySqlConnection.State -eq 'closed'){ db_prepare() }
## do the event reading and data formating stuff
## bild some variables to set as sql param values
$MySQLCommand.Parameters.AddWithValue("#va1",$variable_for_1)
$MySQLCommand.Parameters.AddWithValue("#va2",$variable_for_2)
.
.
.
Try{ $MySqlCommand.ExecuteNonQuery() | Out-Null }
Catch{ <# error handling #> }
}
Change your logic so that the db_prepare() method initializes a MySql connection and a MySql command with parameters. Set the parameter values for pre-declared parameter names in loop. Like so,
function db_prepare(){
# ...
# Add named parameters
$MySQLCommand.Parameters.Add("#val1", <datatype>)
$MySQLCommand.Parameters.Add("#val2", <datatype>)
}
while($true) {
# ...
# Set values for the named parameters
$MySQLCommand.Parameters.SetParameter("#val1", <value>)
$MySQLCommand.Parameters.SetParameter("#val2", <value>)
$MySqlCommand.ExecuteNonQuery()
# ...
}

Accessing data retuned by PowerShell Import-CSV

I am trying to write a PowerShell script that uses a CSV file as input that will turn off the clutter feature in Office 365. The CSV file has only 1 column and that has the 2 target email addresses that I am using for testing. When I run this script with a read-host line and enter a valid email address it works with no errors. When I use the CSV file errors follow.
Import-Module MSOnline
$LiveCred = Get-Credential
$Session = New-PSSession -ConfigurationName Microsoft.Exchange -ConnectionUri https://ps.outlook.com/PowerShell -Credential $LiveCred -Authentication Basic -AllowRedirection
Import-PSSession -allowclobber $Session
Connect-MsolService -Credential $LiveCred
cd c:\scripts
Write-Host "This tool removes the Clutter feature from O365 "
$Clutter = Import-Csv .\Clutteroff.csv
foreach ($user in $Clutter){
Set-Clutter -Identity $User -Enable $false
}
When I run this I get the following error :
Cannot process argument transformation on parameter 'Identity'. Cannot convert value "#{UserID=xxxxx#myCompany.com}" to type "Microsoft.Exchange.Configuration.Tasks.MailboxIdParameter". Error: "Cannot
convert hashtable to an object of the following type: Microsoft.Exchange.Configuration.Tasks.MailboxIdParameter. Hashtable-to-Object conversion is not supported in restricted language mode or a Data
section."
+ CategoryInfo : InvalidData: (:) [Set-Clutter], ParameterBindin...mationException
+ FullyQualifiedErrorId : ParameterArgumentTransformationError,Set-Clutter
+ PSComputerName : ps.outlook.com
Any help would be appreciated and explanations will get extra credit :)
CSV file = User, XXXX#MyCompany.com, YYYY#MyCompany.com
Email addresses are valid.
Putting all of the items in one line like that is not going to work well with Import-CSV. Import-CSV is suited to a table structure (columns and rows), whereas you are just using a comma-separated list (one row, with an unknown number of columns). If in fact you do have the items on different lines, then please correct the question and I'll change the answer.
To work with the data from a file formatted like that, I would just split it into an ArrayList, and remove the first item because it is "User" and not not an email address:
[System.Collections.ArrayList]$Clutter = (get-content .\Clutteroff.csv).split(",")
$Clutter.RemoveAt(0)
Then you can iterate through the array:
foreach ($user in $Clutter){
$address = $user.trim()
{Set-Clutter -Identity $address -Enable $false}
}
For the extra credit, $user in your script was returning a row of key/value pairs to represent columns (keys) and the data in the columns (values). Your error message shows #{UserID=xxxxx#myCompany.com}, so to return just the email address you would use $user.UserID to return the value for UserID.
i GOT THIS WORKING TO PULL FROM CSV SO ONLY THOSE USERS ARE MODIFIED!! SORRY FOR THE CAPS BUT I AM A TOTAL NOOB AND I COULDNT BELIEVE I GOT THIS TO WORK!!! I am beyond STOKED!! :)
the csv requires no headers, just the email address of the users you want to modify in one column
$Clutter = (Get-Content "pathofyourcsv.csv")
foreach ($User in $Clutter) {
$address = $User
Get-Mailbox -Identity $User | Set-Clutter -Enable $false}