search embeded webpage source in vb.net - html

I wrote a program that includes an embedded web browser that loads a website which have a changing part (the part changes about 2 times a week and it have no regular timing pattern) that I want to search for a particular part in the opened webpage source code after refreshing the webpage in a specified time interval.
I found many things similar to my question but this is what I want and those questions doesn't have:
search embedded webpage source (they searching the webpage without embedding, and I had to embed it because I had to login before I see the particular page)
so this is the procedure I'm trying to do:
1- open a website in embedded web browser
2- after user logged in, with a press of button in program, it hides the embedded
web browser and start to refresh the page in a time interval (like
every minute) and search if the particular code changed in the source of
that opened webpage
any other/better Ideas appreciated
thanks

Many years ago I wrote an app to reintegrate forum posts from several pages into one and I struggled with the login issue too and thought it was only possible using an embedded browser. As it turns out, it's possible to use System.Net in .NET to handle web pages that need a login as you can pull the cookies out and keep them on hand. I would suggest you do that and move away from the embedded browser.
Unfortunately I wrote the code in C# originally, but as it's .NET and is mostly classes-based, it shouldn't be too difficult to port over.
The Basic Principle
Find out what information is included in the POST when you login, which you can do in Chrome with developer mode on (F12). Convert that to a byteArray, POST it to the page, store the cookies and make another call with the cookie data later on. You will need a class variable to hold the cookies.
Code:
private void Login()
{
byte[] byteArray = Encoding.UTF8.GetBytes("username=" + username + "&password=" + password + "&autologin=on&login=Log+in"); // Found by investigation
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("yourURL");
request.AllowAutoRedirect = false;
request.CookieContainer = new CookieContainer();
request.Method = "POST";
request.ContentLength = byteArray.Length;
request.ContentType = "application/x-www-form-urlencoded";
Stream dataStream = request.GetRequestStream();
dataStream.Write(byteArray, 0, byteArray.Length);
dataStream.Close();
WebResponse response = request.GetResponse();
if (((HttpWebResponse)response).StatusCode == HttpStatusCode.Found)
{
// Well done, your login has been accepted
loginDone = true;
cookies = request.CookieContainer;
}
else
{
// If at first you don't succeed...
}
response.Close();
}
private string GetResponseHTML(string url)
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.AllowAutoRedirect = false;
// Add cookies from Login()
request.CookieContainer = cookies;
request.ContentType = "application/x-www-form-urlencoded";
WebResponse response = request.GetResponse();
string sResponse = "";
StreamReader reader = null;
if (((HttpWebResponse)response).StatusCode == HttpStatusCode.OK)
{
reader = new StreamReader(response.GetResponseStream());
sResponse = reader.ReadToEnd();
reader.Close();
}
response.Close();
return sResponse;
}
Hope that helps.

I had to change to C# and I found what I was looking for:
string webPageSource = webBrowser1.DocumentText;
That gave me the source of web page opened in webBrowser1 control.

Related

HTML5 audio seek is not working properly. Throws Response Content-Length mismatch Exception

I'm trying to stream audio file to Angular application where is html5 audio element and src set to my api end point (example. /audio/234). My backend is implemented with .NET Core 2.0. I have implemented already this kind of streaming: .NET Core| MVC pass audio file to html5 player. Enable seeking
Seek works if I don't seek to end of file immediately when audio starts playing. I use audio element's autoplay attribute to start playing immediately audio element has enough data. So in my situation audio element has not yet all the data when I seek so it make new GET to my API. In that situation in my backend log there is this Exception:
fail: Microsoft.AspNetCore.Server.Kestrel[13]
[1] Connection id "0HL9V370HAF39", Request id "0HL9V370HAF39:00000001": An unhandled exception was thrown by the application.
[1] System.InvalidOperationException: Response Content-Length mismatch: too few bytes written (0 of 6126919).
Here is my audio controller GET method.
byte[] audioArray = new byte[0];
//Here I load audio file from cloud
long fSize = audioArray.Length;
long startbyte = 0;
long endbyte = fSize - 1;
int statusCode = 200;
var rangeRequest = Request.Headers["Range"].ToString();
_logger.LogWarning(rangeRequest);
if (rangeRequest != "")
{
string[] range = Request.Headers["Range"].ToString().Split(new char[] { '=', '-' });
startbyte = Convert.ToInt64(range[1]);
if (range.Length > 2 && range[2] != "") endbyte = Convert.ToInt64(range[2]);
if (startbyte != 0 || endbyte != fSize - 1 || range.Length > 2 && range[2] == "")
{ statusCode = 206; }
}
_logger.LogWarning(startbyte.ToString());
long desSize = endbyte - startbyte + 1;
_logger.LogWarning(desSize.ToString());
_logger.LogWarning(fSize.ToString());
Response.StatusCode = statusCode;
Response.ContentType = "audio/mp3";
Response.Headers.Add("Content-Accept", Response.ContentType);
Response.Headers.Add("Content-Length", desSize.ToString());
Response.Headers.Add("Content-Range", string.Format("bytes {0}-{1}/{2}", startbyte, endbyte, fSize));
Response.Headers.Add("Accept-Ranges", "bytes");
Response.Headers.Remove("Cache-Control");
var stream = new MemoryStream(audioArray, (int)startbyte, (int)desSize);
return new FileStreamResult(stream, Response.ContentType)
{
FileDownloadName = track.Name
};
Am I missing some Header or what?
I didn't get this exception with .NET Core 1.1 but I'm not sure is it just coincident and/or bad testing. But if anybody has information is there something changed in .NET Core related to streaming I will appreciate that info.
Now when I research more I found this: https://learn.microsoft.com/en-us/aspnet/core/aspnetcore-2.0 look Enhanced HTTP header support- heading. It says this
If an application visitor requests content with a Range Request header, ASP.NET will recognize that and handle that header. If the requested content can be partially delivered, ASP.NET will appropriately skip and return just the requested set of bytes. You do not need to write any special handlers into your methods to adapt or handle this feature; it is automatically handled for you.
So all I need is some clean up when I move to .NET Core 1.1 to 2.0 because there is already handler for those headers.
byte[] audioArray = new byte[0];
//Here I get my MP3 file from cloud
var stream = new MemoryStream(audioArray);
return new FileStreamResult(stream, "audio/mp3")
{
FileDownloadName = track.Name
};
Problem was in Headers. I don't know exactly which header was incorrect or was my stream initialization incorrect but now It's working. I used this https://stackoverflow.com/a/35920244/8081009 . Only change I make this was renamed it as AudioStreamResult. And then I used it like this:
Response.ContentType = "audio/mp3";
Response.Headers.Add("Content-Accept", Response.ContentType);
Response.Headers.Remove("Cache-Control");
var stream = new MemoryStream(audioArray);
return new AudioStreamResult(stream, Response.ContentType)
{
FileDownloadName = track.Name
};
Notice that I pass full stream to AudioStreamResult.
var stream = new MemoryStream(audioArray);

Flickr API image unavailable windows phone

I'm trying to use flicke's api to import images into a Windows Phone app and display them on the phones panoramic dispaly.
I'm new to flickr's API and am stuck ATM.
I've tried the following call:
// original string flickString = "https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=cc9babb2754c1d29837bea480c97013e&text=game+of+thrones&format=json&nojsoncallback=1&api_sig=bb86a60e9e42f31950bf53d25fc45f08";
string flickString = "https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=cc9babb2754c1d29837bea480c97013e&text=game+of+thrones&extras=url_sq%2C+url_t%2C+url_s%2C+url_q%2C+url_m%2C+url_n%2C+url_z%2C+url_c%2C+url_l%2C+url_o+&format=json&nojsoncallback=1&api_sig=9e74e094d8c6a7496fc66e070f5c0898";
var baseUrl = string.Format(flickString, flickrAPIKey);
string flickrResult = await client.GetStringAsync(baseUrl);
FlickrData flickrApiData = JsonConvert.DeserializeObject<FlickrData>(flickrResult);
if(flickrApiData.stat == "ok")
{
foreach (Photo data in flickrApiData.photos.photo)
{
// To retrieve one photo
// http://farm{farmid}.staticflickr.com/{server-id}/{id}_{secret}{size}.jpeg
//string photoUrl = "http://farm{0}.staticflickr.com/{1}/{2}_{3}_o.jpeg";
//string photoUrl = "http://farm{0}.staticflickr.com/{1}/{2}_{3}_b.jpeg";
//string photoUrl = "http://farm{0}.staticflickr.com/{1}/{2}_{3}_n.jpg";
string photoUrl = "http://farm{0}.staticflickr.com/{1}/{2}_{3}"
string baseFlickrUrl = string.Format(photoUrl,
data.farm,
data.server,
data.id,
data.secret);
flickr1Image.Source = new BitmapImage(new Uri(baseFlickrUrl));
break;
}
}
}
When I deploy and run the app I get an image saying that this image is unavailiablemessage every time? I've tried changing the search terms etc and still get the sme message. Which is making me wondor if I've missed something setting up my account with flickr earlier that I'm not aware of? It's very frustrating - help please.
Thanks to card_master for his help so far
I'm also integrating with Flickr. I'm creating a web site that uses their api.
I'm using FlickrNet. This is an open source .net library that you can use to call the Flickr Services. This is a C# library.
The benefit of using it on a mobile application you can take advantage of the caching. It allows you to store images in the phones storage. This won't work on a web application though.

Writing a full website to socket with microncontroller

I'm using a web server to control devices in the house with a microcontroller running .netMF (netduino plus 2). The code below writes a simple html page to a device that connects to the microcontroller over the internet.
while (true)
{
Socket clientSocket = listenerSocket.Accept();
bool dataReady = clientSocket.Poll(5000000, SelectMode.SelectRead);
if (dataReady && clientSocket.Available > 0)
{
byte[] buffer = new byte[clientSocket.Available];
int bytesRead = clientSocket.Receive(buffer);
string request =
new string(System.Text.Encoding.UTF8.GetChars(buffer));
if (request.IndexOf("ON") >= 0)
{
outD7.Write(true);
}
else if (request.IndexOf("OFF") >= 0)
{
outD7.Write(false);
}
string statusText = "Light is " + (outD7.Read() ? "ON" : "OFF") + ".";
string response = WebPage.startHTML(statusText, ip);
clientSocket.Send(System.Text.Encoding.UTF8.GetBytes(response));
}
clientSocket.Close();
}
public static string startHTML(string ledStatus, string ip)
{
string code = "<html><head><title>Netduino Home Automation</title></head><body> <div class=\"status\"><p>" + ledStatus + " </p></div> <div class=\"switch\"><p>On</p><p>Off</p></div></body></html>";
return code;
}
This works great, so I wrote a full jquery mobile website to use instead of the simple html. This website is stored on the SD card of the device and using the code below, should write the full website in place of the simple html above.
However, my problem is the netduino only writes the single HTML page to the browser, with none of the JS/CSS style files that are referenced in the HTML. How can I make sure the browser reads all of these files, as a full website?
The code I wrote to read the website from the SD is:
private static string getWebsite()
{
try
{
using (StreamReader reader = new StreamReader(#"\SD\index.html"))
{
text = reader.ReadToEnd();
}
}
catch (Exception e)
{
throw new Exception("Failed to read " + e.Message);
}
return text;
}
I replaced string code = " etc bit with
string code = getWebsite();
How can I make sure the browser reads all of these files, as a full website?
Isn't it already? Use an HTTP debugging tool like Fiddler. As I read from your code, your listenerSocket is supposed to listen on port 80. Your browser will first retrieve the results of the getWebsite call and parse the HTML.
Then it'll fire more requests, as it finds CSS and JS references in your HTML (none shown). These requests will, as far as we can see from your code, again receive the results of the getWebsite call.
You'll need to parse the incoming HTTP request to see what resource is being requested. It'll become a lot easier if the .NET implementation you run supports the HttpListener class (and it seems to).

Getting WP8 web requests to be synchronous

I am trying to port some code from a Windows form application to WP8, and have run into some issues regarding asynchronous calls.
The basic idea is to do some UAG authentication. In the Windows form code, I do a GET on the portal homepage and wait for the cookies. I then pass these cookies into a POST request to the validation URL the UAG server. It all works fine in the form, since all the steps are sequential and synchronous.
Now, when I started porting this to WP8, first thing I noticed was that GetResponse() wasn't available, instead I had to use BeginGetResponse(), which is asynchronous and calls a callback function. This is no good for me, since I need to ensure this step finishes before I do the POST
My Windows form code looks like this (taken from http://usingnat.net/sharepoint/2011/2/23/how-to-programmatically-authenticate-to-uag-protected-sharep.html):
private void Connect()
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(this.Url);
request.CookieContainer = new CookieContainer();
request.UserAgent = this.UserAgent;
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
//Get the UAG generated cookies from the response
this.Cookies = response.Cookies;
}
}
private void ValidateCredentials()
{
//Some code to construct the headers goes here...
HttpWebRequest postRequest = (HttpWebRequest)WebRequest.Create(this.ValidationUrl);
postRequest.ContentType = "application/x-www-form-urlencoded";
postRequest.CookieContainer = new CookieContainer();
foreach (Cookie cookie in this.Cookies)
{
postRequest.CookieContainer.Add(cookie);
}
postRequest.Method = "POST";
postRequest.AllowAutoRedirect = true;
using (Stream newStream = postRequest.GetRequestStream())
{
newStream.Write(data, 0, data.Length);
}
using (HttpWebResponse response = (HttpWebResponse)postRequest.GetResponse())
{
this.Cookies = response.Cookies;
}
public CookieCollection Authenticate()
{
this.Connect();
this.ValidateCredentials();
return this.Cookies;
}
The thing is this code relies on synchronous operation (first call Connect(), then ValidateCredentials() ), and it seems WP8 does not support that for Web requests. I could combine the two functions into one, but that won't solve my problem fully since later on this needs to be expanded to access resources behind the UAG, so it would need a modular design.
Is there a way to "force" synchronization?
Thanks
You can still continue your steps in the call back function using the asynchronous model. Or you can use the new HttpClient which can be used with the await keyword so you can program your stuff in a synchronous way.
You can get HttpClient through nuget
install-package Microsoft.Net.Http

Load full Website WinRT

i want to load the Kepler reference Page with HttpClient like this:
string resourceAddress = _url;
HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Get, resourceAddress);
HttpClient httpClient = new HttpClient();
// Do not buffer the response:
HttpResponseMessage response = new HttpResponseMessage();
response = await httpClient.SendAsync(request,
HttpCompletionOption.ResponseContentRead);
using (Stream responseStream = await response.Content.ReadAsStreamAsync())
{
int read = 0;
byte[] responseBytes = new byte[(Int32)responseStream.Length];
do
{
read = await responseStream.ReadAsync(responseBytes, 0, responseBytes.Length);
} while (read != 0);
}
But i think, the Page won´t be loaded complete, like without all images and iframes etc...
Downloading just the first piece of html is rarely going to be enough to give you all the elements of the page, even if you parse it and include all the linked images etc. There is also css and javascript that will bring new content into view when you open a page in a browser and getting all this yourself is going to be an effort similar to implementing your own browser. Your best bet would be to either just load the page once in a WebView control and let it cache its content or use a WebView and scan the DOM to try to get all the elements. You could also write a web service that would download the page for you and just deliver you the whole package... assuming that the page doesn't require authentication.