VB.net HTTP GET in chunks - html

I want to get some HTTP source with VB.net. The page is huge though, so it would suit me to pull it in chunks. I'm also already reading about the VB.net background worker.
So far I'm using this code: HTTP GET in VB.NET but my program stalls while it loads the page, for a decent amount of time. The page to get is http://ftp.drupal.org/files/projects/, but don't go there unless you absolutely have to.
Should I stick with the background worker, and/or is there a way to split up the HTTP request?

You have to use WebClient.DownloadStringAsync method to read without blocking the calling thread. (MSDN reference link).

Related

Is there a way to scrape data from a website that is not available in the page's source?

What are the few things that I'll have to include in my code that will point me in the right direction?
For Example this website
Open your browser's debugger on Network tab and observe what are the requests when site is loading dynamic content (when you click). You'll see it's getting all the data using some API, for example: https://www.bestfightodds.com/api?f=ggd&b=3&m=16001&p=2
You can download all the data by changing parameters in this URL.
Usually that's enough but here it's more tricky as the data returned by the server is somehow encoded and not easily readable. You'd have to debug its javascript to find function which is used to decode this data before you can parse it.

how to get dynamic table data in html in reponse in JMeter

I am using the JMeter3.0. In my project I visited pages where the dynamic table contents are displayed as in the response.
Actually tabular format is showing, but data are not showing, and I would require data to extract values from those.
Can someone help me our here?
If you don't see the full response data it may mean that the data is being populated using i.e. AJAX technology by secondary JavaScript request(s).
As per JMeter Project main page:
JMeter is not a browser, it works at protocol level. As far as web-services and remote services are concerned, JMeter looks like a browser (or rather, multiple browsers); however JMeter does not perform all the actions supported by browsers. In particular, JMeter does not execute the Javascript found in HTML pages. Nor does it render the HTML pages as a browser does (it's possible to view the response as HTML etc., but the timings are not included in any samples, and only one sample in one thread is ever displayed at a time).
so if your table is being populated via AJAX requests you need to simulate these requests somehow and get data from their responses. AJAX requests can be recorded using HTTP(S) Test Script Recorder, but when it comes to replaying them you need to do it a little bit differently comparing to "normal" sequential HTTP Requests, see How to Load Test AJAX/XHR Enabled Sites With JMeter article to learn how AJAX requests can be handled in JMeter tests.

Where is the Data stored on Website

I am at this website -
http://www.zoominfo.com/s/#!search/company/1.64.eyJjb21wYW55TmFtZSI6xIB2YWx1xIw6ImEiLCJpc1VzZWTEjXRyxJN9fQ%3D%3D
If you see the company name - Agilent Technologies Inc.
Its neither there in page source, nor in any json format.
But it does show in the Dom of Chrome Developer tool.
I have looked and analysed almost every requests that it sent, but still couldn't find where this data is saved.
By where the data is saved - I am looking to find where I can scrape that data from?
If by using python-requests and BeautifulSoup
I do see an XMLHTTPREQUEST made, not sure what that means, or if that is the clue to my answer.
I am still learning python, and it would be a very useful information if someone helps me with this.
Thanks in advance.
After the HTML is loaded, js requests for the data through an XMLHTTPREQUEST which is loaded right after the request is received on your client. That's why you see the DOM element right there using element inspector.
You didn't mention what goal you want to achieve or what tool you are using. Please be specific on your question. If you do not have any idea about this kind of pattern, google out angularjs, see some example.
do see an XMLHTTPREQUEST made, not sure what that means, or if that is the clue to my answer.
It means that javascript embedded in the page is sending an extra HHTP request to the web server. It is likely that the "Agilent Technologies Inc." text is being returned in the server's response to that request, and the javascript in the page is then injecting the text into the DOM in the appropriate place.
Where is the Data stored on Website
That is a completely different question ...
(You have already noted that the data (e.g. the company name) gets injected into the page displayed by your browser.)
On the server side, the data could be stored in the web server (or its back-end systems) in a variety of ways. Or it might not be stored at all. There is no way of knowing ... without looking at the server-side code and configurations.

Web Request Listener

I would like to write a plugin that calculates a cryptographic hash of every file rendered in the browser (images, html, css, etc), without causing a second get request for the various files. Ideally I could listen for each resource being loaded and get a copy of the bytestream that comes back.
Does anyone know if such a hook exists / what it would be called?
Thanks!
Not at the moment.
There is a feature request to allow (at least) read access to response body in chrome.webRequest.onCompleted, but it's not currently implemented.

Drupal 7 (VERY) Custom Preview

I have a drupal site that is being used strictly as a CMS that produces JSON feeds using services and services_views, which are consumed by a separate site. What I would like to do (and I have a working proof of concept of this) is allow for a "live preview" on the real site, by intercepting the node form preview / submit, encoding the node as JSON, and loading a special page on the live site that consumes that JSON and displays the page accordingly.
The problem with this JSONized node is, it's different from the JSON being produced by my view (using services_views). My end goal is to produce JSON that is identical for both previewed and non-previewed objects, without having to maintain separate output methods (I could easily hand-customize the json but then when my view for the public api changes I have to make the same changes to the preview json. Trying to avoid this).
I'm looking for feedback on this approach. Is what I'm attempting even possible? The ideas I've been able to come up with so far are:
being able to (conditionally) drive my view with data from a non-databse source
sneakily inserting data into the view object during one of the stages of execution? Kludgy but I'm not above that :)
saving a "clone" node (or revision?) of the node being previewed and let the view use that to display the preview JSON?
Maybe this is the wrong approach altogether and there's something better? (Trying to intercept and format the services output in my module... maybe avoid services_views altogether?)
If anyone can offer some advice, insight or opinions on how to best proceed here, I'd be really grateful.
in a custom module, you could set up a page that grabs the json output from the view page.
$JSON = file_get_contents($url);
that way the preview stays bound to the view, even if the view changes.
First I think it's not an easy task what you are trying to achieve. So before all, good luck.
I think you could intercept the node submission data, then create a node programatically, then render that node, and then export the rendered node to JSON. Inmediately after you get the JSON, delete this node, because the programmatically created node is only for preview.
This task could be more CPU demanding but think that previewing content exactly as the content will look is difficult.
Your rss feeds that your site reads could be filtered with some parameter to avoid programmatically created nodes (prewiew nodes), despite these nodes will be available for a very short time.