I am working on a project involving finding out what http requests were made by the user.
I have all the http request and response headers (but not the data), and I need to find out what content was requested by the user and what content was automatically sent (e.g. ads pages, streaming on the background, and all sorts of unrelevant content).
When recording the net traffic (even for a short period) alot of content gets generated, and most of it is not relevant.
since im no expert in http, i'd like some help with directions as of which headers I can safely use (assuming most web pages send them), and which headers might be omitted and so it will not be safe to rely on.
my current idea involves:
find all the html files, and check what the main html files were (no referrer or search engine referrer), and then recursively mark all the files called by these html files onward as relevant, and discard the rest.
the problem with this is that I've been told that I can't trust the referrer header, and I have no idea as of how to identify what html files were clicked by the user.
Every kind of help will be appreciated, sorry if the post is not formatted well, this is my first question here.
EDIT:
I've been told the question is'nt clear enough, so all I'm asking is for some way to determine which requests were triggered by the user and whic requests were automatically made
To determine which request was send by the user itself you should look at the first request send through the connection and look at it's response body.
All external files referenced in this first body which then consecutively get send to the user are most likely to be send automatically without the users interaction.
Time passing between requests could also be an factor worth looking at.
Another thing you already mentioned yourself would be looking at the Referer header. As far as the RFC 2616 14.36 goes it can be trusted, as the Referer header must not be sent if the Request URI comes from user input. Although there could be automatically send content which does not have the Referer header set, as it's optional.
Related
I am trying to get started with REST API calls by seeing how to format the API calls using a browser. Most examples I have found online use SDKs or just return all fields for a request.
For example, I am trying to use the Soundcloud API to view track information.
To start, I've made a simple request in the browser as follows http://api.soundcloud.com/tracks/13158665.json?client_id=31a9f4a3314c219bd5c79393a8a569ec which returns a bunch of info about the track in JSON format
(e.g. {"kind":"track","id":13158665,"created_at":"2011/04/06 15:37:43 ...})
Is it possible to only to get returned the "created_at" value using the browser? I apologize if this question is basic, but I don't know what keywords to search online. Links to basic guides would be nice, although I would prefer to stay out of using a specific SDK for the time being.
In fact, it's really hard to answer such question since it depends on the Web APIs. I mean if the API supports to return only a subset of fields, you could but if not, you will receive all the content. From what I saw on the documentation, it's not possible. The filters only allow you to get a subset of elements and not control the list of returned fields within elements.
Notice that you have a great application to execute HTTP requests (and also REST) in Chrome: Postman. This allows to execute all HTTP methods and not only GET ones and controls the headers and sent content and also see what is received back.
If you use Firefox, Firebug provides a similar thing.
To finish, you could have a look at this link to find out hints about the way Web APIs work and are designed: https://templth.wordpress.com/2014/12/15/designing-a-web-api/.
Hope it helps you and I answered you question,
Thierry
Straight from the browser bar you can utilize REST endpoints that respond to a GET message. That is what you are doing when you hit that URI, you are sending an HTTP GET message to that server and it is sending back a JSON.
You are not always guaranteed a JSON, or anything when hitting a known REST endpoint. What each endpoint returns when hit with a GET is specific to how it was built. In that case, it is built to return a JSON, but some may return an HTML page. In my personal experience, most endpoints that utilize JSON returns expect you to process that object in a computer fashion and don't give you a lot of options to get a specific field of the JSON. Here is a good link on how to process JSON utilizing JavaScript.
You can utilize REST clients (such as the Advanced REST Client for Chrome) to craft HTTP POST and PUT if a specific REST endpoint has the functionality built in to receive data and do something with it. For example, a lot of wiki style REST endpoints will allow you to create a page with a specifically crafted HTTP POST with either specific header information, URI parameters or a JSON as part of it.
you can install DHC client app in your chrome and send request like put or get
I've seen articles and posts all over (including SO) on this topic, and the prevailing commentary is that same-origin policy prevents a form POST across domains. The only place I've seen someone suggest that same-origin policy does not apply to form posts, is here.
I'd like to have an answer from a more "official" or formal source. For example, does anyone know the RFC that addresses how same-origin does or does not affect a form POST?
clarification: I am not asking if a GET or POST can be constructed and sent to any domain. I am asking:
if Chrome, IE, or Firefox will allow content from domain 'Y' to send a POST to domain 'X'
if the server receiving the POST will actually see any form values at all. I say this because the majority of online discussion records testers saying the server received the post, but the form values were all empty / stripped out.
What official document (i.e. RFC) explains what the expected behavior is (regardless of what the browsers have currently implemented).
Incidentally, if same-origin does not affect form POSTs - then it makes it somewhat more obvious of why anti-forgery tokens are necessary. I say "somewhat" because it seems too easy to believe that an attacker could simply issue an HTTP GET to retrieve a form containing the anti-forgery token, and then make an illicit POST which contains that same token. Comments?
The same origin policy is applicable only for browser side programming languages. So if you try to post to a different server than the origin server using JavaScript, then the same origin policy comes into play but if you post directly from the form i.e. the action points to a different server like:
<form action="http://someotherserver.com">
and there is no javascript involved in posting the form, then the same origin policy is not applicable.
See wikipedia for more information
It is possible to build an arbitrary GET or POST request and send it to any server accessible to a victims browser. This includes devices on your local network, such as Printers and Routers.
There are many ways of building a CSRF exploit. A simple POST based CSRF attack can be sent using .submit() method. More complex attacks, such as cross-site file upload CSRF attacks will exploit CORS use of the xhr.withCredentals behavior.
CSRF does not violate the Same-Origin Policy For JavaScript because the SOP is concerned with JavaScript reading the server's response to a clients request. CSRF attacks don't care about the response, they care about a side-effect, or state change produced by the request, such as adding an administrative user or executing arbitrary code on the server.
Make sure your requests are protected using one of the methods described in the OWASP CSRF Prevention Cheat Sheet. For more information about CSRF consult the OWASP page on CSRF.
Same origin policy has nothing to do with sending request to another url (different protocol or domain or port).
It is all about restricting access to (reading) response data from another url.
So JavaScript code within a page can post to arbitrary domain or submit forms within that page to anywhere (unless the form is in an iframe with different url).
But what makes these POST requests inefficient is that these requests lack antiforgery tokens, so are ignored by the other url. Moreover, if the JavaScript tries to get that security tokens, by sending AJAX request to the victim url, it is prevented to access that data by Same Origin Policy.
A good example: here
And a good documentation from Mozilla: here
Given a page like this, I am trying to extract all the answer text with a ruby web crawler.
I am using nokogiri and search('div[#class="answer_content"]').inner_text to access the answers, but I can't seem to access all the text, even when in fact I am logged in. About 200 words down, I'll get the message "sign up or log in to read full content."
Also, is this div class the correct one to use?
It seems to me that you need to authenticate yourself from the crawler. I've done it a few weeks ago. I used a firefox extension called Tamper Data which allowed me to see the requests made between the browser and the server. In my case, the authentication was handled by a session id; I just had to get it back and pass it to each request I made to the server.
But in your case, the authentication might be done by a different way, you'll have to see for yourself. Anyway, I can detail if it's not clear enough.
I have a form which is posted to an MVC3 controller that then has to be POSTED to an external URL. The browser need to go to the URL permanently so I thought a permanent redirect would be perfect.
However, how do I send the form POST data with the redirect?
I don't really want to send another page down to the browser to do it.
Thanks
A redirect will always to be a GET, not a POST.
If the 2nd POST doesn't need to come from the client, you can make the POST using HttpWebRequest from the server. Beware the secondary POST may hold up the return of the client request if the external server is down or running slowly.
A permanent redirect is wholly inappropriate here. First, it will not cause form values to be resubmitted. Second, the semantics are all wrong - you would be telling the browser "do not request this url again. instead, go here". However, you do want future submissions to go to your same url.
Gaz's idea could work. It involves your server, only.
Alternatively, send a form with the same submitted values and the external URL, and use client-side code to automatically submit it.
I'm trying to parse a bunch of webpages from an adult website using Ruby:
require 'hpricot'
require 'open-uri'
doc = Hpricot(open('random page on an adult website'))
However, what I end up getting instead is that initial 'Site Agreement' page making sure that you're 18+, etc.
How do I get past the Site Agreement and pull the webpages I want? (If there's a way to do it, any language is fine.)
You're going to have to figure out how the site detects that a visitor has accepted the agreement.
The most obvious choice would be cookies. Likely when a visitor accepts the agreement, a cookie is sent to their browser, which is then passed back to the site on every subsequent request.
You'll have to get your script to act like a visitor by accepting the cookie, and sending it with every subsequent request. This will require programming on your part to request the "accept agreement" page first, find the cookie, and store it for use. It's likely that they don't use a specific cookie for the agreement, but rather store it in a session, in which case you just need to find the session cookie.
The 'Site Agreement' page probably has a link you have to click or form you have to submit to send back to the server to proceed. Read the source of that page to be sure. You could send that response back from your application. I don't know how to do that in Ruby, but I've seen similar tasks done using cURL and libcurl, which can probably be used from Ruby.
Install LiveHTTPHeaders plugin for Firefox and visit this site. Watch the headers and see what happens when you accept the agreement. You'll probably see that the browser sends some request (possibly a Post) and accepts some cookies. Then you'll have to repeat whatever browser does in your Ruby script.