Scraping - catch json response with python - json

I need to scrape a website with a "load more" button.
I need to catch the json response (which is invisible in the html code) and parse it to build URLs
This is the JSON post request response
I'm using Selenium, python.
how do I ?
tHX

You can bypass actually clicking on the "load more" button by reading the API call that the website is sending when you click the button and then sending it via Selenium. If you send it through Selenium, you can capture the response. Here's what I've been using an Angular website. You'll have to modify it to work with the website you're using, but this should get you started.
call = """
$http = angular.element(document.body).injector().get('$http');
var done = arguments[0];
$http({
method: 'POST',
headers: {
"Content-Type": "application/json"
},
data: {
foo: "bar"
},
url: "https://request.url/"
}).then(data => done(data));
"""
json_response = driver.execute_async_script(call)
The execute_async_script method will make the call and wait for a JSON response.
You can also right-click on the xhr in Chrome DevTools and copy the API call, which should make it easier to recreate it with selenium.
Let me know if you have follow-up questions.

Related

Flask API can receive posts from AJAX but not Postman

We have a Flask API that talks to multiple sources, a web app, and an external source. In the web app, we use AJAX to send a JSON post to the API which is successful. From an external source, whether it's postman or the VaRest Unreal Engine plugin, we get a 400 Error: Bad Request even though we use the correct content-type header.
If anyone can help us figure out why the posts we are sending aren't properly identified we would really appreciate it.
Thanks
This is the JS code from our web app, used to create the JSON which is sent through AJAX (this is the successful code)
var but1 = document.getElementById('but1');
const data1 = {
number: 1 ,
type: 1 ,
value: 100
}
but1.addEventListener("click", function() {
$.post(url, data1);
});
This is a post route in our python API that takes in the input and saves it to a file we have
#app.route('/button', methods=['POST'])
def button():
buttonLog = open("buttonLog.txt", "w")
buttonLog.write(request.form['number'])
buttonLog.close()
typeOf = int(request.form['type'])
value = int(request.form['value'])
return "success"
Here is our JSON post, with headers
Postman JSON
Postman Headers
The AJAX post works as intended, but the postman post/Unreal engine post are not being seen as "posts" to the API.

How to update json file in AngularJS

I want to update the existing json file in AngularJS
JSON file:
{
"text1": "Click here to edit!",
"text2": "Click here to edit!",
"text3": "Click here to edit!",
"text4": "Click here to edit!"
}
I want to update this JSON file as:
text1: "Abc"
and save this changes in JSON file
You can not update a json file without using a server-side language like PHP or python. Basically it is security compliance. For more understanding kindly go through
https://docs.angularjs.org/guide/security
https://docs.angularjs.org/api/ng/directive/ngCsp
and
https://developer.mozilla.org/en-US/docs/Web/Security/CSP
Imagine you have getjson.php and savejson.php in the server which work exactly as their names suggest.
Now use $http service of Angular to retrieve your json from the server.
$http.get("getjson.php").then(function(response){
$scope.myJsonObject = response.data;
//Your json becomes JS object here. Change it the way you want
$scope.myJsonObject.text1 = "Abc";
});
Use $http service again to send your json back to the server.
$http({
method: "post",
url: "savejson.php",
data: $scope.myJsonObject,
headers: { 'Content-Type': 'application/json; charset=utf-8' }
});
This is the basic. Please note that you need to do your php part to save/load your json file. Also you should handle errors of the $http service.
Please see how $http service and promises work.

AngularJS POST json to SilverStripe API

I wrote a pretty basic API using a SilverStripe module (here) and while building it I was testing using Postman and Advanced REST Client chrome extension so I know the endpoints work.
Now when trying to reach the endpoints (POST json) the API is telling me that required values aren't set. I'm did a little bit of digging and compared the request headers from Postman to the ones from Angular using Firebug. The only noticeable discrepancy is that in Postman the header for Content-Type is:
"application/json"
and for Angular it's:
"application/json; charset=UTF-8"
Here's the code for the Angular POST (pretty basic):
notebookFactory.addNotebook = function(title) {
var notebook = "Testing Title 23!";
var message = {
Title: notebook
};
return $http({
url: 'api/notebook',
method: 'POST',
headers: { 'Content-Type': 'application/json' },
data: message
});
};
The API error says: "The JSON property Title is required"
Is this something that could make a difference? I've tried adding charset=UTF-8 to the end of the Content-Type in Postman and receive the same error. Is there any way to remove the charset from the Angular POST header?
Let me know if you need any more info and thanks in advance!
This was a small issue with the code for handling json in the RESTful API module.
The developer has made the fix already and can be referenced here: https://github.com/pstaender/silverstripe-restful-api/issues/1

Playframework handling post request

In my routes:
POST /forms/FormValidator1/validateForm controllers.FormValidator1.validateForm(jsonForm:String)
There is a controller method defined for that route:
def validateForm(jsonForm:String) = Action { ...
Then I try to send POST request by chrome POSTMAN plugin (see pic above).
I use:
url: http://localhost:9000/forms/FormValidator1/validateForm
headers: Content Type: application/json
json data: {name: "me", surname: "my"}
So, sending this POST request I can not reach controller's method by mentioned route / url. Why?
UPDATE:
Interestly enough: after I got it working on my laptop (see my answer below) then push it on gitHub and pull it to another machine it starts working differently. Now it complains than Bad Request is [Invalid XML] nevertheless I use "application/json" header and did not change any line of code after commit. I wonder maybe it is a bug.
It seems I got it.
Here:
https://groups.google.com/forum/#!topic/play-framework/XH3ulCys_co
And here:
https://groups.google.com/forum/#!msg/play-framework/M97vBcvvL58/216pTqm22HcJ
There is wrong and correct way explained:
Doesn't work: curl -d "name=sam" http://localhost:9000/test
Works: curl -d "" http://localhost:9000/test?name=sam
This is the way how POST params are passing..in play. (second link is explanation WHY):
Sometimes you have to make compromises. In Play 1 you could bind your
action parameters from any parameter extracted from the URL path,
query string or even the request body. It was highly productive but
you had no way to control the way the form was uploaded. I mean, if a
user uploads a big file you needed to load the entire request in
memory to be able to handle it.
In Play 2 you can control the request body submission. You can reject
it early if something is wrong with the user, you can process big
files or streams without filling your memory with more than one HTTP
chunk… You gain a high control of what happens and it can help you to
scale you service. But, the other side of the coin is that when a
request is routed, Play 2 only uses the request header to make its
decision: the request body is not available yet, hence the inability
to directly bind an action parameter from a parameter extracted from
the request body.
UPDATE:
Interestly enough: after I got it working on my laptop then push it on gitHub and pull it to another machine it starts working differently. Now it complains than Bad Request is [Invalid XML] nevertheless I use "application/json" header and did not change any line of code after commit.
UPDATE 2
So I fixed it like this:
On angular side (we even can comment dataType and headers):
var data = $scope.fields
$http({
url: '/forms/FormValidator1/validateForm',
method: "POST",
//dataType: "json",
data: data,
//headers: {'Content-Type': 'application/json'}
}).success(function (data, status, headers, config) {
console.log("good")
}).error(function (data, status, headers, config) {
console.log("something wrong")
});
On playFramework side: (use BodyParser)
def validateForm = Action { request =>
val body: AnyContent = request.body
val jsonBody: Option[JsValue] = body.asJson
// Expecting text body
jsonBody.map { jsValue =>
val name = (jsValue \ "name")
val surname = (jsValue \ "surname")
....
}
Routes (don't define parameters at all !):
POST /forms/FormValidator1/validateForm controllers.FormValidator1.validateForm

Chrome API responseHeaders

Based on this documentation: https://developer.chrome.com/extensions/webRequest.html#event-onHeadersReceived
I tried to display the response via the console like:
console.log(info.responseHeaders);
But its returning undefined.
But this works though:
console.log("Type: " + info.type);
Please help, I really need to get the responseHeaders data.
You have to request the response headers like this:
chrome.webRequest.onHeadersReceived.addListener(function(details){
console.log(details.responseHeaders);
},
{urls: ["http://*/*"]},["responseHeaders"]);
An example of use. This is one instance of how I use the webRequest api in my extension. (Only showing partial incomplete code)
I need to indirectly access some server data and I do that by making use of a 302 redirect page. I send a Head request to the desired url like this:
$.ajax({
url: url,
type: "HEAD"
success: function(data,status,jqXHR){
//If this was not a HEAD request, `data` would contain the response
//But in my case all I need are the headers so `data` is empty
comparePosts(jqXHR.getResponseHeader('redirUrl')); //where I handle the data
}
});
And then I silently kill the redirect while scraping the location header for my own uses using the webRequest api:
chrome.webRequest.onHeadersReceived.addListener(function(details){
if(details.method == "HEAD"){
var redirUrl;
details.responseHeaders.forEach(function(v,i,a){
if(v.name == "Location"){
redirUrl = v.value;
details.responseHeaders.splice(i,1);
}
});
details.responseHeaders.push({name:"redirUrl",value:redirUrl});
return {responseHeaders:details.responseHeaders}; //I kill the redirect
}
},
{urls: ["http://*/*"]},["responseHeaders","blocking"]);
I actually handle the data inside the onHeadersReceived listener, but this way shows where the response data would be.