i tried to cache a json response from an api request using the Etag.
I'm calling something like this http://localhost:3000/api/config and getting:
Response Headers:
Cache-Control:public, max-age=31557600
Connection:keep-alive
Content-Length:11
Content-Type:application/json; charset=utf-8
Date:Wed, 13 May 2015 11:41:52 GMT
ETag:"94d52736bcd99b1ac771f13b1bbdf622"
X-Powered-By:Express
Resonse: {id: 1}
I expected the browser to cache the response and to send the Etag with the next request triggert by "f5". But this isn't the case.
Request Headers 2nd request:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8
Accept-Encoding:gzip, deflate, sdch
Accept-Language:de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4
Cache-Control:no-cache
Connection:keep-alive
Host:localhost:3000
Pragma:no-cache
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36
So is it impossible to cache pure json response, getting by a direct api requests? Or do i miss something.
The api is an node js test implemantation done with express:
router.get('/config', function(req, res) {
var eTag = crypto.createHash('md5').update(JSON.stringify(config)).digest('hex');
res.setHeader('ETag', '"' + eTag + '"');
res.setHeader('Content-Type', 'application/json');
res.setHeader('Cache-Control', 'public, max-age=31557600');
});
Testet with chrom(42.x) and firefox(37.x)
Thx for response.
Hi this code seems to work for me:
router.get('/config', function(req, res) {
var eTag = crypto.createHash('md5').update(JSON.stringify(config)).digest('hex');
if (req.headers['if-none-match'] && req.headers['if-none-match'] === '"' + eTag + '"') {
res.status(304);
res.end();
} else {
res.setHeader('ETag', '"' + eTag + '"');
res.setHeader('Content-Type', 'application/json');
res.setHeader('Cache-Control', 'public, max-age=31557600');
res.send(JSON.stringify(config));
}
});
Calling the api using browser url bar http://localhost:3000/api/config
Looks like you might be using chrome.
Chrome should include the following header in the request after "f5":
If-None-Match:"94d52736bcd99b1ac771f13b1bbdf622"
If you don't see this, check the chrome settings / General and make sure that "Disable cache (while DevTools is open) is not checked:
Using jQuery we can use the ifModified option:
$.ajax({
type: "GET",
ifModified: true,
url: "http://localhost:3000/api/config"
}).then(function(data) {
. . .
});
Related
I was unable to use the requests library and use the get() function to scrape data from this specific website as running the below code block will result in a status code of 403 (unsuccessful)
import requests
#using headers in order to emulate a browser
headers = {'user-agent': 'Chrome/55.0.2883.87'}
url = "https://www.rumah.com/properti-dijual"
# Make a request to the website and retrieve the data
response = requests.get(url, headers=headers)
# Check if the request was successful
if response.status_code == 200:
print("Request was successful", response.status_code)
# Save the source code as a text file
print(response.text)
else:
print("Request was not successful", response.status_code)
However, when I tried the same source code trying to scrape a different website, the request was successful (status code 200).
import requests
#using headers in order to emulate a browser
headers = {'user-agent': 'Chrome/55.0.2883.87'}
url = "https://www.subscene.com"
# Make a request to the website and retrieve the data
response = requests.get(url, headers=headers)
# Check if the request was successful
if response.status_code == 200:
print("Request was successful", response.status_code)
# Save the source code as a text file
print(response.text)
else:
print("Request was not successful", response.status_code)
I'm trying to scrape housing data from the website by getting a successful request to the website. I realized that some websites prevent scraping and those specific pages are listed in the robots.txt file. However, I can't find the specific page that I want to scrape in the robots.txt file, therefore I thought that I should be able to scrape this website.
Here is the robots.txt file for the specific webpage:
enter image description here
This is my first question in StackOverflow. Any help would be appreciated!
Your url https://www.rumah.com/properti-dijual is using cloudfare protection, and https://www.subscene.com as well.
But maybe, https://www.subscene.com has a more strict policy.
In case your getting error 403:
provide all headers as following:
import requests
headers = {
'authority': 'www.rumah.com',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-language': 'de,de-DE;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6,fr;q=0.5,de-CH;q=0.4',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'sec-ch-ua': '"Not_A Brand";v="99", "Microsoft Edge";v="109", "Chromium";v="109"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.70',
}
response = requests.get('https://www.rumah.com/properti-dijual', headers=headers)
If that doesn't work, try using javascript:
fetch("https://www.rumah.com/properti-dijual", {
"headers": {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-language": "de,de-DE;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6,fr;q=0.5,de-CH;q=0.4",
"cache-control": "no-cache",
"pragma": "no-cache",
"sec-ch-ua": "\"Not_A Brand\";v=\"99\", \"Microsoft Edge\";v=\"109\", \"Chromium\";v=\"109\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"Windows\"",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1"
},
"body": null,
"method": "GET",
"mode": "cors",
});
You can also initiate javascript with python using Selenium or Selenium-Profiles (undetected, uses Chrome)
I'm trying to upload binary file (to Amazon S3) from my localhost Vue page, using Amazon API Gateway with CORS enabled.
Actual POST Request have issued after Preflight Request issued.
And file upload have succeed.
But the POST Request have caught error bellow.
I don't know Why got the error?
Chrome(Version 79.0.3945.79)
got message
Access to XMLHttpRequest at 'https://XXXXXXXXXXX.execute-api.ap-northeast-1.amazonaws.com/dev/upload' from origin 'http://192.168.0.20:8080' has been blocked by CORS policy:
No 'Access-Control-Allow-Origin' header is present on the requested resource.
AXIOS ERROR: Error: Network Error
at createError (createError.js?2d83:16)
at XMLHttpRequest.handleError (xhr.js?b50d:81)
Source code
async upload() {
console.log("file:", this.file);
const axiosConfig = {
headers: {
"Content-Type": "image/png"
}
};
axios
.post("https://XXXXXXXXXX.execute-api.ap-northeast-1.amazonaws.com/dev/upload", this.file, axiosConfig)
.then(res => {
console.log("RESPONSE RECEIVED: ", res);
})
.catch(err => {
console.log("AXIOS ERROR: ", err);
});
Header(Preflight Request)
Request
:authority: XXXXXXXXXX.execute-api.ap-northeast-1.amazonaws.com
:method: OPTIONS
:path: /dev/upload
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9,ja;q=0.8
access-control-request-headers: content-type
access-control-request-method: POST
origin: http://192.168.0.20:8080
referer: http://192.168.0.20:8080/
sec-fetch-mode: cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36
Response
access-control-allow-headers: Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token
access-control-allow-methods: DELETE,GET,HEAD,OPTIONS,PATCH,POST,PUT
access-control-allow-origin: *
content-length: 0
content-type: application/json
date: Fri, 13 Dec 2019 12:39:40 GMT
status: 200
via: 1.1 88c2e4442XXX3f0dXXX7df6fcXXX37ff.cloudfront.net (CloudFront)
x-amz-apigw-id: EpH19E9sNjMFhOg=
x-amz-cf-id: PEXXXH0x8_mlAspmv-xhi3X3XXXn_LSBswhXXXyqnCGZmVPkXXXYhw==
x-amz-cf-pop: NRT51-C1
x-amzn-requestid: 47XXc915-3b44-4XX7-959a-3XXX62150b3d
x-cache: Miss from cloudfront
Header(Actual POST)
Request
:authority: XXXXXXXXXX.execute-api.ap-northeast-1.amazonaws.com
:method: POST
:path: /dev/upload
:scheme: https
accept: application/json, text/plain, */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9,ja;q=0.8
content-length: 6849
content-type: image/png
origin: http://192.168.0.20:8080
referer: http://192.168.0.20:8080/
sec-fetch-mode: cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36
Response
content-length: 47
content-type: application/json
date: Fri, 13 Dec 2019 12:39:40 GMT
status: 200
via: 1.1 88c2e44426XX3f0db837df6fc92437ff.cloudfront.net (CloudFront)
x-amz-apigw-id: EpH1_EeptjMFXqw=
x-amz-cf-id: XXqDis00oJqvh8wY-a0sugE6tuhwPHiJLs7ucXX5OdPC0uoCql7-nQ==
x-amz-cf-pop: NRT51-C1
x-amzn-requestid: 9XXX54a0-0a71-4cda-9d91-ae90a3322c9f
x-amzn-trace-id: Root=1-5XXX868c-fXXXa33dd82751efXXX547d;Sampled=0
x-cache: Miss from cloudfront
I solved it myself.
I don't know Why got the error?
Because Response header includes NO 'access-control-allow-origin'.
Browser could't read response body by CORB (Cross-Origin Read Blocking).
Added the header to response in Lambda function, it works.
s3.putObject({
Body: requestBody,
Bucket: "xxxxxx.com",
ContentType: "image/png",
Key: "uploadTest/logo.png"
})
.promise()
.then(result => {
const message = JSON.stringify(result);
callback(null, {
body: message,
statusCode: 200,
headers: {
"Access-Control-Allow-Origin": "*"
}
});
});
I am making a python module that interacts with Carousell using the requests module. Now I am trying to send a post request with a JSON payload, but I keep getting HTTP error code 422(UNPROCESSABLE ENTITY). I don't know what's wrong with my JSON payload, python dict(before it's converted to JSON) or perhaps I am missing something in my request headers.
I tried taking the raw json string(from the POST request that I captured using Chrome dev tools) converting it dict and copy that dict(printed out) and try to use it in the program. It didn't work.
login_session = requests.session()
login_session.headers.update({"DNT":"1", "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36", "Origin":"https://sg.carousell.com"})
login_payload = {'requests': {'g0': {'resource': 'sso', 'operation': 'create', 'params': {'loginToken': cookies["login-token"]}, 'body': {}}}, 'context': {'_csrf': cookies["_csrf"]}}
login_cookies = {"__cfduid": cookies["__cfduid"], "_csrf": cookies["_csrf"], "gtkprId": cookies["gtkprId"], "login-token": cookies["login-token"], "redirect":"redirect"}
login_headers = {'accept':'*/*','accept-encoding':'gzip, deflate, br','accept-language':'en-GB,en;q=0.9,zh-CN;q=0.8,zh;q=0.7,en-US;q=0.6', 'x-requested-with': 'XMLHttpRequest', 'content-type': 'application/json'}
login_data = login_session.post(query_url, cookies=login_cookies, data=json.dumps(login_payload), headers=login_header)
Heres the output from debugging logger
DEBUG:urllib3.connectionpool:https://sg.carousell.com:443 "POST /ui/iso?_csrf=TNZTMZpBdQYgRFFouCF4ELVB HTTP/1.1" 422 0
Edit:
Heres the JSON payload which was sent to the server. I am trying to replicate it.
{"requests":{"g0":{"resource":"sso","operation":"create","params":{"loginToken":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1NTg1MDQyMjQsImlzcyI6ImxvZ2luLmNhcm91c2VsbC5jb20iLCJzc29pZCI6IkRacG1rd1l1SXAxdDF5U3A2M1RXWExPUTJnWmRFRzBOSHd3d0ZGSm9PSkFvVFFOdGFyNWt0MDMzNm5EVHRudHoiLCJ1c2VyaWQiOiIxNDczMjI3NCJ9.x7YxdLLk1ID6_jWy4trtLzbrPnZZ0eI7g_cQN1BilF8"},"body":{}}},"context":{"_csrf":"hPPhgajp-1GMLSbgjZBNBD7z2EGPVGCuA_mU"}}
Note that login token and _csrf are data from the cookies.
I got a problem which I dont understand.
I try to post data to my API in a form using the following code
formSubmit() {
const req =this.http.post('http://[ip]/api/login', {
id: '7',
username: 'PostTest',
password: 'studp123lan',
matrikelnr: 'winf303666',
email: 'winf303666#example.de',
email_verified: '1'
})
.subscribe(
res => {
console.log(res);
},
err => {
console.log("Error occured");
}
When I inspect it in the Chrome Developter tools, this is what I get:
Failed to load http://[ip]/api/login: Response for preflight has
invalid HTTP status code 404
register.component.ts:42 Error occured
And this is what I get in the network tab:
General:
Request URL:http://[ip]/api/login
Request Method:OPTIONS
Status Code:404 Not Found
Remote Address:[ip]:80
Referrer Policy:no-referrer-when-downgrade
Response Header:
Access-Control-Allow-Headers:Content-Type
Access-Control-Allow-Methods:GET, POST, PUT, DELETE, OPTIONS
Access-Control-Allow-Origin:*
Content-Length:0
Date:Thu, 21 Dec 2017 09:00:35 GMT
Server:Kestrel
X-Powered-By:ASP.NET
Request Header:
Accept:*/*
Accept-Encoding:gzip, deflate
Accept-Language:de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7
Access-Control-Request-Headers:content-type
Access-Control-Request-Method:POST
Connection:keep-alive
Host:[ip]
Origin:http://localhost:4200
User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36
Somehow, this doesn't work. But when I Post the same data via Postman Post Request to the same URL, it works like a charm.
Can anyone explain and help?
Thanks.
404 is a page not found error : this means your endpoint isn't available at this address.
Sure, you make a postman call and it works : but did you create your postman call by hand, or used the interceptor to make it ? (The interceptor is a Chomr plugin that allows you to register all calls made by Chrome, into postman).
There must be something you have forgotten. Could you post your postman call, and if you can, try with the interceptor ?
I'm trying to stream out an audio/wav using HTML5 feature in the following way:
<audio type="audio/wav" src="/sound/10/audio/full" sound-id="10" version="full" controls="">
</audio>
This is working quite well on Chrome, except for the fact it's impossible to replay or reposition the currentTime attribute:
var audioElement = $('audio').get(0)
audioElement.currentTime
> 1.2479820251464844
audioElement.currentTime = 0
audioElement.currentTime
> 1.2479820251464844
I'm serving the audio file from a Grails controller, using following code:
def audio() {
File file = soundService.getAudio(...)
response.setContentType('audio/wav')
response.setContentLength (file.bytes.length)
response.setHeader("Content-disposition", "attachment;filename=${file.getName()}")
response.outputStream << file.newInputStream() // Performing a binary stream copy
response.status = 206
return false
}
It seems though Grails is giving back an HTTP response 200 instead of 206 (Partial-Content) as you can see from the following output from Chrome:
Request URL:http://localhost:8080/sound/10/audio/full
Request Method:GET
Status Code:200 OK
Request Headersview source
Accept:*/*
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:identity;q=1, *;q=0
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Cookie:JSESSIONID=9395D8FFF34B7455F937190F521AA1BC
Host:localhost:8080
Range:bytes=0-3189
Referer:http://localhost:8080/cms/sound/10/edit
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11
Response Headersview source
Content-disposition:attachment;filename=full.wav
Content-Length:3190
Content-Type:audio/wav
Date:Wed, 19 Dec 2012 13:58:44 GMT
Server:Apache-Coyote/1.1
Any idea what might be wrong?
Thanks, Amit.
ADDITION:
Changing the controller logic to:
response.status = 206
response.setContentType(version.mime)
response.setContentLength (file.bytes.length)
response.setHeader("Content-disposition", "attachment;filename=${file.getName()}")
response.outputStream << file.newInputStream() // Performing a binary stream copy
Did help with returning HTTP-206(Partial-Content), however both Chrome and Firefox won't play the audio file(mentioning again Chrome did play when it got the file with a 200...)
with the following info of the response:
Request URL:http://localhost:8080/sound/10/audio/full
Request Headersview source
Accept-Encoding:identity;q=1, *;q=0
Range:bytes=0-
Referer:http://localhost:8080/cms/sound/10/edit
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11
You might want to try:
response.reset();
response.setStatus(206);
response.setHeader("Accept-Ranges", "bytes");
response.setHeader("Content-length", Integer.toString(length + 1));
response.setHeader("Content-range", "bytes " + start.toString() + "-" + end.toString() + "/" + Long.toString(f.size()));
response.setContentType(...);
And this type of output should only be done if the client specifically asked for a range. You can check by using:
String range = request.getHeader("range");
if range is not null, then you'll have to parse the range for the start and end byte requests. Note that you can have "0-" as a range In some cases, you'll see "0-1" as a request to see if your service knows how to handle range requests.