I am a network person trying to use python. I am trying to use rest api on one of the devices and use it through python 3 using the requests library.
url = 'https://erspan/api/?type=op&cmd=<show><system>'...not a complet url
rs = requests.get(url, verify=False)
print(rs.headers)
print(rs.text)
{'Date': 'Thu, 09 May 2019 21:34:39 GMT', 'Content-Type': 'application/xml; charset=UTF-8', 'Content-Length': '1891', 'Connection': 'keep-alive', 'ETag':.....}
<response status="success"><result><system><hostname>erspan</hostname><ip-a.......
the object type for this is
print(type(rs))
<class 'requests.models.Response'>
I want to get this as json
Related
I was unable to use the requests library and use the get() function to scrape data from this specific website as running the below code block will result in a status code of 403 (unsuccessful)
import requests
#using headers in order to emulate a browser
headers = {'user-agent': 'Chrome/55.0.2883.87'}
url = "https://www.rumah.com/properti-dijual"
# Make a request to the website and retrieve the data
response = requests.get(url, headers=headers)
# Check if the request was successful
if response.status_code == 200:
print("Request was successful", response.status_code)
# Save the source code as a text file
print(response.text)
else:
print("Request was not successful", response.status_code)
However, when I tried the same source code trying to scrape a different website, the request was successful (status code 200).
import requests
#using headers in order to emulate a browser
headers = {'user-agent': 'Chrome/55.0.2883.87'}
url = "https://www.subscene.com"
# Make a request to the website and retrieve the data
response = requests.get(url, headers=headers)
# Check if the request was successful
if response.status_code == 200:
print("Request was successful", response.status_code)
# Save the source code as a text file
print(response.text)
else:
print("Request was not successful", response.status_code)
I'm trying to scrape housing data from the website by getting a successful request to the website. I realized that some websites prevent scraping and those specific pages are listed in the robots.txt file. However, I can't find the specific page that I want to scrape in the robots.txt file, therefore I thought that I should be able to scrape this website.
Here is the robots.txt file for the specific webpage:
enter image description here
This is my first question in StackOverflow. Any help would be appreciated!
Your url https://www.rumah.com/properti-dijual is using cloudfare protection, and https://www.subscene.com as well.
But maybe, https://www.subscene.com has a more strict policy.
In case your getting error 403:
provide all headers as following:
import requests
headers = {
'authority': 'www.rumah.com',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-language': 'de,de-DE;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6,fr;q=0.5,de-CH;q=0.4',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'sec-ch-ua': '"Not_A Brand";v="99", "Microsoft Edge";v="109", "Chromium";v="109"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.70',
}
response = requests.get('https://www.rumah.com/properti-dijual', headers=headers)
If that doesn't work, try using javascript:
fetch("https://www.rumah.com/properti-dijual", {
"headers": {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-language": "de,de-DE;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6,fr;q=0.5,de-CH;q=0.4",
"cache-control": "no-cache",
"pragma": "no-cache",
"sec-ch-ua": "\"Not_A Brand\";v=\"99\", \"Microsoft Edge\";v=\"109\", \"Chromium\";v=\"109\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"Windows\"",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1"
},
"body": null,
"method": "GET",
"mode": "cors",
});
You can also initiate javascript with python using Selenium or Selenium-Profiles (undetected, uses Chrome)
I am getting a bad request response to my request. I have checked with an online JSON validator my dictionary data to be correct, and everything seems fine.
My code is the following:
// Parse datetime to timestamp and include data in a dict
let data_dict = {
"stop_date": Date.parse(sup_limit.value),
"start_date": Date.parse(inf_limit.value)
}
// Send the Ajax request
let request = $.ajax({
url: url,
type: 'POST',
data: data_dict,
contentType: 'application/json;charset=UTF-8',
});
Backend receive endpoint:
#dashboard_bp.route('/download_last_test_influx<mode>', methods=['GET', 'POST'])
#login_required
def download_last_test_influx(mode: str):
# Check if request comes from a custom or test event
if mode == 'custom':
start_date = int(request.json.get('start_date'))
stop_date = int(request.json.get('stop_date'))
# Check if time range is valid, if not return server internal error
if stop_date - start_date <= 0:
return jsonify({'message': 'Time range must be grater than 0'}), 500
# Create response header
response = make_response(send_file(spock_comm_mgr
.test_backup_influx_manager
.get_last_test_influx_record(start_date=start_date, stop_date=stop_date)))
response.headers['Content-Type'] = 'application/gzip'
response.headers['Content-Encoding'] = 'gzip'
return response
Request header:
POST /download_last_test_influxcustom HTTP/1.1
Host: 0.0.0.0:5000
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/json;charset=UTF-8
X-Requested-With: XMLHttpRequest
Content-Length: 48
Origin: http://0.0.0.0:5000
Connection: keep-alive
Referer: http://0.0.0.0:5000/influx_management
Cookie: *********************
Request payload:
stop_date=1623758400000&start_date=1623708000000
Response message:
Bad Request
The browser (or proxy) sent a request that this server could not understand.
You are telling your server, you are sending JSON data, but the request body is not a JSON string but a url-encoded string (because that's the default behaviour of $.ajax() when you pass an object as data).
Use JSON.stringify, to pass a correct JSON body
let request = $.ajax({
url: url,
type: 'POST',
data: JSON.stringify(data_dict),
contentType: 'application/json;charset=UTF-8',
});
I am trying to request this API: https://iso19139echnap.geocat.live/geonetwork/srv/api/0.1/groups/11864
it works in browser if I enter it as a direct URL. But it doesn't work in Python Request module or in Post man.
Python code looks like the following:
import requests
payload = {
'username': 'corey',
'password': 'testing'
}
r = requests.get('https://iso19139echnap.geocat.live/geonetwork/srv/api/0.1/groups/11864')
print(r)
--------------------------------------------------------
returns
<Response [400]>
import requests
url = "https://iso19139echnap.geocat.live/geonetwork/srv/api/0.1/groups/11864"
payload = {}
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-IE,en-GB;q=0.9,en-US;q=0.8,en;q=0.7',
}
response = requests.request("GET", url, headers=headers, data=payload)
print(response.text)
you have to add accept encoding header
Add the same in postman also and it works fine
I have a requirement to use Python to deploy some things to an Openshift cluster (I've been pre-empted from any other solution), so I'm attempting to exercise the kubernetes module:
from kubernetes import client, config
configuration = client.Configuration()
configuration.username='admin'
configuration.password='redacted'
configuration.host='https://api.cluster.example.com:6443'
configuration.verify_ssl = False
v1 = client.CoreV1Api(client.ApiClient(configuration))
ns = {}
ns['kind'] = 'Namespace'
ns['apiVersion'] = 'v1'
ns['metadata'] = {}
ns['metadata']['name'] = 'mynamespace'
v1.create_namespace(ns)
Sadly, the v1 object doesn't authenticate to the cluster with the username / password I've given it:
kubernetes.client.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'd59bb32d-a114-4b42-90ca-86c6315809d0', Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-ptions': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '58a1456a-ff43-47a6-9a08-4a682ad5a509', X-Kubernetes-Pf-Prioritylevel-Uid': 'efbe11a3-861b-46ec-8e4a-1eadb766e284', 'Date': 'Thu, 21 Jan 2021 19:17:31 GMT', 'Content-Length': '273'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"namespaces is forbidden: User \"system:anonymous\" cannot create resource \"namespaces\" in API group \"\" at the cluster scope","reason":"Forbidden","details":{"kind":"namespaces"},"code":403}
I'm looking for a way to throw configuration YAML at the OCP cluster and have it stick...
I have the following code to post using requests module
api_path = r'/DeviceCategory/create'
api_server = (self.base_url + api_path)
logging.info("Triggered API : %s", api_server)
arguments = {"name": "WrongTurn", "vendor": "Cupola", "protocolType": "LWM2M"}
headers = {'content-type': 'application/json;charset=utf-8', 'Accept': '*'}
test_response = requests.post(api_server,headers=headers,cookies=self.jessionid,
timeout=30, json=arguments)
logging.info(test_response.headers)
logging.info(test_response.request)
logging.info(test_response.json())
logging.info(test_response.url)
logging.info(test_response.reason)
The following response i got in header
2017-08-22 12:03:12,811 - INFO - {'Server': 'Apache-Coyote/1.1', 'X-FRAME-OPTIONS': 'SAMEORIGIN, SAMEORIGIN', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'GET, POST, DELETE, PUT', 'Access-Control-Allow-Headers': 'Content-Type', 'Content-Type': 'text/html;charset=utf-8', 'Content-Language': 'en', 'Transfer-Encoding': 'chunked', 'Content-Encoding': 'gzip', 'Vary': 'Accept-Encoding', 'Date': 'Tue, 22 Aug 2017 06:33:12 GMT', 'Connection': 'close'}
And JSON decoding the error
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Can some please help me out, the status code i got is 500
It means that the server didn't return any response values (there was no response body, so json() couldn't decode the JSON). Given the 500 error, that probably means that your API call was bad. Not being familiar with the API, I cannot say more but my guess is that you are passing the arguments wrong, try something like:
test_response = requests.post(api_server,headers=headers,cookies=self.jessionid,
timeout=30, data=arguments)