Python 3 urllib HTTP Error 412: Precondition Failed - html

I'm trying to parse HTML data of a website. I wrote this code:
import urllib.request
def parse(url):
response = urllib.request.urlopen(url)
html = response.read()
strHTML = html.decode()
return strHTML
website = "http://www.manarat.ac.bd/"
string = parse(website)
but it is showing this error:
Traceback (most recent call last):
File "C:\Users\pupewekate\Videos\RAW\2.py", line 11, in
string = parse(website)
File "C:\Users\pupewekate\Videos\RAW\2.py", line 5, in parse
response = urllib.request.urlopen(url)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py",
line 223, in urlopen return opener.open(url, data, timeout)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py",
line 532, in open response = meth(req, response)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py",
line 642, in http_response 'http', request, response, code, msg,
hdrs)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py",
line 570, in error return > self._call_chain(*args)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py",
line 504, in _call_chain result = func(*args)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py",
line 650, in http_error_default raise HTTPError(req.full_url, code,
msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 412: Precondition
Failed
Any solution?

This website checks the user agent header. If it doesn't recognize its value it returns status code 412:
import requests
print(requests.get('http://www.manarat.ac.bd/'))
# <Response [412]>
print(requests.get('http://www.manarat.ac.bd/', headers={'User-Agent': 'Chrome'}))
# <Response [200]>
See this answer for how to set user agent in urlib.

You could use requests module as it is easier to implement, else if you are determined to use urllib, you can use this:
import urllib
def parse(url):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
response = urllib.request.urlopen(url,headers=headers)
print response
website = "http://www.manarat.ac.bd/"
string = parse(website)

Related

Sonoff 3.6.0 DIY v2.0

I bought a Sonoff Basic R3 to use use in DIY mode with a Raspberry Pi. The firmware I've confirmed is 3.6.0 so DIY v2.0. The DIY connection process works fine using 10.10.7.1 and it connects to the router and I have assigned it a static IP address. I can ping it no problem.
Ping statistics for 10.0.0.35:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 3ms, Maximum = 97ms, Average = 53ms
But when I run the Raspberry Pi (connected to the same router with the same subnet etc) http post json code
import requests
import json
#API details
url = "https://10.0.0.35:8081/zeroconf/switch"
body = {"deviceid": "", "data": {"switch": "on"} }
headers = {'Content-Type': 'application/json'}
#Making http post request
response = requests.post(url, headers=headers, data=body, verify=False)
print(response)
It just hangs up until I keyboard interrupt (Ctrl-C). I'm pretty sure it is some sort of connection/port issue as the outcome of the code is the same whether or not the device is powered up (I've also tried a number of variations of the json post code using info, switch: off etc all with the same outcome).
^CTraceback (most recent call last):
File "/home/pi/Programmes/sonoff3.py", line 10, in <module>
response = requests.post(url, headers=headers, data=body, verify=False)
File "/usr/lib/python3/dist-packages/requests/api.py", line 119, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/lib/python3/dist-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.9/http/client.py", line 1347, in getresponse
response.begin()
File "/usr/lib/python3.9/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.9/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.9/socket.py", line 704, in readinto
return self._sock.recv_into(b)
KeyboardInterrupt
So after 8 hours of trying yesterday and another few today I'm at a loss so any pointers as to what else to try would be great. Thanks.

How to parse JSON data from Digital Ocean API

Hello there i am trying to build a App Out of Digital Ocean API so basically i am sending a request to https://api.digitalocean.com/v2/droplets using the requests library(Python) and here's my code
import requests
host = "https://api.digitalocean.com"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer MYTOKEN"
}
dataa = {}
api = requests.post(f"{host}/v2/droplets", headers= headers, data=json.dumps(dataa))
And then try to access the info returned by API by data = api.json()
and if i try to print the id by running print(data['droplet']['id']) i running onto this error
Traceback (most recent call last):
File "/home/gogamic/code/gogamic-website/functions.py", line 276, in <module>
create_new_server('mai#gogamic.com', 2)
File "/home/gogamic/code/gogamic-website/functions.py", line 260, in create_new_server
server_info = json.loads(infoo.json())
File "/usr/lib/python3/dist-packages/requests/models.py", line 898, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 525, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
This is the returned JSON From the API
That API method is a GET not a POST:
https://developers.digitalocean.com/documentation/v2/#list-all-droplets
Your code works for me replacing:
api = requests.post(f"{host}/v2/droplets",
headers= headers,
data=json.dumps(dataa))
With:
api = requests.get(f"{host}/v2/droplets",
headers= headers,
data=json.dumps(dataa))
And per #fixatd adding:
print(api)
Yields:
<Response [200]>
NOTE I have no droplets to list.
For completeness, create a droplet and re-run:
doctl compute droplet create stackoverflow-65092533 \
--region sfo3 \
--size s-1vcpu-2gb \
--ssh-keys ${KEY} \
--tag-names stackoverflow \
--image ubuntu-20-10-x64
Then:
Using:
content = resp.json()
if resp.status_code != 200:
print("Unexpected status code: {}".format(resp.status_code))
quit()
for droplet in content["droplets"]:
print("ID: {}\tName: {}".format(droplet["id"], droplet["name"]))
Yields:
ID: 219375538 Name: stackoverflow-65092533

urllib2.URLError: <urlopen error [Errno 8]

import urllib2
import urllib
import json
url = "http://ajax/googleapis.com/ajax/services/search/web?v=1.0&"
query = raw_input ("What do you want to search for ? >> ")
query = urllib.urlencode({'q': query})
response = urllib2.urlopen (url + query).read()
data = json.loads (response)
results = data ['responseData'] ['results']
for result in results:
title = result['title']
url = result['url']
print (title + ';' + url)
ERROR
/System/Library/Frameworks/Python.framework/Versions/2.6/bin/python2.6 /Users/dragonleo/PycharmProjects/untitled2/googleapi
What do you want to search for ? >> apple
Traceback (most recent call last):
File "/Users/dragonleo/PycharmProjects/untitled2/googleapi", line 8, in
response = urllib2.urlopen (url + query).read()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 1181, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 1156, in do_open
raise URLError(err)
urllib2.URLError:
Appreciate if expert can explain why I am getting the error
Two problems stand out immediately:
There are multiple typos in the code above. Specifically, there are no spaces between brackets and parens. Also, the URL should be ajax.googleapis.com.
The Google Web Search API is no longer available. You should migrate to the Google Custom Search API

urllib & python3: HTTP Error 405: Method Not Allowed

I am trying to do a simple authentication using Python3 and urllib on an API that should return account balances.
The code I have is the following:
import urllib
import urllib.request
import json
id = "nkkhuz6" # fake
secret = "s9MeR0J9yxtndLBPVA" # fake
auth_str = id + ":" + secret
def getBalances():
values = {'u' : auth_str}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8') # data should be bytes
request = urllib.request.Request(url = "https://api.com", data = data)
with urllib.request.urlopen(request) as f:
print(json.loads(f.read().decode('utf-8')))
However when I run getBalances() I get the following errors:
Adriaans-MacBook-Pro:Documents adriaanjoubert$ python3 main.py
Traceback (most recent call last):
File "main.py", line 96, in <module>
getBalances()
File "main.py", line 19, in getBalances
with urllib.request.urlopen(request) as f:
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 161, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 469, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 579, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 507, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 441, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 587, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 405: Method Not Allowed
I am sure the URL is correct and if I append a trailing / I get the error urllib.error.HTTPError: HTTP Error 404: Not Found
When I run the following code I do get my account balances:
cmd = """curl -u """ + auth_str + """ https://api.com/"""
os.system(cmd)
What am I doing wrong? I would like to use urllib so that I can store the stuff I get back from the API in a variable.

How to pass urls from CSV list into a python GET request

I have a CSV file, which contains a list of Google extension IDs.
I'm writing a code that will read the extension IDs, add the webstore url, then perform a basic get request:
import csv
import requests
with open('small.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
urls = "https://chrome.google.com/webstore/detail/" + row[0]
print urls
r = requests.get([urls])
Running this code results in the following Traceback:
Traceback (most recent call last):
File "C:\Users\tom\Dropbox\Python\panya\test.py", line 9, in <module>
r = requests.get([urls])
File "C:\Python27\lib\site-packages\requests\api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 567, in send
adapter = self.get_adapter(url=request.url)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 641, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
InvalidSchema: No connection adapters were found for '['https://chrome.google.com/webstore/detail/blpcfgokakmgnkcojhhkbfbldkacnbeo']'
How can revise the code, so that it would accept the urls in the list, and make the GET request?
requests.get expects a string, but you're creating and passing a list [urls]
r = requests.get([urls])
Change it to just
r = requests.get(urls)
and it should work.