error in python web scraping code: HTTP error 303 - json

I am writing code to search the New York Times API for news articles and to print information from those articles.
I have been able to get the URLs for the news articles, but when I try to read the information from them, I am getting the following error:
urllib.error.HTTPError: HTTP Error 303: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
See Other
My code is pasted below and I commented the line where I am getting the error. The thing that confuses me is that urlopen worked fine the first time I used it to read information from NYT's API, but resulted in an error the second time when I try to read information from the URL of the specific article.
from urllib.request import urlopen
import json
url = 'http://api.nytimes.com/svc/search/v2/articlesearch.json?q=olympics&api-key=mykey'
request = urlopen(url) #no problems here
request = request.readall().decode('utf-8')
json_obj = json.loads(request)
for i, item in enumerate(json_obj):
title = json_obj.get('response').get('docs')[i].get('headline').get('main')
web_url = json_obj.get('response').get('docs')[i].get('web_url')
print(i+1)
print(title)
print(web_url)
print()
request = urlopen(web_url) #This is where I am getting the error
request = request.read().decode('utf-8')
print (request)
I am using portable python 3.2.5.1
I am also new to using python. Most of my programming experience is with java.

Related

pytest - accessing flask response parameters [duplicate]

This question already exists:
Flask response - TypeError: 'NoneType' object is not callable [duplicate]
Closed 1 year ago.
I am writing code, in order to test a flask REST server.
One of the responses of the server is:
return make_response(json.dumps({'myName': userName}), 200)
I can have access to the response's status code with:
response.status_code
But how do i have access to the myName parameter of the server's response?
This question has very little to do with Pytest. All you're doing is parsing a JSON body into a dictionary and accessing its properties. That's all doable with requests library and Python itself.
I recommend reading requests documentation here, the first code block is enough in your example.
So a solution for your problem is:
import requests
my_name_property = requests.get("server_url").json()["myName"]
or in more steps:
import requests
response = requests.get("server_url")
response_body = response.json()
my_name_property = response_body["myName"]
You can't use response.myName, that's not how you access properties of a dictionary in Python. This would work with e.g. namedtuple in Python, or in Javascript. But not in Python when using dictionaries, which is what json() method returns in the examples above.
If something doesn't work, I recommend using Postman or any other such client to figure out what's going on on the API, how to call it, what response you get, in what structure, and once you understand it, write some tests in Python.

How can I response function blocked by website?

I am trying to grab the data table in url = 'https://quotes.wsj.com/index/HK/XHKG/HSI/historical-prices/download?num_rows=150&range_days=150&endDate=02/29/2020'. I clicked on the link and a csv file will be download. Therefore, the link is correct. However, from the response.content, there is message saying "The request could not be satisfied.Request blocked." Can server distinguish manual click and python Request and block it? Any way to work around it?
My codes:
import requests
url = 'https://quotes.wsj.com/index/HK/XHKG/HSI/historical-prices/download?num_rows=150&range_days=150&endDate=02/29/2020'
response = requests.get(url)
print(response.content)
open('wsj.csv', 'wb').write(response.content)

Getting data from API using json and requests module

Hi I am trying to get the data in the attached image using python using json & requests module.
If I use the following link the code works:
https://min-api.cryptocompare.com/data/price?fsym=ETH&tsyms=BTC,USD,EUR
However when I use the desired path as in the attached image I get the following error:
raise JSONDecodeError("Expecting value", s, err.value) from None
JSONDecodeError: Expecting value
Does anyone know what is going wrong? Code is below:
import json
import requests
url = "https://dpssdata.coherent.com/rest.aspx?products&version=2&breaks=1.json"
r = requests.get(url)
json_data = r.json()
It looks like you're getting redirected to login.
r = requests.get(url, allow_redirects=False)
>>> r
<Response [302]>
>>> r.text
'<html><head><title>Object moved</title></head><body>\r\n<h2>Object moved to here.</h2>\r\n</body></html>\r\n'
Presumably you have logged in in your browser. So you'll need to securely store the login token and use that in your auth header. It looks like the site supports X.509 Certificates also which would probably be the way to go for automated access.

How to get JSON data in an Odoo controller?

I am trying to send some JSON data to an Odoo controller, but when I send the request, I always get 404 as response.
This is the code of my controller:
import openerp.http as http
import logging
_logger = logging.getLogger(__name__)
class Controller(http.Controller):
#http.route('/test/result', type='json', auth='public')
def index(self, **args):
_logger.info('The controller is called.')
return '{"response": "OK"}'
Now, I type the URL (http://localhost:8069/test/result) on the browser to check if it is available, and I get function index at 0x7f04a28>, /test/result: Function declared as capable of handling request of type 'json' but called with a request of type 'http'. This way I know that the controller is listening at that URL and is expecting JSON data.
So I open a Python console and type:
import json
import requests
data = {'test': 'Hello'}
data_json = json.dumps(data)
r = requests.get('http://localhost:8069/test/result', data=data_json)
When I print r in the console, it returns <Response [404]>, and I cannot see any message in the log (I was expecting The controller is called.).
There is a similar question here, but it is not exactly the same case:
OpenERP #http.route('demo_json', type="json") URL not displaying JSON Data
Can anyone help me? What am I doing wrong?
I have just solved the problem.
Firstly, as #techsavvy told, I had to modify the decorator, to write type='http' instead of type='json'.
And after that, the request from the console returned a 404 error because it did not know which database it was sending data to. In localhost:8069 I had more than one database. So I tried to have only one at that port. And that is, now it works great!
To manage that without removing any of the other databases, I have just modified the config file to change the parameter db_filter and put there a regular expression which only included my current database.
I have just gone through your issue and I noticed that you have written JSON route which is call from javascript. if you want to call it from browser url hit then you have to define router with type="http" and auth="public" argument in route:
#http.route('/', type='http', auth="public", website=True)

Where do I find the Google Places API Client Library for Python?

It's not under the supported libraries here:
https://developers.google.com/api-client-library/python/reference/supported_apis
Is it just not available with Python? If not, what language is it available for?
Andre's answer points you at a correct place to reference the API. Since your question was python specific, allow me to show you a basic approach to building your submitted search URL in python. This example will get you all the way to search content in just a few minutes after you sign up for Google's free API key.
ACCESS_TOKEN = <Get one of these following the directions on the places page>
import urllib
def build_URL(search_text='',types_text=''):
base_url = 'https://maps.googleapis.com/maps/api/place/textsearch/json' # Can change json to xml to change output type
key_string = '?key='+ACCESS_TOKEN # First think after the base_url starts with ? instead of &
query_string = '&query='+urllib.quote(search_text)
sensor_string = '&sensor=false' # Presumably you are not getting location from device GPS
type_string = ''
if types_text!='':
type_string = '&types='+urllib.quote(types_text) # More on types: https://developers.google.com/places/documentation/supported_types
url = base_url+key_string+query_string+sensor_string+type_string
return url
print(build_URL(search_text='Your search string here'))
This code will build and print a URL searching for whatever you put in the last line replacing "Your search string here". You need to build one of those URLs for each search. In this case I've printed it so you can copy and paste it into your browser address bar, which will give you a return (in the browser) of a JSON text object the same as you will get when your program submits that URL. I recommend using the python requests library to get that within your program and you can do that simply by taking the returned URL and doing this:
response = requests.get(url)
Next up you need to parse the returned response JSON, which you can do by converting it with the json library (look for json.loads for example). After running that response through json.loads you will have a nice python dictionary with all your results. You can also paste that return (e.g. from the browser or a saved file) into an online JSON viewer to understand the structure while you write code to access the dictionary that comes out of json.loads.
Please feel free to post more questions if part of this isn't clear.
Somebody has written a wrapper for the API: https://github.com/slimkrazy/python-google-places
Basically it's just HTTP with JSON responses. It's easier to access through JavaScript but it's just as easy to use urllib and the json library to connect to the API.
Ezekiel's answer worked great for me and all of the credit goes to him. I had to change his code in order for it to work with python3. Below is the code I used:
def build_URL(search_text='',types_text=''):
base_url = 'https://maps.googleapis.com/maps/api/place/textsearch/json'
key_string = '?key=' + ACCESS_TOKEN
query_string = '&query=' + urllib.parse.quote(search_text)
type_string = ''
if types_text != '':
type_string = '&types='+urllib.parse.quote(types_text)
url = base_url+key_string+query_string+type_string
return url
The changes were urllib.quote was changed to urllib.parse.quote and sensor was removed because google is deprecating it.