Getting AttributeError error 'str' object has no attribute 'get' - json

I am getting an error while working with JSON response:
Error: AttributeError: 'str' object has no attribute 'get'
What could be the issue?
I am also getting the following errors for the rest of the values:
***TypeError: 'builtin_function_or_method' object is not subscriptable
'Phone': value['_source']['primaryPhone'],
KeyError: 'primaryPhone'***
# -*- coding: utf-8 -*-
import scrapy
import json
class MainSpider(scrapy.Spider):
name = 'main'
start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']
def parse(self, response):
resp = json.loads(response.body)
values = resp['hits']['hits']
for value in values:
yield {
'Full Name': value['_source']['fullName'],
'Phone': value['_source']['primaryPhone'],
"Email": value['_source']['primaryEmail'],
"City": value.get['_source']['city'],
"Zip Code": value.get['_source']['zipcode'],
"Website": value['_source']['websiteURL'],
"Facebook": value['_source']['facebookURL'],
"LinkedIn": value['_source']['LinkedIn_URL'],
"Twitter": value['_source']['Twitter'],
"BIO": value['_source']['Bio']
}

It's nested deeper than what you think it is. That's why you're getting an error.
Code Example
import scrapy
import json
class MainSpider(scrapy.Spider):
name = 'test'
start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']
def parse(self, response):
resp = json.loads(response.body)
values = resp['hits']['hits']
for value in values:
yield {
'Full Name': value['_source']['fullName'],
'Primary Phone':value['_source']['primaryPhone']
}
Explanation
The resp variable is creating a python dictionary, but there is no resp['hits']['hits']['fullName'] within this JSON data. The data you're looking for, for fullName is actually resp['hits']['hits'][i]['_source']['fullName']. i being an number because resp['hits']['hits'] is a list.
resp['hits'] is a dictionary and therefore the values variable is fine.
But resp['hits']['hits'] is a list, therefore you can't use the get request, and it's only accepts numbers as values within [], not strings. Hence the error.
Tips
Use response.json() instead of json.loads(response.body), since Scrapy v2.2, scrapy now has support for json internally. Behind the scenes it already imports json.
Also check the json data, I used requests for ease and just getting nesting down till I got the data you needed.
Yielding a dictionary is fine for this type of data as it's well structured, but any other data that needs modifying or changing or is wrong in places. Use either Items dictionary or ItemLoader. There's a lot more flexibility in those two ways of yielding an output than yielding a dictionary. I almost never yield a dictionary, the only time is when you have highly structured data.
Updated Code
Looking at the JSON data, there are quite a lot of missing data. This is part of web scraping you will find errors like this. Here we use a try and except block, for when we get a KeyError which means python hasn't been able to recognise the key associated with a value. We have to handle that exception, which we do here by saying to yield a string 'No XXX'
Once you start getting gaps etc it's better to consider an Items dictionary or Itemloaders.
Now it's worth looking at the Scrapy docs about Items. Essentially Scrapy does two things, it extracted data from websites, and it provides a mechanism for storing this data. The way it does this is storing it in a dictionary called Items. The code isn't that much different from yielding a dictionary but Items dictionary allows you to manipulate the extracted data more easily with extra things scrapy can do. You need to edit your items.py first with the fields you want. We create a class called TestItem, we define each field using scrapy.Field(). We then can import this class in our spider script.
items.py
import scrapy
class TestItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
full_name = scrapy.Field()
Phone = scrapy.Field()
Email = scrapy.Field()
City = scrapy.Field()
Zip_code = scrapy.Field()
Website = scrapy.Field()
Facebook = scrapy.Field()
Linkedin = scrapy.Field()
Twitter = scrapy.Field()
Bio = scrapy.Field()
Here we're specifying what we want the fields to be, you can't use a string with spaces unfortunately hence why full name is full_name. The field() creates the field of the item dictionary for us.
We import this item dictionary into our spider script with from ..items import TestItem. The from ..items means we're taking the items.py from the parent folder to the spider script and we're importing the class TestItem. That way our spider can populate the items dictionary with our json data.
Note that just before the for loop we instantiate the class TestItem by item = TestItem(). Instantiate means to call upon the class, in this case it makes a dictionary. This means we are creating the item dictionary and then we populate that dictionary with keys and values. You have to does this before you add your keys and values as you can see from within the for loop.
Spider script
import scrapy
import json
from ..items import TestItem
class MainSpider(scrapy.Spider):
name = 'test'
start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']
def parse(self, response):
resp = json.loads(response.body)
values = response.json()['hits']['hits']
item = TestItem()
for value in values:
try:
item['full_name'] = value['_source']['fullName']
except KeyError:
item['full_name'] = 'No Name'
try:
item['Phone'] = value['_source']['primaryPhone']
except KeyError:
item['Phone'] = 'No Phone number'
try:
item["Email"] = value['_source']['primaryEmail']
except KeyError:
item['Email'] = 'No Email'
try:
item["City"] = value['_source']['activeLocations'][0]['city']
except KeyError:
item['City'] = 'No City'
try:
item["Zip_code"] = value['_source']['activeLocations'][0]['zipcode']
except KeyError:
item['Zip_code'] = 'No Zip code'
try:
item["Website"] = value['AgentMarketingCenter'][0]['Website']
except KeyError:
item['Website'] = 'No Website'
try:
item["Facebook"] = value['_source']['AgentMarketingCenter'][0]['Facebook_URL']
except KeyError:
item['Facebook'] = 'No Facebook'
try:
item["Linkedin"] = value['_source']['AgentMarketingCenter'][0]['LinkedIn_URL']
except KeyError:
item['Linkedin'] = 'No Linkedin'
try:
item["Twitter"] = value['_source']['AgentMarketingCenter'][0]['Twitter']
except KeyError:
item['Twitter'] = 'No Twitter'
try:
item["Bio"]: value['_source']['AgentMarketingCenter'][0]['Bio']
except KeyError:
item['Bio'] = 'No Bio'
yield item

Related

Django View: Return Queryset in JSON Format

i am trying to make the following view with return JsonResponse() at the end work correctly:
def get_data(request):
full_data = Fund.objects.all()
data = {
"test2": full_data.values('investment_strategy').annotate(sum=Sum('commitment')),
}
return JsonResponse(data)
However, I get an error message saying "Object of type QuerySet is not JSON serializable".
When I put the above Queryset in a view with return render() at the end:
def get_more_data(request):
full_data = Fund.objects.all()
data = {"test2": full_data.values('investment_strategy').annotate(sum=Sum('commitment'))}
return render (request, 'test.html', data)
I get the the following result: <QuerySet [{'investment_strategy': 'Buyout', 'sum': 29}, {'investment_strategy': 'Growth', 'sum': 13}, {'investment_strategy': 'Miscellaneous', 'sum': 14}, {'investment_strategy': 'Venture Capital', 'sum': 23}, {'investment_strategy': 'n/a', 'sum': 36}]>
So the queryset works fine, I just have no clue how to return the data in proper Json format (which I would need to use the data charts.js)
I looked through answers for similar questions such as:
TypeError: object is not JSON serializable in DJango 1.8 Python 3.4
Output Django queryset as JSON
etc.
but could not find a meaningful solution for my problem.
Any help would be much appreciated!
So I managed to find a solution, that worked for me - in case anyone else has the same problem. I changed my view to the following:
def get_data(request):
full_data = Fund.objects.all()
full_data_filtered = full_data.values('investment_strategy').annotate(sum=Sum('commitment'))
labels = []
values = []
for d in full_data_filtered:
labels.append(d['investment_strategy'])
values.append(d['sum'])
data = {
"labels": labels,
"values": values,
}
return JsonResponse(data)
So basically I iterate over the Queryset and assign the values I need to lists, which can be passed to JsonResponse. I don't know if this is the most elegant way to do this (sure not), but it works and I can render my data in charts.js
JsonResponse(list(data)) will evaluate the Queryset (actually perform the query to the database) and turn it into a list that can be passed to JsonResponse.
This works because you use values and annotate, so the list is a list of dictionaries containing serializable fields.
In the example you mentioned, it didn't work because the Queryset was just returning a list of model instances, so wrapping in list() isn't enough. If you hadn't added values, you'd have had a list of Fund instances which are not serializable.
The best way I found was to create a custom QuerySet and Manager, it's not a lot of code and it is reusable!
I began by creating the custom QuerySet:
# managers.py but you can do that in the models.py too
from django.db import models
class DictQuerySet(models.QuerySet):
def dict(self):
values = self.values()
result = {}
for value in values:
id = value['id']
result[id] = value # you can customize the content of your dictionary here
return result
Then I created a custom Manager, it is optional but i prefer this way.
# managers.py, also optional
class DictManager(models.Manager):
def get_queryset(self):
return DictQuerySet(self.model, using=self._db)
Change the default manager in your model:
# models.py
from .managers import DictManager
class Fund(models.Model):
# ...
objects = DictManager()
# ...
And now you can call the dict() method from the query
# views.py
def get_data(request):
full_data = Fund.objects.all().dict()
return JsonResponse(full_data)
The response will be the full_data as a dictionary of dictionaries and each key is the primary key of the corresponding object.
If you intend to keep the same format for your JSONs, then you can use the same custom manager for all your models.

Return proper JSON from Django JsonResponse

My Model:
class Person(models.Model):
first_name = models.CharField(max_length=30)
last_name = models.CharField(max_length=30)
phone = models.CharField(max_length=20)
email = models.EmailField()
My View:
def users(request):
people = Person.objects.all()
data = serializers.serialize('json', people)
return JsonResponse(data, safe=False)
All I want back is the data in JSON format. What I'm getting back is this:
"[{\"model\": \"myapp.person\", \"pk\": 1, \"fields\": {\"first_name\": \"ahmet\", \"last_name\": \"arsan\", \"phone\": \"xxx-xxx-xxxx\", \"email\": \"aarsan#xxxxxxxx.com\"}}]"
While technically that is valid JSON, there are 2 problems (for me) with this response:
I don't want those double quotes escaped.
I don't need the model name (myapp.person).
I don't know if/what I'm doing wrong, but it seems like something is off here. Perhaps my query should be returning a dict but I don't know how to get it to do that. I am using Django 1.10.1, Python 3.4.
You already have an answer, but what you're doing wrong is double encoding. JsonResponse serialises to json, but you already have json as that's what's returned from the serialiser.
Either serialise to "python" or use a standard HttpResponse.
I am assuming you are asking this question for an API response. I would suggest using Rest Framework for that as it makes things very easy. You can select your own fields by writing your own serializers for the model.
from rest_framework import serializers
class PersonSerializer(serializers.ModelSerializer):
class Meta:
model = Person
fields = ('first_name', 'last_name', 'phone', 'email')

HTTPResponse' object has no attribute 'decode

I was getting the following error initially when I was trying to run the code below-
Error:-the JSON object must be str, not 'bytes'
import urllib.request
import json
search = '230 boulder lane cottonwood az'
search = search.replace(' ','%20')
places_api_key = 'AIzaSyDou2Q9Doq2q2RWJWncglCIt0kwZ0jcR5c'
url = 'https://maps.googleapis.com/maps/api/place/textsearch/json?query='+search+'&key='+places_api_key
json_obj = urllib.request.urlopen(url)
data = json.load(json_obj)
for item in data ['results']:
print(item['formatted_address'])
print(item['types'])
After making some troubleshooting changes like:-
json_obj = urllib.request.urlopen(url)
obj = json.load(json_obj)
data = json_obj .readall().decode('utf-8')
Error - 'HTTPResponse' object has no attribute 'decode'
I am getting the error above, I have tried multiple posts on stackoverflow nothing seem to work. I have uploaded the entire working code if anyone can get it to work I'll be very grateful. What I don't understand is that why the same thing worked for others and not me.
Thanks!
urllib.request.urlopen returns an HTTPResponse object which cannot be directly json decoded (because it is a bytestream)
So you'll instead want:
# Convert from bytes to text
resp_text = urllib.request.urlopen(url).read().decode('UTF-8')
# Use loads to decode from text
json_obj = json.loads(resp_text)
However, if you print resp_text from your example, you'll notice it is actually xml, so you'll want an xml reader:
resp_text = urllib.request.urlopen(url).read().decode('UTF-8')
(Pdb) print(resp_text)
<?xml version="1.0" encoding="UTF-8"?>
<PlaceSearchResponse>
<status>OK</status>
...
update (python3.6+)
In python3.6+, json.load can take a byte stream (and json.loads can take a byte string)
This is now valid:
json_obj = json.load(urllib.request.urlopen(url))

Return QuerySet as JSON?

I'm working in Django 1.8 and having trouble finding the modern way to do this.
This is what I've got, based on Googling and this blog post:
results = PCT.objects.filter(code__startswith='a')
json_res = []
for result in results:
json_res.append(result.as_dict())
return HttpResponse(json.dumps(json_res), content_type='application/json')
However this gives me 'PCT' object has no attribute 'as_dict'.
Surely there must be a neater way by now?
I was wondering if it was possible to use JSONResponse but frustratingly, the docs give no example of how to use JSONRespose with a queryset, which must be the most common use case. I have tried this:
results = PCT.objects.filter(code__startswith='a')
return JsonResponse(results, safe=False)
This gives [<PCT: PCT object>, <PCT: PCT object>] is not JSON serializable.
Simplest solution without any additional framework:
results = PCT.objects.filter(code__startswith='a').values('id', 'name')
return JsonResponse({'results': list(results)})
returns {'results': [{'id': 1, 'name': 'foo'}, ...]}
or if you only need the values:
results = PCT.objects.filter(code__startswith='a').values_list('id', 'name')
return JsonResponse({'results': list(results)})
returns {'results': [[1, 'foo'], ...]}
use values() to return a querydict, and pass that to json.dumps
values = PCT.objects.filter(code__startswith='a').values()
return HttpResponse(json.dumps(values), content_type='application/json')
https://docs.djangoproject.com/en/1.8/ref/models/querysets/#values
Most of these answers are out of date. Here's what I use:
views.py (returns HTML)
from django.shortcuts import render
from django.core import serializers
def your_view(request):
data = serializers.serialize('json', YourModel.objects.all())
context = {"data":data}
return render(request, "your_view.html", context)
views.py (returns JSON)
from django.core import serializers
from django.http import HttpResponse
def your_view(request):
data = serializers.serialize('json', YourModel.objects.all())
return HttpResponse(data, content_type='application/json')
Take a look at Django's serialization framework. It allows not only the XML format, but also JSON and YAML.
The accepted answer, using JsonResponse, is nice and simple. However, it does not return complete objects.
An alternative is to use Django's serializers. Here's an example copied verbatim from the admin actions documentation:
...
response = HttpResponse(content_type="application/json")
serializers.serialize("json", queryset, stream=response)
return response
This is very similar to what happens in Django's JsonResponse, as can be seen in the source.
The main difference is that JsonResponse calls json.dumps() directly, and does not know how to handle querysets, whereas the example above uses serializers.serialize('json', ...), which does know how to handle querysets, and returns complete objects that can also be de-serialized later on.
If you want to save directly to file (using content-disposition: attachment to open a save dialog in the browser), you could use a FileResponse, for example:
...
data = serializers.serialize('json', queryset)
return FileResponse(
io.BytesIO(data.encode('utf-8')),
content_type='application/json',
as_attachment=True,
filename=f'{queryset.model.__name__.lower()}-objects.json'
)

How to save JSON from Vk.com API response to MongoDB?

I am trying to get list of wall post from Vk.com using Vontakte, mongoengine, Django
Here is my view, where wallposts = vk.get('wall.get', owner_id=237897731, offset=0, count=10) is Vk.com API call:
import vkontakte
vk = vkontakte.API(token=access_token)
class VkWallPostListView(ListView):
model = VkWallPost
context_object_name = "vk_list"
def get_template_names(self):
return ["blog/vk_list.html"]
def get_queryset(self):
wallposts = VkWallPost.objects
if 'all_posts' not in self.request.GET:
wallposts = vk.get('wall.get', owner_id=237897731, offset=0, count=10)
for wallpost in wallposts:
wallpost.save()
#wallposts = wallposts.filter(text__startswith='RT')
tag = self.request.GET.get('tag', None)
if tag:
wallposts = wallposts.filter(tags=tag)
return wallposts
Also in this view i am trying to save results of API call to MongoDB right after the actual call:
for wallpost in wallposts:
wallpost.save()
But in browser i see an error:
Exception Value:
'int' object has no attribute 'save'
Exception Location: c:\Users\JOOMLER\BitNami_DjangoStack\django_mongo_test\blog\views.py in get_queryset, line 109
If i remove this two strings for cycle all works fine and shows on the fly data from Vk.com in browser. But i want to save it for later use. So, i assume the problem is how to save JSON response to MongoDB ?
That was simple and i am surprised that no answers:
i need to use raw Pymongo:
wallpost.save()
is not correct, because save django model with predefined fields
VkWallPost._get_collection().insert(vk_post)
this is correct - we use raw pymongo insert
Well, did you see what do you have in wallposts variable?
Cause, I think, error message is pretty clear: in cycle variable wallpost you have int value. And, of course, trying to call save() will throw exception. May be, objects in wallposts have integer indices, do you think?
Try to print what you have in wallposts.