Here is sample json reply needed for jquery ui autocomplete. Looks like only label and value are needed in my case.
I have the following code:
class City(db.Model):
'''Storage for cities ids.
Index
key_name: id of the city
parent: Country of the city
'''
city_name = db.StringProperty()
term = self.request.get('term')
query = City.all()
query.filter('city_name >=', term)
query.filter('city_name <=', unicode(term) + u"\ufffd")
cities = query.fetch(20, 0)
How to format result as json in the format like value = city_name, id = key_name?
I've also seen the following code somewhere, but it doesn't work for me:
map(lambda x: x.city_name(), cities)
You can use simplejson which is included in django.utils:
from django.utils import simplejson as json
Then create an array of dictionaries and json encode it:
city_array = []
for city in cities:
city_array.append({'value': city.city_name,
'label': city.city_name,
'id': city.key().name()})
json_message = json.dumps(city_array)
Related
I have been trying to scrape some data but keep getting a blank value or None. I've tried doing next sibling and failed (I probably did it wrong). Any and all help is greatly appreciated. Thank you in advance.
Website to scrape (final): https://www.unegui.mn/azhild-avna/ulan-bator/
Website to test (current, has less listings): https://www.unegui.mn/azhild-avna/mt-hariltsaa-holboo/slzhee-tehnik-hangamzh/ulan-bator/
Code Snippet:
def parse(self, response, **kwargs):
cards = response.xpath("//li[contains(#class,'announcement-container')]")
# parse details
for card in cards:
company = card.xpath(".//*[#class='announcement-block__company-name']/text()").extract_first()
date_block = card.xpath("normalize-space(.//div[contains(#class,'announcement-block__date')]/text())").extract_first().split(',')
date = date_block[0]
city = date_block[1]
item = {'date': date,
'city': city,
'company': company
}
HTML Snippet:
<div class="announcement-block__date">
<span class="announcement-block__company-name">Электро экспресс ХХК</span>
, Өчигдөр 13:05, Улаанбаатар</div>
Expected Output:
date = Өчигдөр 13:05
city = Улаанбаатар
UPDATE: I figured out how to get my date and city data. I ended up using follow next sibling to get date, split by comma, and get the 2nd and 3rd values.
date_block = card.xpath("normalize-space(.//div[contains(#class,'announcement-block__date')]/span/following-sibling::text())").extract_first().split(',')
date = date_block[1]
city = date_block[2]
Extra:
If anyone can tell me or refer me to how I can setup my pipeline file would be greatly appreciated. Is it correct to use pipeline or should you use items.py? Currently I have 3 spiders in the same project folder: apartments, jobs, cars. I need to clean my data and transform it. For example, for the jobs spider I am currently working on as shown above I want to create the following manipulations:
if salary is < 1000, then replace with string 'Negotiable'
if date contains the text "Өчигдөр" then replace with 'Yesterday'
without deleting the time
if employer contains value 'Хувь хүн' then change company value to 'Хувь хүн'
my pipelines.py file:
from itemadapter import ItemAdapter
class ScrapebooksPipeline:
def process_item(self, item, spider):
return item
my items.py file:
import scrapy
class ScrapebooksItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
pass
I changed your xpath to a smaller scope.
extract_first() will get the first instance, so use getall() instead.
In order to get the date I had to use regex (most of the results have time but not date so if you get a blank for the date it's perfectly fine).
I can't read the language so I had to guess (kind of) for the city, but even if it's wrong you can get the point.
import scrapy
import re
class TempSpider(scrapy.Spider):
name = 'temp_spider'
allowed_domains = ['unegui.mn']
start_urls = ['https://www.unegui.mn/azhild-avna/ulan-bator/']
def parse(self, response, **kwargs):
cards = response.xpath('//div[#class="announcement-block__date"]')
# parse details
for card in cards:
company = card.xpath('.//span/text()').get()
date_block = card.xpath('./text()').getall()
date = date_block[1].strip()
date = re.findall(r'(\d+-\d+-\d+)', date)
if date:
date = date[0]
else:
date = ''
city = date_block[1].split(',')[2].strip()
item = {'date': date,
'city': city,
'company': company
}
yield item
Output:
[scrapy.core.scraper] DEBUG: Scraped from <200 https://www.unegui.mn/azhild-avna/ulan-bator/>
{'date': '2021-11-07', 'city': 'Улаанбаатар', 'company': 'Arirang'}
[scrapy.core.scraper] DEBUG: Scraped from <200 https://www.unegui.mn/azhild-avna/ulan-bator/>
{'date': '2021-11-11', 'city': 'Улаанбаатар', 'company': 'Altangadas'}
[scrapy.core.scraper] DEBUG: Scraped from <200 https://www.unegui.mn/azhild-avna/ulan-bator/>
...
...
...
Looks like you are missing indentation.
Instead
def parse(self, response, **kwargs):
cards = response.xpath("//li[contains(#class,'announcement-container')]")
# parse details
for card in cards: date_block = card.xpath("normalize-space(.//div[contains(#class,'announcement-block__date')]/text())").extract_first().split(',')
date = date_block[0]
city = date_block[1]
Try this:
def parse(self, response, **kwargs):
cards = response.xpath("//li[contains(#class,'announcement-container')]")
# parse details
for card in cards: date_block = card.xpath("normalize-space(.//div[contains(#class,'announcement-block__date')]/text())").extract_first().split(',')
date = date_block[0]
city = date_block[1]
I'm using marshmallow for validate some fields of a schema.
class PlantDetailsSchema(Schema):
name: fields.Str(required=True, validate=validate.Length(min=3)),
sprout-time: fields.Str(required=True, validate=validate.Length(min=3)),
full-growth: fields.Str(required=True, validate=validate.Length(min=5)),
edible: fields.Bool(required=True)
class PlantInfoSchema(Schema):
plant = fields.Nested(PlantDetailsSchema)
num = fields.Int(required=True, validate=validate.Range(min=1))
def validate_json(json):
try:
schema = PlantInfoSchema().load(json)
ret = schema.dump()
return ret
except ValidationError:
return None
The json element to validate is:
json = {'plant': {'name': 'Bonsai', 'sprout-time': '3 months', 'full-growth': '2 years', 'edible': False}, 'num': 3}
My issue is how to validate fields that contains a dash in the middle of the name (like 'sprout-time' and 'full-growth'. Do you have some ideas to solve this issue?
You can use the 'data_key' attribute to specify the key name with dash. Define the field name something which is valid like 'sprout_time' (could be anything) and data_key as an attribute to the field with the actual name 'sprout-time'.
class PlantDetailsSchema(Schema):
name = fields.Str(required=True, validate=validate.Length(min=3))
sprout_time = fields.Str(
data_key='sprout-time',
required=True, validate=validate.Length(min=3)
)
full_growth = fields.Str(
data_key='full-growth',
required=True,
validate=validate.Length(min=5)
)
edible = fields.Bool(required=True)
I have a JSON which looks like this
{"name":"Michael", "cities":["palo alto", "menlo park"], "schools":[{"sname":"stanford", "year":2010}, {"sname":"berkeley","year":2012}]}
I want to store output in a csv file like this:
Michael,{"sname":"stanford", "year":2010}
Michael,{"sname":"berkeley", "year":2012}
I have tried the following:
val people = sqlContext.read.json("people.json")
val flattened = people.select($"name", explode($"schools").as("schools_flat"))
The above code does not give schools_flat as a json.
Any ides on how to get the expected output.
Thanks
You need to specify schema explicitly to read the json file in the desired way.
In this case it would be like this:
import org.apache.spark.sql.catalyst.ScalaReflection
import org.apache.spark.sql.types.StructType
case class json_schema_class( cities: String, name : String, schools: Array[String])
var json_schema = ScalaReflection.schemaFor[json_schema_class].dataType.asInstanceOf[StructType]
var people = sqlContext.read.schema( json_schema ).json("people.json")
var flattened = people.select($"name", explode($"schools").as("schools_flat"))
The 'flattened' dataframe is like this:
+-------+--------------------+
| name| schools_flat|
+-------+--------------------+
|Michael|{"sname":"stanfor...|
|Michael|{"sname":"berkele...|
+-------+--------------------+
I am working on a project and i am required to store information entered into a form to a database column as json. The form does not have a model of its own but all its values will be stored as json into a column of another model. Here is the model:
class Document(models.Model):
user = models.ForeignKey(User)
document = models.JSONField(default = {})
category = models.CharField(max_length=255)
Now i am required to store json data from different forms (different categorys ) into the column document. Here is one category of such forms:
class InformalLetterForm(forms.Form):
sender_name = forms.CharField(max_length=45)
sender_address = forms.CharField(max_length=255)
date = forms.DateTimeField()
message_body = forms.CharField()
receiver_name = forms.CharField(max_length=255)
How do i serialize data entered in such a form to a json object to be stored in a database column (i.e the column document above).
i have searched online but i have seen serialization done only for data from model forms.
Thanks for any help..
You can call the .cleaned_data attribute from the Form, it will return a dictionary with the form's data in python, then you can call the .dumps() method from json python's library. Let's take an example from the docs:
>>> data = {'subject': 'hello',
... 'message': 'Hi there',
... 'sender': 'foo#example.com',
... 'cc_myself': True}
>>> f = ContactForm(data)
>>> f.is_valid()
True
>>> f.cleaned_data
{'cc_myself': True, 'message': 'Hi there', 'sender': 'foo#example.com', 'subject': 'hello'}
Here you have a dictionary with your data, now let's make it a json:
import json
# An example simple dict
d = {'a': 1, 'b': 2}
json.dumps(d)
# '{"a": 1, "b": 2}'
I have a function which returns json data as history from Version of reversion.models.
from django.http import HttpResponse
from reversion.models import Version
from django.contrib.admin.models import LogEntry
import json
def history_list(request):
history_list = Version.objects.all().order_by('-revision__date_created')
data = []
for i in history_list:
data.append({
'date_time': str(i.revision.date_created),
'user': str(i.revision.user),
'object': i.object_repr,
'field': i.revision.comment.split(' ')[-1],
'new_value_field': str(i.field_dict),
'type': i.content_type.name,
'comment': i.revision.comment
})
data_ser = json.dumps(data)
return HttpResponse(data_ser, content_type="application/json")
When I run the above snippet I get the output json as
[{"type": "fruits", "field": "colour", "object": "anyobject", "user": "anyuser", "new_value_field": "{'price': $23, 'weight': 2kgs, 'colour': 'red'}", "comment": "Changed colour."}]
From the function above,
'comment': i.revision.comment
returns json as "comment": "changed colour" and colour is the field which I have written in the function to retrieve it from comment as
'field': i.revision.comment.split(' ')[-1]
But i assume getting fieldname and value from field_dict is a better approach
Problem: from the above json list I would like to filter new_field_value and old_value. In the new_filed_value only value of colour.
Getting the changed fields isn't as easy as checking the comment, as this can be overridden.
Django-reversion just takes care of storing each version, not comparing.
Your best option is to look at the django-reversion-compare module and its admin.py code.
The majority of the code in there is designed to produce a neat side-by-side HTML diff page, but the code should be able to be re-purposed to generate a list of changed fields per object (as there can be more than one changed field per version).
The code should* include a view independent way to get the changed fields at some point, but this should get you started:
from reversion_compare.admin import CompareObjects
from reversion.revisions import default_revision_manager
def changed_fields(obj, version1, version2):
"""
Create a generic html diff from the obj between version1 and version2:
A diff of every changes field values.
This method should be overwritten, to create a nice diff view
coordinated with the model.
"""
diff = []
# Create a list of all normal fields and append many-to-many fields
fields = [field for field in obj._meta.fields]
concrete_model = obj._meta.concrete_model
fields += concrete_model._meta.many_to_many
# This gathers the related reverse ForeignKey fields, so we can do ManyToOne compares
reverse_fields = []
# From: http://stackoverflow.com/questions/19512187/django-list-all-reverse-relations-of-a-model
changed_fields = []
for field_name in obj._meta.get_all_field_names():
f = getattr(
obj._meta.get_field_by_name(field_name)[0],
'field',
None
)
if isinstance(f, models.ForeignKey) and f not in fields:
reverse_fields.append(f.rel)
fields += reverse_fields
for field in fields:
try:
field_name = field.name
except:
# is a reverse FK field
field_name = field.field_name
is_reversed = field in reverse_fields
obj_compare = CompareObjects(field, field_name, obj, version1, version2, default_revision_manager, is_reversed)
if obj_compare.changed():
changed_fields.append(field)
return changed_fields
This can then be called like so:
changed_fields(MyModel,history_list_item1, history_list_item2)
Where history_list_item1 and history_list_item2 correspond to various actual Version items.
*: Said as a contributor, I'll get right on it.