Converting mongoengine objects to JSON - json

i tried to fetch data from mongodb using mongoengine with flask. query is work perfect the problem is when i convert query result into json its show only fields name.
here is my code
view.py
from model import Users
result = Users.objects()
print(dumps(result))
model.py
class Users(DynamicDocument):
meta = {'collection' : 'users'}
user_name = StringField()
phone = StringField()
output
[["id", "user_name", "phone"], ["id", "user_name", "phone"]]
why its show only fields name ?

Your query returns a queryset. Use the .to_json() method to convert it.
Depending on what you need from there, you may want to use something like json.loads() to get a python dictionary.
For example:
from model import Users
# This returns <class 'mongoengine.queryset.queryset.QuerySet'>
q_set = Users.objects()
json_data = q_set.to_json()
# You might also find it useful to create python dictionaries
import json
dicts = json.loads(json_data)

Related

Json string written to Kafka using Spark is not converted properly on reading

I read a .csv file to create a data frame and I want to write the data to a kafka topic. The code is the following
df = spark.read.format("csv").option("header", "true").load(f'{file_location}')
kafka_df = df.selectExpr("to_json(struct(*)) AS value").selectExpr("CAST(value AS STRING)")
kafka_df.show(truncate=False)
And the data frame looks like this:
value
"{""id"":""d215e9f1-4d0c-42da-8f65-1f4ae72077b3"",""latitude"":""-63.571457254062715"",""longitude"":""-155.7055842710919""}"
"{""id"":""ca3d75b3-86e3-438f-b74f-c690e875ba52"",""latitude"":""-53.36506636464281"",""longitude"":""30.069167069917597""}"
"{""id"":""29e66862-9248-4af7-9126-6880ceb3b45f"",""latitude"":""-23.767505281795835"",""longitude"":""174.593140405442""}"
"{""id"":""451a7e21-6d5e-42c3-85a8-13c740a058a9"",""latitude"":""13.02054867061598"",""longitude"":""20.328402498420786""}"
"{""id"":""09d6c11d-7aae-4d17-8cd8-183157794893"",""latitude"":""-81.48976715040848"",""longitude"":""1.1995769642056189""}"
"{""id"":""393e8760-ef40-482a-a039-d263af3379ba"",""latitude"":""-71.73949722379649"",""longitude"":""112.59922770487054""}"
"{""id"":""d6db8fcf-ee83-41cf-9ec2-5c2909c18534"",""latitude"":""-4.034680969008576"",""longitude"":""60.59645511854336""}"
After I wrote it to Kafka I want to read it and transform the binary data from column "value" back to json string but the result is that the value contains only the id, not the whole string. Any ideea why?
from pyspark.sql import functions as F
df = consume_from_event_hub(topic, bootstrap_servers, config, consumer_group)
string_df = df.select(F.col("value").cast("string"))
string_df.display()
value
794541bc-30e6-4c16-9cd0-3c5c8995a3a4
20ea5b50-0baa-47e3-b921-f9a3ac8873e2
598d2fc1-c919-4498-9226-dd5749d92fc5
86cd5b2b-1c57-466a-a3c8-721811ab6959
807de968-c070-4b8b-86f6-00a865474c35
e708789c-e877-44b8-9504-86fd9a20ef91
9133a888-2e8d-4a5a-87ce-4a53e63b67fc
cd5e3e0d-8b02-45ee-8634-7e056d49bf3b
the CSV the format is this
id,latitude,longitude
bd6d98e1-d1da-4f41-94ba-8dbd8c8fce42,-86.06318155350924,-108.14300138138589
c39e84c6-8d7b-4cc5-b925-68a5ea406d52,74.20752175171859,-129.9453606091319
011e5fb8-6ab7-4ee9-97bb-acafc2c71e15,19.302250885973592,-103.2154291337162
You need to remove selectExpr("CAST(value AS STRING)") since to_json already returns a string column
from pyspark.sql.functions import col, to_json, struct
df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load(f'{file_location}')
kafka_df = df.select(to_json(struct(col("*"))).alias("value"))
kafka_df.show(truncate=False)
I'm not sure what's wrong with the consumer. That should have worked unless consume_from_event_hub does something specifically to extract the ID column

Django MySQL Query Json field

I have a MySQL database with a table containing a JSON field called things. The JSON looks like this
things = {"value1": "phil", "value2": "jill"}
I have collection of objects that I have pulled from the database via
my_things = Name_table.objects.values
Now, I'd like to filter the my_things collection by one of the JSON fields. I've tried this
my_things =
my_things.filter(things__contains={'value': 'phil'})
which returned an empty collection. I've also tried
my_things = my_things.filter(things={'value': 'phil'})
and
my_things = my_things.filter(things__exact={'value': 'phil'})
I'n using Django 1.10 and MySQL 5.7
Thoughts?
It depends on how exactly do you store JSON in field. If you use django-jsonfield, then your things will be string without spaces, with strings inside of quotation marks: '{"value1":"phil","value2":"jill"}'.
Then, via docs:
my_things = my_things.filter(things__contains='"value1":"phil"')
should return your filtered QuerySet, because
>>> tmp_str = '{"value1":"phil","value2":"jill"}'
>>> '"value1":"phil"' in tmp_str
True

filter particular field name and value from field_dict of package django-reversion

I have a function which returns json data as history from Version of reversion.models.
from django.http import HttpResponse
from reversion.models import Version
from django.contrib.admin.models import LogEntry
import json
def history_list(request):
history_list = Version.objects.all().order_by('-revision__date_created')
data = []
for i in history_list:
data.append({
'date_time': str(i.revision.date_created),
'user': str(i.revision.user),
'object': i.object_repr,
'field': i.revision.comment.split(' ')[-1],
'new_value_field': str(i.field_dict),
'type': i.content_type.name,
'comment': i.revision.comment
})
data_ser = json.dumps(data)
return HttpResponse(data_ser, content_type="application/json")
When I run the above snippet I get the output json as
[{"type": "fruits", "field": "colour", "object": "anyobject", "user": "anyuser", "new_value_field": "{'price': $23, 'weight': 2kgs, 'colour': 'red'}", "comment": "Changed colour."}]
From the function above,
'comment': i.revision.comment
returns json as "comment": "changed colour" and colour is the field which I have written in the function to retrieve it from comment as
'field': i.revision.comment.split(' ')[-1]
But i assume getting fieldname and value from field_dict is a better approach
Problem: from the above json list I would like to filter new_field_value and old_value. In the new_filed_value only value of colour.
Getting the changed fields isn't as easy as checking the comment, as this can be overridden.
Django-reversion just takes care of storing each version, not comparing.
Your best option is to look at the django-reversion-compare module and its admin.py code.
The majority of the code in there is designed to produce a neat side-by-side HTML diff page, but the code should be able to be re-purposed to generate a list of changed fields per object (as there can be more than one changed field per version).
The code should* include a view independent way to get the changed fields at some point, but this should get you started:
from reversion_compare.admin import CompareObjects
from reversion.revisions import default_revision_manager
def changed_fields(obj, version1, version2):
"""
Create a generic html diff from the obj between version1 and version2:
A diff of every changes field values.
This method should be overwritten, to create a nice diff view
coordinated with the model.
"""
diff = []
# Create a list of all normal fields and append many-to-many fields
fields = [field for field in obj._meta.fields]
concrete_model = obj._meta.concrete_model
fields += concrete_model._meta.many_to_many
# This gathers the related reverse ForeignKey fields, so we can do ManyToOne compares
reverse_fields = []
# From: http://stackoverflow.com/questions/19512187/django-list-all-reverse-relations-of-a-model
changed_fields = []
for field_name in obj._meta.get_all_field_names():
f = getattr(
obj._meta.get_field_by_name(field_name)[0],
'field',
None
)
if isinstance(f, models.ForeignKey) and f not in fields:
reverse_fields.append(f.rel)
fields += reverse_fields
for field in fields:
try:
field_name = field.name
except:
# is a reverse FK field
field_name = field.field_name
is_reversed = field in reverse_fields
obj_compare = CompareObjects(field, field_name, obj, version1, version2, default_revision_manager, is_reversed)
if obj_compare.changed():
changed_fields.append(field)
return changed_fields
This can then be called like so:
changed_fields(MyModel,history_list_item1, history_list_item2)
Where history_list_item1 and history_list_item2 correspond to various actual Version items.
*: Said as a contributor, I'll get right on it.

Removing Unnecessary JSON fields using SPARK (SQL)

I'm a new spark user currently playing around with Spark and some big data and I have a question related to Spark SQL or more formally the SchemaRDD. I'm reading a JSON file containing data about some weather forecasts and I'm not really interested in all of the fields that I have ... I only want 10 fields out of 50+ fields returned for each record. Is there a way (similar to filter) that I can use to specify the names of some fields that I want remove from spark.
Just a small descriptive example. Consider I have the Schema "Person" with 3 fields "Name", "Age", and "Gender" and I'm not interested in the "Age" field and wold like to remove it. Can I use spark some how to do that. ? Thanks
If you are using Spark 1.2, you can do the following (using Scala)...
If you already know what fields you want to use, you can construct the schema for these fields and apply this schema to the JSON dataset. Spark SQL will return a SchemaRDD. Then, you can register it and query it as a table. Here is a snippet...
// sc is an existing SparkContext.
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
// The schema is encoded in a string
val schemaString = "name gender"
// Import Spark SQL data types.
import org.apache.spark.sql._
// Generate the schema based on the string of schema
val schema =
StructType(
schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))
// Create the SchemaRDD for your JSON file "people" (every line of this file is a JSON object).
val peopleSchemaRDD = sqlContext.jsonFile("people.txt", schema)
// Check the schema of peopleSchemaRDD
peopleSchemaRDD.printSchema()
// Register peopleSchemaRDD as a table called "people"
peopleSchemaRDD.registerTempTable("people")
// Only values of name and gender fields will be in the results.
val results = sqlContext.sql("SELECT * FROM people")
When you look at the schema of peopleSchemaRDD (peopleSchemaRDD.printSchema()), you will only see name and gender field.
Or, if you want to explore the dataset and determine what fields you want after you see all fields, you can ask Spark SQL to infer the schema for you. Then, you can register the SchemaRDD as a table and use projection to remove unneeded fields. Here is a snippet...
// Spark SQL will infer the schema of the given JSON file.
val peopleSchemaRDD = sqlContext.jsonFile("people.txt")
// Check the schema of peopleSchemaRDD
peopleSchemaRDD.printSchema()
// Register peopleSchemaRDD as a table called "people"
peopleSchemaRDD.registerTempTable("people")
// Project name and gender field.
sqlContext.sql("SELECT name, gender FROM people")
You can specify what fields you would like to have in the schemaRDD. Below is an example. Create a case class, with only the fields that you need. Read the data into an rdd, then specify the only the fileds that you need(in the same order as you have specified the schema in the case class).
Sample Data: People.txt
foo,25,M
bar,24,F
Code:
case class Person(name: String, gender: String)
val people = sc.textFile("People.txt").map(_.split(",")).map(p => Person(p(0), p(2)))
people.registerTempTable("people")

How to create a JSON file from a NDB database using Python

I would like to generate a simple json file from an database.
I am not an expert in parsing json files using python nor NDB database engine nor GQL.
What is the right query to search the data? see https://developers.google.com/appengine/docs/python/ndb/queries
How should I write the code to generate the JSON using the same schema as the json described here below?
Many thanks for your help
Model Class definition using NDB:
# coding=UTF-8
from google.appengine.ext import ndb
import logging
class Albums(ndb.Model):
"""Models an individual Event entry with content and date."""
SingerName = ndb.StringProperty()
albumName = ndb.StringProperty()
Expected output:
{
"Madonna": ["Madonna Album", "Like a Virgin", "True Blue", "Like a Prayer"],
"Lady Gaga": ["The Fame", "Born This Way"],
"Bruce Dickinson": ["Iron Maiden", "Killers", "The Number of the Beast", "Piece of Mind"]
}
For consistency, model names should by singular (Album not Albums), and property names should be lowercase_with_underscores:
class Album(ndb.Model):
singer_name = ndb.StringProperty()
album_name = ndb.StringProperty()
To generate the JSON as described in your question:
1) Query the Album entities from the datastore:
albums = Album.query().fetch(100)
2) Iterate over them to form a python data structure:
albums_dict = {}
for album in albums:
if not album.singer_name in albums_dict:
albums_dict[album.singer_name] = []
albums_dict[album.singer_name].append(album.album_name)
3) use json.dumps() method to encode to JSON.
albums_json = json.dumps(albums_dict)
Alternatively, you could use the built in to_dict() method:
albums = Album.query().fetch(100)
albums_json = json.dumps([a.to_dict() for a in albums])