Is it possible to load printed mongo document? - json

I know how to dump mongo document and load the dumped string.
from bson import json_util
ds = json_util.dumps(doc1)
doc2 = json_util.loads(ds)
But if the document is printed, how can I load the string as document or json?
"{'_id': ObjectId('62ece11feab8b0600f0d3f6e'), 'requestId': '1660645299359', 'ts': datetime.datetime(2022, 8, 5, 17, 21, 20, 303000)}"

Related

How to push the data from rds to kafka queue in json format

I use kafka topic to receive message from mysql database.I need to write python code to push the data in json format from mysql to kafka topic.My requirement is to get the output in json format but not in raw strings.
Below is the python code to dump the mysql table data to kafka topic in json format.
Code:
connection = mysql.connector.connect(host='xyz.us-east-1.rds.amazonaws.com', database='testdb',user='stdnt', password='pssw123')
cursor=connection.cursor()
statement='SELECT * FROM patients_vital_info'
cursor.execute(statement)
data=cursor.fetchall()
producer = KafkaProducer(bootstrap_servers=['localhost:9092'],
api_version=(0,11,5),value_serializer=lambda x:
json.dumps(x).encode('utf-8'))
for i in data:
producer.send('test',i)
sleep(1)
Output from kafka topic in raw string format:
[3, 69, 175]
[4, 68, 171]
[5, 72, 177]
[1, 78, 162]
[2, 66, 157]
[3, 72, 156]
The output should be pushed in json format while writing the message to kafka queue.
Expected output:
{"bp":140,"heartBeat":73,"Customerid":1}
cursor.fetchall() returns a row iterator, not a dictionary with key-value pairs of column to value. Your data is also, correctly, a JSON array
You'd need to build the JSON yourself if you want to include column names or use Kafka Connect JDBC source / Debezium rather than Python to do exactly what you're looking for

Convert a pipeline_pb2.TrainEvalPipelineConfig to JSON or YAML file for tensorflow object detection API

I want to convert a pipeline_pb2.TrainEvalPipelineConfig to JSON or YAML file format for tensorflow object detection API. I tried converting the protobuf file using :
import tensorflow as tf
from google.protobuf import text_format
import yaml
from object_detection.protos import pipeline_pb2
def get_configs_from_pipeline_file(pipeline_config_path, config_override=None):
'''
read .config and convert it to proto_buffer_object
'''
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
with tf.gfile.GFile(pipeline_config_path, "r") as f:
proto_str = f.read()
text_format.Merge(proto_str, pipeline_config)
if config_override:
text_format.Merge(config_override, pipeline_config)
#print(pipeline_config)
return pipeline_config
def create_configs_from_pipeline_proto(pipeline_config):
'''
Returns the configurations as dictionary
'''
configs = {}
configs["model"] = pipeline_config.model
configs["train_config"] = pipeline_config.train_config
configs["train_input_config"] = pipeline_config.train_input_reader
configs["eval_config"] = pipeline_config.eval_config
configs["eval_input_configs"] = pipeline_config.eval_input_reader
# Keeps eval_input_config only for backwards compatibility. All clients should
# read eval_input_configs instead.
if configs["eval_input_configs"]:
configs["eval_input_config"] = configs["eval_input_configs"][0]
if pipeline_config.HasField("graph_rewriter"):
configs["graph_rewriter_config"] = pipeline_config.graph_rewriter
return configs
configs = get_configs_from_pipeline_file('pipeline.config')
config_as_dict = create_configs_from_pipeline_proto(configs)
But when I try converting this returned dictionary to YAML with yaml.dump(config_as_dict) it says
TypeError: can't pickle google.protobuf.pyext._message.RepeatedCompositeContainer objects
For json.dump(config_as_dict) it says :
Traceback (most recent call last):
File "config_file_parsing.py", line 48, in <module>
config_as_json = json.dumps(config_as_dict)
File "/usr/lib/python3.5/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.5/json/encoder.py", line 198, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.5/json/encoder.py", line 256, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python3.5/json/encoder.py", line 179, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: label_map_path: "label_map.pbtxt"
shuffle: true
tf_record_input_reader {
input_path: "dataset.record"
}
is not JSON serializable
Would appreciate some help here.
JSON can only dump a subset of the python primtivies primitives and dict and list collections (with limitation on self-referencing).
YAML is more powerful, and can be used to dump arbitrary Python objects. But only if those objects can be "investigated" during the representation phase of the dump, which essentially limits that to instances of pure Python classes. For objects created at the C level, one can make explicit dumpers, and if not available Python will try and use the pickle protocol to dump the data to YAML.
Inspecing protobuf on PyPI shows me that there are non-generic wheels available, which is always an indication for some C code optimization. Inspecting one of these files indeed shows a pre-compiled shared object.
Although you make a dict out of the config, this dict can of course only be dumped when all its keys and all its values can be dumped. Since your keys are strings (necessary for JSON), you need to look at each of the values, to find the one that doesn't dump, and convert that to a dumpable object structure (dict/list for JSON, pure Python class for YAML).
You might want to take a look at Module json_format

How to save twitterscraper output as json file

I read the documentation, but the documentation only mentions saving output as .txt file. I tried to modify the code to save output as JSON.
save as .txt:
from twitterscraper import query_tweets
if __name__ == '__main__':
list_of_tweets = query_tweets("Trump OR Clinton", 10)
#print the retrieved tweets to the screen:
for tweet in query_tweets("Trump OR Clinton", 10):
print(tweet)
#Or save the retrieved tweets to file:
file = open(“output.txt”,”w”)
for tweet in query_tweets("Trump OR Clinton", 10):
file.write(tweet.encode('utf-8'))
file.close()
I tried to modify this to save as JSON:
output = query_tweets("Trump OR Clinton", 10)
jsonfile = open("tweets.json","w")
for tweet in output:
json.dump(tweet,jsonfile)
jsonfile.close()
TypeError: Object of type Tweet is not JSON serializable
But I get the above type error
How can I save output as JSON?
I know that typing command in termminal creates JSON, but I wanted to write a python version.
We'll need to convert each tweet to a dict first, as Python class objects are not serializable as JSON. Looking at the first object we can see the available methods and attributes like this: help(list_of_tweets[0]). Accessing the __dict__ of the first object we see:
# print(list_of_tweets[0].__dict__)
{'user': 'foobar',
'fullname': 'foobar',
'id': '143846459132929',
'url': '/foobar/status/1438420459132929',
'timestamp': datetime.datetime(2011, 12, 5, 23, 59, 53),
'text': 'blah blah',
'replies': 0,
'retweets': 0,
'likes': 0,
'html': '<p class="TweetTextSize...'}
Before we can dump it to json we'll need to convert the datetime objects to strings.
tweets = [t.__dict__ for t in list_of_tweets]
for t in tweets:
t['timestamp'] = t['timestamp'].isoformat()
Then we can use the json module to dump the data to a file.
import json
with open('data.json', 'w') as f:
json.dump(tweets, f)

Python bytes to geojson Point

I have a MySQL database where I have Point type location data and a Django (Django Rest Framework) backend where I am trying to retrieve that data. If I try to get that location data from phpMyAdmin the returned location is something like this POINT(23.89826 90.267535). In my Django backend however, I get a bytes as the returned location. The returned value is something like this
b'\x00\x00\x00\x00\x01\x01\x00\x00\x00\x12N\x0b^\xf4\xe57#C\xe2\x1eK\x1f\x91V#'
The database uses utf8mb4_unicode_ci collation.
If I try to convert the returned bytes to a string with .decode('utf-8') I get UnicodeDecodeError
>>> s = b'\x00\x00\x00\x00\x01\x01\x00\x00\x00\x12N\x0b^\xf4\xe57#C\xe2\x1eK\x1f\x91V#'
>>> s.decode('utf-8')
Traceback (most recent call last):
File "<console>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf4 in position 13: invalid continuation byte
I get the same bytes array even if I perform a raw query from Django with the MySQL function St_AsGeoJson(location).
I then tried geojson. When I feed that bytes to geojson.Point() I get a geojson back but instead of 2 floats the coordinates array consists 25 integer values.
>>> s = b'\x00\x00\x00\x00\x01\x01\x00\x00\x00\x12N\x0b^\xf4\xe57#C\xe2\x1eK\x1f\x91V#'
>>> geojson.Point(s)
{"coordinates": [0, 0, 0, 0, 1, 1, 0, 0, 0, 18, 78, 11, 94, 244, 229, 55, 64, 67, 226, 30, 75, 31, 145, 86, 64], "type": "Point"}
How can I retrieve the Point data from the bytes or this geojson?
I had this problem because I was using plain Django and Django models doesn't have a field type that deals with Geo data. I was using a CharField with a max_length=255 and then tried to parse whatever that CharField retrieved from the database. I have solved the problem by using GeoDjango and Django REST Framework GIS. Django REST Framework GIS is not necessary. I used it because I am using Django REST Framework and it outputs the Geo data in a nice format.
Steps were to
Install GDAL(Geospatial Data Abstraction Library)
sudo apt-get install gdal-bin
sudo apt-get install python3-gdal
Add django.contrib.gis and rest_framework_gis to settings.INSTALLED_APPS
Set GDAL_LIBRARY_PATH in settings, in my case it's GDAL_LIBRARY_PATH = os.getenv('GDAL_LIBRARY_PATH')
Update model import from from django.db import models to from django.contrib.gis.db import models
Update the model to use a Geo field. More: https://docs.djangoproject.com/en/2.1/ref/contrib/gis/model-api/
Links
https://docs.djangoproject.com/en/2.1/ref/contrib/gis/
https://github.com/djangonauts/django-rest-framework-gis
https://github.com/domlysz/BlenderGIS/wiki/How-to-install-GDAL

Php json array into Python3

I have a php script that outputs a json array that looks like this...
[{"year":"2016","Month":"Apr","the_days":"16, 29, 30"},
{"year":"2016","Month":"May","the_days":"13, 27"},
{"year":"2016","Month":"Jun","the_days":"10, 11, 24"},
{"year":"2016","Month":"Jul","the_days":"08, 22, 23"},
{"year":"2016","Month":"Aug","the_days":"06, 20"},
{"year":"2016","Month":"Sep","the_days":"02, 03, 16, 17, 30"},
{"year":"2016","Month":"Oct","the_days":"01, 14, 15, 29"},
{"year":"2016","Month":"Nov","the_days":"25"},
{"year":"2016","Month":"Dec","the_days":"09, 10, 23, 24"}]
I'm trying to put together some Python that will (eventually) output something like....
Apr: 16, 29, 30
May: 13, 27
//etc
...but I'm not having any luck pulling the array out.
This is code that I'm using in Python3 (that I've pulled together from other Stack questions that I've searched for).
import urllib.request
import json
response = urllib.request.urlopen('http://www.captainobviousobviously.co.uk/private/Apijson.php')
content = response.read()
data = json.load(content.decode('utf-8'))
print(data)
This is the error that I'm getting...
Traceback (most recent call last):
File "/home/pi/Python/availableDates.py", line 6, in <module>
data = json.load(content.decode('utf-8'))
File "/usr/lib/python3.4/json/__init__.py", line 265, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
I'm not really sure how to fix it.
Replace
data = json.load(content.decode('utf-8'))
with
data = json.loads(content.decode('utf-8'))
'load' is for files and 'loads' for strings.
Refer What is the difference between json.dumps and json.load?.
As for the code for your problem
for i in data:
print (str(i['Month'])+":"+str(i['the_days']))
Use json.loads instead. load is for loading from a stream, such as a file, whereas loads loads from a string.
data = json.loads(content.decode('utf-8'))
From the Python documentation:
json.load
Deserialize fp (a .read()-supporting file-like object containing a JSON document) to a Python object using this conversion table.
A string isn't a "file-like object", which is why you get your error - the JSON is trying to call .read on the string, but that doesn't exist.
You need to use json.loads(<json str>). If you want you can do the following
content = response.read().decode()
data = json.loads(content)
for d in data:
print(d["Month"], d["the_days"], sep=":")