I need to create some JSON files for exporting data from a Django system to Google Big Query.
The problem is that Google BQ imposes some characteristics in the JSON file, for example, that each object must be in a different line.
json.dumps writes a stringified version of the JSON, so it is not useful for me.
Django serializes writes better JSON, but it put all in one line. All the information I found about pretty-printing is about json.dumps, which I cannot use.
I will like to know if anyone knows a way to create a JSON file in the format required by Big Query.
Example:
JSONSerializer = serializers.get_serializer("json")
json_serializer = JSONSerializer()
data_objects = DataObject.objects.all()
with open("dataobjects.json", "w") as out:
json_serializer.serialize(data_objects, stream=out)
json.dumps is OK. You have to use indent like this.
import json
myjson = '{"latitude":48.858093,"longitude":2.294694}'
mydata = json.loads(myjson)
print(json.dumps(mydata, indent=4, sort_keys=True))
Output:
{
"latitude": 48.858093,
"longitude": 2.294694
}
Related
I have uploaded a *.mat file that contains a 'struct' to my jupyter lab using:
from pymatreader import read_mat
data = read_mat(mat_file)
Now I have a multi-dimensional dictionary, for example:
data['Forces']['Ss1']['flap'].keys()
Gives the output:
dict_keys(['lf', 'rf', 'lh', 'rh'])
I want to convert this into a JSON file, exactly by the keys that already exist, without manually do so because I want to perform it to many *.mat files with various key numbers.
EDIT:
Unfortunately, I no longer have access to MATLAB.
An example for desired output would look something like this:
json_format = {
"Forces": {
"Ss1": {
"flap": {
"lf": [1,2,3,4],
"rf": [4,5,6,7],
"lh": [23 ,5,6,654,4],
"rh": [4 ,34 ,35, 56, 66]
}
}
}
}
ANOTHER EDIT:
So after making lists of the subkeys (I won't elaborate on it), I did this:
FORCES = []
for ind in individuals:
for force in forces:
for wing in wings:
FORCES.append({
ind: {
force: {
wing: data['Forces'][ind][force][wing].tolist()
}
}
})
Then, to save:
with open(f'{ROOT_PATH}/Forces.json', 'w') as f:
json.dump(FORCES, f)
That worked but only because I looked manually for all of the keys... Also, for some reason, I have squared brackets at the beginning and at the end of this json file.
The json package will output dictionaries to JSON:
import json
with open('filename.json', 'w') as f:
json.dump(data, f)
If you are using MATLAB-R2016b or later, and want to go straight from MATLAB to JSON check out JSONENCODE and JSONDECODE. For your purposes JSONENCODE
encodes data and returns a character vector in JSON format.
MathWorks Docs
Here is a quick example that assumes your data is in the MATLAB variable test_data and writes it to a file specified in the variable json_file
json_data = jsonencode(test_data);
writematrix(json_data,json_file);
Note: Some MATLAB data formats cannot be translate into JSON data due to limitations in the JSON specification. However, it sounds like your data fits well with the JSON specification.
I have the file log.txt with following data:
{"__TIMESTAMP":"2020-07-09T19:05:20.858013","__LABEL":"web_channel","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"Port web_channel/diagnose_client not connected!"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229737","__LABEL":"context_logging_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"startup component"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229761","__LABEL":"context_logging_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"activate component"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229793","__LABEL":"context_monitoring_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"startup component"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229805","__LABEL":"context_monitoring_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"activate component"}
If I define a single row, I can convert in real JSON type:
import json
import datetime
from json import JSONEncoder
log = {
"__TIMESTAMP":"2020-07-09T19:05:21.229737",
"__LABEL":"context_logging_addon",
"__LEVEL":4,
"__DIAGNOSE_SLOT":"",
"msg":"Port web_channel/diagnose_client not connected!"}
class DateTimeEncoder(JSONEncoder):
#Override the default method
def default(self,obj):
if isinstance(obj,(datetime.date,datetime.datetime)):
return obj.isoformat()
print("Printing to check how it will look like")
print(DateTimeEncoder().encode(log))
I have the following output, which format is perfect JSON.
Printing to check how it will look like
{"__TIMESTAMP": "2020-07-09T19:05:21.229737", "__LABEL": "context_logging_addon", "__LEVEL": 4, "__DIAGNOSE_SLOT": "", "msg": "Port web_channel/diagnose_client not connected!"}
But I don't know how should I open the log.txt file, read the data to convert into JSON without any failure.
Could you help me please? Thanks in advance.
Let us say your log.txt file is in the same directory than your .py file.
Just open it with with open(... and then parse your file according to your syntax to create a list of dictionaries (each item corresponding to a row, then parse each dictionary as you're currently doing).
Here is how you could open and parse your file:
with open("log.txt","r") as file:
all_text = file.readlines()
parsed_line = list()
for text in all_text:
parsed_line.append(dict([item.split('":"') for item in text[2:-2].split('","')]))
If you have any question about the parsing let me know. This one is pretty straightforward.
Hope this helped you.
Try it this way:
logs = """[your log file above]"
for log in logs.splitlines():
print(DateTimeEncoder().encode(log))
Output:
"{\"__TIMESTAMP\":\"2020-07-09T19:05:20.858013\",\"__LABEL\":\"web_channel\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"Port web_channel/diagnose_client not connected!\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229737\",\"__LABEL\":\"context_logging_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"startup component\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229761\",\"__LABEL\":\"context_logging_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"activate component\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229793\",\"__LABEL\":\"context_monitoring_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"startup component\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229805\",\"__LABEL\":\"context_monitoring_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"activate component\"}"
I store a blob of Json in the datastore using JsonProperty.
I don't know the structure of the json data.
I am using endpoints proto datastore in order to retrieve my data.
The probleme is the json property is encoded in base64 and I want a plain json object.
For the example, the json data will be:
{
first: 1,
second: 2
}
My code looks something like:
import endpoints
from google.appengine.ext import ndb
from protorpc import remote
from endpoints_proto_datastore.ndb import EndpointsModel
class Model(EndpointsModel):
data = ndb.JsonProperty()
#endpoints.api(name='myapi', version='v1', description='My Sample API')
class DataEndpoint(remote.Service):
#Model.method(path='mymodel2', http_method='POST',
name='mymodel.insert')
def MyModelInsert(self, my_model):
my_model.data = {"first": 1, "second": 2}
my_model.put()
return my_model
#Model.method(path='mymodel/{entityKey}',
http_method='GET',
name='mymodel.get')
def getMyModel(self, model):
print(model.data)
return model
API = endpoints.api_server([DataEndpoint])
When I call the api for getting a model, I get:
POST /_ah/api/myapi/v1/mymodel2
{
"data": "eyJzZWNvbmQiOiAyLCAiZmlyc3QiOiAxfQ=="
}
where eyJzZWNvbmQiOiAyLCAiZmlyc3QiOiAxfQ== is the base64 encoded of {"second": 2, "first": 1}
And the print statement give me: {u'second': 2, u'first': 1}
So, in the method, I can explore the json blob data as a python dict.
But, in the api call, the data is encoded in base64.
I expeted the api call to give me:
{
'data': {
'second': 2,
'first': 1
}
}
How can I get this result?
After the discussion in the comments of your question, let me share with you a sample code that you can use in order to store a JSON object in Datastore (it will be stored as a string), and later retrieve it in such a way that:
It will show as plain JSON after the API call.
You will be able to parse it again to a Python dict using eval.
I hope I understood correctly your issue, and this helps you with it.
import endpoints
from google.appengine.ext import ndb
from protorpc import remote
from endpoints_proto_datastore.ndb import EndpointsModel
class Sample(EndpointsModel):
column1 = ndb.StringProperty()
column2 = ndb.IntegerProperty()
column3 = ndb.StringProperty()
#endpoints.api(name='myapi', version='v1', description='My Sample API')
class MyApi(remote.Service):
# URL: .../_ah/api/myapi/v1/mymodel - POSTS A NEW ENTITY
#Sample.method(path='mymodel', http_method='GET', name='Sample.insert')
def MyModelInsert(self, my_model):
dict={'first':1, 'second':2}
dict_str=str(dict)
my_model.column1="Year"
my_model.column2=2018
my_model.column3=dict_str
my_model.put()
return my_model
# URL: .../_ah/api/myapi/v1/mymodel/{ID} - RETRIEVES AN ENTITY BY ITS ID
#Sample.method(request_fields=('id',), path='mymodel/{id}', http_method='GET', name='Sample.get')
def MyModelGet(self, my_model):
if not my_model.from_datastore:
raise endpoints.NotFoundException('MyModel not found.')
dict=eval(my_model.column3)
print("This is the Python dict recovered from a string: {}".format(dict))
return my_model
application = endpoints.api_server([MyApi], restricted=False)
I have tested this code using the development server, but it should work the same in production using App Engine with Endpoints and Datastore.
After querying the first endpoint, it will create a new Entity which you will be able to find in Datastore, and which contains a property column3 with your JSON data in string format:
Then, if you use the ID of that entity to retrieve it, in your browser it will show the string without any strange encoding, just plain JSON:
And in the console, you will be able to see that this string can be converted to a Python dict (or also a JSON, using the json module if you prefer):
I hope I have not missed any point of what you want to achieve, but I think all the most important points are covered with this code: a property being a JSON object, store it in Datastore, retrieve it in a readable format, and being able to use it again as JSON/dict.
Update:
I think you should have a look at the list of available Property Types yourself, in order to find which one fits your requirements better. However, as an additional note, I have done a quick test working with a StructuredProperty (a property inside another property), by adding these modifications to the code:
#Define the nested model (your JSON object)
class Structured(EndpointsModel):
first = ndb.IntegerProperty()
second = ndb.IntegerProperty()
#Here I added a new property for simplicity; remember, StackOverflow does not write code for you :)
class Sample(EndpointsModel):
column1 = ndb.StringProperty()
column2 = ndb.IntegerProperty()
column3 = ndb.StringProperty()
column4 = ndb.StructuredProperty(Structured)
#Modify this endpoint definition to add a new property
#Sample.method(request_fields=('id',), path='mymodel/{id}', http_method='GET', name='Sample.get')
def MyModelGet(self, my_model):
if not my_model.from_datastore:
raise endpoints.NotFoundException('MyModel not found.')
#Add the new nested property here
dict=eval(my_model.column3)
my_model.column4=dict
print(json.dumps(my_model.column3))
print("This is the Python dict recovered from a string: {}".format(dict))
return my_model
With these changes, the response of the call to the endpoint looks like:
Now column4 is a JSON object itself (although it is not printed in-line, I do not think that should be a problem.
I hope this helps too. If this is not the exact behavior you want, maybe should play around with the Property Types available, but I do not think there is one type to which you can print a Python dict (or JSON object) without previously converting it to a String.
I have the following JSON file.
{
"reviewerID": "ABC1234",
"productID": "ABCDEF",
"reviewText": "GOOD!",
"rating": 5.0,
},
{
"reviewerID": "ABC5678",
"productID": "GFMKDS",
"reviewText": "Not bad!",
"rating": 3.0,
}
I want to parse without SparkSQL and use a JSON parser.
The result of parse that i want is textfile.
ABC1234::ABCDEF::5.0
ABC5678::GFMKDS::3.0
How to parse the json file by using json parser in spark scala?
tl;dr Spark SQL supports JSONs in the format of one JSON per file or per line. If you'd like to parse multi-line JSONs that can appear together in a single file, you'd have to write your own Spark support as it's not currently possible.
A possible solution is to ask the "writer" (the process that writes the files to be nicer and save one JSON per file) that would make your life much sweeter.
If that does not give you much, you'd have to use mapPartitions transformation with your parser and somehow do the parsing yourself.
val input: RDD[String] = // ... load your JSONs here
val jsons = jsonRDD.mapPartitions(json => // ... use your JSON parser here)
I have a json object that gets passed into a save function as
{
"markings": {
"headMarkings": "Brindle",
"leftForeMarkings": "",
"rightForeMarkings": "sock",
"leftHindMarkings": "sock",
"rightHindMarkings": "",
"otherMarkings": ""
}
** EDIT **
The system parses it and passes it to my function as a mapping. I don't actually have the JSON, although it wouldn't be difficult to build up the JSON myself, it just seems like overkill
* END EDIT **
The toString() function ends up putting the results into the database as
"[rightForeMarkings:, otherMarkings:, leftForeMarkings:sock, leftHindMarkings:sock, rightHindMarkings:, headMarkings:brindle]"
I then want to save that as a string (fairly easy) by calling
params.markings.toString()
From here, I save the info and return the updated information.
My issue is that since I am storing the object in the DB as a string, I can't seem to get the markings back out as a map (to then be converted to JSON).
I have tried a few different things to no avail, although it is completely possible that I went about something incorrectlywith these...
Eval.me(Item.markings)
evaluate(Item.markings)
Item.markings.toList()
Thanks in advance for the help!
Throwing my tests.
Using JSON converters in Grails, I think this should be the approach: (synonymous to #JamesKleeh and #GrailsGuy)
def json = '''{
"markings": {
"headMarkings": "Brindle",
"leftForeMarkings": "",
"rightForeMarkings": "sock",
"leftHindMarkings": "sock",
"rightHindMarkings": "",
"otherMarkings": ""
}
}'''
def jsonObj = grails.converters.JSON.parse(json)
//This is your JSON object that should be passed in to the method
print jsonObj //[markings:[rightForeMarkings:sock, otherMarkings:, leftForeMarkings:, leftHindMarkings:sock, rightHindMarkings:, headMarkings:Brindle]]
def jsonStr = jsonObj.toString()
//This is the string which should be persisted in db
assert jsonStr == '{"markings":{"rightForeMarkings":"sock","otherMarkings":"","leftForeMarkings":"","leftHindMarkings":"sock","rightHindMarkings":"","headMarkings":"Brindle"}}'
//Get back json obj from json str
def getBackJsobObj = grails.converters.JSON.parse(jsonStr)
assert getBackJsobObj.markings.leftHindMarkings == 'sock'
If I understand correctly, you want to convert a String to a JSON object? You can actually bypass converting it to a map, and parse it directly as a JSON object:
import grails.converters.JSON
def json = JSON.parse(Item.markings)
This will give you your entire JSON object, and then you can just reference the values as you would a map.
Edit #2:
So apparently there is no "safe" way to convert that string back to a map without something custom. I would recommend saving the structure in the database as it originally comes in. If you can do that, then all you would need is JSON.parse()