Reading JSON data in a shell script - json

I have a JSON file containing data about some images:
{
"imageHeight": 1536,
"sessionID": "4340cc80cb532ecf106a7077fc2a166cb84e2c21",
"bottomHeight": 1536,
"imageID": 1,
"crops": 0,
"viewPortHeight": 1296,
"imageWidth": 2048,
"topHeight": 194,
"totalHeight": 4234
}
I wish to process these values in a simple manner in a shell script. I searched online but was not able to find any simple material to understand.
EDIT : What I wish to do with the values ?
I'm using convert (Imagemagick) to process the images. So, the whole workflow is something like. Read the an entry say crop from a line in the json file and then use the value in cropping the image :
convert -crop [image width from json]x[image height from json]+0+[crop value from json] [session_id from json]-[imageID from json].png [sessionID]-[ImageID]-cropped.png

I would recommend using jq. For example, to get the imageHeight, you can use:
jq ".imageHeight" data.json
Output:
1536
If you want to store the value in a shell variable use:
variable_name=$(jq ".imageHeight" data.json)

Python-solution
import json
from pprint import pprint
json_data=open('json_file')
data = json.load(json_data)
pprint(data)
data['bottomHeight']
output:
In [28]: pprint(data)
{u'bottomHeight': 1536,
u'crops': 0,
u'imageHeight': 1536,
u'imageID': 1,
u'imageWidth': 2048,
u'sessionID': u'4340cc80cb532ecf106a7077fc2a166cb84e2c21',
u'topHeight': 194,
u'totalHeight': 4234,
u'viewPortHeight': 1296}
In [29]: data['bottomHeight']
Out[29]: 1536

Related

how to convert a text file of JSON to JSON array in Python?

I'm a bit stumped on how to convert a text file of JSON to JSON array in Python?
So I have a text file that is constantly being written to with JSON entries, however the text file does not have bracket to call it as a JSON Array.
{"apple": "2", "orange": "3"}, {"apple": "5", "orange": "6"}, {"apple": "9", "orange": "10"} ...
How can I put that into a JSON array? My end goal is read the latest entry every time there is a new entry added. The JSON file is created by some other program that I have no control over, and its being written to constantly, so I can't just slap brackets to the start and end of the file.
Thanks
After you read in the file, you can treat it as a string, to which you add brackets to. Then, pass that to the json library to decode it as JSON for you.
import json
with open('data.txt') as f:
raw_data = f.read().splitlines()[-1]
list_data = f'[{raw_data}]'
json_data = json.loads(list_data)
print(json_data)
# [{'apple': '2', 'orange': '3'}, {'apple': '5', 'orange': '6'}, {'apple': '9', 'orange': '10'}]
This assumes that each line of the file should become a new array. The code extracts the last line of the file and converts it.

Dask how to open json with list of dicts

I'm trying to open a bunch of JSON files using read_json In order to get a Dataframe as follow
ddf.compute()
id owner pet_id
0 1 "Charlie" "pet_1"
1 2 "Charlie" "pet_2"
3 4 "Buddy" "pet_3"
but I'm getting the following error
_meta = pd.DataFrame(
columns=list(["id", "owner", "pet_id"]])
).astype({
"id":int,
"owner":"object",
"pet_id": "object"
})
ddf = dd.read_json(f"mypets/*.json", meta=_meta)
ddf.compute()
*** ValueError: Metadata mismatch found in `from_delayed`.
My JSON files looks like
[
{
"id": 1,
"owner": "Charlie",
"pet_id": "pet_1"
},
{
"id": 2,
"owner": "Charlie",
"pet_id": "pet_2"
}
]
As far I understand the problem is that I'm passing a list of dicts, so I'm looking for the right way to specify it the meta= argument
PD:
I also tried doing it in the following way
{
"id": [1, 2],
"owner": ["Charlie", "Charlie"],
"pet_id": ["pet_1", "pet_2"]
}
But Dask is wrongly interpreting the data
ddf.compute()
id owner pet_id
0 [1, 2] ["Charlie", "Charlie"] ["pet_1", "pet_2"]
1 [4] ["Buddy"] ["pet_3"]
The invocation you want is the following:
dd.read_json("data.json", meta=meta,
blocksize=None, orient="records",
lines=False)
which can be largely gleaned from the docstring.
meta looks OK from your code
blocksize must be None, since you have a whole JSON object per file and cannot split the file
orient "records" means list of objects
lines=False means this is not a line-delimited JSON file, which is the more common case for Dask (you are not assuming that a newline character means a new record)
So why the error? Probably Dask split your file on some newline character, and so a partial record got parsed, which therefore did not match your given meta.

Problem printing json data from python script

I have a python script that should print json data.
This is what I have in my script:
finaldata = {
"date": datetime.datetime.utcnow().isoformat(),
"voltage_mv":emeter["voltage_mv"],
"current_ma":emeter["current_ma"],
"power_mw":emeter["power_mw"] ,
"energy_wh": emeter["total_wh"],
}
print(finaldata)
I am running the script from Node-RED because I need to send the data to a storage account (in json format of course). The Problem is that the data that is being sent looks like this:
{'power_mw': 0, 'date': '2019-04-16T07:12:19.858159', 'energy_wh': 2, 'voltage_mv': 225045, 'current_ma': 20}
when it should look like this in order to be correctly stored in my storage account:
{"power_mw": 0, "date": '2019-04-16T07:12:19.858159', "energy_wh": 2, "voltage_mv": 225045, "current_ma": 20}
(important for later use, since I already get errors in the storage account).
Does anyone know why this is happening and how I can fix it? Thanks in advance
You should use the python json module and dump your python dict into json:
import json
finaldata = {"power_mw": 0, "date": '2019-04-16T07:12:19.858159',
"energy_wh": 2, "voltage_mv": 225045, "current_ma": 20}
print(json.dumps(finaldata))
JSON Reference
For order check linked OrderedDict
or read the OrderedDict collection Reference

How to take any CSV file and convert it to JSON?(with python as a script engine) [Novice user trying to learn NiFi]

1) There is a CSV file containing the following information (the first row is the header):
first,second,third,total
1,4,9,14
7,5,2,14
3,8,7,18
2) I would like to find the sum of individual rows and generate a final file with a modified header. The final file should look like this:
[
{
"first": 1,
"second": 4,
"third": 9,
"total": 14
},
{
"first": 7,
"second": 5,
"third": 2,
"total": 14
},
{
"first": 3,
"second": 8,
"third": 7,
"total": 18
}
]
But it does not work and I am not sure how to fix this. Can anyone provide me an understanding on how to approach this problem?
NiFi flow:
Although i'm not into Python, by just googling around i think this might do it:
import csv
with open("YOURFILE.csv") as f:
reader = csv.DictReader(f)
data = [r for r in reader]
import json
with open('result.json', 'w') as outfile:
json.dump(data, outfile)
You can use Query Record processor and add new property as
total
select first,second,third,first+second+third total from FLOWFILE
Configure the CsvReader controller service with matching avro schema with int as datatype for all the fields and Json Setwriter controller service,Include total field name so that the output from Query Record processor will be all the columns and the sum of the columns as total.
Connect total relationship from Query Record processor for further processing
Refer to these links regarding Query Record and Configure Record Reader/Writer

Reading a json file into a RDD (not dataFrame) using pyspark

I have the following file: test.json >
{
"id": 1,
"name": "A green door",
"price": 12.50,
"tags": ["home", "green"]
}
I want to load this file into a RDD. This is what I tried:
rddj = sc.textFile('test.json')
rdd_res = rddj.map(lambda x: json.loads(x))
I got an error:
Expecting object: line 1 column 1 (char 0)
I don't completely understand what does json.loads do.
How can I resolve this problem ?
textFile reads data line by line. Individual lines of your input are not syntactically valid JSON.
Just use json reader:
spark.read.json("test.json", multiLine=True)
or (not recommended) whole text files
sc.wholeTextFiles("test.json").values().map(json.loads)