Sort and Select Top 5 JSON values - json

I have a two-fold issue and looking for clues as to how to approach it.
I have a json file that is formatted as such:
{
"code": 2000,
"data": {
"1": {
"attribute1": 40,
"attribute2": 1.4,
"attribute3": 5.2,
"attribute4": 124
"attribute5": "65.53%"
},
"94": {
"attribute1": 10,
"attribute2": 4.4,
"attribute3": 2.2,
"attribute4": 12
"attribute5": "45.53%"
},
"96": {
"attribute1": 17,
"attribute2": 9.64,
"attribute3": 5.2,
"attribute4": 62
"attribute5": "51.53%"
}
},
"message": "SUCCESS"
}
My goals are to:
I would first like to sort the data by any of the attributes.
There are around 100 of these, I would like to grab the top 5 (depending on how they are sorted), then...
Output the data in a table e.g.:
These are sorted by: attribute5
---
attribute1 | attribute2 | attribute3 | attribute4 | attribute5
40 |1.4 |5.2|124|65.53%
17 |9.64|5.2|62 |51.53%
10 |4.4 |2.2|12 |45.53%
*also, attribute5 above is a string value
Admittedly, my knowledge here is very limited.
I attempted to mimick the method used here:
python sort list of json by value
I managed to open the file and I can extract the key values from a sample row:
import json
jsonfile = path-to-my-file.json
with open(jsonfile) as j:
data=json.load(j)
k = data["data"]["1"].keys()
print(k)
total=data["data"]
for row in total:
v = data["data"][str(row)].values()
print(v)
this outputs:
dict_keys(['attribute1', 'attribute2', 'attribute3', 'attribute4', 'attribute5'])
dict_values([1, 40, 1.4, 5.2, 124, '65.53%'])
dict_values([94, 10, 4.4, 2.2, 12, '45.53%'])
dict_values([96, 17, 9.64, 5.2, 62, '51.53%'])
Any point in the right direction would be GREATLY appreciated.
Thanks!

If you don't mind using pandas you could do it like this
import pandas as pd
rows = [v for k,v in data["data"].items()]
df = pd.DataFrame(rows)
# then to get the top 5 values by attribute can choose either ascending
# or descending with the ascending keyword and head prints the top 5 rows
df.sort_values('attribute1', ascending=True).head()
This will allow you to sort by any attribute you need at any time and print out a table.
Which will produce output like this depending on what you sort by
attribute1 attribute2 attribute3 attribute4 attribute5
0 40 1.40 5.2 124 65.53%
1 10 4.40 2.2 12 45.53%
2 17 9.64 5.2 62 51.53%

I'll leave this answer here in case you don't want to use pandas but the answer from #MatthewBarlowe is way less complicated and I recommend that.
For sorting by a specific attribute, this should work:
import json
SORT_BY = "attribute4"
with open("test.json") as j:
data = json.load(j)
items = data["data"]
sorted_keys = list(sorted(items, key=lambda key: items[key][SORT_BY], reverse=True))
Now, sorted_keys is a list of the keys in order of the attribute they were sorted by.
Then, to print this as a table, I used the tabulate library. The final code for me looked like this:
from tabulate import tabulate
import json
SORT_BY = "attribute4"
with open("test.json") as j:
data = json.load(j)
items = data["data"]
sorted_keys = list(sorted(items, key=lambda key: items[key][SORT_BY], reverse=True))
print(f"\nSorted by: {SORT_BY}")
print(
tabulate(
[
[sorted_keys[i], *items[sorted_keys[i]].values()]
for i, _ in enumerate(items)
],
headers=["Column", *items["1"].keys()],
)
)
When sorting by 'attribute5', this outputs:
Sorted by: attribute5
Column attribute1 attribute2 attribute3 attribute4 attribute5
-------- ------------ ------------ ------------ ------------ ------------
1 40 1.4 5.2 124 65.53%
96 17 9.64 5.2 62 51.53%
94 10 4.4 2.2 12 45.53%

Related

Pandas Dataframe row as a formatted JSON output

I have data frame which I am trying to group by customer and print an output ,the to_json is not giving the format. Also I need to create separate json file for each customer, I think using the pandas generic method custom json formatting is not possible, what should be the direction I should be looking for.
I tried to group by customer_id , first_name and last_name and then set them as index and tried the orientation as index value but that didn't really worked out.
import pandas as pd
data = [{'customer_id': 1, 'first_name':'John', 'last_name':'Doe', 'amount':100, 'sub_amount':50,'total': 150,'product':'tool box'},
{'customer_id': 1, 'first_name':'John', 'last_name':'Doe', 'amount':50, 'sub_amount':50,'total': 100,'product':'light'},
{'customer_id': 2, 'first_name':'Jane', 'last_name':'Doe', 'amount':200, 'sub_amount':50,'total': 250,'product':'iron box'},
{'customer_id': 2, 'first_name':'Jane', 'last_name':'Doe', 'amount':50, 'sub_amount':50,'total': 100,'product':'led'}
]
df = pd.DataFrame(data)
df
customer_id first_name last_name amount sub_amount total product
0 1 John Doe 100 50 150 tool box
1 1 John Doe 50 50 100 light
2 2 Jane Doe 200 50 250 iron box
3 2 Jane Doe 50 50 100 led
expected output
{
"frist_name": "John",
"last_name": "Doe",
"Product_Details": {
"too box": {
"total": 150,
"amount": 100
},
"light": {
"total": 100,
"amount": 50
}
}
}
clients={}
for index,row in df.iterrows():
clients.setdefault(row['customer_id'], {'first_name': row['first_name'],
'last_name': row['last_name']})
clients[row['customer_id']].setdefault('Product_Details',{})[row['product']] = \
{'total': row['total'], 'amount': row['amount']}
print(json.dumps(clients[1],indent=4))

Pandas converts string-typed JSON value to INT

I have list of objects as JSON. Each object has two properties: id(string) and arg(number).
When I use pandas.read_json(...), the resulting DataFrame has the id interpreted as number as well, which causes problems, since information is lost.
import pandas as pd
json = '[{ "id" : "1", "arg": 1 },{ "id" : "1_1", "arg": 2}, { "id" : "11", "arg": 2}]'
df = pd.read_json(json)
I'd expect to have a DataFrame like this:
id arg
0 "1" 1
1 "1_1" 2
2 "11" 2
I get
id arg
0 1 1
1 11 2
2 11 2
and suddenly, the once unique id is not so unique anymore.
How can I tell pandas to stop doing that?
My search so far only yielded results, where people where trying to achive the opposite - having columns of string beeing interpreted as numbers. I totally don't want to achive that in this case!
If you set the dtype parameter to False, read_json will not infer the types automatically:
df = pd.read_json(json, dtype=False)
Use dtype parameter for preventing cast id to numbers:
df = pd.read_json(json, dtype={'id':str})
print (df)
id arg
0 1 1
1 1_1 2
2 11 2
print (df.dtypes)
id object
arg int64
dtype: object

Json to pandas dataframe with slight modification

I have a json data as below:
{
"X": "abc",
"Y": 1,
"Z": 4174,
"t_0":
{
"M": "bm",
"T": "sp",
"CUD": 4,
"t_1": '
{
"CUD": "1",
"BBC": "09",
"CPR": -127
},
"EVV": "10.7000",
"BBC": -127,
"CMIX": "25088"
},
"EYR": "sp"
}
The problem is converting to python data-frame creates two columns of same name CUD. One is under t_0 and another is under t_1. But both are different events. How can I append json tag name to column names so that I can differentiate two columns of same name. Something like t_0_CUD , t_1_CUD.
My code is below:
df = pd.io.json.json_normalize(json_data)
df.columns = df.columns.map(lambda x: x.split(".")[-1])
If use only first part of solution it return what you need, only instead _ are used .:
df = pd.io.json.json_normalize(json_data)
print (df)
X Y Z EYR t_0.M t_0.T t_0.CUD t_0.t_1.CUD t_0.t_1.BBC t_0.t_1.CPR \
0 abc 1 4174 sp bm sp 4 1 09 -127
t_0.EVV t_0.BBC t_0.CMIX
0 10.7000 -127 25088
If need _:
df.columns = df.columns.str.replace('\.','_')
print (df)
X Y Z EYR t_0_M t_0_T t_0_CUD t_0_t_1_CUD t_0_t_1_BBC t_0_t_1_CPR \
0 abc 1 4174 sp bm sp 4 1 09 -127
t_0_EVV t_0_BBC t_0_CMIX
0 10.7000 -127 25088

Python csv conversion to specific nested json

I have a dataframe (csv file loaded into Pandas) as below :
col1 col2 col3 col4 col5 name amount
1 USA 4000 Air 60 Education 200
1 USA 4000 Air 60 Car 100
1 USA 4000 Air 60 Restaurant 100
2 UK 5000 Cash 50 Government 125
2 UK 5000 Cash 50 Restaurant 135
Now, i need to convert it into nested json format. For one record ( Col1, col2, col3, col4 - consider for grouping )
Below Json format is expected output :
{
“col5”: 60,
“col4”: [
{
“name”: “Air”
}
],
“expenses”: [
{
“amount”: 200,
“name”: “Education”
},
{
“amount”: Car,
“name”: “Car”
},
{
“amount”: 100,
“name”: “Restaurant”
}
],
“col1”: 1,
“col2”: “USA”,
“col3”: “4000”
}
I understand, its going to be bit complex code... But is there some one to help ?
Thanks in advance !!
I believe you need:
For dictionary:
d = (df.groupby(['col1','col2','col3','col4','col5'])
.apply(lambda x: dict(zip(x['name'], x['amount'])))
.reset_index(name='expenses')
.to_dict(orient='records')
)
print (d)
For json:
j = (df.groupby(['col1','col2','col3','col4','col5'])
.apply(lambda x: dict(zip(x['name'], x['amount'])))
.reset_index(name='expenses')
.to_json(orient='records')
)
print (j)

Read JSONs in R to data.frame

I have list of JSON values (actually it's a text file where every line is one JSON object). Like this:
{ "id": 1, "name": "john", "age": 18, "education": "master" }
{ "id": 2, "name": "jack", "job": "clerk" }
...
Some of the values can be missing (e.g. first item doesn't have "job" value and second item doesn't have "education" and "age").
I need to create data frame in R and fill all missing column values as NAs (if field with unique name exists in at least one row). How to achieve this easier?
What I already done - I installed "rjson" package and parsed these lines to R lists. Let's assume that lines variable is a character vector of lines.
library(rjson)
lines <- // initialize "lines" var here
jsons <- sapply(lines, fromJSON)
"jsons" variable became "list of lists" (every JSON object is converted to list in R terminology). How to convert it to data.frame?
I want to see the following data frame for the example I provided:
"id" | "name" | "age" | "education" | "job"
-------------------------------------------
1 | "john" | 18 | "master" | NA
2 | "jack | NA | NA | "clerk"
From plyr you can use rbind.fill to add the NAs for you
library(plyr)
rbind.fill(sapply(jsons, data.frame), jsons)
# id name age education job
# 1 1 john 18 master <NA>
# 2 2 jack NA <NA> clerk
or from data.table
library(data.table)
rbindlist(jsons, fill=T)
and dplyr
library(dplyr)
bind_rows(sapply(jsons, data.frame))
Future me, correcting past me's mistakes. It would make more sense to use jsonlite's stream_in
stream_in(txtfile)
# To test on `txt` from below, try:
# stream_in(textConnection(txt))
# Found 2 records...
# Imported 2 records. Simplifying...
# id name age education job
#1 NA john 18 master <NA>
#2 2 jack NA <NA> clerk
Use the jsonlite package's fromJSON function, after making a few inline edits to your original text data (I've also edited the first piece of id data to include an explicit null value, to show that it deals with this):
fromJSON(paste0("[", gsub("}\n", "},\n", txt), "]"))
# id name age education job
#1 NA john 18 master <NA>
#2 2 jack NA <NA> clerk
All I did was add a little formatting to wrap all the JSON lines together in [ and ] and add a comma at the end of each closing } - resulting in an output like the below which can be processed all at once by jsonlite::fromJSON:
[{"1":"one"},{"2":"two"}]
Where txt was your lines of data as presented, with a null in the id variable:
txt <- "{ \"id\": null, \"name\": \"john\", \"age\": 18, \"education\": \"master\" }
{ \"id\": 2, \"name\": \"jack\", \"job\": \"clerk\" }"