R data.frame to JSON with child nodes / hierarchical - json

I am trying to write a data.frame from R into a JSON file, but in a hierarchical structure with child nodes within them. I found examples and JSONIO but I wasn't able to apply it to my case.
This is the data.frame in R
> DF
Date_by_Month CCG Year Month refYear name OC_5a OC_5b OC_5c
1 2010-01-01 MyTown 2010 01 2009 2009/2010 0 15 27
2 2010-02-01 MyTown 2010 02 2009 2009/2010 1 14 22
3 2010-03-01 MyTown 2010 03 2009 2009/2010 1 6 10
4 2010-04-01 MyTown 2010 04 2010 2010/2011 0 10 10
5 2010-05-01 MyTown 2010 05 2010 2010/2011 1 16 7
6 2010-06-01 MyTown 2010 06 2010 2010/2011 0 13 25
In addtion to writing the data by month, I would also like to create an aggregate child, the 'yearly' one, which holds the sum (for example) of all the months that fall in this year. This is how I would like the JSON file to look like:
[
{
"ccg":"MyTown",
"data":[
{"period":"yearly",
"scores":[
{"name":"2009/2010","refYear":"2009","OC_5a":2, "OC_5b": 35, "OC_5c": 59},
{"name":"2010/2011","refYear":"2010","OC_5a":1, "OC_5b": 39, "OC_5c": 42},
]
},
{"period":"monthly",
"scores":[
{"name":"2009/2010","refYear":"2009","month":"01","year":"2010","OC_5a":0, "OC_5b": 15, "OC_5c": 27},
{"name":"2009/2010","refYear":"2009","month":"02","year":"2010","OC_5a":1, "OC_5b": 14, "OC_5c": 22},
{"name":"2009/2010","refYear":"2009","month":"03","year":"2010","OC_5a":1, "OC_5b": 6, "OC_5c": 10},
{"name":"2009/2010","refYear":"2009","month":"04","year":"2010","OC_5a":0, "OC_5b": 10, "OC_5c": 10},
{"name":"2009/2010","refYear":"2009","month":"05","year":"2010","OC_5a":1, "OC_5b": 16, "OC_5c": 7},
{"name":"2009/2010","refYear":"2009","month":"01","year":"2010","OC_5a":0, "OC_5b": 13, "OC_5c": 25}
]
}
]
},
]
Thank you so much for your help!

Expanding on my comment:
The jsonlite package has a lot of features, but what you're describing doesn't really map to a data frame anymore so I doubt any canned routine has this functionality. Your best bet is probably to convert the data frame to a more general list (FYI data frames are stored internally as lists of columns) with a structure that matches the structure of the JSON exactly, then just use the converter to translate
This is complicated in general but in your case should be fairly simple. The list will be structured exactly like the JSON data:
list(
list(
ccg = "Town1",
data = list(
list(
period = "yearly",
scores = yearly_data_frame_town1
),
list(
period = "monthly",
scores = monthly_data_frame_town1
)
)
),
list(
ccg = "Town2",
data = list(
list(
period = "yearly",
scores = yearly_data_frame_town2
),
list(
period = "monthly",
scores = monthly_data_frame_town2
)
)
)
)
Constructing this list should be a straightforward case of looping over unique(DF$CCG) and using aggregate at each step, to construct the yearly data.
If you need performance, look to either the data.table or dplyr packages to do the looping and aggregating all at once. The former is flexible and performant but a little esoteric. The latter has relatively easy syntax and is similarly performant, but is designed specifically around building pipelines for data frames so it might take some hacking to get it to produce the right output format.

Looks like ssdecontrol has you covered... but here's my solution. Need to loop over unique CCG and Years to create the entire data set...
df <- read.table(textConnection("Date_by_Month CCG Year Month refYear name OC_5a OC_5b OC_5c
2010-01-01 MyTown 2010 01 2009 2009/2010 0 15 27
2010-02-01 MyTown 2010 02 2009 2009/2010 1 14 22
2010-03-01 MyTown 2010 03 2009 2009/2010 1 6 10
2010-04-01 MyTown 2010 04 2010 2010/2011 0 10 10
2010-05-01 MyTown 2010 05 2010 2010/2011 1 16 7
2010-06-01 MyTown 2010 06 2010 2010/2011 0 13 25"), stringsAsFactors=F, header=T)
library(RJSONIO)
to_list <- function(ccg, year){
df_monthly <- subset(df, CCG==ccg & Year==year)
df_yearly <- aggregate(df[,c("OC_5a", "OC_5b", "OC_5c")] ,df[,c("name", "refYear")], sum)
l <- list("ccg"=ccg,
data=list(list("period" = "yearly",
"scores" = as.list(df_yearly)
),
list("period" = "monthly",
"scores" = as.list(df[,c("name", "refYear", "OC_5a", "OC_5b", "OC_5c")])
)
)
)
return(l)
}
toJSON(to_list("MyTown", "2010"), pretty=T)
Which returns this:
{
"ccg" : "MyTown",
"data" : [
{
"period" : "yearly",
"scores" : {
"name" : [
"2009/2010",
"2010/2011"
],
"refYear" : [
2009,
2010
],
"OC_5a" : [
2,
1
],
"OC_5b" : [
35,
39
],
"OC_5c" : [
59,
42
]
}
},
{
"period" : "monthly",
"scores" : {
"name" : [
"2009/2010",
"2009/2010",
"2009/2010",
"2010/2011",
"2010/2011",
"2010/2011"
],
"refYear" : [
2009,
2009,
2009,
2010,
2010,
2010
],
"OC_5a" : [
0,
1,
1,
0,
1,
0
],
"OC_5b" : [
15,
14,
6,
10,
16,
13
],
"OC_5c" : [
27,
22,
10,
10,
7,
25
]
}
}
]
}

Related

Sort and Select Top 5 JSON values

I have a two-fold issue and looking for clues as to how to approach it.
I have a json file that is formatted as such:
{
"code": 2000,
"data": {
"1": {
"attribute1": 40,
"attribute2": 1.4,
"attribute3": 5.2,
"attribute4": 124
"attribute5": "65.53%"
},
"94": {
"attribute1": 10,
"attribute2": 4.4,
"attribute3": 2.2,
"attribute4": 12
"attribute5": "45.53%"
},
"96": {
"attribute1": 17,
"attribute2": 9.64,
"attribute3": 5.2,
"attribute4": 62
"attribute5": "51.53%"
}
},
"message": "SUCCESS"
}
My goals are to:
I would first like to sort the data by any of the attributes.
There are around 100 of these, I would like to grab the top 5 (depending on how they are sorted), then...
Output the data in a table e.g.:
These are sorted by: attribute5
---
attribute1 | attribute2 | attribute3 | attribute4 | attribute5
40 |1.4 |5.2|124|65.53%
17 |9.64|5.2|62 |51.53%
10 |4.4 |2.2|12 |45.53%
*also, attribute5 above is a string value
Admittedly, my knowledge here is very limited.
I attempted to mimick the method used here:
python sort list of json by value
I managed to open the file and I can extract the key values from a sample row:
import json
jsonfile = path-to-my-file.json
with open(jsonfile) as j:
data=json.load(j)
k = data["data"]["1"].keys()
print(k)
total=data["data"]
for row in total:
v = data["data"][str(row)].values()
print(v)
this outputs:
dict_keys(['attribute1', 'attribute2', 'attribute3', 'attribute4', 'attribute5'])
dict_values([1, 40, 1.4, 5.2, 124, '65.53%'])
dict_values([94, 10, 4.4, 2.2, 12, '45.53%'])
dict_values([96, 17, 9.64, 5.2, 62, '51.53%'])
Any point in the right direction would be GREATLY appreciated.
Thanks!
If you don't mind using pandas you could do it like this
import pandas as pd
rows = [v for k,v in data["data"].items()]
df = pd.DataFrame(rows)
# then to get the top 5 values by attribute can choose either ascending
# or descending with the ascending keyword and head prints the top 5 rows
df.sort_values('attribute1', ascending=True).head()
This will allow you to sort by any attribute you need at any time and print out a table.
Which will produce output like this depending on what you sort by
attribute1 attribute2 attribute3 attribute4 attribute5
0 40 1.40 5.2 124 65.53%
1 10 4.40 2.2 12 45.53%
2 17 9.64 5.2 62 51.53%
I'll leave this answer here in case you don't want to use pandas but the answer from #MatthewBarlowe is way less complicated and I recommend that.
For sorting by a specific attribute, this should work:
import json
SORT_BY = "attribute4"
with open("test.json") as j:
data = json.load(j)
items = data["data"]
sorted_keys = list(sorted(items, key=lambda key: items[key][SORT_BY], reverse=True))
Now, sorted_keys is a list of the keys in order of the attribute they were sorted by.
Then, to print this as a table, I used the tabulate library. The final code for me looked like this:
from tabulate import tabulate
import json
SORT_BY = "attribute4"
with open("test.json") as j:
data = json.load(j)
items = data["data"]
sorted_keys = list(sorted(items, key=lambda key: items[key][SORT_BY], reverse=True))
print(f"\nSorted by: {SORT_BY}")
print(
tabulate(
[
[sorted_keys[i], *items[sorted_keys[i]].values()]
for i, _ in enumerate(items)
],
headers=["Column", *items["1"].keys()],
)
)
When sorting by 'attribute5', this outputs:
Sorted by: attribute5
Column attribute1 attribute2 attribute3 attribute4 attribute5
-------- ------------ ------------ ------------ ------------ ------------
1 40 1.4 5.2 124 65.53%
96 17 9.64 5.2 62 51.53%
94 10 4.4 2.2 12 45.53%

How to insert object of array into JSON format?

If this json output looks like this
{
"Bank_Name":"This is bank name",
"ACC_Name":"Tummy",
"ACC_No":"1122XXXX115",
"Date_Active":"Jan 31 2019 2:16PM",
"Date_Expired":"Nov 17 2020 1:14PM",
"Bank_Status":"Expired",
"email_Notif":[
{"Verification":[{"User_Email":"tfe.master#gmail.com","Send_DateTime":"2020-11-03T13:30:59.7036152"},
{"User_Email":"the.user#outlook.com","Send_DateTime":"2020-11-03T13:31:02.1563596"}]
},{"Verified":[{"DateTime": "2020-11-03T13:31:02.1563596"}]
},
{ "Updating":[{"User_Email":"the.spv#gmail.com","Send_DateTime":"2020-11-03T13:30:59.7036152"},
{"User_Email":"the.officer#outlook.com","Send_DateTime":"2020-11-03T13:31:02.1563596"}]
}
],
"rejection_Statuses":[
{"Verification":"Nov 3 2020 01:31:02 PM"} ,
{"Verified":"Nov 7 2020 01:12:03 PM"} ,
{"Updating":"Nov 17 2020 01:18:03 PM"} ,
{"Re_run":"Nov 27 2020 05:18:03 PM"}
]
}
Questions:
How do I use "JSON_Modifiy" in SQL Server to insert email_Notif (as object of array) ? if JSON input looks like this:
{"Bank_Name": "BPD SULAWESI SELATAN",
"ACC_Name": "Tutik",
"ACC_No": "1122000115",
"Date_Active": "Jan 31 2019 2:16PM",
"Date_Expired": "Nov 17 2020 1:14PM",
"Bank_Status": "Expired",
"rejection_Statuses":[
{"Verification":"Nov 3 2020 01:31:02 PM"} ,
{"Verified":"Nov 7 2020 01:12:03 PM"} ,
{"Updating":"Nov 17 2020 01:18:03 PM"} ,
{"Re_run":"Nov 27 2020 05:18:03 PM"}]
}
How to get value from "email_Notify" as JSON format in SQL Server query by using select statement ? (Verification, Verified and Updating)
If I understand correctly, one way to achieve this is:
JSON:
DECLARE
#email_Verfication nvarchar(max) = N'{
"Verification":[
{"User_Email":"tfe.master#gmail.com", "Send_DateTime":"2020-11-03T13:30:59.7036152"},
{"User_Email":"the.user#outlook.com", "Send_DateTime":"2020-11-03T13:31:02.1563596"}
]
}',
#email_Verified nvarchar(max) = N'{
"Verified":[
{"DateTime":"2020-11-03T13:31:02.1563596"}
]
}',
#email_Updating nvarchar(max) = N'{"Updating":[
{"User_Email":"the.spv#gmail.com", "Send_DateTime":"2020-11-03T13:30:59.7036152"},
{"User_Email":"the.officer#outlook.com", "Send_DateTime":"2020-11-03T13:31:02.1563596"}
]
}',
#detail NVARCHAR (MAX) = N'{
"Bank_Name":"BPD SULAWESI SELATAN",
"ACC_Name":"Tutik",
"ACC_No":"1122000115",
"Date_Active":"Jan 31 2019 2:16PM",
"Date_Expired":"-",
"Bank_Status":"Active",
"rejection_Statuses":[
{"Verification":"Nov 3 2020 01:31:02 PM"},
{"Verified":"Nov 7 2020 01:12:03 PM"},
{"Updating":"Nov 17 2020 01:18:03 PM"},
{"Re_run":"Nov 27 2020 05:18:03 PM"}
]
}'
Modify JSON:
SET #detail = JSON_MODIFY (#detail, 'append $.email_Notif', JSON_QUERY(#email_Verfication, '$'))
SET #detail = JSON_MODIFY (#detail, 'append $.email_Notif', JSON_QUERY(#email_Verified, '$'))
SET #detail = JSON_MODIFY (#detail, 'append $.email_Notif', JSON_QUERY(#email_Updating, '$'))
Parse JSON:
SELECT j2.[key], j2.[value]
FROM OPENJSON(#json, '$.email_Notif') j1
CROSS APPLY OPENJSON(j1.[value], '$') j2

Should sequence nature of input data within a mini-batch be maintained?

Suppose a multivariate time series prediction problem for the following data
rows = 2000
cols = 6
data = np.arange(int(rows*cols)).reshape(-1,rows).transpose()
print(data[0:20])
print('\n {} \n'.format(data.shape))
print(data[-20:])
which prints out following data
[[ 0 2000 4000 6000 8000 10000]
[ 1 2001 4001 6001 8001 10001]
[ 2 2002 4002 6002 8002 10002]
[ 3 2003 4003 6003 8003 10003]
[ 4 2004 4004 6004 8004 10004]
[ 5 2005 4005 6005 8005 10005]
[ 6 2006 4006 6006 8006 10006]
[ 7 2007 4007 6007 8007 10007]
[ 8 2008 4008 6008 8008 10008]
[ 9 2009 4009 6009 8009 10009]
[ 10 2010 4010 6010 8010 10010]
[ 11 2011 4011 6011 8011 10011]
[ 12 2012 4012 6012 8012 10012]
[ 13 2013 4013 6013 8013 10013]
[ 14 2014 4014 6014 8014 10014]
[ 15 2015 4015 6015 8015 10015]
[ 16 2016 4016 6016 8016 10016]
[ 17 2017 4017 6017 8017 10017]
[ 18 2018 4018 6018 8018 10018]
[ 19 2019 4019 6019 8019 10019]]
(2000, 6)
[[ 1980 3980 5980 7980 9980 11980]
[ 1981 3981 5981 7981 9981 11981]
[ 1982 3982 5982 7982 9982 11982]
[ 1983 3983 5983 7983 9983 11983]
[ 1984 3984 5984 7984 9984 11984]
[ 1985 3985 5985 7985 9985 11985]
[ 1986 3986 5986 7986 9986 11986]
[ 1987 3987 5987 7987 9987 11987]
[ 1988 3988 5988 7988 9988 11988]
[ 1989 3989 5989 7989 9989 11989]
[ 1990 3990 5990 7990 9990 11990]
[ 1991 3991 5991 7991 9991 11991]
[ 1992 3992 5992 7992 9992 11992]
[ 1993 3993 5993 7993 9993 11993]
[ 1994 3994 5994 7994 9994 11994]
[ 1995 3995 5995 7995 9995 11995]
[ 1996 3996 5996 7996 9996 11996]
[ 1997 3997 5997 7997 9997 11997]
[ 1998 3998 5998 7998 9998 11998]
[ 1999 3999 5999 7999 9999 11999]]
Consider the last column is the target time series to be predicted. If the batch-size is 20, Can I randomly skip some points e.g. 1009, 1012, 1015 from the first batch during training?
If yes, this mean we can randomly choose points within a time series to be used as training and test?

converting R dataframes to json object

Say I have the following dataframes:
df1 <- data.frame(Name = c("Harry","George"), color=c("#EA0001", "#EEEEEE"))
Name color
1 Harry #EA0001
2 George #EEEEEE
df.details <- data.frame(Name = c(rep("Harry",each=3), rep("George", each=3)),
age=21:23,
total=c(14,19,24,1,9,4)
)
Name age total
1 Harry 21 14
2 Harry 22 19
3 Harry 23 24
4 George 21 1
5 George 22 9
6 George 23 4
I know how to convert each df to json like this:
library(jsonlite)
toJSON(df.details)
[{"Name":"Harry","age":21,"total":14},{"Name":"Harry","age":22,"total":19},{"Name":"Harry","age":23,"total":24},{"Name":"George","age":21,"total":1},{"Name":"George","age":22,"total":9},{"Name":"George","age":23,"total":4}]
However, I am looking to get the following structure to my JSON data:
{
"myjsondata": [
{
"Name": "Harry",
"color": "#EA0001",
"details": [
{
"age": 21,
"total": 14
},
{
"age": 22,
"total": 19
},
{
"age": 23,
"total": 24
}
]
},
{
"Name": "George",
"color": "#EEEEEE",
"details": [
{
"age": 21,
"total": 1
},
{
"age": 22,
"total": 9
},
{
"age": 23,
"total": 4
}
]
}
]
}
I think the answer may be in how I store the data in a list in R before converting, but not sure.
Try this format:
df1$details <- split(df.details[-1], df.details$Name)[df1$Name]
df1
# Name color details
#1 Harry #EA0001 21, 22, 23, 14, 19, 24
#2 George #EEEEEE 21, 22, 23, 1, 9, 4
toJSON(df1)
#[{
#"Name":"Harry",
#"color":"#EA0001",
#"details":[
# {"age":21,"total":14},
# {"age":22,"total":19},
# {"age":23,"total":24}]},
#{
#"Name":"George",
#"color":"#EEEEEE",
#"details":[
# {"age":21,"total":1},
# {"age":22,"total":9},
# {"age":23,"total":4}]}
#]

Constructing request payload in R using rjson/jsonlite

My current code as seen below attempts to construct a request payload (body), but isn't giving me the desired result.
library(df2json)
library(rjson)
y = rjson::fromJSON((df2json::df2json(dataframe)))
globalparam = ""
req = list(
Inputs = list(
input1 = y
)
,GlobalParameters = paste("{",globalparam,"}",sep="")#globalparam
)
body = enc2utf8((rjson::toJSON(req)))
body currently turns out to be
{
"Inputs": {
"input1": [
{
"X": 7,
"Y": 5,
"month": "mar",
"day": "fri",
"FFMC": 86.2,
"DMC": 26.2,
"DC": 94.3,
"ISI": 5.1,
"temp": 8.2,
"RH": 51,
"wind": 6.7,
"rain": 0,
"area": 0
}
]
},
"GlobalParameters": "{}"
}
However, I need it to look like this:
{
"Inputs": {
"input1": [
{
"X": 7,
"Y": 5,
"month": "mar",
"day": "fri",
"FFMC": 86.2,
"DMC": 26.2,
"DC": 94.3,
"ISI": 5.1,
"temp": 8.2,
"RH": 51,
"wind": 6.7,
"rain": 0,
"area": 0
}
]
},
"GlobalParameters": {}
}
So basically global parameters have to be {}, but not hardcoded. It seemed like a fairly simple problem, but I couldn't fix it. Please help!
EDIT:
This is the dataframe
X Y month day FFMC DMC DC ISI temp RH wind rain area
1 7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0
2 7 4 oct tue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 0
3 7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 0
4 8 6 mar fri 91.7 33.3 77.5 9.0 8.3 97 4.0 0.2 0
This is an example of another data frame
> a = data.frame("col1" = c(81, 81, 81, 81), "col2" = c(72, 69, 79, 84))
Using this sample data
dd<-read.table(text=" X Y month day FFMC DMC DC ISI temp RH wind rain area
1 7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0", header=T)
You can do
globalparam = setNames(list(), character(0))
req = list(
Inputs = list(
input1 = dd
)
,GlobalParameters = globalparam
)
body = enc2utf8((rjson::toJSON(req)))
Note that globalparam looks a bit funny because we need to force it to a named list for rjson to treat it properly. We only have to do this when it's empty.