I'm looking to take a nice tidy data frame and turn it into a deeply nested JSON using R. So far I haven't been able to find any other resources that directly address this task - most seem to be trying to take it in the other direction (un-nesting a JSON).
Here's a small dummy version of the data frame I'm starting with. Imagine a survey was given to two audiences within a company, one for managers and a separate one for employees. The surveys have different sets of questions with different IDs but many questions overlap and I want to compare the responses between the two groups. The end goal is to make a JSON that matches up section IDs, question IDs, and option IDs/text from two surveys in the correct hierarchy. Some questions have subquestions that require a further level of nesting, which is what I’m having difficulty doing.
library(dplyr)
library(tidyr)
library(jsonlite)
dummyDF <- data_frame(sectionId = c(rep(1,9),rep(2,3)),
questionId = c(rep(1,3),rep(2,6),rep(3,3)),
subquestionId = c(rep(NA,3),rep("2a",3),rep("2b",3),rep(NA,3)),
deptManagerQId = c(rep("m1",3),rep("m2",3),rep("m3",3),rep("m4",3)),
deptEmployeeQId = c(rep("e1",3),rep("e3",3),rep("e4",3),rep("e7",3)),
optionId = rep(c(1,2,3),4),
text = rep(c("yes","neutral","no"),4))
And here’s the end result I’m trying to achieve:
theGoal <- fromJSON('{
"sections": [
{
"sectionId": "1",
"questions": [
{
"questionId": "1",
"deptManagerQId": "m1",
"deptEmployeeQId": "e1",
"options": [
{
"optionId": 1,
"text": "yes"
},
{
"optionId": 2,
"text": "neutral"
},
{
"optionId": 3,
"text": "no"
}
]
},
{
"questionId": "2",
"options": [
{
"optionId": 1,
"text": "yes"
},
{
"optionId": 2,
"text": "neutral"
},
{
"optionId": 3,
"text": "no"
}
],
"subquestions": [
{
"subquestionId": "2a",
"deptManagerQId": "m2",
"deptEmployeeQId": "e3"
},
{
"subquestionId": "2b",
"deptManagerQId": "m3",
"deptEmployeeQId": "e4"
}
]
},
{
"questionId": "3",
"deptManagerQId": "m4",
"deptEmployeeQId": "e7",
"options": [
{
"optionId": 1,
"text": "yes"
},
{
"optionId": 2,
"text": "neutral"
},
{
"optionId": 3,
"text": "no"
}
]
}
]
}
]
}')
Here are a few approaches I’ve tried using nest from tidyr that end up either only getting me part of the way there or throwing an error message.
1
list1 <- dummyDF %>% nest(-sectionId, .key=questions) %>%
mutate(questions = lapply(seq_along(.$questions), function(x) nest(.$questions[[x]], optionId, text, .key = options))) %>%
list(sections = .)
2
nested1 <- dummyDF %>% nest(-sectionId, .key=questions) %>%
mutate(questions = lapply(seq_along(.$questions), function(x) nest(.$questions[[x]], optionId, text, .key = options)))
nested2 <- nested1 %>% mutate(questions = lapply(seq_along(.$questions), function(x) nest(.$questions[[x]], subquestionId, .key = subquestions)))
#Gives this error: cannot group column options, of class 'list'
3
list2 <- dummyDF %>% nest(-sectionId, .key=questions) %>%
mutate(questions = lapply(seq_along(.$questions),
function(x) {ifelse(is.na(.$questions[[x]]$subquestionId),
function(x) {.$questions[[x]] %>% select(-subquestionId) %>% nest(optionId, text, .key = options)},
function(x) {.$questions[[x]] %>% nest(subquestion_id, .key = subquestions)})})) %>%
list(sections = .)
#Gives this error: attempt to replicate an object of type 'closure'
Any ideas would be greatly appreciated. I’m open to any approaches. I took the issue to a local R user group meet-up but wasn’t able to come up with any solutions so I’ve got my fingers crossed here. I realize R might not be the best tool to accomplish this but it’s the one I know so I’m giving it a shot. Thanks.
jsonlite::toJSON looks like a nice solution to your problem.
Works seamlessly up to column types and column order (I corrected to illustrate that the objects were identical). If you need any other type of JSON structure, I would recommend restructuring the data_frame on the front end first using something like dplyr or tidyr.
library(jsonlite)
library(dplyr)
dummyDF <- data_frame(sectionId = c(rep(1,9),rep(2,3)),
questionId = c(rep(1,3),rep(2,6),rep(3,3)),
subquestionId = c(rep(NA,3),rep("2a",3),rep("2b",3),rep(NA,3)),
deptManagerQId = c(rep("m1",3),rep("m2",3),rep("m3",3),rep("m4",3)),
deptEmployeeQId = c(rep("e1",3),rep("e3",3),rep("e4",3),rep("e7",3)),
optionId = rep(c(1,2,3),4),
text = rep(c("yes","neutral","no"),4))
## Convert to a JSON object
json <- jsonlite::toJSON(dummyDF)
theGoal <- fromJSON(json) %>% tbl_df() %>% select_(.dots=names(dummyDF)) %>%
## Convert integer columns to numeric
mutate_if(function(x) {if (typeof(x)=='integer') {TRUE} else {FALSE}},as.numeric)
## Compare the objects
all.equal(theGoal,dummyDF)
# TRUE
identical(theGoal,dummyDF)
# TRUE
Related
How can i retrieve all name and id from json file.This is a short version of my json file. I want to retrieve all names and id's so that i can match them with my variable. Then i can triger some work on it.So please help me to retrieve all Id and name. I searched in google but couldn't find. Every example was of single json.
[
{
"id": 707860,
"name": "Hurzuf",
"country": "UA",
"coord": {
"lon": 34.283333,
"lat": 44.549999
}
},
{
"id": 519188,
"name": "Novinki",
"country": "RU",
"coord": {
"lon": 37.666668,
"lat": 55.683334
}
},
{
"id": 1283378,
"name": "Gorkhā",
"country": "NP",
"coord": {
"lon": 84.633331,
"lat": 28
}
}
]
Here's My Code:
import json
with open('city.list.json') as f:
data = json.load(f)
for p_id in data:
hay = p_id.get('name')
suppose,i got a word delhi, now i am comparing it with name in dictionary above. when it hits i want to retrieve it's id.
if hay == delhi:
ga = # retrieve delhi's id
You need to check for name and apply a condition:
for p_id in data:
u_id = p_id.get('id')
u_name = p_id.get('name')
if(u_id == 1283378 and u_name == "Gorkha"):
# dosomthing
Not sure exactly on your output. But this extracts id and name in a new variable.
ids=[]
for p_id in data:
ids.append((p_id['id'], p_id['name']))
print(ids)
Output:
[(707860, 'Hurzuf'), (519188, 'Novinki'), (1283378, 'Gorkhā')]
I would suggest a different approach, process the JSON data into a dict and get the information you want from that. For example:
import json
with open('city.list.json') as f:
data = json.load(f)
name_by_id = dict([(str(p['id']), p['name']) for p in data])
id_by_name = dict([(p['name'], p['id']) for p in data])
And the results:
>>> print(id_by_name['Hurzuf'])
707860
>>> print(name_by_id['519188'])
Novinki
import json
with open('citylist.json') as f:
data = json.load(f)
list1 = list ((p_id.get('id') for p_id in data if p_id.get('name') == "Novinki"))
# you can put this in print statement,
# but since goal is to save and not just print,
# you can store in a variable
print(*list1, sep="\n")
gives
519188
[Program finished]
I have a .txt file with this structure
section1#[{"p": "0.999834", "tag": "MA"},{"p": "1", "tag": "MO"},...etc...}]
section1#[{"p": "0.9995", "tag": "NC"},{"p": "1", "tag": "FL"},...etc...}]
...
section2#[{"p": "0.9995", "tag": "NC"},{"p": "1", "tag": "FL"},...etc...}]
I am trying to read it by using R with the commands
library(jsonlite)
data <- fromJSON("myfile.txt")
But I get this
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
lexical error: invalid char in json text.
section2#[{"p": "0.99
(right here) ------^
How can I read it even by splitting by sections?
Remove the prefix and bind the flattened JSON arrays together into a data frame:
raw_dat <- readLines(textConnection('section1#[{"p": "0.999834", "tag": "MA"},{"p": "1", "tag": "MO"}]
section1#[{"p": "0.9995", "tag": "NC"},{"p": "1", "tag": "FL"}]
section2#[{"p": "0.9995", "tag": "NC"},{"p": "1", "tag": "FL"}]'))
library(stringi)
library(purrr)
library(jsonlite)
stri_replace_first_regex(raw_dat, "^section[[:digit:]]+#", "") %>%
map_df(fromJSON)
## p tag
## 1 0.999834 MA
## 2 1 MO
## 3 0.9995 NC
## 4 1 FL
## 5 0.9995 NC
## 6 1 FL
Remove section# from each line. Then your .txt will have a 2D array with JSON objects at each index.
You can access elements by accessing it as foo[0][0] being the first object of first line and foo[m][n] where m is the number of sections -1 and n is number of objects in each section -1
I have a data that shows a series of actions (column Actions ) performed by several users (column Id). The order of the data frame is important - it is the order the actions were performed in. For each id, the first action performed is start. Consecutive identical actions are possible (for example, the sequence start -> D -> D -> D is valid ). This is some code to generate data:
set.seed(10)
i <- 0
all_id <- NULL
all_vals <- NULL
while (i < 5) {
i <- i + 1
print(i)
size <- sample(3:5, size = 1)
tmp_id <- rep(i, times = size + 1)
tmp_vals <- c("start",sample(LETTERS, size = size) )
all_id <- c(all_id, tmp_id)
all_vals <- c(all_vals, tmp_vals)
}
df <- data.frame(Id = all_id,
Action = all_vals)
Goal - transform this data in a JSON nested on multiple levels that will be used in a D3.js visualization (like this). I would like to see a counter for how many times each child appears for their respective parent (an maybe even a percentage out of the total appearances of the parent) - but I hope I can do that myself.
Expected output below - this is generic, not from the data I generated above, and real data will have quite a lot of nested values ( count and percentage are optional at this point in time):
{
"action": "start",
"parent": "null",
"count": "10",
"percentage": "100",
"children": [
{
"action": "H",
"parent": "start",
"count": "6",
"percentage": "60",
"children": [
{
"action": "D",
"parent": "H",
"count": "5",
"percentage": "83.3"
},
{
"action": "B",
"parent": "H",
"count": "3",
"percentage": "50"
}
]
},
{
"action": "R",
"parent": "start",
"count": "4",
"percentage": "40"
}
]
}
I know I am supposed to post something I've tried, but I really don't have anything remotely worth of being shown.
I have just started writing some R -> d3.js converters in https://github.com/timelyportfolio/d3r that should work well in these type situations. I will work up an example later today with your data.
The internal hierarchy builder in https://github.com/timelyportfolio/sunburstR also might work well here.
I'll add to the answer as I explore both of these paths.
example 1
set.seed(10)
i <- 0
all_id <- NULL
all_vals <- NULL
while (i < 5) {
i <- i + 1
print(i)
size <- sample(3:5, size = 1)
tmp_id <- rep(i, times = size + 1)
tmp_vals <- c("start",sample(LETTERS, size = size) )
all_id <- c(all_id, tmp_id)
all_vals <- c(all_vals, tmp_vals)
}
df <- data.frame(Id = all_id,
Action = all_vals)
# not sure I completely understand what this is
# supposed to become but here is a first try
# find position of start
start_pos <- which(df$Action=="start")
# get the sequences
# surely there is a better way but do this for now
sequences <- paste(
start_pos+1,
c(start_pos[-1],nrow(df))-1,
sep=":"
)
paths <- lapply(
sequences,
function(x){
data.frame(
t(as.character(df[eval(parse(text=x)),]$Action)),
stringsAsFactors=FALSE
)
}
)
paths_df <- dplyr::bind_rows(paths)
# use d3r
# devtools::install_github("timelyportfolio/d3r")
library(d3r)
d3_nest(paths_df) # if want list, then json=FALSE
# visualize with listviewer
# devtools::install_github("timelyportfolio/listviewer")
listviewer::jsonedit(d3_nest(paths_df))
I am using Python; and I need to iterate through JSON objects and retrieve nested values. A snippet of my data follows:
"bills": [
{
"url": "http:\/\/maplight.org\/us-congress\/bill\/110-hr-195\/233677",
"jurisdiction": "us",
"session": "110",
"prefix": "H",
"number": "195",
"measure": "H.R. 195 (110\u003csup\u003eth\u003c\/sup\u003e)",
"topic": "Seniors' Health Care Freedom Act of 2007",
"last_update": "2011-08-29T20:47:44Z",
"organizations": [
{
"organization_id": "22973",
"name": "National Health Federation",
"disposition": "support",
"citation": "The National Health Federation (n.d.). \u003ca href=\"http:\/\/www.thenhf.com\/government_affairs_federal.html\"\u003e\u003ccite\u003e Federal Legislation on Consumer Health\u003c\/cite\u003e\u003c\/a\u003e. Retrieved August 6, 2008, from The National Health Federation.",
"catcode": "J3000"
},
{
"organization_id": "27059",
"name": "A Christian Perspective on Health Issues",
"disposition": "support",
"citation": "A Christian Perspective on Health Issues (n.d.). \u003ca href=\"http:\/\/www.acpohi.ws\/page1.html\"\u003e\u003ccite\u003ePart E - Conclusion\u003c\/cite\u003e\u003c\/a\u003e. Retrieved August 6, 2008, from .",
"catcode": "X7000"
},
{
"organization_id": "27351",
"name": "Natural Health Roundtable",
"disposition": "support",
"citation": "Natural Health Roundtable (n.d.). \u003ca href=\"http:\/\/naturalhealthroundtable.com\/reform_agenda\"\u003e\u003ccite\u003eNatural Health Roundtable SUPPORTS the following bills\u003c\/cite\u003e\u003c\/a\u003e. Retrieved August 6, 2008, from Natural Health Roundtable.",
"catcode": "J3000"
}
]
},
I need to go through each object in "bills" and get "session", "prefix", etc. and I also need go through each "organizations" and get "name", "disposition", etc. I have the following code:
import csv
import json
path = 'E:/Thesis/thesis_get_data'
with open (path + "/" + 'maplightdata110congress.json',"r") as f:
data = json.load(f)
a = data['bills']
b = data['bills'][0]["prefix"]
c = data['bills'][0]["number"]
h = data['bills'][0]['organizations'][0]
e = data['bills'][0]['organizations'][0]['name']
f = data['bills'][0]['organizations'][0]['catcode']
g = data['bills'][0]['organizations'][0]['catcode']
for i in a:
for index in e:
print ('name')
and it returns the string 'name' a bunch of times.
Suggestions?
This might help you.
def func1(data):
for key,value in data.items():
print (str(key)+'->'+str(value))
if type(value) == type(dict()):
func1(value)
elif type(value) == type(list()):
for val in value:
if type(val) == type(str()):
pass
elif type(val) == type(list()):
pass
else:
func1(val)
func1(data)
All you have to do is to pass the JSON Object as Dictionary to the Function.
There is also this python library that might help you with this.You can find this here -> JsonJ
PEACE BRO!!!
I found the solution on another forum and wanted to share with everyone here in case this comes up again for someone.
import csv
import json
path = 'E:/Thesis/thesis_get_data'
with open (path + "/" + 'maplightdata110congress.json',"r") as f:
data = json.load(f)
for bill in data['bills']:
for organization in bill['organizations']:
print (organization.get('name'))`
refining to #Joish's answer
def func1(data):
for key,value in data.items():
print (str(key)+'->'+str(value))
if isinstance(value, dict):
func1(value)
elif isinstance(value, list):
for val in value:
if isinstance(val, str):
pass
elif isinstance(val, list):
pass
else:
func1(val)
func1(data)
Same as implemented here
This question is double nested so two for loops makes sense.
Here's an extract from Pluralsight using their GraphGL with an example that goes three levels deep to get either Progress, User or Course info:
{
"data": {
"courseProgress": {
"nodes": [
{
"user": {
"id": "1",
"email": "a#a.com",
"startedOn": "2019-07-26T05:00:50.523Z"
},
"course": {
"id": "22",
"title": "Building Machine Learning Models in Python with scikit-learn"
},
"percentComplete": 34.0248,
"lastViewedClipOn": "2019-07-26T05:26:54.404Z"
}
]
}
}
}
The code to parse this JSON:
for item in items["data"]["courseProgress"]["nodes"]:
print(item["user"].get('email'))
print(item["course"].get('title'))
print(item.get('percentComplete'))
print(item.get('lastViewedClipOn'))
i'm having trouble with a very nasty nested JSON.
The format is like this
{
"matches": [
{
"matchId": 1,
"region": "BR",
"participants": [
{
"participantId": 0,
"teamId": 200,
"stats": {
"winner": true,
"champLevel": 16,
"item0": 3128,
}
{
"matchId": 2,
"region": "BR",
"participants": [
{
"participantId": 0,
"teamId": 201,
"stats": {
"winner": false,
"champLevel": 18,
"item0": 3128,
"item1": 3157,
"item1": 3158,
}
As you can see in the second match the number of items increased, but in the data frame the first row will have the same collumns:
MatchId region ... stats.winner stats.champLevel stats.item0 stats.item1 stats.item2
1 BR TRUE 16 3128 1 BR
1 BR TRUE 16 3128 3157 3158
See the first row is smaller than the second, so R recycle the values ....
If you want the full data you can grab it at:
http://pastebin.com/HQDf2ase
How I parsed the json to data.frame:
json.matchData <- fromJSON(file="file.json"))
Unlist the elements of the Json and convert it to a data frame
matchData.i <- lapply(json.matchData$matches, function(x){ unlist(x)})
Transform into Data Frame
matchData <- do.call("rbind", matchData.i)
matchData <- as.data.frame(matchData)
But the dataframe is messed up, because some fields should be NA but they are filled with wrong values.
I think using the plyr rbind.fill() function would be helpful here. How about this
library(plyr)
matchData <- rbind.fill(lapply(matchData.i,
function(x) do.call("data.frame", as.list(x))
))
the lapply() bit is to turn the intermediate lists into data.frames which rbind.fill requires.