I'm messing around with tidyjson (latest from github, published by Jeremy Stanley). I wanted to sort of automate searching and extract the nested arrays. The following examples below provide the output I want.
'{"name": {"first": "bob", "last": "jones"}, "age": 32}' %>%
enter_object("name") %>%
gather_keys %>%
append_values_string
'{"name": {"first": "bob", "last": "jones"}, "age": 32}' %>%
enter_object(name) %>%
gather_keys %>%
append_values_string
These both give the same output:
# A tbl_json: 2 x 3 tibble with a "JSON" attribute
`attr(., "JSON")` document.id key string
<chr> <int> <chr> <chr>
1 "bob" 1 first bob
2 "jones" 1 last jones
However, if I declare a character variable before and pass it along it fails.
object_name <- "name"
'{"name": {"first": "bob", "last": "jones"}, "age": 32}' %>%
enter_object(list(name="name")) %>%
gather_keys %>%
append_values_string
Error: Path components must be single names or character strings
Any ideas why this would happen?
If you are familiar with Hadley's book Advanced R, this is a piece of non-standard evaluation that unfortunately does not presently have a workaround in pure tidyjson (I would prefer a enter_object_ that uses standard evaluation, more like dplyr). I am hopeful of that functionality at some point being available, because as you suggest, it would be nice to vectorize and automate these sorts of programs.
The Non-Standard Evaluation is the "magic" that allows you to pass in the un-quoted name and still get good results in your second example (instead of the program looking for an object called name). The hazard is it does not resolve objects like object_name in your case.
That said, it seems you can work-around with do.call and a list of parameters (I fixed your example, as I think it went a bit awry)
library(tidyjson)
json <- "{\"name\": {\"first\": \"bob\", \"last\": \"jones\"}, \"age\": 32}"
object_name <- "name"
do.call(enter_object, args = list(json, object_name)) %>% gather_object %>%
append_values_string
#> # A tbl_json: 2 x 3 tibble with a "JSON" attribute
#> `attr(., "JSON")` document.id name string
#> <chr> <int> <chr> <chr>
#> 1 "\"bob\"" 1 first bob
#> 2 "\"jones\"" 1 last jones
I definitely recommend checking out some of the new features / functionality in the development version of tidyjson with devtools::install_github('jeremystan/tidyjson'), but unfortunately no support for standard evaluation in "path"s yet.
Related
I'm wanting a way to convert the results of a pipeline manipulation into a table so it can be rendered as a HTML table in R Markdown.
Sample data:
Category <- sample(1:6, 394400)
Category <- sample(1:6, 394400, replace=TRUE)
Category <- factor(Category,
levels = c(1,2,3,4,5,6),
labels = c("First",
"Second",
"Third",
"Fourth",
"Fifth",
"Sixth"))
data <- data.frame(Category)
Then I build a frequency table using the pipeline:
Table <- data %>%
group_by(Category) %>%
summarise(N= n(), Percent = n()/NROW(data)*100) %>%
mutate(C.Percent = cumsum(Percent))
Which gives me this nice little summary table here:
# A tibble: 6 × 4
Category N Percent C.Percent
<fctr> <int> <dbl> <dbl>
1 First 65853 16.69701 16.69701
2 Second 66208 16.78702 33.48403
3 Third 65730 16.66582 50.14985
4 Fourth 65480 16.60243 66.75228
5 Fifth 65674 16.65162 83.40390
6 Sixth 65455 16.59610 100.00000
However if I try to convert that to a table to then convert to HTML, it tells me it cannot coerce Table to a table. This is the same with data frames as well.
Does anyone know a way, as I'd quite like to customise the appearance of the output?
There are several packages for that. Here are some:
knitr::kable(Table)
htmlTable::htmlTable(Table)
ztable::ztable(as.data.frame(Table))
DT::datatable(Table)
stargazer::stargazer(Table, type = "html")
Each of these has different customization options.
I want to extract data from the json object in R
R Package used tidyjson, magrittr, jsonlite
trial <- '[{ "KEYS": {"USER_ID": "1266", "MOBILE_NO": "9000000000"}}]'
trial %>%
gather_array %>% # stack as an array
spread_values(USER_ID = jstring("KEYS.USER_ID"),
MOBILE_NO = jstring("KEYs.MOBILE_NO") )
Output of this code is not as required. Anyone with suggestions.
document.id array.index USER_ID MOBILE_NO
1 1 1 <NA> <NA>
Expected output:
document.id array.index USER_ID MOBILE_NO
1 1 1266 9000000000
tidyjson uses multi-parameter paths, rather than "dot-separated" paths, as you attempted. You can really tackle this two ways:
Recommended, as it does not throw away the rest of the object:
trial <- '[{ "KEYS": {"USER_ID": "1266", "MOBILE_NO": "9000000000"}}]'
trial %>%
gather_array %>% # stack as an array
spread_values(USER_ID = jstring('KEYS','USER_ID'),
MOBILE_NO = jstring('KEYS','MOBILE_NO'))
Can also use enter_object if preferred or necessary:
trial <- '[{ "KEYS": {"USER_ID": "1266", "MOBILE_NO": "9000000000"}}]'
trial %>%
gather_array %>% # stack as an array
enter_object('KEYS') %>%
spread_values(USER_ID = jstring('USER_ID'),
MOBILE_NO = jstring('MOBILE_NO'))
I have a list of lists which are of variable length. The first value of each nested list is the key, and the rest of the values in the list will be the array entry. It looks something like this:
[[1]]
[1] "Bob" "Apple"
[[2]]
[1] "Cindy" "Apple" "Banana" "Orange" "Pear" "Raspberry"
[[3]]
[1] "Mary" "Orange" "Strawberry"
[[4]]
[1] "George" "Banana"
I've extracted the keys and entries as follows:
keys <- lapply(x, '[', 1)
entries <- lapply(x, '[', -1)
but now that I have these, I don't know how I can associate a key:value pair in R without creating a matrix first, but this is silly since my data don't fit in a rectangle anyway (every example I've seen uses the column names from a matrix as the key values).
This is my crappy method using a matrix, assigning rownames, and then using jsonLite to export to JSON.
#Create a matrix from entries, without recycling
#I found this function on StackOverflow which seems to work...
cbind.fill <- function(...){
nm <- list(...)
nm <- lapply(nm, as.matrix)
n <- max(sapply(nm, nrow))
do.call(cbind, lapply(nm, function (x)
rbind(x, matrix(, n-nrow(x), ncol(x)))))
}
#Call said function
matrix <- cbind.fill(entries)
#Transpose the thing
matrix <- t(matrix)
#Set column names
colnames(matrix) <- keys
#Export to json
json<-toJSON(matrix)
The result is good, but the implementation sucks. Result:
[{"Bob":["Apple"],"Cindy":["Apple","Banana","Orange","Pear","Raspberry"],"Mary":["Orange","Strawberry"],"George":["Banana"]}]
Please let me know of better ways that might exist to accomplish this.
How about:
names(entries) <- unlist(keys)
toJSON(entries)
Consider the following lapply() approach:
library(jsonlite)
entries <- list(c('Bob', 'Apple'),
c('Cindy', 'Apple', 'Banana', 'Orange','Pear','Raspberry'),
c('Mary', 'Orange', 'Strawberry'),
c('George', 'Banana'))
# ITERATE ALL CONTENTS EXCEPT FIRST
inner <- list()
nestlist <- lapply(entries,
function(i) {
inner <- i[2:length(i)]
return(inner)
})
# NAME EACH ELEMENT WITH FIRST ELEMENT
names(nestlist) <- lapply(entries, function(i) i[1])
#$Bob
#[1] "Apple"
#$Cindy
#[1] "Apple" "Banana" "Orange" "Pear" "Raspberry"
#$Mary
#[1] "Orange" "Strawberry"
#$George
#[1] "Banana"
x <- toJSON(list(nestlist), pretty=TRUE)
x
#[
# {
# "Bob": ["Apple"],
# "Cindy": ["Apple", "Banana", "Orange", "Pear", "Raspberry"],
# "Mary": ["Orange", "Strawberry"],
# "George": ["Banana"]
# }
#]
I think this has already been sufficiently answered but here is a method using purrr and jsonlite.
library(purrr)
library(jsonlite)
sample_data <- list(
list("Bob","Apple"),
list("Cindy","Apple","Banana","Orange","Pear","Raspberry"),
list("Mary","Orange","Strawberry"),
list("George","Banana")
)
sample_data %>%
map(~set_names(list(.x[-1]),.x[1])) %>%
toJSON(auto_unbox=TRUE, pretty=TRUE)
I am trying to convert the following multi-document JSON file into a data.frame.
x = '[
{"name": "Bob","groupIds": ["kwt6x61", "yiahf43"]},
{"name": "Sally","groupIds": "yiahf43"}
]'
I'm almost there by using
y = x %>% gather_array() %>%
spread_values(
name = jstring("name"),
groupIds = jstring("groupIds")
)
print(y)
Which returns:
document.id array.index name groupIds
1 1 1 Bob list("kwt6x61", "yiahf43")
2 1 2 Sally yiahf43
Can someone help spread the groupsIds into addtional rows?
This is an interesting problem. The issue stems from the fact that an array of 1 is stored as a string. Otherwise, enter_object('groupIds') %>% gather_array %>% append_values_string would work nicely. tidyjson does not seem to handle this situation nicely. I wonder whether this would even be considered valid JSON, since in one case groupIds is a string, and in another it is an array.
In any case, although this is not an ideal solution, you can use json_types() to illustrate the difference and then conditionally treat each. I converted to a tbl_df (i.e. dropped JSON component) for future processing when done parsing.
library(tidyjson)
library(dplyr)
library(tidyr)
x = '[
{"name": "Bob","groupIds": ["kwt6x61", "yiahf43"]},
{"name": "Sally","groupIds": "yiahf43"}
]'
## Show the different types
z <- x %>% gather_array() %>% spread_values(
name=jstring('name')
) %>% enter_object('groupIds') %>% json_types()
## Conditionally treat each
final <- bind_rows(
z[z$type=='array',] %>% gather_array('id') %>% append_values_string('groupId')
, z[z$type=='string',] %>% append_values_string('groupId') %>% mutate(id=1)
) %>% tbl_df
## Spread them out, maybe? Depends on what you're looking for
final %>% spread('id','groupId')
I'm using package tidyjson to parse a json string and extract the key values into columns. The json in nested, and while I can drill down at a node, I can't figure out a way to go up to the previous level. The code is below:
library(tidyjson)
library(data.table)
library(dplyr)
input <- '{
"name": "Bob",
"age": 30,
"social": {
"married": "yes",
"kids": "no"
},
"work": {
"title": "engineer",
"salary": 5000
}
}'
output <- input %>% as.tbl_json() %>%
spread_values(name = jstring("name"),
age = jnumber("age")) %>%
enter_object("social") %>%
spread_values(married = jstring("married"),
kids = jstring("kids")) %>%
#### I would need an exit_obeject() here
enter_object("work") %>%
spread_values(title = jstring("title"),
salary = jnumber("salary"))
There's a note in the documentation:
"Note that there are often situations where there are multiple arrays
or objects of differing types that exist at the same level of the JSON
hierarchy. In this case, you need to use enter_object() to enter each
of them in separate pipelines to create separate data.frames that can
then be joined relationally."
As such I've been staging my tidyjson commands and putting the outputs together with merge, e.g.:
# first the high-level values
output_table <- input_tbl_json %>%
spread_values(val1 = jstring('val1'),
val2 = jnumber('val2'))
# then enter an object and get something from inside, merging it as a new column
output_table <- merge(output_table,
input_tbl_json %>%
enter_object('thing') %>%
spread_values(val3 = jstring('thing1')),
by = c('document.id'))
output table columns should look like | document.id | val1 | val2 | val3 |
That workflow may fall over with operations like gather_keys() that add rows, but I haven't had call to test it.
I think an overlooked piece of functionality within tidyjson is the ability to use more complex paths in the jnumber, jstring, etc. functions.
You can do something like the following without "entering an object." I find this to be a very satisfying solution, for the most part. Perhaps more satisfying than multiple enter/exits.
input <- '{
"name": "Bob",
"age": 30,
"social": {
"married": "yes",
"kids": "no"
},
"work": {
"title": "engineer",
"salary": 5000
}
}'
output <- input %>% as.tbl_json() %>%
spread_values(
name = jstring('name')
, age=jnumber('age')
, married=jstring('social','married')
, kids = jstring('social','kids')
, title= jstring('work','title')
, salary = jnumber('work','salary')
)