R to hierarchical JSON using JSONLITE? - json

My end game is to create a tree visualization from a hierarchical JSON file using D3js.
The hierarchy I need to represent is this diagram, where A has children B,C,D ; B has children E,F,G; C has children H, I ; and D has no children. The nodes will have multiple key:value pairs.I've only listed 3 for simplicity.
-- name:E
| type:dkBlue
| id: 005
|
|-- name:F
-- name:B ------| type:medBlue
| type:blue | id: 006
| id:002 |
| |-- name:G
| type:ltBlue
name:A ----| id:007
type:colors|
id:001 |-- name:C ----|-- name:H
| type:red | type:dkRed
| id:003 | id:008
| |
| |
| |-- name:I
| type:medRed
| id:009
|-- name:D
type:green
id: 004
My source data in R looks like:
nodes <-read.table(header = TRUE, text = "
ID name type
001 A colors
002 B blue
003 C red
004 D green
005 E dkBlue
006 F medBlue
007 G ltBlue
008 H dkRed
009 I medRed
")
links <- read.table(header = TRUE, text = "
startID relation endID
001 hasSubCat 002
001 hasSubCat 003
001 hasSubCat 004
002 hasSubCat 005
002 hasSubCat 006
002 hasSubCat 007
003 hasSubCat 008
003 hasSubCat 009
")
I must convert it to the following JSON:
{"name": "A",
"type": "colors",
"id" : "001",
"children": [
{"name": "B",
"type": "blue",
"id" : "002",
"children": [
{"name": "E",
"type": "dkBlue",
"id" : "003"},
{"name": "F",
"type": "medBlue",
"id": "004"},
{"name": "G",
"type": "ltBlue",
"id": "005"}
]},
{"name": "C",
"type": "red",
"id" : "006",
"children": [
{"name": "H",
"type": "dkRed",
"id" : "007"},
{"name": "I",
"type": "dkBlue",
"id": "008"}
]},
{"name": "D",
"type": "green",
"id" : "009"}
]}
I would appreciate any help you may be able to offer!
[UPDATE 2017-04-18]
Based on Ian's references I looked into R's data.tree. I can recreate my hierarchy if I restructure my data as shown below. Note that I've lost the type of relation (hasSubcat) between each node, the value of which can vary for each link/edge in real life. I am willing to let that go (for now) if I can get a workable hierarchy. The revised data for data.tree:
df <-read.table(header = TRUE, text = "
paths type id
A colors 001
A/B blue 002
A/B/E dkBlue 005
A/B/F medBlue 006
A/B/G ltBlue 007
A/C red 003
A/C/H dkRed 008
A/C/I medRed 009
A/D green 004
")
myPaths <- as.Node(df, pathName = "paths")
myPaths$leafCount / (myPaths$totalCount - myPaths$leafCount)
print(myPaths, "type", "id", limit = 25)
The print displays the hierarchy I sketched out in the original post and even contains the key:values for each node. Nice!
levelName type id
1 A colors 1
2 ¦--B blue 2
3 ¦ ¦--E dkBlue 5
4 ¦ ¦--F medBlue 6
5 ¦ °--G ltBlue 7
6 ¦--C red 3
7 ¦ ¦--H dkRed 8
8 ¦ °--I medRed 9
9 °--D green 4
Once again I am at loss for how to translate this from the tree to nested JSON. The example here https://ipub.com/data-tree-to-networkd3/ , like most examples, assumes key:value pairs only on leaf nodes, not branch nodes. I think the answer is in creating a nested list to feed into JSONIO or JSONLITE, and I have no idea how to do that.

data.tree is very helpful and probably the better way to accomplish your objective. For fun, I will submit a more roundabout way to achieve your nested JSON using igraph and d3r.
nodes <-read.table(header = TRUE, text = "
ID name type
001 A colors
002 B blue
003 C red
004 D green
005 E dkBlue
006 F medBlue
007 G ltBlue
008 H dkRed
009 I medRed
")
links <- read.table(header = TRUE, text = "
startID relation endID
001 hasSubCat 002
001 hasSubCat 003
001 hasSubCat 004
002 hasSubCat 005
002 hasSubCat 006
002 hasSubCat 007
003 hasSubCat 008
003 hasSubCat 009
")
library(d3r)
library(dplyr)
library(igraph)
# make it an igraph
gf <- graph_from_data_frame(links[,c(1,3,2)],vertices = nodes)
# if we know that this is a tree with root as "A"
# we can do something like this
df_tree <- dplyr::bind_rows(
lapply(
all_shortest_paths(gf,from="A")$res,
function(x){data.frame(t(names(unclass(x))), stringsAsFactors=FALSE)}
)
)
# we can discard the first column
df_tree <- df_tree[,-1]
# then make df_tree[1,1] as 1 (A)
df_tree[1,1] <- "A"
# now add node attributes to our data.frame
df_tree <- df_tree %>%
# let's get the last non-NA in each row so we can join with nodes
mutate(
last_non_na = apply(df_tree, MARGIN=1, function(x){tail(na.exclude(x),1)})
) %>%
# now join with nodes
left_join(
nodes,
by = c("last_non_na" = "name")
) %>%
# now remove last_non_na column
select(-last_non_na)
# use d3r to nest as we would like
nested <- df_tree %>%
d3_nest(value_cols = c("ID", "type"))

Consider walking down the levels iteratively converting dataframe columns to a multi-nested list:
library(jsonlite)
...
df2list <- function(i) as.vector(nodes[nodes$name == i,])
# GRANDPARENT LEVEL
jsonlist <- as.list(nodes[nodes$name=='A',])
# PARENT LEVEL
jsonlist$children <- lapply(c('B','C','D'), function(i) as.list(nodes[nodes$name == i,]))
# CHILDREN LEVEL
jsonlist$children[[1]]$children <- lapply(c('E','F','G'), df2list)
jsonlist$children[[2]]$children <- lapply(c('H','I'), df2list)
toJSON(jsonlist, pretty=TRUE)
However, with this approach, you will notice some internal children of one-length elements are enclosed in brackets. Because R cannot have complex types inside a character vector the entire object must be a list type which output in brackets.
Hence, consider a cleanup of extra brackets with nested gsub which still renders valid json:
output <- toJSON(jsonlist, pretty=TRUE)
gsub('"\\]\n', '"\n', gsub('"\\],\n', '",\n', gsub('": \\["', '": "', output)))
Final Output
{
"ID": "001",
"name": "A",
"type": "colors",
"children": [
{
"ID": "002",
"name": "B",
"type": "blue",
"children": [
{
"ID": "005",
"name": "E",
"type": "dkBlue"
},
{
"ID": "006",
"name": "F",
"type": "medBlue"
},
{
"ID": "007",
"name": "G",
"type": "ltBlue"
}
]
},
{
"ID": "003",
"name": "C",
"type": "red",
"children": [
{
"ID": "008",
"name": "H",
"type": "dkRed"
},
{
"ID": "009",
"name": "I",
"type": "medRed"
}
]
},
{
"ID": "004",
"name": "D",
"type": "green"
}
]
}

a nice, if a bit difficult to wrap one's head around, way of doing this is with a self referential function as in the following...
nodes <- read.table(header = TRUE, colClasses = "character", text = "
ID name type
001 A colors
002 B blue
003 C red
004 D green
005 E dkBlue
006 F medBlue
007 G ltBlue
008 H dkRed
009 I medRed
")
links <- read.table(header = TRUE, colClasses = "character", text = "
startID relation endID
001 hasSubCat 002
001 hasSubCat 003
001 hasSubCat 004
002 hasSubCat 005
002 hasSubCat 006
002 hasSubCat 007
003 hasSubCat 008
003 hasSubCat 009
")
convert_hier <- function(linksDf, nodesDf, sourceId = "startID",
targetId = "endID", nodesID = "ID") {
makelist <- function(nodeid) {
child_ids <- linksDf[[targetId]][which(linksDf[[sourceId]] == nodeid)]
if (length(child_ids) == 0)
return(as.list(nodesDf[nodesDf[[nodesID]] == nodeid, ]))
c(as.list(nodesDf[nodesDf[[nodesID]] == nodeid, ]),
children = list(lapply(child_ids, makelist)))
}
ids <- unique(c(linksDf[[sourceId]], linksDf[[targetId]]))
rootid <- ids[! ids %in% linksDf[[targetId]]]
jsonlite::toJSON(makelist(rootid), pretty = T, auto_unbox = T)
}
convert_hier(links, nodes)
a few notes...
I added colClasses = "character" to your read.table commands so that the ID numbers are not coerced to integers with no leading zeros and so that the strings are not converted into factors.
I wrapped everything in the convert_hier function to make it easier to adapt to other scenarios, but the real magic is in the makelist function.

Related

Powershell - Convert an ordered nested dictionary to html

I need to report the number of VM snapshots based on their age. For this, I constructed an ordered dictionary that contain another dictionary like this (json output):
{
"Less than 24h": {
"Prod": 15,
"Other": 11
},
"1 day": {
"Prod": 29,
"Other": 12
},
"2 days": {
"Prod": 11,
"Other": 0
},
"3 days and more": {
"Prod": 0,
"Other": 0
}
}
I need to convert it to html to be included in a mail.
I find how to convert a "simple" dictionary :
$Body1 = $dict.GetEnumerator() | Select-Object Key,Value | ConvertTo-Html -Fragment | Out-String
$Body1 = $Body1.Replace('Key','Days').Replace('Value','Number of Snapshot')
And that work fine, but not if the values are nested dictionaries.
For nested dictionary the output will be like this :
| Days | Number of Snapshot |
|-----------------|------------------------------------------------|
| Less than 24h |`System.Collections.Specialized.OrderedDictionary`|
| 1 day |`System.Collections.Specialized.OrderedDictionary`|
| 2 days |`System.Collections.Specialized.OrderedDictionary`|
| 3 days and more |`System.Collections.Specialized.OrderedDictionary`|
Is there a way to have a html output like this?
| Days | Prod | Other |
|-----------------|------|-------|
| Less than 24h | 15 | 11 |
| 1 day | 29 | 12 |
| 2 days | 11 | 0 |
| 3 days and more | 0 | 0 |
One option is to pre-process your dictionary and convert it into something else that can be turned into html a bit easier:
$data = $dict.GetEnumerator() | foreach-object {
new-object pscustomobject -property ([ordered] #{
"Days" = $_.Key
"Prod" = $_.Value.Prod
"Other" = $_.Value.Other
})
}
This will flatten your nested structure into this json-equivalent:
[
{
"Days": "2 days",
"Prod": 11,
"Other": 0
},
{
"Days": "1 day",
"Prod": 29,
"Other": 12
},
{
"Days": "Less than 24h",
"Prod": 15,
"Other": 11
},
{
"Days": "3 days and more",
"Prod": 0,
"Other": 0
}
]
And then you can just convert the whole thing in one go without -Fragment:
$Body1 = $data | ConvertTo-Html
which gives:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>HTML TABLE</title>
</head><body>
<table>
<colgroup><col/><col/><col/></colgroup>
<tr><th>Days</th><th>Prod</th><th>Other</th></tr>
<tr><td>2 days</td><td>11</td><td>0</td></tr>
<tr><td>1 day</td><td>29</td><td>12</td></tr>
<tr><td>Less than 24h</td><td>15</td><td>11</td></tr>
<tr><td>3 days and more</td><td>0</td><td>0</td></tr>
</table>
</body></html>
or
Days
Prod
Other
2 days
11
0
1 day
29
12
Less than 24h
15
11
3 days and more
0
0

Problem with retrieving a table from Chinook dataset due to character

I have created a database in MySQL with data from the Chinook dataset, which has fictitious information on customers that buy music.
One of the tables ("Invoice"), has the billing addresses, which has characters in diverse languages:
InvoiceId CustomerId InvoiceDate BillingAddress
1 2 2009-01-01 00:00:00 Theodor-Heuss-Straße 34
2 4 2009-01-02 00:00:00 Ullevålsveien 14
3 8 2009-01-03 00:00:00 Grétrystraat 63
4 14 2009-01-06 00:00:00 8210 111 ST NW
I tried to retrieve the data using R, with the following code:
library(DBI)
library(RMySQL)
library(dplyr)
library(magrittr)
library(lubridate)
library(stringi)
# Step 1 - Connect to the database ----------------------------------------
con <- DBI::dbConnect(MySQL(),
dbname = Sys.getenv("DB_CHINOOK"),
host = Sys.getenv("HST_CHINOOK"),
user = Sys.getenv("USR_CHINOOK"),
password = Sys.getenv("PASS_CHINOOK"),
port = XXXX)
invoices_tbl <- tbl(con, "Invoice") %>%
collect()
The connection is ok, but when trying to visualize the data, I can't see the special characters:
> head(invoices_tbl[,1:4])
# A tibble: 6 x 4
InvoiceId CustomerId InvoiceDate BillingAddress
<int> <int> <chr> <chr>
1 1 2 2009-01-01 00:00:00 "Theodor-Heuss-Stra\xdfe 34"
2 2 4 2009-01-02 00:00:00 "Ullev\xe5lsveien 14"
3 3 8 2009-01-03 00:00:00 "Gr\xe9trystraat 63"
4 4 14 2009-01-06 00:00:00 "8210 111 ST NW"
5 5 23 2009-01-11 00:00:00 "69 Salem Street"
6 6 37 2009-01-19 00:00:00 "Berger Stra\xdfe 10"
My question is, should I change something in the configuration inside MySQL? Or is it an issue with R? How can I see the special characters? What is the meaning of \xdfe?
Please, any help will be greatly appreciated.
The hexadecimal format can be converted with iconv
invoices_tbl$BillingAddress <- iconv(invoices_tbl$BillingAddress,
"latin1", "utf-8")
-output
invoices_tbl
InvoiceId CustomerId InvoiceDate BillingAddress
1 1 2 2009-01-01 00:00:00 Theodor-Heuss-Straße 34
2 2 4 2009-01-02 00:00:00 Ullevålsveien 14
3 3 8 2009-01-03 00:00:00 Grétrystraat 63
4 4 14 2009-01-06 00:00:00 8210 111 ST NW
5 5 23 2009-01-11 00:00:00 69 Salem Street
6 6 37 2009-01-19 00:00:00 Berger Straße 10
data
invoices_tbl <- structure(list(InvoiceId = 1:6, CustomerId = c(2L, 4L, 8L, 14L,
23L, 37L), InvoiceDate = c("2009-01-01 00:00:00", "2009-01-02 00:00:00",
"2009-01-03 00:00:00", "2009-01-06 00:00:00", "2009-01-11 00:00:00",
"2009-01-19 00:00:00"), BillingAddress = c("Theodor-Heuss-Stra\xdfe 34",
"Ullev\xe5lsveien 14", "Gr\xe9trystraat 63", "8210 111 ST NW",
"69 Salem Street", "Berger Stra\xdfe 10")), row.names = c("1",
"2", "3", "4", "5", "6"), class = "data.frame")

How can I summarize multiple string values by a column value in R?

OK, so this question isn't as simple as the title may sound. I've got a table that's structued like this:
| Brand | First Name | Last Name | Amount | e-mail |
|-------|------------|-----------|---------|---------------------|
| A | John | Smith | 920 USD | johnsmith#email.com |
| A | Mary | Smith | 650 USD | johnsmith#email.com |
| A | Margaret | Smith | 400 USD | johnsmith#email.com |
| B | Eric | Davis | 120 USD | jdavis#email.com |
| B | Wanda | Davis | 500 USD | jdavis#email.com |
| B | Jean | Davis | 300 USD | jdavis#email.com |
| A | Daniel | Barnes | 400 USD | dbarnes#email.com |
What I'm ultimately trying to do is generate emails to be sent to inform customers of their credit balance, and in the above example, I'd like to send one email to johnsmith#email.com that says something like "You have credits with Brand A. John Smith has 920 USD, Mary Smith has 650 USD, Margaret Smith has 400 USD."
I don't need to get all the way there with this question, but what I would like to do is have one row for each e-mail which somehow includes the information for each row with that email. Maybe some kind of generated concatenated field? It seems simple in theory, but in practice I'm having a tough time coming up with how exactly to do this in R. Any help would be much appreciated!
Bonus: I'm also fairly experienced with MySQL, so if there's a better way to do it in SQL, that'd be great!
Edit: Dput output (with names and emails edited)
structure(list(BRAND = c("R", "C", "C", "C", "C", "R", "R", "C",
"C", "C"), GUEST_S_LAST_NAME = c("Stockman", "Ericson", "Ericson",
"Alcin", "Andrews", "Smith", "Smith", "Brown", "Brown", "Brown"
), GUEST_S_FIRST_NAME = c("Margaret", "Abraham", "Naomi", "Dina",
"Arthur", "Laura", "Alan", "Gregory", "Marina", "Viktoria"),
COMPENSATIONAMOUNT_OR_PERCENT = c("920 USD", "1363 USD",
"1363 USD", "452 USD", "452 USD", "250 USD", "250 USD", "1019 USD",
"1019 USD", "323 USD"), EXPIRATION_DATE = c("04/30/2022 12:00:00 00 am",
"12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am",
"12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am",
"04/30/2022 12:00:00 00 am", "04/30/2022 12:00:00 00 am",
"12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am",
"12/31/2021 12:00:00 00 am"), EMAIL = c("email1#email.com",
"email2#email.com", "email2#email.com", "email3#email.com",
"email3#email.com", "email4#email.com", "email4#email.com",
"email5#email.com", "email5#email.com", "email5#email.com"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
Here is my approach with dplyr:
library(dplyr)
your_data %>%
group_by(BRAND, EMAIL) %>%
summarize(text = paste0(
sprintf("You have credits with Brand %s. ", BRAND),
paste(sprintf("%s %s has %s",
GUEST_S_FIRST_NAME,
GUEST_S_LAST_NAME,
COMPENSATIONAMOUNT_OR_PERCENT),
collapse = ", "), "."))
Returns:
# A tibble: 10 x 3
# Groups: BRAND, EMAIL [5]
BRAND EMAIL text
<chr> <chr> <chr>
1 C email2#email… You have credits with Brand C. Abraham Ericson has 1363 …
2 C email2#email… You have credits with Brand C. Abraham Ericson has 1363 …
3 C email3#email… You have credits with Brand C. Dina Alcin has 452 USD, A…
4 C email3#email… You have credits with Brand C. Dina Alcin has 452 USD, A…
5 C email5#email… You have credits with Brand C. Gregory Brown has 1019 US…
6 C email5#email… You have credits with Brand C. Gregory Brown has 1019 US…
7 C email5#email… You have credits with Brand C. Gregory Brown has 1019 US…
8 R email1#email… You have credits with Brand R. Margaret Stockman has 920…
9 R email4#email… You have credits with Brand R. Laura Smith has 250 USD, …
10 R email4#email… You have credits with Brand R. Laura Smith has 250 USD, …
# Data used:
your_data <- structure(list(BRAND = c("R", "C", "C", "C", "C", "R", "R", "C", "C", "C"), GUEST_S_LAST_NAME = c("Stockman", "Ericson", "Ericson", "Alcin", "Andrews", "Smith", "Smith", "Brown", "Brown", "Brown"), GUEST_S_FIRST_NAME = c("Margaret", "Abraham", "Naomi", "Dina", "Arthur", "Laura", "Alan", "Gregory", "Marina", "Viktoria"), COMPENSATIONAMOUNT_OR_PERCENT = c("920 USD", "1363 USD", "1363 USD", "452 USD", "452 USD", "250 USD", "250 USD", "1019 USD", "1019 USD", "323 USD"), EXPIRATION_DATE = c("04/30/2022 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "04/30/2022 12:00:00 00 am", "04/30/2022 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am"), EMAIL = c("email1#email.com", "email2#email.com", "email2#email.com", "email3#email.com", "email3#email.com", "email4#email.com", "email4#email.com", "email5#email.com", "email5#email.com", "email5#email.com")), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

Convert JSON to data.frame with more than 2 columns

I am trying to properly convert a JSON to a data.frame with 3 columns.
This is a simplification of my data
# simplification of my real data
my_data <- '{"Bag 1": [["bananas", 1], ["oranges", 2]],"Bag 2": [["bananas", 3], ["oranges", 4], ["apples", 5]]}'
library(jsonlite)
my_data <- fromJSON(my_data)
> my_data
$`Bag 1`
[,1] [,2]
[1,] "bananas" "1"
[2,] "oranges" "2"
$`Bag 2`
[,1] [,2]
[1,] "bananas" "3"
[2,] "oranges" "4"
[3,] "apples" "5"
I try to convert that to a data.frame
# this return an error about "arguments imply differing number of rows: 2, 3"
my_data <- as.data.frame(my_data)
> my_data <- as.data.frame(my_data)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 2, 3
This is my solution to create the data.frame
# my solution
my_data <- data.frame(fruit = do.call(c, my_data),
bag_number = rep(1:length(my_data),
sapply(my_data, length)))
# how it looks
my_data
> my_data
fruit bag_number
Bag 11 bananas 1
Bag 12 oranges 1
Bag 13 1 1
Bag 14 2 1
Bag 21 bananas 2
Bag 22 oranges 2
Bag 23 apples 2
Bag 24 3 2
Bag 25 4 2
Bag 26 5 2
But my idea is to obtain something like this to avoid problems like doing my_data[a:b,1] when I want to use ggplot2 and others.
fruit | quantity | bag_number
oranges | 2 | 1
bananas | 1 | 1
oranges | 4 | 2
bananas | 3 | 2
apples | 5 | 2
library(plyr)
# import data (note that the rJSON package does this differently than the jsonlite package)
data.import <- jsonlite::fromJSON(my_data)
# combine all data using plyr
df <- ldply(data.import, rbind)
# clean up column names
colnames(df) <- c('bag_number', 'fruit', 'quantity')
bag_number fruit quantity
1 Bag 1 bananas 1
2 Bag 1 oranges 2
3 Bag 2 bananas 3
4 Bag 2 oranges 4
5 Bag 2 apples 5
purrr / tidyverse version. You also get proper types with this and rid of "Bag":
library(jsonlite)
library(purrr)
library(readr)
fromJSON(my_data, flatten=TRUE) %>%
map_df(~as.data.frame(., stringsAsFactors=FALSE), .id="bag") %>%
type_convert() %>%
setNames(c("bag_number", "fruit", "quantity")) -> df
df$bag_number <- gsub("Bag ", "", df$bag_number)

search in json values from mysql tables

I have some data like :
id name ccode json
1 john 231 {"age": 12,"score": 90}
2 danny 231 {"age": 22,"score": 87}
3 danniel 231 {"age": 18,"score": 48}
4 sara 431 {"age": 16,"score": 67}
now, i want get all fields of all users that they ages are between 15 to 24 and they ccode is 231.
result must be something like :
2 danny 231 {"age": 22,"score": 87}
3 danniel 231 {"age": 18,"score": 48}
you can use the following query ,
select id,name,ccode,json, CAST(SUBSTRING(SUBSTRING_INDEX(json, ',', 1) FROM 8) AS UNSIGNED) as val
from events
where ccode=231 having val>15 and val<24;