Find full paths to all leaf nodes? - json

I'm using QuickGraph to create a graph of (P)roducts and the properties associated in them, (T)ype, (S)ubtype and (F)requency.
So in this example, I have 2 products P1 & P2:
P1 is assigned properties T1, S1 & S2, F1
P2 is assigned properties T1, S1, F1 & F2
The directed, unweighted graph looks like this:
Is there any way to use this generate a JSON object to hold the full paths to all products? Something like:
{T1: [
{S1: [
{F1: [
P1,P2
]},
{F2: [
P2
]}
]},
{S2: [
{F1: [
P1
]}
]}
]}
I initially looked at DepthFirstSearchAlgorithm and its DiscoverVertex event, which will walk the graph by depth but the this event is only triggered when a new vertex is discovered so I get T1,S1,F1,P1,P2,F2,S2.
Any help appreciated.

Related

Table from nested list, struct

I have this json data:
consumption_json = """
{
"count": 48,
"next": null,
"previous": null,
"results": [
{
"consumption": 0.063,
"interval_start": "2018-05-19T00:30:00+0100",
"interval_end": "2018-05-19T01:00:00+0100"
},
{
"consumption": 0.071,
"interval_start": "2018-05-19T00:00:00+0100",
"interval_end": "2018-05-19T00:30:00+0100"
},
{
"consumption": 0.073,
"interval_start": "2018-05-18T23:30:00+0100",
"interval_end": "2018-05-18T00:00:00+0100"
}
]
}
"""
and I would like to covert the results list to an Arrow table.
I have managed this by first converting it to python data structure, using python's json library, and then converting that to an Arrow table.
import json
consumption_python = json.loads(consumption_json)
results = consumption_python['results']
table = pa.Table.from_pylist(results)
print(table)
pyarrow.Table
consumption: double
interval_start: string
interval_end: string
----
consumption: [[0.063,0.071,0.073]]
interval_start: [["2018-05-19T00:30:00+0100","2018-05-19T00:00:00+0100","2018-05-18T23:30:00+0100"]]
interval_end: [["2018-05-19T01:00:00+0100","2018-05-19T00:30:00+0100","2018-05-18T00:00:00+0100"]]
But, for reasons of performance, I'd rather just use pyarrow exclusively for this.
I can use pyarrow's json reader to make a table.
reader = pa.BufferReader(bytes(consumption_json, encoding='ascii'))
table_from_reader = pa.json.read_json(reader)
And 'results' is a struct nested inside a list. (Actually, everything seems to be nested).
print(table_from_reader['results'].type)
list<item: struct<consumption: double, interval_start: timestamp[s], interval_end: timestamp[s]>>
How do I turn this into a table directly?
following this https://stackoverflow.com/a/72880717/3617057
I can get closer...
import pyarrow.compute as pc
flat = pc.list_flatten(table_from_reader["results"])
print(flat)
[
-- is_valid: all not null
-- child 0 type: double
[
0.063,
0.071,
0.073
]
-- child 1 type: timestamp[s]
[
2018-05-18 23:30:00,
2018-05-18 23:00:00,
2018-05-18 22:30:00
]
-- child 2 type: timestamp[s]
[
2018-05-19 00:00:00,
2018-05-18 23:30:00,
2018-05-17 23:00:00
]
]
flat is a ChunkedArray whose underlying arrays are StructArray. To convert it to a table, you need to convert each chunks to a RecordBatch and concatenate them in a table:
pa.Table.from_batches(
[
pa.RecordBatch.from_struct_array(s)
for s in flat.iterchunks()
]
)
If flat is just a StructArray (not a ChunkedArray), you can call:
pa.Table.from_batches(
[
pa.RecordBatch.from_struct_array(flat)
]
)

Analysing and formatting JSON using PostgreSQL

I have a table called api_details where i dump the below JSON value into the JSON column raw_data.
Now i need to make a report from this JSON string and the expected output is something like below,
action_name. sent_timestamp Sent. Delivered
campaign_2475 1600416865.928737 - 1601788183.440805. 7504. 7483
campaign_d_1084_SUN15_ex 1604220248.153903 - 1604222469.087918. 63095. 62961
Below is the sample JSON OUTPUT
{
"header": [
"#0 action_name",
"#1 sent_timestamp",
"#0 Sent",
"#1 Delivered"
],
"name": "campaign - lifetime",
"rows": [
[
"campaign_2475",
"1600416865.928737 - 1601788183.440805",
7504,
7483
],
[
"campaign_d_1084_SUN15_ex",
"1604220248.153903 - 1604222469.087918",
63095,
62961
],
[
"campaign_SUN15",
"1604222469.148829 - 1604411016.029794",
63303,
63211
]
],
"success": true
}
I tried like below, but is not getting the results.I can do it using python by lopping through all the elements in row list.
But is there an easy solution in PostgreSQL(version 11).
SELECT raw_data->'rows'->0
FROM api_details
You can use JSONB_ARRAY_ELEMENTS() function such as
SELECT (j.value)->>0 AS action_name,
(j.value)->>1 AS sent_timestamp,
(j.value)->>2 AS Sent,
(j.value)->>3 AS Delivered
FROM api_details
CROSS JOIN JSONB_ARRAY_ELEMENTS(raw_data->'rows') AS j
Demo
P.S. in this case the data type of raw_data is assumed to be JSONB, otherwise the argument within the function raw_data->'rows' should be replaced with raw_data::JSONB->'rows' in order to perform explicit type casting.

Which R object does return this JSON structure?

After R to JSON conversion, this should be the output:
{
"alpha": [100,120,140,150,160],
"beta": [0.6, 1, 1.5, 2],
"gamma": [
[
0.018429082998491217,
-0.1973461380810494,
0.6373366343601572,
0.1533790888325718,
0.014712015654254968
],
[
0.012075950866910893,
-0.14585424179257,
0.6591589092698342,
0.2571689477155383,
0.010925520086793088
],
[
0.0159193430322232,
-0.146917626129837,
0.4710901890006199,
0.15728143658310957,
0.012566273548505473
],
[
0.017317835334994967,
-0.1549043092753231,
0.4882454969264185,
0.1300951912298256,
0.013437976685378085
]
]
}
This describes a matrix: alpha and beta are arrays which index the matrix depicted below by columns.
rjson::toJSON() function from rjson takes vector or list. However, it doesn't split an R matrix (with named rows and columns) in arrays; instead, it generates an array of row values and then names each column by its column name.
I cannot really figure out which R data structure allows producing such a file format.
Could you show me the R code that uses rjson::toJSON() function and generates that output?

get information from dataset (json format) in r

I created a datatable from mongodb collection. Data in this datatable is in JSON format but I cant get to extract the information from it..
{"place":{"bounding_box":{
"type":"Polygon",
"coordinates":[
[
[
-119.932568,
36.648905
],
[
-119.632419,
36.648905
]
]
]
}}}
I need the first two values of the coordinates: lat = 36.648905 and lon = -119.932568
But cant seems to extract that info:
my_lon <- myBigDF$place.bounding_box.coordinates[1[1[1]]]
I have tried few combination but I'm always getting NULL.
Thank you for any help..
--EDIT-- Including the code on how I'm connecting to db and creating dataframe from it..
mongo <- mongo.create(host="localhost" , db="mydb")
library(plyr)
## create the empty data frame
myDF = data.frame(stringsAsFactors = FALSE)
## create the cursor we will iterate over, basically a select * in SQL
cursor = mongo.find(mongo, namespace)
## create the counter
i = 1
## iterate over the cursor
while (mongo.cursor.next(cursor)) {
# iterate and grab the next record
tmp = mongo.bson.to.list(mongo.cursor.value(cursor))
# make it a dataframe
tmp.df = as.data.frame(t(unlist(tmp)), stringsAsFactors = F)
# bind to the master dataframe
myDF = rbind.fill(myDF, tmp.df)
}
It's hard to tell exactly how you are going from the JSON string to an R object. There are different libraries that parse thing differently. If I assume for a moment use "rjson", then you would have something like
x <- rjson::fromJSON('{"place":{"bounding_box":{ "type":"Polygon", "coordinates":[ [ [ -119.932568, 36.648905 ], [ -119.632419, 36.648905 ] ] ] }}}')
And because your data seems to have an excessive number of square brackets, things are a bit messy. You can get to the coordinates section with
x$place$bounding_box$coordinates
# [1]]
# [[1]][[1]]
# [1] -119.9326 36.6489
#
# [[1]][[2]]
# [1] -119.6324 36.6489
which is a list of lists of vectors. To make a nice matrix of lat/long coordinates you can do
do.call(rbind, x$place$bounding_box$coordinates[[1]])

R list(structure(list())) to data frame

I have a JSON data source providing a list of hashes:
[
{ "a": "foo",
"b": "sdfshk"
},
{ "a": "foo",
"b": "ihlkyhul"
}
]
I use fromJSON() in the rjson package to convert that to an R data structure. It returns:
list(
structure(list(a = "foo", b = "sdfshk"), .Names = c("a", "b")),
structure(list(a = "foo", b = "ihlkyhul"), .Names = c("a", "b"))
)
I need to get this into an R data frame, but data.frame() turns that into a single-row data frame with four columns instead of a 2x2 data frame as expected. I lack the R-fu to do the transform from one to the other, though it looks like it should be straightforward.
Bonus points:
The actual problem is a bit more complex, because the JSON data source isn't as regular as I show above. The objects it returns vary in type. That is, the field set in each can be one of a few different types:
[
{ "a": "foo",
"b": "asdfhalsdhfla"
},
{ "a": "bar",
"c": "akjdhflakjhsdlfkah",
"d": "jfhglskhfglskd",
},
{ "a": "foo",
"b": "dfhlkhldsfg"
}
]
As you can see, the "a" field in each object is a type tag, indicating which other fields the object will have.
I'm not too particular how the solution copes with this.
It wouldn't be horrible if the two object types were just mooshed together, so you get columns a, b, c, and d, and the rows simply have N/A or NULL values where the JSON source object doesn't have a value for a given field. I believe I can clean the resulting data frame with subset(df, a == "foo"). I'll end up with some empty columns that way, but it won't matter to my program.
It would be better if the solution provides a way to select which JSON source rows go into the data frame and which get rejected, so the result has only the columns and rows actually required.
If you have a jagged list you want converted to a data.frame, you could use Hadley's plyr's rbind.fill. Saved my neck on a couple of occasions. Let me know if this is what you're looking for. Notice that I modified your first example to include only "b" in the third element to make it jagged.
> x <- list(
+ structure(list(a = "foo", b = "sdfshk"), .Names = c("a", "b")),
+ structure(list(a = "foo", b = "ihlkyhul"), .Names = c("a", "b")),
+ structure(list(b = "asdf"), .Names = "b")
+ )
>
> library(plyr)
> do.call("rbind.fill", lapply(x, as.data.frame))
a b
1 foo sdfshk
2 foo ihlkyhul
3 <NA> asdf