What is the difference between #> and ->> operator in PostgreSQL? - json

We can access any JSON element in PostgreSQL 9.3 using the -> and ->> operators. Seems to me that the #> along with #>> only provide a shorter form of writing the JSON path. Or is there a bigger picture behind the #> operator? Does it serve a special purpose/provide any advantage over the arrow notation? Which one is the preffered method of writing paths?
It all comes to the question: why should I use the #> and #>> operator over the -> and ->>?
The docs are a bit enigmatic about this.
Both queries below give the same result:
=> select '{"a":[1,2,3],"b":[4,5,6]}'::json#>'{a,2}';
?column?
----------
3
=> select '{"a":[1,2,3],"b":[4,5,6]}'::json->'a'->>2;
?column?
----------
3

Consider nesting.
{
"a" : {
"b" : {
"c" : 1,
"d" : 2
}
}
}
Imagine that you have a json document, and you don't know in advance how it will be nested. If you knew you needed a three level path, you could write:
SELECT '{
"a" : {
"b" : {
"c" : 1,
"d" : 2
}
}
}'::json -> 'a' -> 'b' -> 'c';
but what if you want to write a query that doesn't know that in advance? That's where the path-based operators are useful; the path can be supplied along with the document, and there's no longer any assumption about the document structure in the query.
SELECT '{
"a" : {
"b" : {
"c" : 1,
"d" : 2
}
}
}'::json #>> ARRAY['a','b','c']

Related

Postgresql JsonPath: how to identify an array? [duplicate]

I'm struggling with JSONB_PATH_EXISTS Postgres function
I'm using PG 12 and following this documentation : https://www.postgresql.org/docs/12/functions-json.html
With the following request (test it on DBFiddle: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=d5aa984182852438c6f71cf5fa70324e) :
select
json
from (
select '{
"fields": {
"foo": true,
"number": 3,
"listnb": [3, 4],
"listenb2": ["3", "4"],
"txt": "hello how are you",
"listtxt": ["hello","how","are", "you", "3"],
"nullval": null
}
}'::jsonb as json
) t
where 1=1
-- Works with 'strict'
AND JSONB_PATH_EXISTS(json -> 'fields' -> 'listtxt', 'strict $ ? (#.type() == "array")')
-- Doesn't work without 'strict'. Why ?
--AND JSONB_PATH_EXISTS(json -> 'fields' -> 'listtxt', '$ ? (#.type() == "array")')
-- Can't add a nested condition on an array element value (syntax error)
--AND JSONB_PATH_EXISTS(json -> 'fields' -> 'listtxt', 'strict $ ? (#.type() == "array" && #[*] ? (# == "how"))')
;
#1 - I can't get type() function work without strict mode
It could be related to the lax mode unwrapping arrays automatically, but the documentation explicitly states that it is not done when type() function is called :
The lax mode facilitates matching of a JSON document structure and path expression if the JSON data does not conform to the expected schema. [...] Automatic unwrapping is not performed only when:
The path expression contains type() or size() methods that return the type and the number of elements in the array, respectively.
[...]
So I don't understand why we have a difference in the result
#2 I can't get the nested condition work (3rd AND in the sample request)
According to the examples in the documentation, the syntax looks OK but I have a syntax error that I don't understand.
Thank you for your help
If you pass the complete JSON value to the function, then the following works:
where jsonb_path_exists(json, '$ ? (#.fields.listtxt.type() == "array")')
However I would probably simply use jsonb_typeof() without a path query
where jsonb_typeof(json -> 'fields' -> 'listtxt') = 'array'

Postgres : can't make JSONB_PATH_EXISTS work correctly

I'm struggling with JSONB_PATH_EXISTS Postgres function
I'm using PG 12 and following this documentation : https://www.postgresql.org/docs/12/functions-json.html
With the following request (test it on DBFiddle: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=d5aa984182852438c6f71cf5fa70324e) :
select
json
from (
select '{
"fields": {
"foo": true,
"number": 3,
"listnb": [3, 4],
"listenb2": ["3", "4"],
"txt": "hello how are you",
"listtxt": ["hello","how","are", "you", "3"],
"nullval": null
}
}'::jsonb as json
) t
where 1=1
-- Works with 'strict'
AND JSONB_PATH_EXISTS(json -> 'fields' -> 'listtxt', 'strict $ ? (#.type() == "array")')
-- Doesn't work without 'strict'. Why ?
--AND JSONB_PATH_EXISTS(json -> 'fields' -> 'listtxt', '$ ? (#.type() == "array")')
-- Can't add a nested condition on an array element value (syntax error)
--AND JSONB_PATH_EXISTS(json -> 'fields' -> 'listtxt', 'strict $ ? (#.type() == "array" && #[*] ? (# == "how"))')
;
#1 - I can't get type() function work without strict mode
It could be related to the lax mode unwrapping arrays automatically, but the documentation explicitly states that it is not done when type() function is called :
The lax mode facilitates matching of a JSON document structure and path expression if the JSON data does not conform to the expected schema. [...] Automatic unwrapping is not performed only when:
The path expression contains type() or size() methods that return the type and the number of elements in the array, respectively.
[...]
So I don't understand why we have a difference in the result
#2 I can't get the nested condition work (3rd AND in the sample request)
According to the examples in the documentation, the syntax looks OK but I have a syntax error that I don't understand.
Thank you for your help
If you pass the complete JSON value to the function, then the following works:
where jsonb_path_exists(json, '$ ? (#.fields.listtxt.type() == "array")')
However I would probably simply use jsonb_typeof() without a path query
where jsonb_typeof(json -> 'fields' -> 'listtxt') = 'array'

Deep JSON query with partial path in MySQL 5.7?

Given a number of JSON document like this:
{
id: some_id,
l1: {
f1: [
{
c1: foo,
c2: bar
},
{
c1: foo1,
c2: bar1
}
],
f2: [
{
c3: baz,
c4: bar
}
]
}
}
How can I query MySQL 5.7 for f1....c1: foo1 -- ie lX is not given nor is the list position of the c1-c2 subdocument.
This is not a duplicate of Deep JSON query with partial path in PGSQL JSONB? since that is about PostgreSQL and this one is about MySQL.
This should do it:
SELECT JSON_CONTAINS(JSON_EXTRACT(Doc, '$.*.f1[*].c1'), '"foo1"') FROM table;
If you're using 5.7.9 or later, you can replace the JSON_EXTRACT function with the -> operator:
SELECT JSON_CONTAINS(Doc->'$.*.f1[*].c1', '"foo1"') FROM table;

Dataframe in R to be converted to sequence of JSON objects

I had asked the same question after editing 2 times of a previous question I had posted. I am sorry for the bad usage of this website. I have flagged it for deletion and I am posting a proper new question on the same here. Please look into this.
I am basically working on a recommender system code. The output has to be converted to sequence of JSON objects. I have a matrix that has a look up table for every item ID, with the list of the closest items it is related to and the the similarity scores associated with their combinations.
Let me explain through a example.
Suppose I have a matrix
In the below example, Item 1 is similar to Items 22 and 23 with similarity scores 0.8 and 0.5 respectively. And the remaining rows follow the same structure.
X1 X2 X3 X4 X5
1 22 23 0.8 0.5
34 4 87 0.4 0.4
23 7 92 0.6 0.5
I want a JSON structure for every item (every X1 for every row) along with the recommended items and the similarity scores for each combination as a separate JSON entity and this being done in sequence. I don't want an entire JSON object containing these individual ones.
Assume there is one more entity called "coid" that will be given as input to the code. I assume it is XYZ and it is same for all the rows.
{ "_id" : { "coid" : "XYZ", "iid" : "1"}, "items" : [ { "item" : "22", "score" : 0.8},{ "item": "23", "score" : 0.5}] }
{ "_id" : { "coid" : "XYZ", "iid" : "34"},"items" : [ { "item" : "4", "score" : 0.4},{ "item": "87", "score" : 0.4}] }
{ "_id" : { "coid" : "XYZ", "iid" : "23"},"items" : [ { "item" : "7", "score" : 0.6},{ "item": "92", "score" : 0.5}] }
As in the above, each entity is a valid JSON structure/object but they are not put together into a separate JSON object as a whole.
I appreciate all the help done for the previous question but somehow I feel this new alteration I have here is not related to them because in the end, if you do a toJSON(some entity), then it converts the entire thing to one JSON object. I don't want that.
I want individual ones like these to be written to a file.
I am very sorry for my ignorance and inconvenience. Please help.
Thanks.
library(rjson)
## Your matrix
mat <- matrix(c(1,34,23,
22, 4, 7,
23,87,92,
0.8, 0.4, 0.6,
0.5, 0.4, 0.5), byrow=FALSE, nrow=3)
I use a function (not very interesting name makejson) that takes a row of the matrix and returns a JSON object. It makes two list objects, _id and items, and combines them to a JSON object
makejson <- function(x, coid="ABC") {
`_id` <- list(coid = coid, iid=x[1])
nitem <- (length(x) - 1) / 2 # Number of items
items <- list()
for(i in seq(1, nitem)) {
items[[i]] <- list(item = x[i + 1], score = x[i + 1 + nitem])
}
toJSON(list(`_id`=`_id`, items=items))
}
Then using apply (or a for loop) I use the function for each row of the matrix.
res <- apply(mat, 1, makejson, coid="XYZ")
cat(res, sep = "\n")
## {"_id":{"coid":"XYZ","iid":1},"items":[{"item":22,"score":0.8},{"item":23,"score":0.5}]}
## {"_id":{"coid":"XYZ","iid":34},"items":[{"item":4,"score":0.4},{"item":87,"score":0.4}]}
## {"_id":{"coid":"XYZ","iid":23},"items":[{"item":7,"score":0.6},{"item":92,"score":0.5}]}
The result can be saved to a file with cat by specifying the file argument.
## cat(res, sep="\n", file="out.json")
There is a small difference in your output and mine, the numbers are in quotes ("). If you want to have it like that, mat has to be character.
## mat <- matrix(as.character(c(1,34,23, ...
Hope it helps,
alex

Ragged list or data frame to JSON

I am trying to create a ragged list in R that corresponds to the D3 tree structure of flare.json. My data is in a data.frame:
path <- data.frame(P1=c("direct","direct","organic","direct"),
P2=c("direct","direct","end","end"),
P3=c("direct","organic","",""),
P4=c("end","end","",""), size=c(5,12,23,45))
path
P1 P2 P3 P4 size
1 direct direct direct end 5
2 direct direct organic end 12
3 organic end 23
4 direct end 45
but it could also be a list or reshaped if necessary:
path <- list()
path[[1]] <- list(name=c("direct","direct","direct","end"),size=5)
path[[2]] <- list(name=c("direct","direct","organic","end"), size=12)
path[[3]] <- list(name=c("organic", "end"), size=23)
path[[4]] <- list(name=c("direct", "end"), size=45)
The desired output is:
rl <- list()
rl <- list(name="root", children=list())
rl$children[1] <- list(list(name="direct", children=list()))
rl$children[[1]]$children[1] <- list(list(name="direct", children=list()))
rl$children[[1]]$children[[1]]$children[1] <- list(list(name="direct", children=list()))
rl$children[[1]]$children[[1]]$children[[1]]$children[1] <- list(list(name="end", size=5))
rl$children[[1]]$children[[1]]$children[2] <- list(list(name="organic", children=list()))
rl$children[[1]]$children[[1]]$children[[2]]$children[1] <- list(list(name="end", size=12))
rl$children[[1]]$children[2] <- list(list(name="end", size=23))
rl$children[2] = list(list(name="organic", children=list()))
rl$children[[2]]$children[1] <- list(list(name="end", size=45))
So when I print to json it's:
require(RJSONIO)
cat(toJSON(rl, pretty=T))
{
"name" : "root",
"children" : [
{
"name" : "direct",
"children" : [
{
"name" : "direct",
"children" : [
{
"name" : "direct",
"children" : [
{
"name" : "end",
"size" : 5
}
]
},
{
"name" : "organic",
"children" : [
{
"name" : "end",
"size" : 12
}
]
}
]
},
{
"name" : "end",
"size" : 23
}
]
},
{
"name" : "organic",
"children" : [
{
"name" : "end",
"size" : 45
}
]
}
]
}
I am having a hard time wrapping my head around the recursive steps that are necessary to create this list structure in R. In JS I can pretty easily move around the nodes and at each node determine whether to add a new node or keep moving down the tree by using push as needed, eg: new = {"name": node, "children": []}; or new = {"name": node, "size": size}; as in this example. I tried to split the data.frame as in this example:
makeList<-function(x){
if(ncol(x)>2){
listSplit<-split(x,x[1],drop=T)
lapply(names(listSplit),function(y){list(name=y,children=makeList(listSplit[[y]]))})
} else {
lapply(seq(nrow(x[1])),function(y){list(name=x[,1][y],size=x[,2][y])})
}
}
jsonOut<-toJSON(list(name="root",children=makeList(path)))
but it gives me an error
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?
The function given in the linked Q&A is essentially what you need, however it was failing on your data set because of the null values for some rows in the later columns. Instead of just blindly repeating the recursion until you run out of columns, you need to check for your "end" value, and use that to switch to making leaves:
makeList<-function(x){
listSplit<-split(x[-1],x[1], drop=TRUE);
lapply(names(listSplit),function(y){
if (y == "end") {
l <- list();
rows = listSplit[[y]];
for(i in 1:nrow(rows) ) {
l <- c(l, list(name=y, size=rows[i,"size"] ) );
}
l;
}
else {
list(name=y,children=makeList(listSplit[[y]]))
}
});
}
I believe this does what you want, though it has some limitations. In particular, it is assumed that every branch in your network is unique (i.e. there can't be two rows in your data frame that are equal for every column other than size):
df.split <- function(p.df) {
p.lst.tmp <- unname(split(p.df, p.df[, 1]))
p.lst <- lapply(
p.lst.tmp,
function(x) {
if(ncol(x) == 2L && nrow(x) == 1L) {
return(list(name=x[1, 1], size=unname(x[, 2])))
} else if (isTRUE(is.na(unname(x[ ,2])))) {
return(list(name=x[1, 1], size=unname(x[, ncol(x)])))
}
list(name=x[1, 1], children=df.split(x[, -1, drop=F]))
}
)
p.lst
}
all.equal(rl, df.split(path)[[1]])
# [1] TRUE
Though note you had the organic size switched, so I had to fix your rl to get this result (rl has it as 45, but your path as 23). Also, I modified your path data.frame slightly:
path <- data.frame(
root=rep("root", 4),
P1=c("direct","direct","organic","direct"),
P2=c("direct","direct","end","end"),
P3=c("direct","organic",NA,NA),
P4=c("end","end",NA,NA),
size=c(5,12,23,45),
stringsAsFactors=F
)
WARNING: I haven't tested this with other structures, so it's possible it will hit corner cases that you'll need to debug.