Alternative ways to extract the contents of a JSON string - json

Consider the following query:
select '"{\"foo\":\"bar\"}"'::json;
This will return a single record of a single element containing a JSON string. See:
test=# select json_typeof('"{\"foo\":\"bar\"}"'::json); json_typeof
-------------
string
(1 row)
It is possible to extract the contents of the string as follows:
=# select ('"{\"foo\":\"bar\"}"'::json) #>>'{}';
json
---------------
{"foo":"bar"}
(1 row)
From this point onward, the result can be cast as a JSON object:
test=# select json_typeof((('"{\"foo\":\"bar\"}"'::json) #>>'{}')::json);
json_typeof
-------------
object
(1 row)
This way seems magical.
I define no path within the extraction operator, yet what is returned is not what I passed. This seems like passing no index to an array accessor, and getting an element back.
I worry that I will confuse the next maintainer to look at this logic.
Is there a less magical way to do this?

But you did define a path. Defining "root" as path is just another path. And that's just what the #>> operator is for:
Extracts JSON sub-object at the specified path as text.
Rendering as text effectively applies the escape characters in the string. When casting back to json the special meaning of double-quotes (not escaped any more) kicks in. Nothing magic there. No better way to do it.
If you expect it to be confusing to the afterlife, add comments explaining what you are doing there.
Maybe, in the spirit of clarity, you might use the equivalent function json_extract_path_text() instead. The manual:
Extracts JSON sub-object at the specified path as text. (This is functionally equivalent to the #>> operator.)
Now, the function has a VARIADIC parameter, and you typically enter path elements one-by-one, like the example in the manual demonstrates:
json_extract_path_text('{"f2":{"f3":1},"f4":{"f5":99,"f6":"foo"}}',
'f4', 'f6') → foo
You cannot enter the "root" path this way. But (what the manual does not add at this point) you can alternatively provide an actual array after adding the keyword VARIADIC. See:
Pass multiple values in single parameter
So this does the trick:
SELECT json_extract_path_text('"{\"foo\":\"bar\"}"'::json, VARIADIC '{}')::json;
And since we are all about being explicit, use the verbose SQL standard cast syntax:
SELECT cast(json_extract_path_text('"{\"foo\":\"bar\"}"'::json, VARIADIC '{}') AS json)
Any clearer, yet? (I would personally prefer your shorter original, but I may be biased, being a "native speaker" of Postgres..)
The question is, why do you have that odd JSON literal including escapes as JSON string in the first place?

Related

How to marshal a predicate from JSON in Prolog?

In Python it is common to marshal objects from JSON. I am seeking similar functionality in Prolog, either swi-prolog or scryer.
For instance, if we have JSON stating
{'predicate':
{'mortal(X)', ':-', 'human(X)'}
}
I'm hoping to find something like load_predicates(j) and have that data immediately consulted. A version of json.dumps() and loads() would also be extremely useful.
EDIT: For clarity, this will allow interoperability with client applications which will be collecting rules from users. That application is probably not in Prolog, but something like React.js.
I agree with the commenters that it would be easier to convert the JSON data to a .pl file in the proper format first and then load that.
However, you can load the predicates from JSON directly, convert them to a representation that Prolog understands, and use assertz to add them to the knowledge base.
If indeed the data contains all the syntax needed for a predicate (as is the case in the example data in the question) then converting the representation is fairly simple as you just need to concatenate the elements of the list into a string and then create a term out of the string. Note that this assumption skips step 2 in the first comment by Guy Coder.
Note that the Prolog JSON library is rather strict in which format it accepts: only double quotes are valid as string delimiters, and lists with singleton values (i.e., not key-value pairs) need to use the notation [a,b,c] instead of {a,b,c}. So first the example data needs to be rewritten:
{"predicate":
["mortal(X)", ":-", "human(X)"]
}
Then you can load it in SWI-Prolog. Minimal working example:
:- use_module(library(http/json)).
% example fact for testing
human(aristotle).
load_predicate(J) :-
% open the file
open(J, read, JSONstream, []),
% parse the JSON data
json_read(JSONstream, json(L)),
% check for an occurrence of the predicate key with value L2
member(predicate=L2, L),
% concatenate the list into a string
atomics_to_string(L2, S),
% create a term from the string
term_string(T, S),
% add to knowledge base
assertz(T).
Example run:
?- consult('mwe.pl').
true.
?- load_predicate('example_predicate.json').
true.
?- mortal(X).
X = aristotle.
Detailed explanation:
The predicate json_read stores the data in the following form:
json([predicate=['mortal(X)', :-, 'human(X)']])
This is a list inside a json term with one element for each key-value pair. The element has the syntax key=value. In the call to json_read you can already strip the json() term and store the list directly in the variable L.
Then member/2 is used to search for the compound term predicate=L2. If you have more than one predicate in the JSON file then you should turn this into a foreach or in a recursive call to process all predicates in the list.
Since the list L2 already contains a syntactically well-formed Prolog predicate it can just be concatenated, turned into a term using term_string/2 and asserted. Note that in case the predicate is not yet in the required format, you can construct a predicate out of the various pieces using built-in predicate manipulation functionality, see https://www.swi-prolog.org/pldoc/doc_for?object=copy_predicate_clauses/2 for some pointers.

Is there a JOLT documentation? What's the meaning of the &, # etc. operators? (NiFi, JoltTransformJSON)

Yeah there is! I made this question to share my knowledge, Q&A style since I had a hard time finding it myself :)
Thanks to https://stackoverflow.com/a/67821482/1561441 (Barbaros Özhan, see comments) for pointing me into the correct direction
The answer is: look here and here
Correct me if I'm wrong, but: Wow, currently to my knowledge a single .java file on GitHub, last commit in 2017, holds relevant parts of the official documentation of the JOLT syntax. I had to use its syntax since I'm working with NiFi and applied its JoltTransformJSON processor (hence the SEO abuses in my question, so more people find the answer)
Here are some of the most relevant parts copied from https://github.com/bazaarvoice/jolt/blob/master/jolt-core/src/main/java/com/bazaarvoice/jolt/Shiftr.java and slightly edited. The documentation itself is more extensive and also shows examples.
'*' Wildcard
Valid only on the LHS ( input JSON keys ) side of a Shiftr Spec
The '*' wildcard can be used by itself or to match part of a key.
'&' Wildcard
Valid on the LHS (left hand side - input JSON keys) and RHS (output data path)
Means, dereference against a "path" to get a value and use that value as if were a literal key.
The canonical form of the wildcard is "&(0,0)".
The first parameter is where in the input path to look for a value, and the second parameter is which part of the key to use (used with * key).
There are syntactic sugar versions of the wildcard, all of the following mean the same thing; Sugar : '&' = '&0' = '&(0)' = '&(0,0)
The syntactic sugar versions are nice, as there are a set of data transforms that do not need to use the canonical form, eg if your input data does not have any "prefixed" keys.
'$' Wildcard
Valid only on the LHS of the spec.
The existence of this wildcard is a reflection of the fact that the "data" of the input JSON, can be both in the "values" and the "keys" of the input JSON
The base case operation of Shiftr is to copy input JSON "values", thus we need a way to specify that we want to copy the input JSON "key" instead.
Thus '$' specifies that we want to use an input key, or input key derived value, as the data to be placed in the output JSON.
'$' has the same syntax as the '&' wildcard, and can be read as, dereference to get a value, and then use that value as the data to be output.
There are two cases where this is useful
when a "key" in the input JSON needs to be a "id" value in the output JSON, see the ' "$": "SecondaryRatings.&1.Id" ' example above.
you want to make a list of all the input keys.
'#' Wildcard
Valid both on the LHS and RHS, but has different behavior / format on either side.
The way to think of it, is that it allows you to specify a "synthentic" value, aka a value not found in the input data.
On the RHS of the spec, # is only valid in the the context of an array, like "[#2]".
What "[#2]" means is, go up the three levels and ask that node how many matches it has had, and then use that as an index in the arrays.
This means that, while Shiftr is doing its parallel tree walk of the input data and the spec, it tracks how many matches it has processed at each level of the spec tree.
This useful if you want to take a JSON map and turn it into a JSON array, and you do not care about the order of the array.
On the LHS of the spec, # allows you to specify a hard coded String to be place as a value in the output.
The initial use-case for this feature was to be able to process a Boolean input value, and if the value is boolean true write out the string "enabled". Note, this was possible before, but it required two Shiftr steps.
'#' Wildcard
Valid on both sides of the spec.
The basic '#' on the LHS.
This wildcard is necessary if you want to put both the input value and the input key somewhere in the output JSON.
Thus the '#' wildcard is the mean "copy the value of the data at this level in the tree, to the output".
Advanced '#' sign wildcard
The format is lools like "#(3,title)", where "3" means go up the tree 3 levels and then lookup the key "title" and use the value at that key.
I would love to know if there is an alternative to JoltTransformJSON simply because I'm struggling a lot with understanding it (not coming from a programming background myself). When it works (thanks to all the help here) it does simplify things a lot!
Here are a few other sites that help:
https://intercom.help/godigibee/en/articles/4044359-transformer-getting-to-know-jolt
https://erbalvindersingh.medium.com/applying-jolttransform-on-json-object-array-and-fetching-specific-fields-48946870b4fc
https://cool-cheng.blogspot.com/2019/12/json-jolt-tutorial.html

Convert JSONB to minified (no spaces) String

If I convert a text value like {"a":"b"} to JSONB and then back to text a space () is added between the : and the ".
psql=> select '{"a":"b"}'::jsonb::text;
text
------------
{"a": "b"}
(1 row)
How can I convert a text to a jsonb, so I can use jsonb functions, and back to text to store it?
The JSON standard, RFC 8259, says "... Insignificant whitespace is allowed before or after any of the six structural characters". In other words, the cast from jsonb to text has no universal canonical form. The PostgreSQL cast convention (using spaces) is arbitrary.
So, we must to agree with the PostgreSQL's convention for CAST(var_jsonb AS text). When you need another cast convention, for example to debug or human-readable output, the built-in jsonb_pretty() function is a good choice.
Unfortunately PostgreSQL not offers other choices, like the compact one. So, you can overload jsonb_pretty() with a compact option:
CREATE or replace FUNCTION jsonb_pretty(
jsonb, -- input
compact boolean -- true for compact format
) RETURNS text AS $$
SELECT CASE
WHEN $2=true THEN json_strip_nulls($1::json)::text
ELSE jsonb_pretty($1)
END
$$ LANGUAGE SQL IMMUTABLE;
SELECT jsonb_pretty( jsonb_build_object('a',1, 'bla','bla bla'), true );
-- results {"a":1,"bla":"bla bla"}
See a complete discussion at this similar question.
From the docs:
https://www.postgresql.org/docs/12/datatype-json.html
"Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept."
So:
create table json_test(fld_json json, fld_jsonb jsonb);
insert into json_test values('{"a":"b"}', '{"a":"b"}');
select * from json_test ;
fld_json | fld_jsonb
-----------+------------
{"a":"b"} | {"a": "b"}
(1 row)
If you want to maintain your white space or lack of it use json. Otherwise you will get a pretty print version on output with jsonb. You can json functions/operators on json type though not the jsonb operators/functions. More detail here:
https://www.postgresql.org/docs/12/functions-json.html
Modifying your example:
select '{"a":"b"}'::json::text;
text
-----------
{"a":"b"}
The way your question and comments are phrased, it really looks like you want replace().
We need to make the search as specific as possible to avoid messing with potentially embedded ': ' within the json payload, so it seems safer to match on the surrounding double quotes too, like:
replace('{"a":"b"}'::jsonb::text, '": "', '":"')

Deal with long numbers in scientific notation in json string - Freemarker

I have a json string which contains a long number but in scientific notation (like 1.559101974041E12 instead of 1559101974041). Due to this, I am not able to parse it using ?eval as this value must be in double quotes in order to get parsed.
I thought of one solution like putting double quotes around them using regex and get them evaluated. After that, use some free marker method to convert value into long. But this solution is very risky and can alter other values as well.
I'm not sure how your template looks, but if you have variable s that contains the string "1.559101974041E12" (the quotation marks aren't part of the string value itself), then you can parse it like s?number. s?eval doesn't work because scientific notation is not part of the FreeMarker syntax (but ?number can parse more formats).
If you will re-print the number in the template, note that depending on locale and configuration settings, it might will look like 1,559,101,974,041. You can prevent that with ?c (for example like ${s?number?c}), in which case it will always look like 1559101974041.

Regex: Is it possible to do a substitution within a capture group?

I have this one line JSON text:
{"schemaText":{"fields":[{"name":"AX_SND_TYPE","type":"string"},{"name":"BWORK","type":"int"}],"name":"XXXSchema","type":"record"},"description":"Autogenerated by NiFi"}
As can be seen there is a property called "schemaText" that contains an object, I want to convert it to a string, so the 'only' thing I need to do is add quotes at the beginning and end of the property and escape the quotes inside.
Using the regular expression bellow (not that my regex knowledge is really low), I am able to do the first step:
({"schemaText":)(\{"fields":\[.*)(,"description.*)
Using the substitution
$1"$2"$3
gives the result:
{"schemaText":"{"fields":[{"name":"AX_SND_TYPE","type":"string"},{"name":"BWORK","type":"int"}],"name":"XXXSchema","type":"record"}","description":"Autogenerated by NiFi"}
But still remains to escape the quotes to get this:
{"schemaText":"{\"fields\":[{\"name\":\"AX_SND_TYPE\",\"type\":\"string\"},{\"name\":\"BWORK\",\"type\":\"int\"}],"name":"XXXSchema","type":"record"}","description":"Autogenerated by NiFi"}
That is have valid JSON format.
The question is: is there a way to escape the quotes inside $2 capture group in the same regular expression?
Thanks in advance.
The answer to your question is no, it's not possible. You're really trying to do two different, unrelated substitutions in a single regular expression. This is a feature that no regular expression engine supports.
Think about it: Your first requirement is for the engine to perform a substitution on the whole text (the quotes), and then, for your second requirement, the engine has to somehow backtrack and perform more substitutions on text which may or may not have already changed: e.g.: It would need to perform a new match on the already substituted text, which, depending on what the first substitution did, may not even exist anymore!
If, as you say, you already have an aproach that works, keep that. A single regular expression is simply not a good fit for what you are trying to do.
I'd recommend tackling this problem using code e.g. with vanilla JavaScript:
let json = '{"schemaText":{"fields":[{"name":"AX_SND_TYPE","type":"string"},{"name":"BWORK","type":"int"}],"name":"XXXSchema","type":"record"},"description":"Autogenerated by NiFi"}';
let obj = JSON.parse(json);
let schemaTextAsString = JSON.stringify(obj.schemaText)
obj.schemaText = schemaTextAsString
var result = JSON.stringify(obj)
You can see this working here.
Note that in your desired output you were not escaping the quotes in schemaText's name field, but this code does.
Finally whenever I use regular expressions I always think of this classic article "Regular Expressions: Now You Have Two Problems"!
Just for your information, you can actually match at every position where a substitution should occur, using an expression such as the following:
/({"schemaText":)|}(,"description")(.*)|([^"]*)"/g
The only issue, as others have mentioned, is that you want to do more than match; you want to perform a "conditional replacement" because there does not exist a single catch-all substitution that will cover all 3 cases you're dealing with (insert starting ", insert \ before quotes, and insert ending ").
You can in fact accomplish this with a single replace() call:
var test = "{\"schemaText\":{\"fields\":[{\"name\":\"AX_SND_TYPE\",\"type\":\"string\"},{\"name\":\"BWORK\",\"type\":\"int\"}],\"name\":\"XXXSchema\",\"type\":\"record\"},\"description\":\"Autogenerated by NiFi\"}";
window.alert(test.replace(/({"schemaText":)|}(,"description")(.*)|([^"]*)"/g, function(a,b,c,d,e){ return (b=="{\"schemaText\":"?b+"\"":(c==",\"description\""?"}\""+c+d:e+"\\\"")) })));
So it's technically "the same regex", but the substitution parameter uses an inline function as replacement rather than a static string.