Convert JSONB to minified (no spaces) String - json

If I convert a text value like {"a":"b"} to JSONB and then back to text a space () is added between the : and the ".
psql=> select '{"a":"b"}'::jsonb::text;
text
------------
{"a": "b"}
(1 row)
How can I convert a text to a jsonb, so I can use jsonb functions, and back to text to store it?

The JSON standard, RFC 8259, says "... Insignificant whitespace is allowed before or after any of the six structural characters". In other words, the cast from jsonb to text has no universal canonical form. The PostgreSQL cast convention (using spaces) is arbitrary.
So, we must to agree with the PostgreSQL's convention for CAST(var_jsonb AS text). When you need another cast convention, for example to debug or human-readable output, the built-in jsonb_pretty() function is a good choice.
Unfortunately PostgreSQL not offers other choices, like the compact one. So, you can overload jsonb_pretty() with a compact option:
CREATE or replace FUNCTION jsonb_pretty(
jsonb, -- input
compact boolean -- true for compact format
) RETURNS text AS $$
SELECT CASE
WHEN $2=true THEN json_strip_nulls($1::json)::text
ELSE jsonb_pretty($1)
END
$$ LANGUAGE SQL IMMUTABLE;
SELECT jsonb_pretty( jsonb_build_object('a',1, 'bla','bla bla'), true );
-- results {"a":1,"bla":"bla bla"}
See a complete discussion at this similar question.

From the docs:
https://www.postgresql.org/docs/12/datatype-json.html
"Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept."
So:
create table json_test(fld_json json, fld_jsonb jsonb);
insert into json_test values('{"a":"b"}', '{"a":"b"}');
select * from json_test ;
fld_json | fld_jsonb
-----------+------------
{"a":"b"} | {"a": "b"}
(1 row)
If you want to maintain your white space or lack of it use json. Otherwise you will get a pretty print version on output with jsonb. You can json functions/operators on json type though not the jsonb operators/functions. More detail here:
https://www.postgresql.org/docs/12/functions-json.html
Modifying your example:
select '{"a":"b"}'::json::text;
text
-----------
{"a":"b"}

The way your question and comments are phrased, it really looks like you want replace().
We need to make the search as specific as possible to avoid messing with potentially embedded ': ' within the json payload, so it seems safer to match on the surrounding double quotes too, like:
replace('{"a":"b"}'::jsonb::text, '": "', '":"')

Related

Alternative ways to extract the contents of a JSON string

Consider the following query:
select '"{\"foo\":\"bar\"}"'::json;
This will return a single record of a single element containing a JSON string. See:
test=# select json_typeof('"{\"foo\":\"bar\"}"'::json); json_typeof
-------------
string
(1 row)
It is possible to extract the contents of the string as follows:
=# select ('"{\"foo\":\"bar\"}"'::json) #>>'{}';
json
---------------
{"foo":"bar"}
(1 row)
From this point onward, the result can be cast as a JSON object:
test=# select json_typeof((('"{\"foo\":\"bar\"}"'::json) #>>'{}')::json);
json_typeof
-------------
object
(1 row)
This way seems magical.
I define no path within the extraction operator, yet what is returned is not what I passed. This seems like passing no index to an array accessor, and getting an element back.
I worry that I will confuse the next maintainer to look at this logic.
Is there a less magical way to do this?
But you did define a path. Defining "root" as path is just another path. And that's just what the #>> operator is for:
Extracts JSON sub-object at the specified path as text.
Rendering as text effectively applies the escape characters in the string. When casting back to json the special meaning of double-quotes (not escaped any more) kicks in. Nothing magic there. No better way to do it.
If you expect it to be confusing to the afterlife, add comments explaining what you are doing there.
Maybe, in the spirit of clarity, you might use the equivalent function json_extract_path_text() instead. The manual:
Extracts JSON sub-object at the specified path as text. (This is functionally equivalent to the #>> operator.)
Now, the function has a VARIADIC parameter, and you typically enter path elements one-by-one, like the example in the manual demonstrates:
json_extract_path_text('{"f2":{"f3":1},"f4":{"f5":99,"f6":"foo"}}',
'f4', 'f6') → foo
You cannot enter the "root" path this way. But (what the manual does not add at this point) you can alternatively provide an actual array after adding the keyword VARIADIC. See:
Pass multiple values in single parameter
So this does the trick:
SELECT json_extract_path_text('"{\"foo\":\"bar\"}"'::json, VARIADIC '{}')::json;
And since we are all about being explicit, use the verbose SQL standard cast syntax:
SELECT cast(json_extract_path_text('"{\"foo\":\"bar\"}"'::json, VARIADIC '{}') AS json)
Any clearer, yet? (I would personally prefer your shorter original, but I may be biased, being a "native speaker" of Postgres..)
The question is, why do you have that odd JSON literal including escapes as JSON string in the first place?

Is there a JOLT documentation? What's the meaning of the &, # etc. operators? (NiFi, JoltTransformJSON)

Yeah there is! I made this question to share my knowledge, Q&A style since I had a hard time finding it myself :)
Thanks to https://stackoverflow.com/a/67821482/1561441 (Barbaros Özhan, see comments) for pointing me into the correct direction
The answer is: look here and here
Correct me if I'm wrong, but: Wow, currently to my knowledge a single .java file on GitHub, last commit in 2017, holds relevant parts of the official documentation of the JOLT syntax. I had to use its syntax since I'm working with NiFi and applied its JoltTransformJSON processor (hence the SEO abuses in my question, so more people find the answer)
Here are some of the most relevant parts copied from https://github.com/bazaarvoice/jolt/blob/master/jolt-core/src/main/java/com/bazaarvoice/jolt/Shiftr.java and slightly edited. The documentation itself is more extensive and also shows examples.
'*' Wildcard
Valid only on the LHS ( input JSON keys ) side of a Shiftr Spec
The '*' wildcard can be used by itself or to match part of a key.
'&' Wildcard
Valid on the LHS (left hand side - input JSON keys) and RHS (output data path)
Means, dereference against a "path" to get a value and use that value as if were a literal key.
The canonical form of the wildcard is "&(0,0)".
The first parameter is where in the input path to look for a value, and the second parameter is which part of the key to use (used with * key).
There are syntactic sugar versions of the wildcard, all of the following mean the same thing; Sugar : '&' = '&0' = '&(0)' = '&(0,0)
The syntactic sugar versions are nice, as there are a set of data transforms that do not need to use the canonical form, eg if your input data does not have any "prefixed" keys.
'$' Wildcard
Valid only on the LHS of the spec.
The existence of this wildcard is a reflection of the fact that the "data" of the input JSON, can be both in the "values" and the "keys" of the input JSON
The base case operation of Shiftr is to copy input JSON "values", thus we need a way to specify that we want to copy the input JSON "key" instead.
Thus '$' specifies that we want to use an input key, or input key derived value, as the data to be placed in the output JSON.
'$' has the same syntax as the '&' wildcard, and can be read as, dereference to get a value, and then use that value as the data to be output.
There are two cases where this is useful
when a "key" in the input JSON needs to be a "id" value in the output JSON, see the ' "$": "SecondaryRatings.&1.Id" ' example above.
you want to make a list of all the input keys.
'#' Wildcard
Valid both on the LHS and RHS, but has different behavior / format on either side.
The way to think of it, is that it allows you to specify a "synthentic" value, aka a value not found in the input data.
On the RHS of the spec, # is only valid in the the context of an array, like "[#2]".
What "[#2]" means is, go up the three levels and ask that node how many matches it has had, and then use that as an index in the arrays.
This means that, while Shiftr is doing its parallel tree walk of the input data and the spec, it tracks how many matches it has processed at each level of the spec tree.
This useful if you want to take a JSON map and turn it into a JSON array, and you do not care about the order of the array.
On the LHS of the spec, # allows you to specify a hard coded String to be place as a value in the output.
The initial use-case for this feature was to be able to process a Boolean input value, and if the value is boolean true write out the string "enabled". Note, this was possible before, but it required two Shiftr steps.
'#' Wildcard
Valid on both sides of the spec.
The basic '#' on the LHS.
This wildcard is necessary if you want to put both the input value and the input key somewhere in the output JSON.
Thus the '#' wildcard is the mean "copy the value of the data at this level in the tree, to the output".
Advanced '#' sign wildcard
The format is lools like "#(3,title)", where "3" means go up the tree 3 levels and then lookup the key "title" and use the value at that key.
I would love to know if there is an alternative to JoltTransformJSON simply because I'm struggling a lot with understanding it (not coming from a programming background myself). When it works (thanks to all the help here) it does simplify things a lot!
Here are a few other sites that help:
https://intercom.help/godigibee/en/articles/4044359-transformer-getting-to-know-jolt
https://erbalvindersingh.medium.com/applying-jolttransform-on-json-object-array-and-fetching-specific-fields-48946870b4fc
https://cool-cheng.blogspot.com/2019/12/json-jolt-tutorial.html

Repair Bad Json with Unescaped Quote in Field Name

Kissmetrics exports apparently produce invalid json when there is a quote in the field name, for example, the following is one of the events produced:
{
"ab test group native dialogs on mobile":"Control",
"ab test group "interested" button copy":"Interested",
"_t":1412633724,
"_p":"hk5yxuxcqe/935mkbj+pz8xi0a8="
}
(Newlines were added to clarify the issue, we can't use those to repair the JSON).
I am looking for a mechanism for repairing such broken JSON.
There are som assumptions I believe we can take advantage of:
We can assume that the JSON being produced is flat (no nested objects or arrays), so I think we can take advantage of that.
I believe all fields are strings, except for _t, but not 100% sure.
I don't think we can assume the bad unescaped quotes will be balanced.
I believe KM removes commas and colons from field names, but not 100% sure -- they are not removed from values (though I believe values to be properly encoded).
Solution I am using now, in python, which I'm sure is imperfect:
match = regex.match(r'^{("(?P<fieldName>([^:]*))":(?P<fieldValue>([0-9]*\.?[0-9]+)|("(([^"])|(\\"))*"))(,|}))*$', s)
fieldNames = match.captures('fieldName')
fieldValues = match.captures('fieldValue')
newJson = "{%s}" % (
",".join(
"\"%s\":%s" % (
fieldName.replace("\"", "\\\""),
fieldValue,
)
for fieldName, fieldValue
in zip(fieldNames, fieldValues)
)
)
This assumes there are no colons in the keys.

Extract an int, string, boolean, etc. as its corresponding PostgreSQL type from JSON [duplicate]

This question already has answers here:
Postgres: How to convert a json string to text?
(5 answers)
How to convert postgres json to integer
(3 answers)
How to convert PostgreSQL 9.4's jsonb type to float
(7 answers)
In postgresql, how can I return a boolean value instead of string on a jsonb key?
(1 answer)
Closed 8 months ago.
I feel like I must just be missing something simple here, but I've looked through PostgreSQL's documentation on JSON and the JSON operators and functions and don't see anything explaining it.
It's easy to turn things into JSON in PostgreSQL:
SELECT *, pg_typeof(j) FROM (VALUES
(to_json(5)),
(to_json(true)),
(to_json('foo'::TEXT))
) x (j);
will give me back a nice result set full of jsons:
j | pg_typeof
-------+-----------
5 | json
true | json
"foo" | json
But how do I convert these json values back into their original types? I don't expect to be able to do that all in one result set, of course, since the types aren't consistent. I just mean individually.
Lots of stuff I tried
Casting sure doesn't work:
SELECT to_json(5)::NUMERIC;
gives
ERROR: cannot cast type json to numeric
If I try to abuse the json_populate_record function like so:
SELECT json_populate_record(null::INTEGER, to_json(5));
I get
ERROR: first argument of json_populate_record must be a row type
In PG 9.4, I can pretty easily tell the type: SELECT json_typeof(to_json(5)); gives number, but that doesn't help me actually extract it.
Neither does json_to_record (also 9.4):
SELECT * FROM json_to_record(to_json(5)) x (i INT);
gets me another error:
ERROR: cannot call json_to_record on a scalar
So how do you convert json "scalars" (as PG calls them, apparently) into the corresponding PG type?
I'm interested in 9.3 and 9.4; 9.2 would just be a bonus.
The simplest way for booleans and numbers seems to be to first cast to TEXT and then cast to the appropriate type:
SELECT j::TEXT::NUMERIC
FROM (VALUES ('5.4575e6'::json)) x (j)
-- Result is 5457500, with column type NUMERIC
SELECT j::TEXT::BOOLEAN
FROM (VALUES ('true'::json)) x (j)
-- Result is t, with column type BOOLEAN
This leaves strings, where you instead get back a quoted value trying to this:
SELECT j::TEXT
FROM (VALUES (to_json('foo'::TEXT))) x (j)
-- Result is "foo"
Apparently, that particular part of my question has already been addressed. You can get around it by wrapping the text value in an array and then extracting it:
SELECT array_to_json(array[j])->>0
FROM (VALUES (to_json('foo'::TEXT))) x (j)
-- Result is foo, with column type TEXT.
First step: if your values are contained within structures (which is usually the case), you need to use the correct operators / functions to extract your data's string representation: ->> (9.3+), #>> (9.3+), json_each_text() (9.3+), json_array_elements_text() (9.4+).
To select json array elements' text representation in 9.3, you need something like this:
select json_array ->> indx
from generate_series(0, json_array_length(json_array) - 1) indx
For scalar values, you can use this little trick:
select ('[' || json_scalar || ']')::json ->> 0 -- ...
At this point, strings and nulls are handled (json nulls convered to sql NULLs by these methods). To select numbers, you need to use casts to numeric (that's fully1 compatible with json's number). To select booleans, use casts to boolean (both true and false supported as input representations). But note, that casts can make your query fail, if their input representation is not accepted. F.ex. if you have a json object in some of your columns, that object usually have some key, which is usually number (but not always), this query can fail:
select (json_object ->> 'key')::numeric
from your_table
If you have such data, you need to filter your selects with json_typeof() (9.4+):
select (json_object ->> 'key')::numeric
from your_table
where json_typeof(json_object -> 'key') = 'number'
1 I haven't checked their full syntaxes, but numeric also accepts scientific notation, so in theory, all json numbers should be handled correctly.
For 9.2+, this function can test a json value's type:
create or replace function json_typeof(json)
returns text
language sql
immutable
strict
as $func$
select case left(trim(leading E'\x20\x09\x0A\x0D' from $1::text), 1)
when 'n' then 'null'
when 't' then 'boolean'
when 'f' then 'boolean'
when '"' then 'string'
when '[' then 'array'
when '{' then 'object'
else 'number'
end
$func$;
This is question similar to yours. Essentially, the underlying bit-level representations of the data types are incompatible, and transforming a scalar into the native type is not something that has been implemented because of the ambiguities involved. JSON has a very strict spec that corresponds tightly to javascript objects and natives.
It is possible, but I do not think it has been implemented yet.

Changing array type representation to use square brackets? Possible?

The answer to my question here: Detecting column changes in a postgres update trigger has me converting rows in my database into their hstore equivalent. It's clever, but leads to serialization / deserialization issues with array column types.
I have a few array-typed columns, so for example:
select hstore_to_json(hstore(documents.*)) from documents where id=283;
gives me (abbreviated form):
{"id": "283", "tags": "{potato,rutabaga}", "reply_parents": "{7}"}
What I'd really like is
"tags": ["potato", "rutabaga"], "reply_parents": [7]
as this is well-formed JSON. Technically, the first response is also well-formed JSON, as the array has been stringified and is sent down the wire as "{potato,rutabaga}". This requires me to fiddle with the parsing of responses I get, and while that's not the end of the world, it is a bit of a pain if it turns out to be unnecessary.
Calling row_to_json on the row first converts the array types into their proper json-array representation, but it doesn't seem like there's a set subtraction type operator on json objects (hstore - hstore in the question asked above is how I'm sending "these columns changed" events down my websocket wire). So, any suggestions as to how to get this working properly right in the database are welcome. (either futzing with the way arrays get stringified into hstore, or by doing set subtraction on json objects).
If you cannot find any natural solution, you can always trust in regex.
create or replace function hj(json text)
returns text language plpgsql immutable
as $$
begin
return
regexp_replace(
regexp_replace(
regexp_replace(
json, '([^"{]+?),', '"\1", ', 'g'),
'([^"{ ]+?)}"', '"\1"]', 'g'),
'"{"', '["', 'g');
end $$;
select hj('{"id": "283", "tags": "{potato,rutabaga}", "reply_parents": "{7}"}');
-- gives:
-- {"id": "283", "tags": ["potato", "rutabaga"], "reply_parents": ["7"]}