Postgres regex not matching comma as expected - json

I have a text field containing JSON and I need to replace and eliminate some fields. Below is an example of the JSON format. I would like to remove certain fields suffixed by '-op' and the trailing comma but the comma is not being picked up for some reason.
{
"miscId":[],
"otherActivityData":{"activityDate-op":"eq","activityDate":"11/28/2017"}
}
I worked with a nice online tools to show my pattern should work for most languages => regexpal
The pattern is:
"activityDate-op":".+?",?
It picks everything up except the comma. I did a regexp_match and the printed it out via raise notice and it produced
{"\"activityDate-op\":\"eq\""}
Can anyone help point out how I can pick up the comma?
Sometimes the field-op is last in the array so I need to have the 0 or 1 question mark quantifier in place. If I remove the ? then it picks the comma up sometimes but also causes issues.

You don't need a regex. There is an operator to remove an element from a JSONB based on the element's path:
select j #- array['otherActivityData', 'activityDate-op']
from (
values ( '{"miscId":[],
"otherActivityData":{"activityDate-op":"eq","activityDate":"11/28/2017"}
}'::jsonb)
) as t(j);
returns:
{"miscId": [], "otherActivityData": {"activityDate": "11/28/2017"}}

Related

How to get PostgreSQL to escape text from jsonb_array_element?

I'm loading some JSON from Postgres 13 into Elasticsearch using Logstash and ran into some errors caused by text not being escaped with reverse solidus. I tracked my problem down to this behavior:
SELECT
json_build_object(
'literal_text', 'abc\ndef'::text,
'literal_text_type', pg_typeof('abc\ndef'::text),
'text_from_jsonb_array_element', a->>0,
'jsonb_array_element_type', pg_typeof(a->>0)
)
FROM jsonb_array_elements('["abc\ndef"]') jae (a);
{
"literal_text": "abc\\ndef",
"literal_text_type": "text",
"text_from_jsonb_array_element": "abc\ndef",
"jsonb_array_element_type":"text"
}
db-fiddle
json_build_object encodes the literal text as expected (turning \n into \\n); however, it doesn't encode the text retrieved via jsonb_array_element even though both are text.
Why is the text extracted from jsonb_array_element being treated differently (not getting escaped by jsonb_build_object)? I've tried casting, using jsonb_array_elements_text (though my actual use case involves an array of arrays, so I need to split to a set of jsonb), and various escaping/encoding/formatting functions, but haven't found a solution yet.
Is there a trick to cast text pulled from jsonb_array_element so it will get properly encoded by jsonb_build_object?
Thanks for any hints or solutions.
Those strings look awfully similar, but they're actually different. When you create a string literal like '\n', that's a backslash character followed by an "n" character. So when you put that into json_build_object, it needs to add a backslash to escape the backslash you're giving it.
On the other hand, when you call jsonb_array_elements('["abc\ndef"]'), you're saying that the JSON has precisely a \n encoded in it with no second backslash, and therefore when it's converted to text, that \n is interpreted as a newline character, not two separate characters. You can see this easily by running the following:
SELECT a->>0 FROM jsonb_array_elements('["abc\ndef"]') a;
?column?
----------
abc +
def
(1 row)
On encoding that back into a JSON, you get a single backslash again, because it's once again encoding a newline character.
If you want to escape it with an extra backslash, I suggest a simple replace:
SELECT
json_build_object(
'text_from_jsonb_with_replace', replace(a->>0, E'\n', '\n')
)
FROM jsonb_array_elements('["abc\ndef"]') jae (a);
json_build_object
------------------------------------------------
{"text_from_jsonb_with_replace" : "abc\\ndef"}

Regexp to match JSON key:value pairs with commas in value [duplicate]

Can't get why this regex (regex101)
/[\|]?([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g
captures all the input, while this (regex101)
/[\|]+([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g
captures only |Func
Input string is |Func(param1, param2, param32, param54, param293, par13am, param)|
Also how can i match repeated capturing group in normal way? E.g. i have regex
/\(\(\s*([a-z\_]+){1}(?:\s+\,\s+(\d+)*)*\s*\)\)/gui
And input string is (( string , 1 , 2 )).
Regex101 says "a repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations...". I've tried to follow this tip, but it didn't helped me.
Your /[\|]+([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g regex does not match because you did not define a pattern to match the words inside parentheses. You might fix it as \|+([a-z0-9A-Z]+)(?:\(?(\w+(?:\s*,\s*\w+)*)\)?)?\|?, but all the values inside parentheses would be matched into one single group that you would have to split later.
It is not possible to get an arbitrary number of captures with a PCRE regex, as in case of repeated captures only the last captured value is stored in the group buffer.
What you may do is get mutliple matches with preg_match_all capturing the initial delimiter.
So, to match the second string, you may use
(?:\G(?!\A)\s*,\s*|\|+([a-z0-9A-Z]+)\()\K\w+
See the regex demo.
Details:
(?:\G(?!\A)\s*,\s*|\|+([a-z0-9A-Z]+)\() - either the end of the previous match (\G(?!\A)) and a comma enclosed with 0+ whitespaces (\s*,\s*), or 1+ | symbols (\|+), followed with 1+ alphanumeric chars (captured into Group 1, ([a-z0-9A-Z]+)) and a ( symbol (\()
\K - omit the text matched so far
\w+ - 1+ word chars.

Select JSON values with special characters

I am looking to detect anomalies in my JSON values.
Here's an example of the data queries via jq
"2014-03-26 01:58:00"
"9019549360"
"109092812_20150626"
"134670164"
""
"97695498"
"680561513"
I would like to display all the values that contain a - or a _ or is blank.
In other words, I'd like to display the following output
"2014-03-26 01:58:00"
"109092812_20150626"
""
Now, I have tried the following:
select (. | contains("-","_"," "))'
This appears to work, but in order to make it more robust, I'd like to expand this to include all special characters.
Your query won't detect empty strings, and will possibly emit the same string more than once. It would be easier to use test, e.g.:
select( length==0 or test("[-_ ]") )
Note also that the preliminary '.' in your query is unnecessary.
Addendum
From one of the comments, it awould appear that you will want to specify "[^a-zA-Z0-9]" or similar as the argument of test.

Why does Jekyll/Liquid split filter ignore trailing delimiters?

The jekyll/liquid split string filter is briefly documented here. When applying split to URL's I realized that split has a quite surprising behavior when the string being split has as first or last character the split delimiter. See examples on this test page.
--> split of '/a/b/c' by '/' gives ["","a","b","c"]
--> split of 'a/b/c/' by '/' gives ["a","b","c"]
A leading delimiter produces an empty leading string list element
A trailing delimiter seems to be ignored.
That's why split:'/' | join:'/' does not reproduce the original string
as one might naively expect.
Questions:
Is that, to me surprising, asymmetric behavior intended ?
If yes, why ?
If yes, why isn't it clearly documented ?

problems using replaceText for special characters: [ ]

I want to replace "\cite{foo123a}" with "[1]" and backwards. So far I was able to replace text with the following command
body.replaceText('.cite{foo}', '[1]');
but I did not manage to use
body.replaceText('\cite{foo}', '[1]');
body.replaceText('\\cite{foo}', '[1]');
Why?
The back conversion I cannot get to work at all
body.replaceText('[1]', '\\cite{foo}');
this will replace only the "1" not the [ ], this means the [] are interpreted as regex character set, escaping them will not help
body.replaceText('\[1\]', '\\cite{foo}');//no effect, still a char set
body.replaceText('/\[1\]/', '\\cite{foo}');//no matches
The documentation states
A subset of the JavaScript regular expression features are not fully supported, such as capture groups and mode modifiers.
Can I find a full description of what is supported and what not somewhere?
I'm not familiar with Google Apps Script, but this looks like ordinary regular expression troubles.
Your second conversion is not working because the string literal '\[1\]' is just the same as '[1]'. You want to quote the text \[1\] as a string literal, which means '\\[1\\]'. Slashes inside of a string literal have no relevant meaning; in that case you have written a pattern which matches the text /1/.
Your first conversion is not working because {...} denotes a quantifier, not literal braces, so you need \\\\cite\\{foo\\}. (The four backslashes are because to match a literal \ in a regular expression is \\, and to make that a string literal it is \\\\ — two escaped backslashes.)