I am doing a select from a table and I want the output to be in json. For this I am using the json_build_object function.
I am getting some unexpected behavior when the select has a value with a slash inside of it.
A simple example would be:
select json_build_object('test', 'a\b');
This outputs:
{ "test": "a\\b" }
I would like to get an output of:
{ "test": "a\b" } // without extra backslash
Since your input is three characters including the first two letters and a literal backslash between them, then "a\\b" is the correct way to represent that in JSON.
If your input were made up of 2 characters, the first letter and the 'backspace' control character, that would correctly be represented in JSON as "a\b". But that is not the input you showed us. If your input is not what you want it to be, you shouldn't expect json_build_object to read your mind and fix it for you.
Related
Using regular expressions (in Notepad++), I want to find all JSON sections that contain the string foo. Note that the JSON just happens to be embedded within a limited set of HTML source code which is loaded into Notepad++.
I've written the following regex to accomplish this task:
({[^}]*foo[^}]*})
This works as expected in all the input that is possible.
I want to improve my workflow, so instead of just finding all such JSON sections, I want to write a regex to remove all the HTML & JSON that does not match this expression. The result will be only JSON sections that contain foo.
I tried using the Notepad++ regex Replace functionality with this find expression:
(?:({[^}]*?foo[^}]*?})|.)+
and this replace expression:
$1\n\n$2\n\n$3\n\n$4\n\n$5\n\n$6\n\n$7\n\n$8\n\n$9\n\n
This successfully works for the last occurrence of foo within the JSON, but does not find the rest of the occurrences.
How can I improve my code to find all the occurrences?
Here is a simplified minimal example of input and desired output. I hope I haven't simplified it too much for it to be useful:
Simplified input:
<!DOCTYPE html>
<html>
<div dat="{example foo1}"> </div>
<div dat="{example bar}"> </div>
<div dat="{example foo2}"> </div>
</html>
Desired output:
{example foo1}
{example foo2}
You can use
{[^}]*foo[^}]*}|((?s:.))
Replace with (?1:$0\n). Details:
{[^}]*foo[^}]*} - {, zero or more chars other than }, foo, zero or more chars other than } and then a }
| - or
((?s:.)) - Capturing group 1: any one char ((?s:...) is an inline modifier group where . matches all chars including line break chars, same as if you enabled . matches newline option).
The (?1:$0\n) replacement pattern replaces with an empty string if Group 1 was matched, else the replacement is the match text + a newline.
See the demo and search and replace dialog settings:
Updates
The comment section was full tried to suggest a code here,
Let me know if this is a bit close to your intended result,
Find: ({.+?[\n]*foo[ \d]*})|.*?
Replace all: $1
Also added Toto's example
I have a JSON text full of these:
"order": "Commande",
"#order": }
"description": "order word",
"type": "texte"
},
As you can see, There is an error in front of "#Order:" which } is used instead of {
How can I replace all of them with open bracket without changing }, at the end of objects? (I mean I need the regex expression to use it in search of my text editor)
^\s\}{1}\s didn't work
Maybe this can help you
(?<=\"#\w+\":\s+)}
https://regex101.com/r/A4Yv6X/1
Your ^\s\}\s pattern matches a whitespace, } and whitespace at the start of string. while the } you want to replace is at the end of string (or line).
You may consider using \}$, or \}(?=\s*$), or \}(?=\h*$) patterns to match } only at the end of a string/line (where $ is the end of string/line, (?=\s*$) matches a location that is immediately followed with any 0 or more whitespaces and then the end of a string/line, \h just only allows horizontal whitespaces).
However, if there is a colon before the } you need to replace you may consider a more sophisticated pattern like
(:\h*)\}(\h*$)
(:[^\S\r\n]*)\}([^\S\r\n]*$)
Replace with $1{$2 (or \1{\2 depending on the environment).
See the regex demo. Details:
(:\h*) - Capturing group 1 ($1): a colon and zero or more horizontal whitespaces
\} - a } char
(\h*$) - Group 2 ($2): zero or more horizontal whitespaces.
Note that [^\S\r\n] is a rough equivalent to \h, it matches any whitespace char but CR and LF chars.
Try this, which matches brackets at the ends of lines
(?m)\}$
See live demo.
Depending on your input, this might be enough.
I'm attempting to parse and format some text from an HTML file into Word. I'm doing this by capturing each paragraph into an array and then writing it into the word document one paragraph at a time. However, there are superscripted references sprinkled throughout the text. I'm looking for a way to superscript these references in the new Word file and thought I would use regex and split to make this work. Here is an example paragraph:
$p = "This is an example sentence.1 The number is a reference note that should be superscripted and can be one or two digits long."
Here is the code I tried to split and select the digit(s):
[regex]::Split($p,"(\d{1,2})")
This works for single and double digits. However, if there are more than two digits, it still splits it, but moves the extra numbers to the next line. Like so:
This is an example sentence.
10
0
The number is a reference note that should be superscripted and can be one or two digits long.
This is important because there are sometimes larger numbers (3-10 digits) in the text that I don't want to split on. My goal is to take a block of text with reference note numbers and seperate out the notes so I can perform formatting functions on them when I write it out to the Word file. Something like this (untested):
$paragraphs | % {
$a = #([regex]::Split($_,"(\d{1,2})"))
$a | % {
$text = $_
if ($text -match "(\d{1,2})")
{
$objSelection.Font.SuperScript = 1
$objSelection.TypeText("$text")
$objSelection.Font.SuperScript = 0
}
Else
{
$objSelection.Style="Normal"
$objSelection.TypeText("$text")
}
}
$text = "`v"
$objSelection.TypeText("$text")
$objSelection.TypeParagraph()
}
EDIT:
The following regex expression works when I test it with the above loop in it's own script:
"(?<![\d\s])(\d{1,2})(?!\d)"
However, when I run it in the parent script, I get the following error:
Cannot find an overload for "Split" and the argument count: "2"
$a = [regex]::Split($_,"(?<![\d\s])(\d{1,2})(?!\d)")
How would I go about troubleshooting this error?
You may use
[regex]::Split($p,"(?<![\d\s])(\d{1,2})(?!\d)\s*")
It only matches and captures one or two digits that are neither followed nor preceded with another digit, and not preceded with any whitespace char. Any trailing whitespace is matched with \s* and is thus removed from the items that are added into the resulting array.
See this regex demo:
Details
(?<![\d\s]) - a negative lookbehind that fails the match if, immediately to the left of the current position, there is a digit or a whitespace
(\d{1,2}) - Group 1: one or two digits
(?!\d) - that cannot be followed with another digit (it is a negative lookahead that fails the match if its pattern matches immediately to the right of the current location)
\s* - 0+ whitespaces.
So the JSON is like:
"foo": {
"points": 23.67
},
I'd like a regex to just match 23.67.
I've tried \"foo\":{\"points\":([^}"]*) but it doesn't work.
There are multiple lines which contain "points": so just \"points\":([^}"]*) won't work.
You are ignoring whitespace.
Try this instead:
\"foo\":\s*{\s*\"points\":\s*(\d+(?:\.\d+)?)\s*}
Demo
Your solution does not take into account a few details:
Between "foo": and { there can be spaces.
After { there can be a newline and spaces.
After "points": there can also be spaces.
Between the string to capture (capturing group and "terminating" '}'
there can also be a newline and spaces.
So, including the above missed details, and taking into account that \s
matches also newline, the whole regex can be as follows:
\"foo\":\s*{\s*\"points\":\s*([^}"]*)\s*}
Actually, your capturing group can be "more restrictive".
As the text to capture contains only digits and a dot,
it can be written as: [\d\.]+.
Note that I changed * to +, because the content cannot be empty.
Why do I get 0 when running this expression?
SELECT 'Nr. 1700-902-8423. asdasdasd' REGEXP '1+[ ,.-/\]*7+[ ,.-/\]*0+[ ,.-/\]*0+[ ,.-/\]*9+[ ,.-/\]*0+[ ,.-/\]*2+[ ,.-/\]*8+[ ,.-/\]*4+[ ,.-/\]*2+[ ,.-/\]*3+';
I need to get true, when the text contains the specified number (17009028423). There can be symbols ,.-/\ between digits.
For example, if I have number 17009028423, I need get true when in text is:
1700-902-8423
17-00,902-84.23
170/09-0.28\423
1700..902 842-3
17,.009028 4//2\3
etc.
Thanks.
There are two problems with your regular expression. First is that backslash in \] escapes the special meaning of ] to denote a character class. You need to escape your backslash: \\]. Another problem is that - denotes a range [ and ] (e.g. [a-zA-Z]). You need to escape that too or put it at the end like [a-zA-Z-] (as #tenub said). Plus the backslashes should be escaped themselves, which makes:
SELECT 'Nr. 1700-902-8423. asdasdasd' REGEXP '1[ ,./\\\\-]*7[ ,./\\\\-]*0[ ,./\\\\-]*0[ ,./\\\\-]*9[ ,./\\\\-]*0[ ,./\\\\-]*2[ ,./\\\\-]*8[ ,./\\\\-]*4[ ,./\\\\-]*2[ ,./\\\\-]*3'
You can check for yourself.
I also removed + signs in case you want to match each number only once.