Regex: Is it possible to do a substitution within a capture group? - json

I have this one line JSON text:
{"schemaText":{"fields":[{"name":"AX_SND_TYPE","type":"string"},{"name":"BWORK","type":"int"}],"name":"XXXSchema","type":"record"},"description":"Autogenerated by NiFi"}
As can be seen there is a property called "schemaText" that contains an object, I want to convert it to a string, so the 'only' thing I need to do is add quotes at the beginning and end of the property and escape the quotes inside.
Using the regular expression bellow (not that my regex knowledge is really low), I am able to do the first step:
({"schemaText":)(\{"fields":\[.*)(,"description.*)
Using the substitution
$1"$2"$3
gives the result:
{"schemaText":"{"fields":[{"name":"AX_SND_TYPE","type":"string"},{"name":"BWORK","type":"int"}],"name":"XXXSchema","type":"record"}","description":"Autogenerated by NiFi"}
But still remains to escape the quotes to get this:
{"schemaText":"{\"fields\":[{\"name\":\"AX_SND_TYPE\",\"type\":\"string\"},{\"name\":\"BWORK\",\"type\":\"int\"}],"name":"XXXSchema","type":"record"}","description":"Autogenerated by NiFi"}
That is have valid JSON format.
The question is: is there a way to escape the quotes inside $2 capture group in the same regular expression?
Thanks in advance.

The answer to your question is no, it's not possible. You're really trying to do two different, unrelated substitutions in a single regular expression. This is a feature that no regular expression engine supports.
Think about it: Your first requirement is for the engine to perform a substitution on the whole text (the quotes), and then, for your second requirement, the engine has to somehow backtrack and perform more substitutions on text which may or may not have already changed: e.g.: It would need to perform a new match on the already substituted text, which, depending on what the first substitution did, may not even exist anymore!
If, as you say, you already have an aproach that works, keep that. A single regular expression is simply not a good fit for what you are trying to do.

I'd recommend tackling this problem using code e.g. with vanilla JavaScript:
let json = '{"schemaText":{"fields":[{"name":"AX_SND_TYPE","type":"string"},{"name":"BWORK","type":"int"}],"name":"XXXSchema","type":"record"},"description":"Autogenerated by NiFi"}';
let obj = JSON.parse(json);
let schemaTextAsString = JSON.stringify(obj.schemaText)
obj.schemaText = schemaTextAsString
var result = JSON.stringify(obj)
You can see this working here.
Note that in your desired output you were not escaping the quotes in schemaText's name field, but this code does.
Finally whenever I use regular expressions I always think of this classic article "Regular Expressions: Now You Have Two Problems"!

Just for your information, you can actually match at every position where a substitution should occur, using an expression such as the following:
/({"schemaText":)|}(,"description")(.*)|([^"]*)"/g
The only issue, as others have mentioned, is that you want to do more than match; you want to perform a "conditional replacement" because there does not exist a single catch-all substitution that will cover all 3 cases you're dealing with (insert starting ", insert \ before quotes, and insert ending ").
You can in fact accomplish this with a single replace() call:
var test = "{\"schemaText\":{\"fields\":[{\"name\":\"AX_SND_TYPE\",\"type\":\"string\"},{\"name\":\"BWORK\",\"type\":\"int\"}],\"name\":\"XXXSchema\",\"type\":\"record\"},\"description\":\"Autogenerated by NiFi\"}";
window.alert(test.replace(/({"schemaText":)|}(,"description")(.*)|([^"]*)"/g, function(a,b,c,d,e){ return (b=="{\"schemaText\":"?b+"\"":(c==",\"description\""?"}\""+c+d:e+"\\\"")) })));
So it's technically "the same regex", but the substitution parameter uses an inline function as replacement rather than a static string.

Related

Alternative ways to extract the contents of a JSON string

Consider the following query:
select '"{\"foo\":\"bar\"}"'::json;
This will return a single record of a single element containing a JSON string. See:
test=# select json_typeof('"{\"foo\":\"bar\"}"'::json); json_typeof
-------------
string
(1 row)
It is possible to extract the contents of the string as follows:
=# select ('"{\"foo\":\"bar\"}"'::json) #>>'{}';
json
---------------
{"foo":"bar"}
(1 row)
From this point onward, the result can be cast as a JSON object:
test=# select json_typeof((('"{\"foo\":\"bar\"}"'::json) #>>'{}')::json);
json_typeof
-------------
object
(1 row)
This way seems magical.
I define no path within the extraction operator, yet what is returned is not what I passed. This seems like passing no index to an array accessor, and getting an element back.
I worry that I will confuse the next maintainer to look at this logic.
Is there a less magical way to do this?
But you did define a path. Defining "root" as path is just another path. And that's just what the #>> operator is for:
Extracts JSON sub-object at the specified path as text.
Rendering as text effectively applies the escape characters in the string. When casting back to json the special meaning of double-quotes (not escaped any more) kicks in. Nothing magic there. No better way to do it.
If you expect it to be confusing to the afterlife, add comments explaining what you are doing there.
Maybe, in the spirit of clarity, you might use the equivalent function json_extract_path_text() instead. The manual:
Extracts JSON sub-object at the specified path as text. (This is functionally equivalent to the #>> operator.)
Now, the function has a VARIADIC parameter, and you typically enter path elements one-by-one, like the example in the manual demonstrates:
json_extract_path_text('{"f2":{"f3":1},"f4":{"f5":99,"f6":"foo"}}',
'f4', 'f6') → foo
You cannot enter the "root" path this way. But (what the manual does not add at this point) you can alternatively provide an actual array after adding the keyword VARIADIC. See:
Pass multiple values in single parameter
So this does the trick:
SELECT json_extract_path_text('"{\"foo\":\"bar\"}"'::json, VARIADIC '{}')::json;
And since we are all about being explicit, use the verbose SQL standard cast syntax:
SELECT cast(json_extract_path_text('"{\"foo\":\"bar\"}"'::json, VARIADIC '{}') AS json)
Any clearer, yet? (I would personally prefer your shorter original, but I may be biased, being a "native speaker" of Postgres..)
The question is, why do you have that odd JSON literal including escapes as JSON string in the first place?

Is it good to always use string template literals with ES2015+/ES6+? [duplicate]

Is there a reason (performance or other) not to use backtick template literal syntax for all strings in a javascript source file? If so, what?
Should I prefer this:
var str1 = 'this is a string';
over this?
var str2 = `this is another string`;
Code-wise, there is no specific disadvantage. JS engines are smart enough to not have performance differences between a string literal and a template literal without variables.
In fact, I might even argue that it is good to always use template literals:
You can already use single quotes or double quotes to make strings. Choosing which one is largely arbitrary, and you just stick with one. However, it is encouraged to use the other quote if your string contains your chosen string marker, i.e. if you chose ', you would still do "don't argue" instead of 'don\'t argue'. However, backticks are very rare in normal language and strings, so you would actually more rarely have to either use another string literal syntax or use escape codes, which is good.
For example, you'd be forced to use escape sequences to have the string she said: "Don't do this!" with either double or single quotes, but you wouldn't have to when using backticks.
You don't have to convert if you want to use a variable in the string in the future.
However, those are very weak advantages. But still more than none, so I would mainly use template literals.
A real but in my opinion ignorable objection is the one of having to support environments where string literals are not supported. If you have those, you would know and wouldn't be asking this question.
The most significant reason not to use them is that ES6 is not supported in all environments.
Of course that might not affect you at all, but still: YAGNI. Don't use template literals unless you need interpolation, multiline literals, or unescaped quotes and apostrophes. Much of the arguments from When to use double or single quotes in JavaScript? carry over as well. As always, keep your code base consistent and use only one string literal style where you don't need a special one.
Always use template literals. In this case YAGNI is not correct. You absolutely will need it. At some point, you will have add a variable or new line to your string, at which point you will either need to change single quotes to backticks, or use the dreaded '+'.
Be careful when the values are for external use. We work with Tealium for marketing analysis, and it currently does not support ES6 template literals. Event data containing template literals aka string templates will cause the Tealium script to error.
I'm fairly convinced by other answers that there's no serious downside to using them exclusively, but one additional counterpoint is that template strings are also used in advanced "tagged template" syntax, and as illustrated in this Reddit comment, if you try to rely exclusively on JavaScript's automatic semicolon insertion or just forget to include a semicolon, you can run into parsing issues with statements that begin with a template string.
// OK (single (or double) quotes)
logger = console.log
'123'.split('').forEach(logger)
// OK (semicolon)
logger = console.log;
`123`.split('').forEach(logger)
// Not OK
logger = console.log
`123`.split('').forEach(logger) // Error

MySQL Escape characters

I'm having a really hard time figuring out how to replace a special character with another in SQL (MySQL syntax). I've already tried with REPLACE function without success. What I would like to do is:
From this string:
"C:\foo\bar\file.txt"
Obtain this string:
"C:\\foo\\bar\\file.txt"
As I thought - this is an XY problem. MySQL does not require anything from the path. What it does require though is its input to be syntactical. In input, the string literal interprets the sequences of backslash and another character as "escape", which removes special meaning from the next character. Since backslash is such a special character, it can be escaped to remove its special significance: one writes \\ to get a string with a single backslash.
What this means is, if you write 'C:\\foo\\bar\\file.txt' in an SQL command, MySQL will understand it as the string 'C:\foo\bar\file.txt' (like in my comment under your question). If you write 'C:\foo\bar\file.txt', MySQL will understand the backslash as removing the special significance from letters f, b and f (not that they had any in the first place), and the string it will end up with will be 'C:foobarfile.txt'.
Once the string is inside MySQL, it is correct, no replacements are necessary. Thus, you cannot use MySQL's REPLACE to prepare the string for input to MySQL - it is way too late for this. It is kind of like punching the baby in the stomach to pre-chew its food after it has already eaten it, it doesn't work that way and it hurts the baby.
Rather than that, use the language that you use to interface with the database (you didn't tag it, so I can't give you the details) to properly handle the strings. Many languages have functions that will correctly escape strings for you for use by MySQL. Even better, learn about prepared statements and parametrised queries, which completely remove the need for explicit escaping.
The best reference on parametrised queries I can recommend, with remedies for multiple languages, is the Bobby Tables site.
REPLACE function should do the job for you - https://dev.mysql.com/doc/refman/8.0/en/replace.html.
How are you passing the string into REPLACE function?

Regex for matching with and without quotes for dynamic JSON

I have the following text strings:
"Name":"John"}]
"Age":36
"Address":"ABC,PQR234[]/.,#ANYCHARACTERS"
"Gender":null
I need to get two groups (key value pair) from this such that the output would be only:
Key|Value
Name|John
Age|36
Address|ABC,PQR234[]/.,#ANYCHARACTERS
The requirement is to have a single regex to grab everything in the double quotes if the double quotes are present. If not, take the value without the quotes.
In our example above, 36 and null are the one without the quotes and they need to be captured as well.
I have tried a lot but have failed to do so.
UPDATE:
I don't know why I am getting down votes for this question. Yes this is JSON that I am trying to parse but there is a reason behind why I am doing this and not using any document parser.
I am supposed to use Talend for getting a dynamic JSON converted into Key Value Pair. What I mean by dynamic is the fields of the JSON can vary and hence I do not have a fixed schema and hence cannot use a document parser (which demands a fixed structure of JSON). I am devising a solution to get around this using Normalizer (on comma) and then extracting the key value pair which will be in double quotes using Regular Expressions. I tried many things on my own and since I am not an expert in Regular expressions, I have come here to get inputs.
If you know any better solution to this, I would be very happy to get your inputs.
How about this?
/"?([^\n"]*)"?:"?([^\n"]*)"?/
Explained in detail at:
https://regex101.com/r/UM0rl2/1/

How can I populate a query string variable to a text box which contains &,\ and $ in it

I have a variable like say A= drug & medicare $12/$15.
I need to assign it to a text box, but only 'drug' is posted the server. The rest of the data gets truncated.
this.textbox.text= request.querystring["A"].tostring();
The following is not valid for a="foo&bar$12":
http://example.com?a=foo&bar$12
The & symbol is a reserved character, it seperates query string variables. You will need to percent encode a value before sending them to that page.
Also & is a reserved character in HTML/XML. I suggest reading up on percent encoding and html encoding.
I believe you have problems with HTML entities. You need to read up on HTML escaping in your tool of choice. & cannot stand in HTML, since it begins an entity sequence - it needs to be replaced with &. Without specifying at least which toolchain you're using (as per #Richard's comment), we can't really suggest the best way to do it.
EDIT: Now that I reread your question, it seems A is not a variable but a query parameter :) Reading comprehension fail. Anyway, in this case a similar problem exists: & is not a valid character for a query parameter, and it needs URL escaping. Again, how exactly to do it is in the documentation for your toolchain, but in essence & will need to be replaced by %26. Plus sign is also not permitted (or rather it has another meaning); others are tolerated (but there are nicer ways to write them).
That looks more or less like ASP.NET pseudocode, so I'm going to diagnose your problem as the query string needing to be URL encoded. Key/value pairs in the query string are separated by an ampersand (&), and ASP.NET (along with other web platforms) automatically parse out the key value pairs for you.
In this case, the ampersand terminates the value of the "A=..." key/value pair. The problem will be solved if you can URL encode the link that brings the user into your page. If actually using ASP.NET, you can use the HttpUtility.UrlEncode() method for that:
string myValue = Server.UrlEncode("drug & medicare $12/$15");
You'll end up with this querystring instead: A=drug%20%26%20medicare%20%2412%2F%2415