Is JSON safe to use as a command line argument or does it need to be sanitized first? - json

Is the following dangerous?
$ myscript '<somejsoncreatedfromuserdata>'
If so, what can I do to make it not dangerous?
I realize that this can depend on the shell, OS, utility used for making system calls (if being done inside a programming language), etc. However, I'd just like to know what kind of things I should watch out for.

Yes. That is dangerous.
JSON can include single quotes in string values (they do not need to be escaped). See "the tracks" at json.org.
Imagine the data is:
{"pwned": "you' & kill world;"}
Happy coding.
I would consider piping the data in to the program in question (e.g. use "popen" or even a version of "exec" that passes arguments directly) -- this can avoid issues that result from passing through the shell, for instance. Just as with SQL: using placeholders eliminates the need to trifle with "escaping".
If passing through a shell is the only way, then this may be an option (it is not tested, but something similar holds for a "<script>" context):
For every character in the JSON, which is either outside the range of "space" to "~" in ASCII, or has a special meaning in the '' context of a the shell such as \ and ' (but excluding " or any other character -- such as digits -- that can appear outside of "string" data, which is a limitation of this trivial approach), then encode the character using the \uXXXX JSON form. (Per the limitations defined above this should only encode potentially harmful characters appearing within the "strings" in the JSON and there should be no \\ pairs, no trailing \, and no 's, etc.)

It's ok. Just escape the character you use to wrap the string:
' should become '\''
So the JSON string
{"pwned": "you' & kill world;"}
becomes
{"pwned": "you'\'' & kill world;"}
and your final command, as the shell sees it, will be:
$ myscript '{"pwned": "you'\'' & kill world;"}'

Related

MySQL search and replace specific character combinations

NOTE: This is hilarious - I've had to update this post multiple times because it doesn't properly display combinations of the \ character :)
I need to be able to search and replace a specific set of characters without compromising specific allowed combinations or the redundancy of those characters. Let's take a core escape character \ for example.
A string that needs processing may look like this:
"This is \ a test! \\r\\n Let's have \r \n \\\ Some fun!"
Now I need to address each \ for translation to JSON (each \ needs to be \\\\), but at the same time, I don't want to touch the \r or \n. I also want to be able to differentiate between \r and \n vs \r and \n. Is there a method using REGEXP I could adopt that would take the above and convert it to:
"This is \\\\ a test! \\r\\n Let's have \\r \\n \\\\\\\\\\\\ Some fun!"
Note I want each escape \ to be 4x \, and also to ensure special escaped characters are double escaped, but not compound them (the 4x only needs to be applied to stand alone \'s).
What it comes down to is I can't control the data that's coming in, but I can control scrubbing it. I could get weird data with \/////\ by somebody who was just having fun. I need to be able to scrub that as a TEXT value and prepare it for insertion into the database via a dynamically created SQL statement that's executed, which means a single \ needs to be \\\\ and the / is ignored (for example).
I'm thinking I first need to do a specific scan of special escape sequences (such as \', \b, \\, \r, etc.) but at the same time verify they aren't already double escaped. I need need to ensure the additional scans of \ don't meet any special standards (and are just escaped on their own).
I'm hoping somebody has already dealt with this and there's an existing function or SP designed to do this sort of thing so I'm not reinventing the wheel.
Thanks!

Tcl quoting proc to sanitise string to pass to other shells

I want to pass a dict value to another shell (in my application it passes through a few 'shell' levels), and the dict contains characters (space, double quotes, etc) that cause issues.
I can use something like ::base64::encode -wrapchar $dict and the corresponding ::base64::decode $str and it works as expected but the result is, of course, pretty much unreadable.
However, for debugging & presentation reasons I would prefer an encoded/sanitised string that resembled the original dict value inasmuch as reasonable and used a character set that avoids spaces, quotes, etc.
So, I am looking for something like ::base64 mapping procs but with a lighter
touch.
Any suggestions would be appreciated.
You can make lighter-touch quoting schemes using either string map or regsub to do the main work.
Here's an example of string map:
set input "O'Donnell's Bait Shop"
set quoted '[string map {' {'\''}} $input]' ; #'# This comment just because of stupid Stack Overflow syntax highlighter
puts $quoted
# ==> 'O'\''Donnell'\''s Bait Shop'
Here's an example of regsub:
set input "This uses a hypothetical quoting of some letters"
set quoted <[regsub -all {[pqr]} $input {«&»}]>
puts $quoted
# ==> <This uses a hy«p»othetical «q»uoting of some lette«r»s>
You'll need to decide what sort of quoting you really want to use. For myself, if I was going through several shells, I'd be wanting to avoid quoting at all (because it is difficult to get right) and instead find ways to send the data in some other way, perhaps over a pipeline or in a temporary file. At a pinch, I'd use an environment variable, as shells tend to not mess around with those nearly as much as arguments.

Escape single and double quote in TCL

I am using the following script , but it is throwing error message
tcl;
eval {
add command "Audit Param"\
setting "Error : Part's and Spec's desc contains \"OBS\" or \"REPLACE\"" "(Reference No)"\
user all;
}
It is showing error as : Expected word got 'and'.
I tried with Part\'s, but still not working. How to escape both single and double quote , if it is having both?
Single quote and Tcl
In Tcl itself, the single quote character (') has no special meaning at all. It's just an ordinary character like comma (,) or period (.). (Well, except commas have special meaning in expressions and periods are used in floating point values and Tk widget names. Single quote has no meaning at all by comparison.)
With what you have written, any special meaning (and hence any need to quote) is limited to the add command.
Complex quoting situations are often resolved in Tcl by using a different quoting strategy. In particular, putting things in braces disables all substitutions (except backslash-newline-whitespace collapsing). This lets me write the equivalent to what you've written as:
add command "Audit Param" \
setting {Error : Part's and Spec's desc contains "OBS" or "REPLACE"} \
"(Reference No)" user all
Any complaint here is coming from inside that code and is not in the code as written per se. (The eval { ... } adds nothing. Nor does it incur a penalty other than making your code slightly harder to read.)
The real problem
At a very loose guess, that problem string is being used inside an SQL statement with direct string substitution instead of prepared parameters; that could produce that sort of error message. Check the contents of the global errorInfo variable after the failure happens to get a stack trace that can help pin down what went wrong; that might help you see where inside things the code is failing. If it is a piece of naughty SQL, there is code to fix because you've got something that is vulnerable to SQL injection problems (which might or might not be a security problem, depending on the exposure of that command). And if that's the case, doubling up each single quote (changing ' to '') ought to work around the problem in the short run.

Are multi-line strings allowed in JSON?

Is it possible to have multi-line strings in JSON?
It's mostly for visual comfort so I suppose I can just turn word wrap on in my editor, but I'm just kinda curious.
I'm writing some data files in JSON format and would like to have some really long string values split over multiple lines. Using python's JSON module I get a whole lot of errors, whether I use \ or \n as an escape.
JSON does not allow real line-breaks. You need to replace all the line breaks with \n.
eg:
"first line
second line"
can be saved with:
"first line\nsecond line"
Note:
for Python, this should be written as:
"first line\\nsecond line"
where \\ is for escaping the backslash, otherwise python will treat \n as
the control character "new line"
Unfortunately many of the answers here address the question of how to put a newline character in the string data. The question is how to make the code look nicer by splitting the string value across multiple lines of code. (And even the answers that recognize this provide "solutions" that assume one is free to change the data representation, which in many cases one is not.)
And the worse news is, there is no good answer.
In many programming languages, even if they don't explicitly support splitting strings across lines, you can still use string concatenation to get the desired effect; and as long as the compiler isn't awful this is fine.
But json is not a programming language; it's just a data representation. You can't tell it to concatenate strings. Nor does its (fairly small) grammar include any facility for representing a string on multiple lines.
Short of devising a pre-processor of some kind (and I, for one, don't feel like effectively making up my own language to solve this issue), there isn't a general solution to this problem. IF you can change the data format, then you can substitute an array of strings. Otherwise, this is one of the numerous ways that json isn't designed for human-readability.
I have had to do this for a small Node.js project and found this work-around to store multiline strings as array of lines to make it more human-readable (at a cost of extra code to convert them to string later):
{
"modify_head": [
"<script type='text/javascript'>",
"<!--",
" function drawSomeText(id) {",
" var pjs = Processing.getInstanceById(id);",
" var text = document.getElementById('inputtext').value;",
" pjs.drawText(text);}",
"-->",
"</script>"
],
"modify_body": [
"<input type='text' id='inputtext'></input>",
"<button onclick=drawSomeText('ExampleCanvas')></button>"
],
}
Once parsed, I just use myData.modify_head.join('\n') or myData.modify_head.join(), depending upon whether I want a line break after each string or not.
This looks quite neat to me, apart from that I have to use double quotes everywhere. Though otherwise, I could, perhaps, use YAML, but that has other pitfalls and is not supported natively.
Check out the specification! The JSON grammar's char production can take the following values:
any-Unicode-character-except-"-or-\-or-control-character
\"
\\
\/
\b
\f
\n
\r
\t
\u four-hex-digits
Newlines are "control characters" so, no, you may not have a literal newline within your string. However you may encode it using whatever combination of \n and \r you require.
JSON doesn't allow breaking lines for readability.
Your best bet is to use an IDE that will line-wrap for you.
This is a really old question, but I came across this on a search and I think I know the source of your problem.
JSON does not allow "real" newlines in its data; it can only have escaped newlines. See the answer from #YOU. According to the question, it looks like you attempted to escape line breaks in Python two ways: by using the line continuation character ("\") or by using "\n" as an escape.
But keep in mind: if you are using a string in python, special escaped characters ("\t", "\n") are translated into REAL control characters! The "\n" will be replaced with the ASCII control character representing a newline character, which is precisely the character that is illegal in JSON. (As for the line continuation character, it simply takes the newline out.)
So what you need to do is to prevent Python from escaping characters. You can do this by using a raw string (put r in front of the string, as in r"abc\ndef", or by including an extra slash in front of the newline ("abc\\ndef").
Both of the above will, instead of replacing "\n" with the real newline ASCII control character, will leave "\n" as two literal characters, which then JSON can interpret as a newline escape.
Write property value as a array of strings. Like example given over here https://gun.io/blog/multi-line-strings-in-json/. This will help.
We can always use array of strings for multiline strings like following.
{
"singleLine": "Some singleline String",
"multiline": ["Line one", "line Two", "Line Three"]
}
And we can easily iterate array to display content in multi line fashion.
While not standard, I found that some of the JSON libraries have options to support multiline Strings. I am saying this with the caveat, that this will hurt your interoperability.
However in the specific scenario I ran into, I needed to make a config file that was only ever used by one system readable and manageable by humans. And opted for this solution in the end.
Here is how this works out on Java with Jackson:
JsonMapper mapper = JsonMapper.builder()
.enable(JsonReadFeature.ALLOW_UNESCAPED_CONTROL_CHARS)
.build()
This is a very old question, but I had the same question when I wanted to improve readability of our Vega JSON Specification code which uses complex conditoinal expressions. The code is like this.
As this answer says, JSON is not designed for human. I understand that is a historical decision and it makes sense for data exchange purposes. However, JSON is still used as source code for such cases. So I asked our engineers to use Hjson for source code and process it to JSON.
For example, in Git for Windows environment,
you can download the Hjson cli binary and put it in git/bin directory to use.
Then, convert (transpile) Hjson source to JSON. To use automation tools such as Make will be useful to generate JSON.
$ which hjson
/c/Program Files/git/bin/hjson
$ cat example.hjson
{
md:
'''
First line.
Second line.
This line is indented by two spaces.
'''
}
$ hjson -j example.hjson > example.json
$ cat example.json
{
"md": "First line.\nSecond line.\n This line is indented by two spaces."
}
In case of using the transformed JSON in programming languages, language-specific libraries like hjson-js will be useful.
I noticed the same idea was posted in a duplicated question but I would share a bit more information.
You can encode at client side and decode at server side. This will take care of \n and \t characters as well
e.g. I needed to send multiline xml through json
{
"xml": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiID8+CiAgPFN0cnVjdHVyZXM+CiAgICAgICA8aW5wdXRzPgogICAgICAgICAgICAgICAjIFRoaXMgcHJvZ3JhbSBhZGRzIHR3byBudW1iZXJzCgogICAgICAgICAgICAgICBudW0xID0gMS41CiAgICAgICAgICAgICAgIG51bTIgPSA2LjMKCiAgICAgICAgICAgICAgICMgQWRkIHR3byBudW1iZXJzCiAgICAgICAgICAgICAgIHN1bSA9IG51bTEgKyBudW0yCgogICAgICAgICAgICAgICAjIERpc3BsYXkgdGhlIHN1bQogICAgICAgICAgICAgICBwcmludCgnVGhlIHN1bSBvZiB7MH0gYW5kIHsxfSBpcyB7Mn0nLmZvcm1hdChudW0xLCBudW0yLCBzdW0pKQogICAgICAgPC9pbnB1dHM+CiAgPC9TdHJ1Y3R1cmVzPg=="
}
then decode it on server side
public class XMLInput
{
public string xml { get; set; }
public string DecodeBase64()
{
var valueBytes = System.Convert.FromBase64String(this.xml);
return Encoding.UTF8.GetString(valueBytes);
}
}
public async Task<string> PublishXMLAsync([FromBody] XMLInput xmlInput)
{
string data = xmlInput.DecodeBase64();
}
once decoded you'll get your original xml
<?xml version="1.0" encoding="utf-8" ?>
<Structures>
<inputs>
# This program adds two numbers
num1 = 1.5
num2 = 6.3
# Add two numbers
sum = num1 + num2
# Display the sum
print('The sum of {0} and {1} is {2}'.format(num1, num2, sum))
</inputs>
</Structures>
\n\r\n worked for me !!
\n for single line break and \n\r\n for double line break
I see many answers here that may not works in most cases but may be the easiest solution if let's say you wanna output what you wrote down inside a JSON file (for example: for language translations where you wanna have just one key with more than 1 line outputted on the client) can be just adding some special characters of your choice PS: allowed by the JSON files like \\ before the new line and use some JS to parse the text ... like:
Example:
File (text.json)
{"text": "some JSON text. \\ Next line of JSON text"}
import text from 'text.json'
{text.split('\\')
.map(line => {
return (
<div>
{line}
<br />
</div>
);
})}}
Assuming the question has to do with easily editing text files and then manually converting them to json, there are two solutions I found:
hjson (that was mentioned in this previous answer), in which case you can convert your existing json file to hjson format by executing hjson source.json > target.hjson, edit in your favorite editor, and convert back to json hjson -j target.hjson > source.json. You can download the binary here or use the online conversion here.
jsonnet, which does the same, but with a slightly different format (single and double quoted strings are simply allowed to span multiple lines). Conveniently, the homepage has editable input fields so you can simply insert your multiple line json/jsonnet files there and they will be converted online to standard json immediately. Note that jsonnet supports much more goodies for templating json files, so it may be useful to look into, depending on your needs.
The reason OP asked is the same reason I ended up here. Had a json file with long text.
In VS Code it's just ALT+Z to turn on word wrapping in a json file. Changing the actual data isn't what you want, if all you really want is to read the contents of the file as a developer.
If it's just for presentation in your editor you may use ` instead of " or '
const obj = {
myMultiLineString: `This is written in a \
multiline way. \
The backside of it is that you \
can't use indentation on every new \
line because is would be included in \
your string. \
The backslash after each line escapes the carriage return.
`
}
Examples:
console.log(`First line \
Second line`);
will put in console:
First line Second line
console.log(`First line
second line`);
will put in console:
First line
second line
Hope this answered your question.

iconv gives "Illegal Character" with smart quotes -- how to get rid of them?

I have a MySQL table with 120,000 lines stored in UTF-8 format. There is one field, product name, that contains text with many accents. I need to fill a second field with this same name after converting it to a url-friendly form (ASCII).
Since PHP doesn't directly handle UTF-8, I'm using:
$value = iconv ('UTF-8', 'ISO-8859-1', $value);
to convert the name to ISO-8859-1, followed by a massive strstr statement to replace any accented character by its unaccented equivalent (à becomes a, for example).
However, the original text names were entered with smart quotes, and iconv chokes whenever it comes across one -- I get:
Unknown error type: [8]
iconv() [function.iconv]: Detected an illegal character in input string
To get rid of the smart quotes before using iconv, I have tried using three statements like:
$value = str_replace('’', "'", $value);
(’ is the raw value of a UTF-8 smart single quote)
Because the text file is so long, these str_replace's cause the script to time out every single time.
What is the fastest way to strip out the smart quotes (or any invalid characters) from a UTF-8 string, prior to running iconv?
Or, is there an easier solution to this whole problem? What is the fastest way to convert a name with many accents, in UTF-8, to a name with no accents, spelled correctly, in ASCII?
Glibc (and the GNU libiconv) supports //TRANSLIT and //IGNORE suffixes.
Thus, on Linux, this works just fine:
$ echo $'\xe2\x80\x99'
’
$ echo $'\xe2\x80\x99' | iconv -futf8 -tiso8859-1
iconv: illegal input sequence at position 0
$ echo $'\xe2\x80\x99' | iconv -futf8 -tiso8859-1//translit
'
I'm not sure what iconv is in use by PHP, but the documentation implies that //TRANSLIT and //IGNORE will work there too.
What do you mean by "link-friendly"? Only way that makes sense to me, since the text between <a>...</a> tags can be anything, is actually "URL-friendly", similar to SO's URLs where everything is converted to [a-z-].
If that's what you're going for, you'll need a transliteration library, not a character set conversion library. (I've had no luck getting iconv() to do the work in the past, but I haven't tried in a while.) There's a beta PHP extension translit that probably does the job.
If you can't add extensions to your PHP install, you'll have to look for a PHP library that does the same thing. I haven't used it, but the PHP UTF-8 library implements a utf8_to_ascii library that I assume does something like what you need.
(Also, if iconv() is failing like you said, it means that your input isn't actually valid UTF-8, so no amount of replacing valid UTF-8 with anything else will help the problem. EDIT: I may take that back: if ephemient's answer is correct, the iconv error you're seeing may very well be because there's no direct representation of the character in the destination character set. So, nevermind.)
Have you considered using MySQL's REPLACE string function to change the offending strings into apostrophes, or whatever? You may be able to put together the "string to be replaced" part e.g. by using CONCAT on CHAR calls...