BigQuery failing to import CSV from tshark - csv

Currently, I have tshark logging all packets matching a certain messaging criteria and outputting them into a CSV. The CSVs are then stored on Google CloudStorage ready for importing into BigQuery.
This is one example line from the CSV that tshark outputs.
"1380106851.793056000",
"1.1.1.1",
"2.2.2.2",
"99999",
"1111",
"raw:ip",
"324",
"af:00:21:9a",
"880",
"102",
"74:00",
"ORIG",
"It's text or !\x0a\" 's not D",
"0x00",
"0",
BigQuery will not import this line claiming that "Data between close double quote (") and field separator: field starts with: ". I assume it is the 13th column ("It's text or !\x0a\" 's not D") that is causing this issue, but I'm unsure of how to negate it. This column contains the message text and it is reasonable to assume that it may never contain balanced syntax.
The only remedy that I can think of is running awk over the file and replacing any non-syntax double-quotes with single quotes.
Is there anything I've missed?

I'm not sure why tshark escapes double quotes with a backslash, but according to RFC 4180, they should be quoted with a double quote:
"A (double) quote character in a field must be represented by two
(double) quote characters."
BigQuery will happily ingest a quote escaped in this way:
Doesn't work: "It's text or !\x0a\" 's not D"
Works: "It's text or !\x0a"" 's not D"
Is there a way to tell tshark how to appropriately escape CSV? Otherwise I bet it would be a welcomed patch, if citing RFC standards. Also, if necessary this alternate escape mechanism could be implemented as a BigQuery feature (I guess votes in this question can act as measure of how much it's needed).

Related

how can i make my code indented like this for better readability and ease [duplicate]

Is it possible to have multi-line strings in JSON?
It's mostly for visual comfort so I suppose I can just turn word wrap on in my editor, but I'm just kinda curious.
I'm writing some data files in JSON format and would like to have some really long string values split over multiple lines. Using python's JSON module I get a whole lot of errors, whether I use \ or \n as an escape.
JSON does not allow real line-breaks. You need to replace all the line breaks with \n.
eg:
"first line
second line"
can be saved with:
"first line\nsecond line"
Note:
for Python, this should be written as:
"first line\\nsecond line"
where \\ is for escaping the backslash, otherwise python will treat \n as
the control character "new line"
Unfortunately many of the answers here address the question of how to put a newline character in the string data. The question is how to make the code look nicer by splitting the string value across multiple lines of code. (And even the answers that recognize this provide "solutions" that assume one is free to change the data representation, which in many cases one is not.)
And the worse news is, there is no good answer.
In many programming languages, even if they don't explicitly support splitting strings across lines, you can still use string concatenation to get the desired effect; and as long as the compiler isn't awful this is fine.
But json is not a programming language; it's just a data representation. You can't tell it to concatenate strings. Nor does its (fairly small) grammar include any facility for representing a string on multiple lines.
Short of devising a pre-processor of some kind (and I, for one, don't feel like effectively making up my own language to solve this issue), there isn't a general solution to this problem. IF you can change the data format, then you can substitute an array of strings. Otherwise, this is one of the numerous ways that json isn't designed for human-readability.
I have had to do this for a small Node.js project and found this work-around to store multiline strings as array of lines to make it more human-readable (at a cost of extra code to convert them to string later):
{
"modify_head": [
"<script type='text/javascript'>",
"<!--",
" function drawSomeText(id) {",
" var pjs = Processing.getInstanceById(id);",
" var text = document.getElementById('inputtext').value;",
" pjs.drawText(text);}",
"-->",
"</script>"
],
"modify_body": [
"<input type='text' id='inputtext'></input>",
"<button onclick=drawSomeText('ExampleCanvas')></button>"
],
}
Once parsed, I just use myData.modify_head.join('\n') or myData.modify_head.join(), depending upon whether I want a line break after each string or not.
This looks quite neat to me, apart from that I have to use double quotes everywhere. Though otherwise, I could, perhaps, use YAML, but that has other pitfalls and is not supported natively.
Check out the specification! The JSON grammar's char production can take the following values:
any-Unicode-character-except-"-or-\-or-control-character
\"
\\
\/
\b
\f
\n
\r
\t
\u four-hex-digits
Newlines are "control characters" so, no, you may not have a literal newline within your string. However you may encode it using whatever combination of \n and \r you require.
JSON doesn't allow breaking lines for readability.
Your best bet is to use an IDE that will line-wrap for you.
This is a really old question, but I came across this on a search and I think I know the source of your problem.
JSON does not allow "real" newlines in its data; it can only have escaped newlines. See the answer from #YOU. According to the question, it looks like you attempted to escape line breaks in Python two ways: by using the line continuation character ("\") or by using "\n" as an escape.
But keep in mind: if you are using a string in python, special escaped characters ("\t", "\n") are translated into REAL control characters! The "\n" will be replaced with the ASCII control character representing a newline character, which is precisely the character that is illegal in JSON. (As for the line continuation character, it simply takes the newline out.)
So what you need to do is to prevent Python from escaping characters. You can do this by using a raw string (put r in front of the string, as in r"abc\ndef", or by including an extra slash in front of the newline ("abc\\ndef").
Both of the above will, instead of replacing "\n" with the real newline ASCII control character, will leave "\n" as two literal characters, which then JSON can interpret as a newline escape.
Write property value as a array of strings. Like example given over here https://gun.io/blog/multi-line-strings-in-json/. This will help.
We can always use array of strings for multiline strings like following.
{
"singleLine": "Some singleline String",
"multiline": ["Line one", "line Two", "Line Three"]
}
And we can easily iterate array to display content in multi line fashion.
While not standard, I found that some of the JSON libraries have options to support multiline Strings. I am saying this with the caveat, that this will hurt your interoperability.
However in the specific scenario I ran into, I needed to make a config file that was only ever used by one system readable and manageable by humans. And opted for this solution in the end.
Here is how this works out on Java with Jackson:
JsonMapper mapper = JsonMapper.builder()
.enable(JsonReadFeature.ALLOW_UNESCAPED_CONTROL_CHARS)
.build()
This is a very old question, but I had the same question when I wanted to improve readability of our Vega JSON Specification code which uses complex conditoinal expressions. The code is like this.
As this answer says, JSON is not designed for human. I understand that is a historical decision and it makes sense for data exchange purposes. However, JSON is still used as source code for such cases. So I asked our engineers to use Hjson for source code and process it to JSON.
For example, in Git for Windows environment,
you can download the Hjson cli binary and put it in git/bin directory to use.
Then, convert (transpile) Hjson source to JSON. To use automation tools such as Make will be useful to generate JSON.
$ which hjson
/c/Program Files/git/bin/hjson
$ cat example.hjson
{
md:
'''
First line.
Second line.
This line is indented by two spaces.
'''
}
$ hjson -j example.hjson > example.json
$ cat example.json
{
"md": "First line.\nSecond line.\n This line is indented by two spaces."
}
In case of using the transformed JSON in programming languages, language-specific libraries like hjson-js will be useful.
I noticed the same idea was posted in a duplicated question but I would share a bit more information.
You can encode at client side and decode at server side. This will take care of \n and \t characters as well
e.g. I needed to send multiline xml through json
{
"xml": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiID8+CiAgPFN0cnVjdHVyZXM+CiAgICAgICA8aW5wdXRzPgogICAgICAgICAgICAgICAjIFRoaXMgcHJvZ3JhbSBhZGRzIHR3byBudW1iZXJzCgogICAgICAgICAgICAgICBudW0xID0gMS41CiAgICAgICAgICAgICAgIG51bTIgPSA2LjMKCiAgICAgICAgICAgICAgICMgQWRkIHR3byBudW1iZXJzCiAgICAgICAgICAgICAgIHN1bSA9IG51bTEgKyBudW0yCgogICAgICAgICAgICAgICAjIERpc3BsYXkgdGhlIHN1bQogICAgICAgICAgICAgICBwcmludCgnVGhlIHN1bSBvZiB7MH0gYW5kIHsxfSBpcyB7Mn0nLmZvcm1hdChudW0xLCBudW0yLCBzdW0pKQogICAgICAgPC9pbnB1dHM+CiAgPC9TdHJ1Y3R1cmVzPg=="
}
then decode it on server side
public class XMLInput
{
public string xml { get; set; }
public string DecodeBase64()
{
var valueBytes = System.Convert.FromBase64String(this.xml);
return Encoding.UTF8.GetString(valueBytes);
}
}
public async Task<string> PublishXMLAsync([FromBody] XMLInput xmlInput)
{
string data = xmlInput.DecodeBase64();
}
once decoded you'll get your original xml
<?xml version="1.0" encoding="utf-8" ?>
<Structures>
<inputs>
# This program adds two numbers
num1 = 1.5
num2 = 6.3
# Add two numbers
sum = num1 + num2
# Display the sum
print('The sum of {0} and {1} is {2}'.format(num1, num2, sum))
</inputs>
</Structures>
\n\r\n worked for me !!
\n for single line break and \n\r\n for double line break
I see many answers here that may not works in most cases but may be the easiest solution if let's say you wanna output what you wrote down inside a JSON file (for example: for language translations where you wanna have just one key with more than 1 line outputted on the client) can be just adding some special characters of your choice PS: allowed by the JSON files like \\ before the new line and use some JS to parse the text ... like:
Example:
File (text.json)
{"text": "some JSON text. \\ Next line of JSON text"}
import text from 'text.json'
{text.split('\\')
.map(line => {
return (
<div>
{line}
<br />
</div>
);
})}}
Assuming the question has to do with easily editing text files and then manually converting them to json, there are two solutions I found:
hjson (that was mentioned in this previous answer), in which case you can convert your existing json file to hjson format by executing hjson source.json > target.hjson, edit in your favorite editor, and convert back to json hjson -j target.hjson > source.json. You can download the binary here or use the online conversion here.
jsonnet, which does the same, but with a slightly different format (single and double quoted strings are simply allowed to span multiple lines). Conveniently, the homepage has editable input fields so you can simply insert your multiple line json/jsonnet files there and they will be converted online to standard json immediately. Note that jsonnet supports much more goodies for templating json files, so it may be useful to look into, depending on your needs.
If it's just for presentation in your editor you may use ` instead of " or '
const obj = {
myMultiLineString: `This is written in a \
multiline way. \
The backside of it is that you \
can't use indentation on every new \
line because is would be included in \
your string. \
The backslash after each line escapes the carriage return.
`
}
Examples:
console.log(`First line \
Second line`);
will put in console:
First line Second line
console.log(`First line
second line`);
will put in console:
First line
second line
Hope this answered your question.

Snowflake how to escape all special characters in a string of an array of objects before we parse it as JSON?

We are loading data into Snowflake using a JavaScript procedure.
The script will loop over an array of objects to load some data. These objects contain string that may have special characters.
i.e.:
"Description": "This file contain "sensitive" information."
The double quotes on sensitive word will become:
"Description": "This file contain \"sensitive\" information."
Which broke the loading script.
The same issue happened when we used HTML tags within description key:
"Description": "Please use <b>specific fonts</b> to update the file".
This is another example on the Snowflake community site.
Also this post recommended setting FIELD_OPTIONALLY_ENCLOSED_BY equal to the special characters, but I am handling large data set which might have all the special characters.
How can we escape special characters automatically without updating the script and use JavaScript to loop over the whole array to anticipate and replace each special character with something else?
EDIT
I tried using JSON_EXTRACT_PATH_TEXT:
select JSON_EXTRACT_PATH_TEXT(parse_json('{
"description": "Please use \"Custom\" fonts"
}'), 'description');
and got the following error:
Error parsing JSON: missing comma, line 2, pos 33.
I think the escape characters generated by the JS procedure are escaped when passing to SQL functions.
'{"description": "Please use \"Custom\" fonts"}'
becomes
'{"description": "Please use "Custom" fonts"}'
Therefore parsing them as JSON/fetching a field from JSON fails. To avoid error, the JavaScript procedure should generate a double backslash instead of a backslash:
'{"description": "Please use \\"Custom\\" fonts"}'
I do not think there is a way to prevent this error without modifying the JavaScript procedure.
I came across this today, Gokhan is right you need the double backslashes to properly escape the quote.
Here are a couple links that explain it a little more:
https://community.snowflake.com/s/article/Escaping-new-line-character-in-JSON-to-avoid-data-loading-errors
https://community.snowflake.com/s/article/Unable-to-Insert-Data-Containing-Back-Slash-from-Stored-Procedure
For my case I found that I could address this challenge by disabling the escaping and then manually replacing the using replace function.
For your example the replace is not necessary.
select parse_json($${"description": "Please use \"Custom\" fonts"}$$);
select parse_json($${"description": "Please use \"Custom\" fonts"}$$):description;

Is CSV data with missing leading quotations considered malformed?

I am using OpenCSV to read CSV files. Looking over the docs, I don't see guidelines on how to handle malformed data.
I have a CSV File. Comes with all the expected features: each field is separated by a comma, and each field is surrounded by quotes in case one of the values may contain a comma. However, every line (except the headers) is missing a leading quote. Here is an example
"Header 1","Header2"
value1","value2"
value1","value2"
The CSV parser ended up skipping every other line due to the way the quotes were lined up, which obviously causes problems.
I would consider this to be an error, because the first column is missing quotation marks since I know what the data should look like, but as far as the CSV spec is considered, this may be considered valid? If so, I suppose I would have to build extra checks myself to make sure that I am not missing any lines, despite it containing valid CSV data.
According of the rfc for CSV files:
While there are various specifications and implementations for the
CSV format, there is no formal
specification in existence, which allows for a wide variety of
interpretations of CSV files.
So simply put, malformed? No. Informal? No. Even this article (Linked in the RFC) mentions that lines can be mixmatched with quotes and no quotes.
For the data you show:
"Header 1","Header2"
value1","value2"
value1","value2"
we could argue the data is not malformed if the fields would be considered as being not quoted and the fields never contain a separator and there are no multiline fields, which would give the values:
"Header 1" "Header2"
value1" "value2"
value1" "value2"
Of course it's obvious this data was meant to have quoted fields. In that case the data is certainly malformed, and could be parsed differently with different parsers (maybe even as multiline fields).
Valid options would be:
value1,value2 // no quotes at all
"value1","value2" // all quoted
value1,"value2,more data" // only quoted when there is a separator inside

Choosing between tsv and csv

I have a program that outputs a table, and I was wondering if there are any advantages/disadvantages between the csv and tsv formats.
TSV is a very efficient for Javascript/Perl/Python to process, without losing
any typing information, and also easy for humans to read.
The format has been supported in 4store since its public release, and
it's reasonably widely used.
The way I look at it is: CSV is for loading into spreadsheets, TSV is
for processing by bespoke software.
You can see here the technical specification of each here.
The choice depends on the application. In a nutshell, if your fields don't contain commas, use CSV; otherwise TSV is the way to go.
TL;DR
In both formats, the problem arises when the delimiter can appear within the fields, so it is necessary to indicate that the delimiter is not working as a field separator but as a value within the field, which can be somewhat painful.
For example, using CSV: Kalman, Rudolf, von Neumann, John, Gabor, Dennis
Some basic approaches are:
Delete all the delimiters that appear within the field.
E.g. Kalman Rudolf, von Neumann John, Gabor Dennis
Escape the character (usually pre-appending a backslash \).
E.g. Kalman\, Rudolf, von Neumann\, John, Gabor\, Dennis
Enclose each field with other character (usually double quotes ").
E.g. "Kalman, Rudolf", "von Neumann, John", "Gabor, Dennis"
CSV
The fields are separated by a comma ,.
For example:
Name,Score,Country
Peter,156,GB
Piero,89,IT
Pedro,31415,ES
Advantages:
It is more generic and useful when sharing with non-technical people,
as most of software packages can read it without playing with the
settings.
Disadvantages:
Escaping the comma within the fields can be frustrating because not
everybody follows the standards.
All the extra escaping characters and quotes add weight to the final file size.
TSV
The fields are separated by a tabulation <TAB> or \t
For example:
Name<TAB>Score<TAB>Country
Peter<TAB>156<TAB>GB
Piero<TAB>89<TAB>IT
Pedro<TAB>31415<TAB>ES
Advantages:
It is not necessary to escape the delimiter as it is not usual to have the tab-character within a field. Otherwise, it should be removed.
Disadvantages:
It is less widespread.
TSV-utils makes an interesting comparison, copied here after. In a nutshell, use TSV.
Comparing TSV and CSV formats
The differences between TSV and CSV formats can be confusing. The obvious distinction is the default field delimiter: TSV uses TAB, CSV uses comma. Both use newline as the record delimiter.
By itself, using different field delimiters is not especially significant. Far more important is the approach to delimiters occurring in the data. CSV uses an escape syntax to represent comma and newlines in the data. TSV takes a different approach, disallowing TABs and newlines in the data.
The escape syntax enables CSV to fully represent common written text. This is a good fit for human edited documents, notably spreadsheets. This generality has a cost: reading it requires programs to parse the escape syntax. While not overly difficult, it is still easy to do incorrectly, especially when writing one-off programs. It is good practice is to use a CSV parser when processing CSV files. Traditional Unix tools like cut, sort, awk, and diff do not process CSV escapes, alternate tools are needed.
By contrast, parsing TSV data is simple. Records can be read using the typical readline routines found in most programming languages. The fields in each record can be found using split routines. Unix utilities can be called by providing the correct field delimiter, e.g. awk -F "\t", sort -t $'\t'. No special parser is needed. This is much more reliable. It is also faster, no CPU time is used parsing the escape syntax.
The speed advantages are especially pronounced for record oriented operations. Record counts (wc -l), deduplication (uniq, tsv-uniq), file splitting (head, tail, split), shuffling (GNU shuf, tsv-sample), etc. TSV is faster because record boundaries can be found using highly optimized newline search routines (e.g. memchr). Identifying CSV record boundaries requires fully parsing each record.
These characteristics makes TSV format well suited for the large tabular data sets common in data mining and machine learning environments. These data sets rarely need TAB and newline characters in the fields.
The most common CSV escape format uses quotes to delimit fields containing delimiters. Quotes must also be escaped, this is done by using a pair of quotes to represent a single quote. Consider the data in this table:
Field-1
Field-2
Field-3
abc
hello, world!
def
ghi
Say "hello, world!"
jkl
In Field-2, the first value contains a comma, the second value contain both quotes and a comma. Here is the CSV representation, using escapes to represent commas and quotes in the data.
Field-1,Field-2,Field-3
abc,"hello, world!",def
ghi,"Say ""hello, world!""",jkl
In the above example, only fields with delimiters are quoted. It is also common to quote all fields whether or not they contain delimiters. The following CSV file is equivalent:
"Field-1","Field-2","Field-3"
"abc","hello, world!","def"
"ghi","Say ""hello, world!""","jkl"
Here's the same data in TSV. It is much simpler as no escapes are involved:
Field-1 Field-2 Field-3
abc hello, world! def
ghi Say "hello, world!" jkl
The similarity between TSV and CSV can lead to confusion about which tools are appropriate. Furthering this confusion, it is somewhat common to have data files using comma as the field delimiter, but without comma, quote, or newlines in the data. No CSV escapes are needed in these files, with the implication that traditional Unix tools like cut and awk can be used to process these files. Such files are sometimes referred to as "simple CSV". They are equivalent to TSV files with comma as a field delimiter. Traditional Unix tools and tsv-utils tools can process these files correctly by specifying the field delimiter. However, "simple csv" is a very ad hoc and ill defined notion. A simple precaution when working with these files is to run a CSV-to-TSV converter like csv2tsv prior to other processing steps.
Note that many CSV-to-TSV conversion tools don't actually remove the CSV escapes. Instead, many tools replace comma with TAB as the record delimiter, but still use CSV escapes to represent TAB, newline, and quote characters in the data. Such data cannot be reliably processed by Unix tools like sort, awk, and cut. The csv2tsv tool in tsv-utils avoids escapes by replacing TAB and newline with a space (customizable). This works well in the vast majority of data mining scenarios.
To see what a specific CSV-to-TSV conversion tool does, convert CSV data containing quotes, commas, TABs, newlines, and double-quoted fields. For example:
$ echo $'Line,Field1,Field2\n1,"Comma: |,|","Quote: |""|"\n"2","TAB: |\t|","Newline: |\n|"' | <csv-to-tsv-converter>
Approaches that generate CSV escapes will enclose a number of the output fields in double quotes.
References:
Wikipedia: Tab-separated values - Useful description of TSV format.
IANA TSV specification - Formal definition of the tab-separated-values mime type.
Wikipedia: Comma-separated-values - Describes CSV and related formats.
RFC 4180 - IETF CSV format description, the closest thing to an actual standard for CSV.
brendano/tsvutils: The philosophy of tsvutils - Brendan O'Connor's discussion of the rationale for using TSV format in his open source toolkit.
So You Want To Write Your Own CSV code? - Thomas Burette's humorous, and accurate, blog post describing the troubles with ad-hoc CSV parsing. Of course, you could use TSV and avoid these problems!
You can use any delimiter you want, but tabs and commas are supported by many applications, including Excel, MySQL, PostgreSQL. Commas are common in text fields, so if you escape them, more of them need to be escaped. If you don't escape them and your fields might contain commas, then you can't confidently run "sort -k2,4" on your file. You might need to escape some characters in fields anyway (null bytes, newlines, etc.). For these reasons and more, my preference is to use TSVs, and escape tabs, null bytes, and newlines within fields. Additionally, it is usually easier to work with TSVs. Just split each line by the tab delimiter. With CSVs there are quoted fields, possibly fields with newlines, etc. I only use CSVs when I'm forced to.
I think that generally csv, are supported more often than the tsv format.

Are multi-line strings allowed in JSON?

Is it possible to have multi-line strings in JSON?
It's mostly for visual comfort so I suppose I can just turn word wrap on in my editor, but I'm just kinda curious.
I'm writing some data files in JSON format and would like to have some really long string values split over multiple lines. Using python's JSON module I get a whole lot of errors, whether I use \ or \n as an escape.
JSON does not allow real line-breaks. You need to replace all the line breaks with \n.
eg:
"first line
second line"
can be saved with:
"first line\nsecond line"
Note:
for Python, this should be written as:
"first line\\nsecond line"
where \\ is for escaping the backslash, otherwise python will treat \n as
the control character "new line"
Unfortunately many of the answers here address the question of how to put a newline character in the string data. The question is how to make the code look nicer by splitting the string value across multiple lines of code. (And even the answers that recognize this provide "solutions" that assume one is free to change the data representation, which in many cases one is not.)
And the worse news is, there is no good answer.
In many programming languages, even if they don't explicitly support splitting strings across lines, you can still use string concatenation to get the desired effect; and as long as the compiler isn't awful this is fine.
But json is not a programming language; it's just a data representation. You can't tell it to concatenate strings. Nor does its (fairly small) grammar include any facility for representing a string on multiple lines.
Short of devising a pre-processor of some kind (and I, for one, don't feel like effectively making up my own language to solve this issue), there isn't a general solution to this problem. IF you can change the data format, then you can substitute an array of strings. Otherwise, this is one of the numerous ways that json isn't designed for human-readability.
I have had to do this for a small Node.js project and found this work-around to store multiline strings as array of lines to make it more human-readable (at a cost of extra code to convert them to string later):
{
"modify_head": [
"<script type='text/javascript'>",
"<!--",
" function drawSomeText(id) {",
" var pjs = Processing.getInstanceById(id);",
" var text = document.getElementById('inputtext').value;",
" pjs.drawText(text);}",
"-->",
"</script>"
],
"modify_body": [
"<input type='text' id='inputtext'></input>",
"<button onclick=drawSomeText('ExampleCanvas')></button>"
],
}
Once parsed, I just use myData.modify_head.join('\n') or myData.modify_head.join(), depending upon whether I want a line break after each string or not.
This looks quite neat to me, apart from that I have to use double quotes everywhere. Though otherwise, I could, perhaps, use YAML, but that has other pitfalls and is not supported natively.
Check out the specification! The JSON grammar's char production can take the following values:
any-Unicode-character-except-"-or-\-or-control-character
\"
\\
\/
\b
\f
\n
\r
\t
\u four-hex-digits
Newlines are "control characters" so, no, you may not have a literal newline within your string. However you may encode it using whatever combination of \n and \r you require.
JSON doesn't allow breaking lines for readability.
Your best bet is to use an IDE that will line-wrap for you.
This is a really old question, but I came across this on a search and I think I know the source of your problem.
JSON does not allow "real" newlines in its data; it can only have escaped newlines. See the answer from #YOU. According to the question, it looks like you attempted to escape line breaks in Python two ways: by using the line continuation character ("\") or by using "\n" as an escape.
But keep in mind: if you are using a string in python, special escaped characters ("\t", "\n") are translated into REAL control characters! The "\n" will be replaced with the ASCII control character representing a newline character, which is precisely the character that is illegal in JSON. (As for the line continuation character, it simply takes the newline out.)
So what you need to do is to prevent Python from escaping characters. You can do this by using a raw string (put r in front of the string, as in r"abc\ndef", or by including an extra slash in front of the newline ("abc\\ndef").
Both of the above will, instead of replacing "\n" with the real newline ASCII control character, will leave "\n" as two literal characters, which then JSON can interpret as a newline escape.
Write property value as a array of strings. Like example given over here https://gun.io/blog/multi-line-strings-in-json/. This will help.
We can always use array of strings for multiline strings like following.
{
"singleLine": "Some singleline String",
"multiline": ["Line one", "line Two", "Line Three"]
}
And we can easily iterate array to display content in multi line fashion.
While not standard, I found that some of the JSON libraries have options to support multiline Strings. I am saying this with the caveat, that this will hurt your interoperability.
However in the specific scenario I ran into, I needed to make a config file that was only ever used by one system readable and manageable by humans. And opted for this solution in the end.
Here is how this works out on Java with Jackson:
JsonMapper mapper = JsonMapper.builder()
.enable(JsonReadFeature.ALLOW_UNESCAPED_CONTROL_CHARS)
.build()
This is a very old question, but I had the same question when I wanted to improve readability of our Vega JSON Specification code which uses complex conditoinal expressions. The code is like this.
As this answer says, JSON is not designed for human. I understand that is a historical decision and it makes sense for data exchange purposes. However, JSON is still used as source code for such cases. So I asked our engineers to use Hjson for source code and process it to JSON.
For example, in Git for Windows environment,
you can download the Hjson cli binary and put it in git/bin directory to use.
Then, convert (transpile) Hjson source to JSON. To use automation tools such as Make will be useful to generate JSON.
$ which hjson
/c/Program Files/git/bin/hjson
$ cat example.hjson
{
md:
'''
First line.
Second line.
This line is indented by two spaces.
'''
}
$ hjson -j example.hjson > example.json
$ cat example.json
{
"md": "First line.\nSecond line.\n This line is indented by two spaces."
}
In case of using the transformed JSON in programming languages, language-specific libraries like hjson-js will be useful.
I noticed the same idea was posted in a duplicated question but I would share a bit more information.
You can encode at client side and decode at server side. This will take care of \n and \t characters as well
e.g. I needed to send multiline xml through json
{
"xml": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiID8+CiAgPFN0cnVjdHVyZXM+CiAgICAgICA8aW5wdXRzPgogICAgICAgICAgICAgICAjIFRoaXMgcHJvZ3JhbSBhZGRzIHR3byBudW1iZXJzCgogICAgICAgICAgICAgICBudW0xID0gMS41CiAgICAgICAgICAgICAgIG51bTIgPSA2LjMKCiAgICAgICAgICAgICAgICMgQWRkIHR3byBudW1iZXJzCiAgICAgICAgICAgICAgIHN1bSA9IG51bTEgKyBudW0yCgogICAgICAgICAgICAgICAjIERpc3BsYXkgdGhlIHN1bQogICAgICAgICAgICAgICBwcmludCgnVGhlIHN1bSBvZiB7MH0gYW5kIHsxfSBpcyB7Mn0nLmZvcm1hdChudW0xLCBudW0yLCBzdW0pKQogICAgICAgPC9pbnB1dHM+CiAgPC9TdHJ1Y3R1cmVzPg=="
}
then decode it on server side
public class XMLInput
{
public string xml { get; set; }
public string DecodeBase64()
{
var valueBytes = System.Convert.FromBase64String(this.xml);
return Encoding.UTF8.GetString(valueBytes);
}
}
public async Task<string> PublishXMLAsync([FromBody] XMLInput xmlInput)
{
string data = xmlInput.DecodeBase64();
}
once decoded you'll get your original xml
<?xml version="1.0" encoding="utf-8" ?>
<Structures>
<inputs>
# This program adds two numbers
num1 = 1.5
num2 = 6.3
# Add two numbers
sum = num1 + num2
# Display the sum
print('The sum of {0} and {1} is {2}'.format(num1, num2, sum))
</inputs>
</Structures>
\n\r\n worked for me !!
\n for single line break and \n\r\n for double line break
I see many answers here that may not works in most cases but may be the easiest solution if let's say you wanna output what you wrote down inside a JSON file (for example: for language translations where you wanna have just one key with more than 1 line outputted on the client) can be just adding some special characters of your choice PS: allowed by the JSON files like \\ before the new line and use some JS to parse the text ... like:
Example:
File (text.json)
{"text": "some JSON text. \\ Next line of JSON text"}
import text from 'text.json'
{text.split('\\')
.map(line => {
return (
<div>
{line}
<br />
</div>
);
})}}
Assuming the question has to do with easily editing text files and then manually converting them to json, there are two solutions I found:
hjson (that was mentioned in this previous answer), in which case you can convert your existing json file to hjson format by executing hjson source.json > target.hjson, edit in your favorite editor, and convert back to json hjson -j target.hjson > source.json. You can download the binary here or use the online conversion here.
jsonnet, which does the same, but with a slightly different format (single and double quoted strings are simply allowed to span multiple lines). Conveniently, the homepage has editable input fields so you can simply insert your multiple line json/jsonnet files there and they will be converted online to standard json immediately. Note that jsonnet supports much more goodies for templating json files, so it may be useful to look into, depending on your needs.
The reason OP asked is the same reason I ended up here. Had a json file with long text.
In VS Code it's just ALT+Z to turn on word wrapping in a json file. Changing the actual data isn't what you want, if all you really want is to read the contents of the file as a developer.
If it's just for presentation in your editor you may use ` instead of " or '
const obj = {
myMultiLineString: `This is written in a \
multiline way. \
The backside of it is that you \
can't use indentation on every new \
line because is would be included in \
your string. \
The backslash after each line escapes the carriage return.
`
}
Examples:
console.log(`First line \
Second line`);
will put in console:
First line Second line
console.log(`First line
second line`);
will put in console:
First line
second line
Hope this answered your question.