Finding a string between two strings in a file - json

This is a bit of a .json file I need to find information in:
"title":
"Spring bank holiday","date":"2012-06-04","notes":"Substitute day","bunting":true},
{"title":"Queen\u2019s Diamond Jubilee","date":"2012-06-05","notes":"Extra bank holiday","bunting":true},
{"title":"Summer bank holiday","date":"2012-08-27","notes":"","bunting":true},
{"title":"Christmas Day","date":"2012-12-25","notes":"","bunting":true},
{"title":"Boxing Day","date":"2012-12-26","notes":"","bunting":true},
{"title":"New Year\u2019s Day","date":"2013-01-01","notes":"","bunting":true},
{"title":"Good Friday","date":"2013-03-29","notes":"","bunting":false},
{"title":"
The file is much longer, but it is one long line of text.
I would like to display what bank holiday it is after a certain date, and also if it involves bunting.
I've tried grep and sed but I can't figure it out.
I'd like something like this:
[command] between [date] and [}] display [title] and [bunting]/[no bunting]
[title] should be just "Christmas Day" or something else
Forgot to mention:
I would like to achieve this in bash shell, either from the prompt or from a short bit of code.

You should use a proper JSON parser in a decent programming language, then you can do a lot of work in a safe way without too much code. How about this little Python code:
#!/usr/bin/env python
import json
with open('my.json') as jsonFile:
holidays = json.load(jsonFile)
for holiday in holidays:
if holiday['date'] > '2012-05-06':
print holiday['date'], ':', holiday['title'], \
("bunting" if holiday['bunting'] else "no bunting")
break # in case you only want one line of output
I could not figure out what exactly the output should be; if you can be more specific, I can adjust my example.

You can try this with awk:
awk -F"}," '{for(i=1;i<=NF;i++){print $i}}' file.json | awk -F"\"[:,]\"?" '$4>"2013-01-01"{printf "%s:%s:%s\n" ,$2,$4,$8}'
Seeing that the json file is one long string we first split this line into multiple json records on },. Then each individual record is split on a combination of ":, characters with an optional closing ". We then only output the line if its after a certain date.
This will find all records after Jan 1 2013.
EDIT:
The 2nd awk splits each individual json record into key-value pairs using a sub-string starting with ", followed by either a : or ,, and an optional ending ".
So in your example it will split on either ",", ":" or ":.
All odd fields are keys, and all even fields are values (hence $4 being the date in your example). We then check if $4(date) is after 2013-01-01.
I noticed i made a mistake on the optional " (should be followed by ? instead of *) in the split which i have now corrected and i also used printf function to display the values.

Related

how can i make my code indented like this for better readability and ease [duplicate]

Is it possible to have multi-line strings in JSON?
It's mostly for visual comfort so I suppose I can just turn word wrap on in my editor, but I'm just kinda curious.
I'm writing some data files in JSON format and would like to have some really long string values split over multiple lines. Using python's JSON module I get a whole lot of errors, whether I use \ or \n as an escape.
JSON does not allow real line-breaks. You need to replace all the line breaks with \n.
eg:
"first line
second line"
can be saved with:
"first line\nsecond line"
Note:
for Python, this should be written as:
"first line\\nsecond line"
where \\ is for escaping the backslash, otherwise python will treat \n as
the control character "new line"
Unfortunately many of the answers here address the question of how to put a newline character in the string data. The question is how to make the code look nicer by splitting the string value across multiple lines of code. (And even the answers that recognize this provide "solutions" that assume one is free to change the data representation, which in many cases one is not.)
And the worse news is, there is no good answer.
In many programming languages, even if they don't explicitly support splitting strings across lines, you can still use string concatenation to get the desired effect; and as long as the compiler isn't awful this is fine.
But json is not a programming language; it's just a data representation. You can't tell it to concatenate strings. Nor does its (fairly small) grammar include any facility for representing a string on multiple lines.
Short of devising a pre-processor of some kind (and I, for one, don't feel like effectively making up my own language to solve this issue), there isn't a general solution to this problem. IF you can change the data format, then you can substitute an array of strings. Otherwise, this is one of the numerous ways that json isn't designed for human-readability.
I have had to do this for a small Node.js project and found this work-around to store multiline strings as array of lines to make it more human-readable (at a cost of extra code to convert them to string later):
{
"modify_head": [
"<script type='text/javascript'>",
"<!--",
" function drawSomeText(id) {",
" var pjs = Processing.getInstanceById(id);",
" var text = document.getElementById('inputtext').value;",
" pjs.drawText(text);}",
"-->",
"</script>"
],
"modify_body": [
"<input type='text' id='inputtext'></input>",
"<button onclick=drawSomeText('ExampleCanvas')></button>"
],
}
Once parsed, I just use myData.modify_head.join('\n') or myData.modify_head.join(), depending upon whether I want a line break after each string or not.
This looks quite neat to me, apart from that I have to use double quotes everywhere. Though otherwise, I could, perhaps, use YAML, but that has other pitfalls and is not supported natively.
Check out the specification! The JSON grammar's char production can take the following values:
any-Unicode-character-except-"-or-\-or-control-character
\"
\\
\/
\b
\f
\n
\r
\t
\u four-hex-digits
Newlines are "control characters" so, no, you may not have a literal newline within your string. However you may encode it using whatever combination of \n and \r you require.
JSON doesn't allow breaking lines for readability.
Your best bet is to use an IDE that will line-wrap for you.
This is a really old question, but I came across this on a search and I think I know the source of your problem.
JSON does not allow "real" newlines in its data; it can only have escaped newlines. See the answer from #YOU. According to the question, it looks like you attempted to escape line breaks in Python two ways: by using the line continuation character ("\") or by using "\n" as an escape.
But keep in mind: if you are using a string in python, special escaped characters ("\t", "\n") are translated into REAL control characters! The "\n" will be replaced with the ASCII control character representing a newline character, which is precisely the character that is illegal in JSON. (As for the line continuation character, it simply takes the newline out.)
So what you need to do is to prevent Python from escaping characters. You can do this by using a raw string (put r in front of the string, as in r"abc\ndef", or by including an extra slash in front of the newline ("abc\\ndef").
Both of the above will, instead of replacing "\n" with the real newline ASCII control character, will leave "\n" as two literal characters, which then JSON can interpret as a newline escape.
Write property value as a array of strings. Like example given over here https://gun.io/blog/multi-line-strings-in-json/. This will help.
We can always use array of strings for multiline strings like following.
{
"singleLine": "Some singleline String",
"multiline": ["Line one", "line Two", "Line Three"]
}
And we can easily iterate array to display content in multi line fashion.
While not standard, I found that some of the JSON libraries have options to support multiline Strings. I am saying this with the caveat, that this will hurt your interoperability.
However in the specific scenario I ran into, I needed to make a config file that was only ever used by one system readable and manageable by humans. And opted for this solution in the end.
Here is how this works out on Java with Jackson:
JsonMapper mapper = JsonMapper.builder()
.enable(JsonReadFeature.ALLOW_UNESCAPED_CONTROL_CHARS)
.build()
This is a very old question, but I had the same question when I wanted to improve readability of our Vega JSON Specification code which uses complex conditoinal expressions. The code is like this.
As this answer says, JSON is not designed for human. I understand that is a historical decision and it makes sense for data exchange purposes. However, JSON is still used as source code for such cases. So I asked our engineers to use Hjson for source code and process it to JSON.
For example, in Git for Windows environment,
you can download the Hjson cli binary and put it in git/bin directory to use.
Then, convert (transpile) Hjson source to JSON. To use automation tools such as Make will be useful to generate JSON.
$ which hjson
/c/Program Files/git/bin/hjson
$ cat example.hjson
{
md:
'''
First line.
Second line.
This line is indented by two spaces.
'''
}
$ hjson -j example.hjson > example.json
$ cat example.json
{
"md": "First line.\nSecond line.\n This line is indented by two spaces."
}
In case of using the transformed JSON in programming languages, language-specific libraries like hjson-js will be useful.
I noticed the same idea was posted in a duplicated question but I would share a bit more information.
You can encode at client side and decode at server side. This will take care of \n and \t characters as well
e.g. I needed to send multiline xml through json
{
"xml": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiID8+CiAgPFN0cnVjdHVyZXM+CiAgICAgICA8aW5wdXRzPgogICAgICAgICAgICAgICAjIFRoaXMgcHJvZ3JhbSBhZGRzIHR3byBudW1iZXJzCgogICAgICAgICAgICAgICBudW0xID0gMS41CiAgICAgICAgICAgICAgIG51bTIgPSA2LjMKCiAgICAgICAgICAgICAgICMgQWRkIHR3byBudW1iZXJzCiAgICAgICAgICAgICAgIHN1bSA9IG51bTEgKyBudW0yCgogICAgICAgICAgICAgICAjIERpc3BsYXkgdGhlIHN1bQogICAgICAgICAgICAgICBwcmludCgnVGhlIHN1bSBvZiB7MH0gYW5kIHsxfSBpcyB7Mn0nLmZvcm1hdChudW0xLCBudW0yLCBzdW0pKQogICAgICAgPC9pbnB1dHM+CiAgPC9TdHJ1Y3R1cmVzPg=="
}
then decode it on server side
public class XMLInput
{
public string xml { get; set; }
public string DecodeBase64()
{
var valueBytes = System.Convert.FromBase64String(this.xml);
return Encoding.UTF8.GetString(valueBytes);
}
}
public async Task<string> PublishXMLAsync([FromBody] XMLInput xmlInput)
{
string data = xmlInput.DecodeBase64();
}
once decoded you'll get your original xml
<?xml version="1.0" encoding="utf-8" ?>
<Structures>
<inputs>
# This program adds two numbers
num1 = 1.5
num2 = 6.3
# Add two numbers
sum = num1 + num2
# Display the sum
print('The sum of {0} and {1} is {2}'.format(num1, num2, sum))
</inputs>
</Structures>
\n\r\n worked for me !!
\n for single line break and \n\r\n for double line break
I see many answers here that may not works in most cases but may be the easiest solution if let's say you wanna output what you wrote down inside a JSON file (for example: for language translations where you wanna have just one key with more than 1 line outputted on the client) can be just adding some special characters of your choice PS: allowed by the JSON files like \\ before the new line and use some JS to parse the text ... like:
Example:
File (text.json)
{"text": "some JSON text. \\ Next line of JSON text"}
import text from 'text.json'
{text.split('\\')
.map(line => {
return (
<div>
{line}
<br />
</div>
);
})}}
Assuming the question has to do with easily editing text files and then manually converting them to json, there are two solutions I found:
hjson (that was mentioned in this previous answer), in which case you can convert your existing json file to hjson format by executing hjson source.json > target.hjson, edit in your favorite editor, and convert back to json hjson -j target.hjson > source.json. You can download the binary here or use the online conversion here.
jsonnet, which does the same, but with a slightly different format (single and double quoted strings are simply allowed to span multiple lines). Conveniently, the homepage has editable input fields so you can simply insert your multiple line json/jsonnet files there and they will be converted online to standard json immediately. Note that jsonnet supports much more goodies for templating json files, so it may be useful to look into, depending on your needs.
If it's just for presentation in your editor you may use ` instead of " or '
const obj = {
myMultiLineString: `This is written in a \
multiline way. \
The backside of it is that you \
can't use indentation on every new \
line because is would be included in \
your string. \
The backslash after each line escapes the carriage return.
`
}
Examples:
console.log(`First line \
Second line`);
will put in console:
First line Second line
console.log(`First line
second line`);
will put in console:
First line
second line
Hope this answered your question.

Extract numeric values from awk return

For supervision system, I need to return 2 values about latency to my supervisor server thru nrpe.
Here the values that I'm working on (I put this in a file : test.txt) :
{"status":"success","data":{"resultType":"vector","result":[{"metric":{"project_site":"AUB"},"value":[1575277537.052,"0.3889104875437488"]},{"metric":{"project_site":"VDR"},"value":[1575277537.052,"0.2267407994117705"]}]}}
I need to extract 0.3889104875437488 and 0.2267407994117705
I'm using this :
for i in $(""cat test.txt | awk -F ',' '{print $5 $NF}' | grep -o '[0.0001-9999.9]\+'""); do echo $i; done
I'm not sure that's the best method, especially since I have to add this : "AUB" for row 1 and "VDR" for row 2 before each line. Like :
AUB : 0.3889104875437488 seconds
VDR : 0.2267407994117705 seconds
Use jq for parsing JSON, for example:
$ jq -r '.data.result[] | "\(.metric.project_site) : \(.value[1]) seconds"' file
AUB : 0.3889104875437488 seconds
VDR : 0.2267407994117705 seconds
I have upvoted the answer by #oguzismail, and will repeat their suggestion to use jq instead if at all feasible.
If your input is not valid JSON, there are several things wrong with your approach, several of them related more to efficiency and common practice than outright erroneous.
Your regex is wrong. See below.
Avoid the useless cat.
If you are using Awk anyway, you don't need grep. See useless use of grep.
Quote your variable.
Only in this case, you want to remove the useless echo entirely. Capturing standard output so that you can echo it to standard output is simply a waste of processes (unless you specifically wanted to break the quoting, as a special case of the previous item; but that is not the case here).
It is unclear what you hope for the empty string "" to accomplish. After the shell is done with quote removal, ""cat is simply cat.
In some more detail, [0.0001-9999.9] matches a single character which is 0 or . or 0 (oh we mentioned that already, didn't we?) or 0 (ditto) or between 1 and 9 or 9 (etc etc). In short, grep is not at all the right tool for searching for number ranges; fortunately, Awk can do that easily too.
Here, then, is an attempt to refactor to remove these problems.
awk -F ',' '{ split("5:" NF, a, ":"); split("AUB:VDR", l, ":")
for (i=1; i<=2; i++) {
n = $a[i]; gsub(/[]}"]+/, "", n);
if (n >= 0.0001 && n <= 9999.9)
print l[i] ": " n " seconds"} }' test.txt
This is extremely brittle because it hard-codes the locations of the strings within the surface structure of the (not?) JSON data, which could change without warning.
The split is a hack to get the numbers 5 and NF into an array a. We create a second array with the same length for the corresponding labels. We then loop over the first array and use the numbers as indices into the current record's fields. We trim off any quoting and brackets, and then perform the numeric comparison on the thus extracted field. At the end, we add the corresponding label from the other array in front of the printed text.

Expect: extract specific string from output

I am navigating a Java-based CLI menu on a remote machine with expect inside a bash script and I am trying to extract something from the output without leaving the expect session.
Expect command in my script is:
expect -c "
spawn ssh user#host
expect \"#\"
send \"java cli menu command here\r\"
expect \"java cli prompt\"
send \"java menu command\"
"
###I want to extract a specific string from the above output###
Expect output is:
Id Name
-------------------
abcd 12 John Smith
I want to extract abcd 12 from the above output into another expect variable for further use within the expect script. So that's the 3rd line, first field by using a double-space delimiter. The awk equivalent would be: awk -F ' ' 'NR==3 {$1}'
The big issue is that the environment through which I am navigating with Expect is, as I stated above, a Java CLI based menu so I can't just use awk or anything else that would be available from a bash shell.
Getting out from the Java menu, processing the output and then getting in again is not an option as the login process lasts for 15 seconds so I need to remain inside and extract what I need from the output using expect internal commands only.
You can use regexp in expect itself directly with the use of -re flag. Thanks to Donal on pointing out the single quote and double quote issues. I have given solution using both ways.
I have created a file with the content as follows,
Id Name
-------------------
abcd 12 John Smith
This is nothing but your java program's console output. I have tested this in my system with this. i.e. I just simulated your program's output with cat. You just replace the cat code with your program commands. Simple. :)
Double Quotes :
#!/bin/bash
expect -c "
spawn ssh user#domain
expect \"password\"
send \"mypassword\r\"
expect {\\\$} { puts matched_literal_dollar_sign}
send \"cat input_file\r\"; # Replace this code with your java program commands
expect -re {-\r\n(.*?)\s\s}
set output \$expect_out(1,string)
#puts \$expect_out(1,string)
puts \"Result : \$output\"
"
Single Quotes :
#!/bin/bash
expect -c '
spawn ssh user#domain
expect "password"
send "mypasswordhere\r"
expect "\\\$" { puts matched_literal_dollar_sign}
send "cat input_file\r"; # Replace this code with your java program commands
expect -re {-\r\n(.*?)\s\s}
set output $expect_out(1,string)
#puts $expect_out(1,string)
puts "Result : $output"
'
As you can see, I have used {-\r\n(.*?)\s\s}. Here the braces prevent any variable substitutions. In your output, we have a 2nd line with full of hyphens. Then a newline. Then your 3rd line content. Let's decode the regex used.
-\r\n is to match one literal hyphen and a new line together. This will match the last hyphen in the 2nd line and the newline which in turn make it to 3rd line now. So, .*? will match the required output (i.e. abcd 12) till it encounters double space which is matched by \s\s.
You might be wondering why I need parenthesis which is used to get the sub-match patterns.
In general, expect will save the expect's whole match string in expect_out(0,string) and buffer all the matched/unmatched input to expect_out(buffer). Each sub match will be saved in subsequent numbering of string such as expect_out(1,string), expect_out(2,string) and so on.
As Donal pointed out, it is better to use single quote's approach since it looks less messy. :)
It is not required to escape the \r with the backslash in case of double quotes.
Update :
I have changed the regexp from -\r\n(\w+\s+\w+)\s\s to -\r\n(.*?)\s\s.
With this way - your requirement - such as match any number of letters and single spaces until you encounter first occurrence of double spaces in the output
Now, let's come to your question. You have mentioned that you have tried -\r\n(\w+)\s\s. But, there is a problem here with \w+. Remember \w+ will not match space character. Your output has some spaces in it till double spaces.
The use of regexp will matter based on your requirements on the input string which is going to get matched. You can customize the regular expressions based on your needs.
Update version 2 :
What is the significance of .*?. If you ask separately, I am going to repeat what you commented. In regular expressions, * is a greedy operator and ? is our life saver. Let us consider the string as
Stackoverflow is already overflowing with number of users.
Now, see the effect of the regular expression .*flow as below.
* matches any number of characters. More precisely, it matches the longest string possible while still allowing the pattern itself to match. So, due to this, .* in the pattern matched the characters Stackoverflow is already over and flow in pattern matched the text flow in the string.
Now, in order to prevent the .* to match only up to the first occurrence of the string flow, we are adding the ? to it. It will help the pattern to behave as non-greedy manner.
Now, again coming back to your question. If we have used .*\s\s, then it will match the whole line since it is trying to match as much as possible. This is common behavior of regular expressions.
Update version 3:
Have your code in the following way.
x=$(expect -c "
spawn ssh user#host
expect \"password\"
send \"password\r\"
expect {\\\$} { puts matched_literal_dollar_sign}
send \"cat input\r\"
expect -re {-\r\n(.*?)\s\s}
if {![info exists expect_out(1,string)]} {
puts \"Match did not happen :(\"
exit 1
}
set output \$expect_out(1,string)
#puts \$expect_out(1,string)
puts \"Result : \$output\"
")
y=$?
# $x now contains the output from the 'expect' command, and $y contains the
# exit status
echo $x
echo $y;
If the flow happened properly, then exit code will have value as 0. Else, it will have 1. With this way, you can check the return value in bash script.
Have a look at here to know about the info exists command.

Can awk deal with CSV file that contains comma inside a quoted field?

I am using awk to perform counting the sum of one column in the csv file. The data format is something like:
id, name, value
1, foo, 17
2, bar, 76
3, "I am the, question", 99
I was using this awk script to count the sum:
awk -F, '{sum+=$3} END {print sum}'
Some of the value in name field contains comma and this break my awk script.
My question is: can awk solve this problem? If yes, and how can I do that?
Thank you.
One way using GNU awk and FPAT
awk 'BEGIN { FPAT = "([^, ]+)|(\"[^\"]+\")" } { sum+=$3 } END { print sum }' file.txt
Result:
192
I am using
`FPAT="([^,]+)|(\"[^\"]+\")" `
to define the fields with gawk. I found that when the field is null this doesn't recognize correct number of fields. Because "+" requires at least 1 character in the field.
I changed it to:
`FPAT="([^,]*)|(\"[^\"]*\")"`
and replace "+" with "*". It works correctly.
I also find that GNU Awk User Guide also has this problem.
https://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html
You're probably better off doing it in perl with Text::CSV, since that's a fast and robust solution.
You can help awk work with data fields that contain commas (or newlines) by using a small script I wrote called csvquote. It replaces the offending commas inside quoted fields with nonprinting characters. If you need to, you can later restore those commas - but in this case, you don't need to.
Here is the command:
csvquote inputfile.csv | awk -F, '{sum+=$3} END {print sum}'
see https://github.com/dbro/csvquote for the code
For as simple an input file as that you can just write a small function to convert all of the real FSs outside of the quotes to some other value (I chose RS since the record separator cannot be part of the record) and then use that as the FS, e.g.:
$ cat decsv.awk
BEGIN{ fs=FS; FS=RS }
{
decsv()
for (i=1;i<=NF;i++) {
printf "Record %d, Field %d is <%s>\n" ,NR,i,$i
}
print ""
}
function decsv( curr,head,tail)
{
tail = $0
while ( match(tail,/"[^"]+"/) ) {
head = substr(tail, 1, RSTART-1);
gsub(fs,RS,head)
curr = curr head substr(tail, RSTART, RLENGTH)
tail = substr(tail, RSTART + RLENGTH)
}
gsub(fs,RS,tail)
$0 = curr tail
}
$ cat file
id, name, value
1, foo, 17
2, bar, 76
3, "I am the, question", 99
$ awk -F", " -f decsv.awk file
Record 1, Field 1 is <id>
Record 1, Field 2 is <name>
Record 1, Field 3 is <value>
Record 2, Field 1 is <1>
Record 2, Field 2 is <foo>
Record 2, Field 3 is <17>
Record 3, Field 1 is <2>
Record 3, Field 2 is <bar>
Record 3, Field 3 is <76>
Record 4, Field 1 is <3>
Record 4, Field 2 is <"I am the, question">
Record 4, Field 3 is <99>
It only becomes complicated when you have to deal with embedded newlines and embedded escaped quotes within the quotes and even then it's not too hard and it's all been done before...
See What's the most robust way to efficiently parse CSV using awk? for more information.
You can always tackle the problem from the source. Put quotes around the name field, just like the field of "I am the, question". This is much easier than spending your time coding workarounds for that.
Update(as Dennis requested). A simple example
$ s='id, "name1,name2", value 1, foo, 17 2, bar, 76 3, "I am the, question", 99'
$ echo $s|awk -F'"' '{ for(i=1;i<=NF;i+=2) print $i}'
id,
, value 1, foo, 17 2, bar, 76 3,
, 99
$ echo $s|awk -F'"' '{ for(i=2;i<=NF;i+=2) print $i}'
name1,name2
I am the, question
As you can see, by setting the delimiter to double quote, the fields that belong to the "quotes" are always on even number. Since OP doesn't have the luxury of modifying the source data, this method will not be appropriate to him.
This article did help me solve this same data field issue. Most CSV will put a quote around fields with spaces or commas within them. This messes up the field count for awk unless you filter them out.
If you need the data within those fields that contain the garbage, this is not for you. ghostdog74 provided the answer, which empties that field but maintains the total field count in the end, which is key to keeping the data output consistent. I did not like how this solution introduced new lines. This is the version of this solution I used. The fist three fields never had this problem in the data. The fourth field containing customer name often did, but I needed that data. The remaining fields that exhibit the problem I could throw away without issue because it was not needed in my report output. So I first sed out the 4th field's garbage very specifically and remove the first two instances of quotes. Then I apply what ghostdog74gave to empty the remaining fields that have commas within them - this also removes the quotes, but I use printfto maintain the data in a single record. I start off with 85 fields and end up with 85 fields in all cases from my 8000+ lines of messy data. A perfect score!
grep -i $1 $dbfile | sed 's/\, Inc.//;s/, LLC.//;s/, LLC//;s/, Ltd.//;s/\"//;s/\"//' | awk -F'"' '{ for(i=1;i<=NF;i+=2) printf ($i);printf ("\n")}' > $tmpfile
The solution that empties the fields with commas within them but also maintains the record, of course is:
awk -F'"' '{ for(i=1;i<=NF;i+=2) printf ($i);printf ("\n")}
Megs of thanks to ghostdog74 for the great solution!
NetsGuy256/
FPAT is the elegant solution because it can handle the dreaded commas within quotes problem, but to sum a column of numbers in the last column regardless of the number of preceding separators, $NF works well:
awk -F"," '{sum+=$NF} END {print sum}'
To access the second to last column, you would use this:
awk -F"," '{sum+=$(NF-1)} END {print sum}'
If you know for sure that the 'value' column is always the last column:
awk -F, '{sum+=$NF} END {print sum}'
NF represents the number of fields, so $NF is the last column
Fully fledged CSV parsers such as Perl's Text::CSV_XS are purpose-built to handle that kind of weirdness.
perl -MText::CSV_XS -lne 'BEGIN{$csv=Text::CSV_XS->new({allow_whitespace => 1})} if($csv->parse($_)){#f=$csv->fields();$sum+=$f[2]} END{print $sum}' file
allow_whitespace is needed since the input data has whitespace surrounding the comma separators. Very old versions of Text::CSV_XS may not support this option.
I provided more explanation of Text::CSV_XS within my answer here: parse csv file using gawk
you could try piping the file through a perl regex to convert the quoted , into something else like a |.
cat test.csv | perl -p -e "s/(\".+?)(,)(.+?\")/\1\|\3/g" | awk -F, '{...
The above regex assumes there is always a comma within the double quotes. so more work would be needed to make the comma optional
you write a function in awk like below:
$ awk 'func isnum(x){return(x==x+0)}BEGIN{print isnum("hello"),isnum("-42")}'
0 1
you can incorporate in your script this function and check whether the third field is numeric or not.if not numeric then go for the 4th field and if the 4th field inturn is not numberic go for 5th ...till you reach a numeric value.probably a loop will help here, and add it to the sum.

Are multi-line strings allowed in JSON?

Is it possible to have multi-line strings in JSON?
It's mostly for visual comfort so I suppose I can just turn word wrap on in my editor, but I'm just kinda curious.
I'm writing some data files in JSON format and would like to have some really long string values split over multiple lines. Using python's JSON module I get a whole lot of errors, whether I use \ or \n as an escape.
JSON does not allow real line-breaks. You need to replace all the line breaks with \n.
eg:
"first line
second line"
can be saved with:
"first line\nsecond line"
Note:
for Python, this should be written as:
"first line\\nsecond line"
where \\ is for escaping the backslash, otherwise python will treat \n as
the control character "new line"
Unfortunately many of the answers here address the question of how to put a newline character in the string data. The question is how to make the code look nicer by splitting the string value across multiple lines of code. (And even the answers that recognize this provide "solutions" that assume one is free to change the data representation, which in many cases one is not.)
And the worse news is, there is no good answer.
In many programming languages, even if they don't explicitly support splitting strings across lines, you can still use string concatenation to get the desired effect; and as long as the compiler isn't awful this is fine.
But json is not a programming language; it's just a data representation. You can't tell it to concatenate strings. Nor does its (fairly small) grammar include any facility for representing a string on multiple lines.
Short of devising a pre-processor of some kind (and I, for one, don't feel like effectively making up my own language to solve this issue), there isn't a general solution to this problem. IF you can change the data format, then you can substitute an array of strings. Otherwise, this is one of the numerous ways that json isn't designed for human-readability.
I have had to do this for a small Node.js project and found this work-around to store multiline strings as array of lines to make it more human-readable (at a cost of extra code to convert them to string later):
{
"modify_head": [
"<script type='text/javascript'>",
"<!--",
" function drawSomeText(id) {",
" var pjs = Processing.getInstanceById(id);",
" var text = document.getElementById('inputtext').value;",
" pjs.drawText(text);}",
"-->",
"</script>"
],
"modify_body": [
"<input type='text' id='inputtext'></input>",
"<button onclick=drawSomeText('ExampleCanvas')></button>"
],
}
Once parsed, I just use myData.modify_head.join('\n') or myData.modify_head.join(), depending upon whether I want a line break after each string or not.
This looks quite neat to me, apart from that I have to use double quotes everywhere. Though otherwise, I could, perhaps, use YAML, but that has other pitfalls and is not supported natively.
Check out the specification! The JSON grammar's char production can take the following values:
any-Unicode-character-except-"-or-\-or-control-character
\"
\\
\/
\b
\f
\n
\r
\t
\u four-hex-digits
Newlines are "control characters" so, no, you may not have a literal newline within your string. However you may encode it using whatever combination of \n and \r you require.
JSON doesn't allow breaking lines for readability.
Your best bet is to use an IDE that will line-wrap for you.
This is a really old question, but I came across this on a search and I think I know the source of your problem.
JSON does not allow "real" newlines in its data; it can only have escaped newlines. See the answer from #YOU. According to the question, it looks like you attempted to escape line breaks in Python two ways: by using the line continuation character ("\") or by using "\n" as an escape.
But keep in mind: if you are using a string in python, special escaped characters ("\t", "\n") are translated into REAL control characters! The "\n" will be replaced with the ASCII control character representing a newline character, which is precisely the character that is illegal in JSON. (As for the line continuation character, it simply takes the newline out.)
So what you need to do is to prevent Python from escaping characters. You can do this by using a raw string (put r in front of the string, as in r"abc\ndef", or by including an extra slash in front of the newline ("abc\\ndef").
Both of the above will, instead of replacing "\n" with the real newline ASCII control character, will leave "\n" as two literal characters, which then JSON can interpret as a newline escape.
Write property value as a array of strings. Like example given over here https://gun.io/blog/multi-line-strings-in-json/. This will help.
We can always use array of strings for multiline strings like following.
{
"singleLine": "Some singleline String",
"multiline": ["Line one", "line Two", "Line Three"]
}
And we can easily iterate array to display content in multi line fashion.
While not standard, I found that some of the JSON libraries have options to support multiline Strings. I am saying this with the caveat, that this will hurt your interoperability.
However in the specific scenario I ran into, I needed to make a config file that was only ever used by one system readable and manageable by humans. And opted for this solution in the end.
Here is how this works out on Java with Jackson:
JsonMapper mapper = JsonMapper.builder()
.enable(JsonReadFeature.ALLOW_UNESCAPED_CONTROL_CHARS)
.build()
This is a very old question, but I had the same question when I wanted to improve readability of our Vega JSON Specification code which uses complex conditoinal expressions. The code is like this.
As this answer says, JSON is not designed for human. I understand that is a historical decision and it makes sense for data exchange purposes. However, JSON is still used as source code for such cases. So I asked our engineers to use Hjson for source code and process it to JSON.
For example, in Git for Windows environment,
you can download the Hjson cli binary and put it in git/bin directory to use.
Then, convert (transpile) Hjson source to JSON. To use automation tools such as Make will be useful to generate JSON.
$ which hjson
/c/Program Files/git/bin/hjson
$ cat example.hjson
{
md:
'''
First line.
Second line.
This line is indented by two spaces.
'''
}
$ hjson -j example.hjson > example.json
$ cat example.json
{
"md": "First line.\nSecond line.\n This line is indented by two spaces."
}
In case of using the transformed JSON in programming languages, language-specific libraries like hjson-js will be useful.
I noticed the same idea was posted in a duplicated question but I would share a bit more information.
You can encode at client side and decode at server side. This will take care of \n and \t characters as well
e.g. I needed to send multiline xml through json
{
"xml": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiID8+CiAgPFN0cnVjdHVyZXM+CiAgICAgICA8aW5wdXRzPgogICAgICAgICAgICAgICAjIFRoaXMgcHJvZ3JhbSBhZGRzIHR3byBudW1iZXJzCgogICAgICAgICAgICAgICBudW0xID0gMS41CiAgICAgICAgICAgICAgIG51bTIgPSA2LjMKCiAgICAgICAgICAgICAgICMgQWRkIHR3byBudW1iZXJzCiAgICAgICAgICAgICAgIHN1bSA9IG51bTEgKyBudW0yCgogICAgICAgICAgICAgICAjIERpc3BsYXkgdGhlIHN1bQogICAgICAgICAgICAgICBwcmludCgnVGhlIHN1bSBvZiB7MH0gYW5kIHsxfSBpcyB7Mn0nLmZvcm1hdChudW0xLCBudW0yLCBzdW0pKQogICAgICAgPC9pbnB1dHM+CiAgPC9TdHJ1Y3R1cmVzPg=="
}
then decode it on server side
public class XMLInput
{
public string xml { get; set; }
public string DecodeBase64()
{
var valueBytes = System.Convert.FromBase64String(this.xml);
return Encoding.UTF8.GetString(valueBytes);
}
}
public async Task<string> PublishXMLAsync([FromBody] XMLInput xmlInput)
{
string data = xmlInput.DecodeBase64();
}
once decoded you'll get your original xml
<?xml version="1.0" encoding="utf-8" ?>
<Structures>
<inputs>
# This program adds two numbers
num1 = 1.5
num2 = 6.3
# Add two numbers
sum = num1 + num2
# Display the sum
print('The sum of {0} and {1} is {2}'.format(num1, num2, sum))
</inputs>
</Structures>
\n\r\n worked for me !!
\n for single line break and \n\r\n for double line break
I see many answers here that may not works in most cases but may be the easiest solution if let's say you wanna output what you wrote down inside a JSON file (for example: for language translations where you wanna have just one key with more than 1 line outputted on the client) can be just adding some special characters of your choice PS: allowed by the JSON files like \\ before the new line and use some JS to parse the text ... like:
Example:
File (text.json)
{"text": "some JSON text. \\ Next line of JSON text"}
import text from 'text.json'
{text.split('\\')
.map(line => {
return (
<div>
{line}
<br />
</div>
);
})}}
Assuming the question has to do with easily editing text files and then manually converting them to json, there are two solutions I found:
hjson (that was mentioned in this previous answer), in which case you can convert your existing json file to hjson format by executing hjson source.json > target.hjson, edit in your favorite editor, and convert back to json hjson -j target.hjson > source.json. You can download the binary here or use the online conversion here.
jsonnet, which does the same, but with a slightly different format (single and double quoted strings are simply allowed to span multiple lines). Conveniently, the homepage has editable input fields so you can simply insert your multiple line json/jsonnet files there and they will be converted online to standard json immediately. Note that jsonnet supports much more goodies for templating json files, so it may be useful to look into, depending on your needs.
The reason OP asked is the same reason I ended up here. Had a json file with long text.
In VS Code it's just ALT+Z to turn on word wrapping in a json file. Changing the actual data isn't what you want, if all you really want is to read the contents of the file as a developer.
If it's just for presentation in your editor you may use ` instead of " or '
const obj = {
myMultiLineString: `This is written in a \
multiline way. \
The backside of it is that you \
can't use indentation on every new \
line because is would be included in \
your string. \
The backslash after each line escapes the carriage return.
`
}
Examples:
console.log(`First line \
Second line`);
will put in console:
First line Second line
console.log(`First line
second line`);
will put in console:
First line
second line
Hope this answered your question.