How to convert linefeed into literal "\n" - json

I'm having some trouble converting my file to a properly formatted json string.
Have been fiddling with sed for ages now, but it seems to mock me.
Am working on RHEL 6, if that matters.
I'm trying to convert this file (content):
Hi there...
foo=bar
tomàto=tomáto
url=http://www.stackoverflow.com
Into this json string:
{"text":"Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com"}
How would I replace the actual line feeds in the literal '\n' character?? This is where I'm utterly stuck!
I've been trying to convert line feeds into ";" first and then back to a literal "\n". Tried loops for each row in the file. Can't make it work...
Some help is much appreciated!
Thanks!

sed is for simple substitutions on individual lines, that is all. Since sed works line by line your sed script doesn't see the line endings and so you can't get it change the line endings without jumping through hoops using arcane language constructs and convoluted logic that hasn't been useful since the mid-1970s when awk was invented.
This will change all newlines in your input file to the string \n:
$ awk -v ORS='\\n' '1' file
Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com\n
and this will do the rest:
$ awk -v ORS='\\n' 'BEGIN{printf "{\"text\":\""} 1; END{printf "\"}\n"}' file
{"text":"Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com\n"}
or this if you have a newline at the end of your input file but don't want it to become a \n string in the output:
$ awk -v ORS='\\n' '{rec = (NR>1 ? rec ORS : "") $0} END{printf "{\"text\":\"%s\"}\n", rec}' file
{"text":"Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com"}

With GNU sed:
sed ':a;N;s/\n/\\n/;ta' file | sed 's/.*/{"text":"&"}/'
Output:
{"text":"Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com"}

Use awk for this :
awk -v RS=^$ '{gsub(/\n/,"\\n");sub(/^/,"{\"text\":\"");sub(/\\n$/,"\"}")}1' file
Output
{"text":"Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com"}

awk to the rescue!
$ awk -vRS='\0' '{gsub("\n","\\n");
print "{\"text\":\"" $0 "\"}"}' file
{"text":"Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com\n"}

This might work for you (GNU sed):
sed '1h;1!H;$!d;x;s/.*/"text":"&"/;s/\n/\\n/g' file
Slurp the file into memory and use pattern matching to manipulate the file to the desired format.

The most simple (and elegant ?) solution :) :
#!/bin/bash
in=$(perl -pe 's/\n/\\n/' $1)
cat<<EOF
{"text":"$in"}
EOF
Usage:
./script.sh file.txt
Output :
{"text":"Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com\n"}

Related

prefix every header column with string using awk

I have a bunch of big csv I want to prefix every header column with fixed string. There is more than 500 columns in every file.
suppose my header is:
number;date;customer;key;amount
I tried this awk line:
awk -F';' 'NR==1{gsub(/[^a-z_]/,"input_file.")} { print }'
but I get (note fist column is missing prefix and separator is removed):
numberinput_file.dateinput_file.customerinput_file.keyinput_file.amount
expected output:
input_file.number;input_file.date;input_file.customer;input_file.key;input_file.amount
In any awk that'd be:
$ awk 'NR==1{gsub(/^|;/,"&input_file.")} 1' file
input_file.number;input_file.date;input_file.customer;input_file.key;input_file.amount
but sed exists to do simple substitutions like that, e.g. using a sed that has -E to enable EREs (e.g. GNU and BSD sed):
$ sed -E '1s/^|;/&input_file./g' file
input_file.number;input_file.date;input_file.customer;input_file.key;input_file.amount
If you're using GNU tools then you could use either of the above to change all of your CSV files at once with either of these:
awk -i inplace 'NR==1{gsub(/^|;/,"&input_file.")} 1' *.csv
sed -i -E '1s/^|;/&input_file./g' *.csv
Your gsub would brutally replace any nonalphabetic character anywhere in the input with the prefix - including your column separators.
The print can be abbreviated to the common idiom 1 at the very end of your script; this simply means "this condition is true; perform the default action for every line (i.e. print it all)" though this is just a stylistic change.
awk -F';' 'NR==1{
sub(/^/, "input_file."); gsub(/;/, ";input_file."); }
1' filename
If you want to perform this on multiple files, probably put a shell loop around it. If you only want to concatenate everything to standard output, you can give all the files to Awk in one go (in which case you probably don't want to print the header line for any file after the first; maybe change the 1 to NR==1 || FNR != 1).
I would use GNU AWK following way, let file.txt content be
number;date;customer;key;amount
1;2;3;4;5
6;7;8;9;10
then
awk 'BEGIN{FS=";";OFS=";input_file."}NR==1{$1="input_file." $1}{print}' file.txt
output
input_file.number;input_file.date;input_file.customer;input_file.key;input_file.amount
1;2;3;4;5
6;7;8;9;10
Explanation: I set OFS to ; followed by prefix. Then in first line I add prefix to first column, which trigger string rebuilding. No modification is done in any other line, thus they are printed as is.
(tested in GNU Awk 5.0.1)
Also with awk using for loop and printf:
awk 'BEGIN{FS=OFS=";"} NR==1{for (i=1; i<=NF; i++) printf "%s%s", "input_file." $i, (i<NF ? OFS : ORS)}' file
input_file.number;input_file.date;input_file.customer;input_file.key;input_file.amount

How to format a TXT file into a structured CSV file in bash?

I wanted to get some information about my CPU temperatures on my Linux Server (OpenSuse Leap 15.2). So I wrote a Script which collects data every 20 seconds and writes it into a text file. Now I have removed all garbage data (like "CPU Temp" etc.) I don't need.
Now I have a file like this:
47
1400
75
3800
The first two lines are one reading of the CPU temperature in C and the fan speed in RPM, respectively. The next two lines are another reading of the same measurements.
In the end I want this structure:
47,1400
75,3800
My question is: Can a Bash script do this for me? I tried something with sed and Awk but nothing worked perfectly for me. Furthermore I want a CSV file to make a graph, but i think it isn't a problem to convert a text file into a CSV file.
You could use paste
paste -d, - - < file.txt
With pr
pr -ta2s, file.txt
with ed
ed -s file.txt <<-'EOF'
g/./s/$/,/\
;.+1j
,p
Q
EOF
You can use awk:
awk 'NR%2{printf "%s,",$0;next;}1' file.txt > file.csv
Another awk:
$ awk -v OFS=, '{printf "%s%s",$0,(NR%2?OFS:ORS)}' file
Output:
47,1400
75,3800
Explained:
$ awk -v OFS=, '{ # set output field delimiter to a comma
printf "%s%s", # using printf to control newline in output
$0, # output line
(NR%2?OFS:ORS) # and either a comma or a newline
}' file
Since you asked if a bash script can do this, here's a solution in pure bash. ;o]
c=0
while read -r line; do
if (( c++ % 2 )); then
echo "$line"
else printf "%s," "$line"
fi
done < file
Take a look at 'paste'. This will join multiple lines of text together into a single line and should work for what you want.
echo "${DATA}"
Name
SANISGA01CI
5WWR031
P59CSADB01
CPDEV02
echo "${DATA}"|paste -sd ',' -
Name,SANISGA01CI,5WWR031,P59CSADB01,CPDEV02

How to insert a new line with content in front of each line in json?

I think sed should be the command to do this, but I haven't figured out the proper command yet.
My json file looks like this :
{"LAST_MODIFIED_BY":"david","LAST_MODIFIED_DATE":"2018-06-26 12:02:03.0","CLASS_NAME":"/SC/Trade/HTS_CA/1234abcd","DECISION":"AGREE","TASK_TYPE_ID":"abcd1234-832b-43b6-afa6-361253ffe1d5","NAME":"something"}
{"LAST_MODIFIED_BY":"sarah","LAST_MODIFIED_DATE":"2018-08-26 12:02:03.0","CLASS_NAME":"/SC/Import/HTS_US/9876abcd","DECISION":"DISAGREE","TASK_TYPE_ID":"abcd1234-832b-43b6-afa6-361253ffe1d5","NAME":"nicename"}
... more rows to follow
what I wanted to achieve is a json document with below contents:
{"index":{}}
{"LAST_MODIFIED_BY":"david","LAST_MODIFIED_DATE":"2018-06-26 12:02:03.0","CLASS_NAME":"/SC/Trade/HTS_CA/1234abcd","DECISION":"AGREE","TASK_TYPE_ID":"abcd1234-832b-43b6-afa6-361253ffe1d5","NAME":"something"}
{"index":{}}
{"LAST_MODIFIED_BY":"sarah","LAST_MODIFIED_DATE":"2018-08-26 12:02:03.0","CLASS_NAME":"/SC/Import/HTS_US/9876abcd","DECISION":"DISAGREE","TASK_TYPE_ID":"abcd1234-832b-43b6-afa6-361253ffe1d5","NAME":"nicename"}
... more rows to follow
so that I could run bulk load API against Elasticsearch.
The closest one is this one: Elasticsearch Bulk JSON Data, but it split my json file into broken items instead of my desired format.
Any ideas how I can achieve this would be greatly appreciated!
Using sed:
sed 's/^/{"index":{}}\
/'
The trick here is the \.
Alternatively, if your shell supports it:
sed $'s/^/{"index":{}}\n/'
or (as per #sundeep's suggestion):
sed $'i\\\n{"index":{}}\n'
Using jq:
jq -nc 'inputs | {"index":{}}, .'
Here, the key is the -c option to produce JSONLines.
Using awk:
awk '{print "{\"index\":{}}"; print;}'
Etc.
This might work for you (GNU sed):
sed 'i{"index":{}}' file
Insert {"index":{}} before each line.

Why does appending text to the end of each line replaces the first characters instead?

I searched everywhere but I haven't seen anyone have the same issue, yet alone a solution to this. I am trying to add text at the end of each line like so:
"Name1";"2913"
"Name2";"2914"
into:
"Name1";"2913";""
"Name2";"2914";""
I have tried it with sed, awk(with gsub) and pearl commands but each time instead of adding the ;"" to the end of each line it just replaces the first 3 characters of each line with it:
"Name1";"2913"
becomes
;""me1";"2913"
It is not limited to just ;"" it happends with anything i try to add at the end of the line.
code i tried:
cat list | sed 's/$/;""/'
cat list | awk '{gsub(/$/,";\"\"")}1'
each with the same outcome of:
;""me1";"2913"
;""me2";"2914"
Why is this happening?
Looks like OP may have control M characters in OP's Input_file in that case could you please try following.
awk -v s1="\"" 'BEGIN{FS=OFS=";"} {gsub(/\r/,"");$(NF+1)=s1 s1} 1' Input_file
2nd solution: With sed:
sed 's/\r//g;s/$/;""/' Input_file
Suggestions for OP's code:
We need not to use cat with awk or sed they are capable of reading Input_file by themselves.
You could have control M characters in your file so you could remove them by doing tr -d '\r' < Input_file > temp && mv temp Input_file OR directly run commands mentioned above to get rid of the carriage returns and get your output too.

awk change datetime format

I have huge amount of files where each string is a json with incorrect date format. The format I have for now is 2011-06-02 21:43:59 and what I need to do is to add T in between to transform it to ISO format 2011-06-02T21:43:59.
Can somebody, please, point me to some one liner solution? Was struggling with this for 2 hours, but no luck.
sed will come to your rescue, with a simple regex:
sed 's/\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\) /\1T/g' file > file.new
or, to modify the file in place:
sed -i 's/\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\) /\1T/g' file
Example
echo '2011-06-02 21:43:59' | sed 's/\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\) /\1T/g'
2011-06-02T21:43:59
Read more about regexes here: Regex Tag Info
The following seems to be the working solution:
sed -i -r 's/([0-9]{4}-[0-9]{2}-[0-9]{2}) ([0-9]{2}:[0-9]{2}:[0-9]{2})/\1T\2/g' myfiles
-i to process files
-r is to switch on extended regular expression
([0-9]{4}-[0-9]{2}-[0-9]{2}) - is for date
- the space between date and time in source data
([0-9]{2}:[0-9]{2}:[0-9]{2}) - is for time
Also with awk, you can match group with gensub :
awk '{
print gensub(/([0-9]{4}-[0-9]{2}-[0-9]{2})\s+([0-9]{2}:[0-9]{2}:[0-9]{2})/,
"\\1T\\2",
"g");
}' data.txt
echo '2011-06-02 21:43:59' | awk 'sub(/ /,"T")'
2011-06-02T21:43:59