Insert comma into string at the place you want - json

I'm trying to extract the balance from this string (which I did already) and add a comma to the string like 6841,12691421 (already done that also) BUT! theres a problem with doing it the way I did.
{
"address": "NKNXyCmatuYuAnMFufdDnLL82qmvgB4uAYt6",
"count_transactions": 59606,
"first_transaction": "2020-08-07 17:25:51",
"last_transaction": "2021-05-02 09:09:24",
"balance": 684112691421,
"name": []
}
I did it with (excuse the noob code):
sed -n -r 's/(^.*balance":)([^"]+)".*/\2/p' | sed -e 's/[",]//g' | sed 's/./&,/4'
The problem:
The sed 's/./&,/4' is a static thing. When the balance is lower by one character the output is wrong then, example `68411269142 the balance should be 684,11269142.
I need a solution to count the comm`a insert place from the right, 8 characters in.

Two jq-only solutions:
a) without any regex overhead:
jq -r '.balance | tostring | .[:-8]+ ","+ .[-8:]'
b) with regex:
jq -r 'tostring|sub("(?<tail>[0-9]{8}$)"; ",\(.tail)" )'
Caveat
Unfortunately these jq-only solutions will only work reliably for integers with fewer than 16 digits unless you have a sufficiently recent version of jq (after 1.6).

You may use this single sed:
sed -E 's/(^.*balance":)([^",]+).*/\2/; s/[0-9]{8}$/,&/' file
6841,12691421
s/[0-9]{8}$/,&/ matches 8 trailing digits and inserts a comma before it

With jq and sed:
jq '.balance' file.json | sed -E 's/.{8}$/,&/'
Output:
6841,12691421

Related

find and extract substring from string [duplicate]

This question already has answers here:
How to parse json response in the shell script?
(4 answers)
Extract json value with sed
(5 answers)
Closed 2 years ago.
This command
ubus -S call system board
gives me this output
{"kernel":"4.14.195","hostname":"OpenWrt","system":"ARMv7 Processor rev 1 (v7l)","model":"Linksys WRT32X","board_name":"linksys,venom","release":{"distribution":"OpenWrt","version":"19.07.4","revision":"r11208-ce6496d796","target":"mvebu/cortexa9","description":"OpenWrt 19.07.4 r11208-ce6496d796"}}
I want to just extract the model and, if there's a space, replace it with an underscore so I end up with
Linksys_WRT32X
Your command output is json, you can extract the value of the "model" field in pure text and use sed to replace any spaces with underscore.
<command> | jq -r '.model' | sed 's/ /_/g'
or using one jq command to select and replace the text value (thanks #Cyrus)
<command> | jq -r '.model | sub(" "; "_")'
If you don't have jq here is an awk for this
awk -v RS=, -F: '$1 ~ /model/{gsub(/\"/,""); gsub(" ","_"); print $2}'
Have in mind, that one awk or sed for this specific output is fine, but if this was for any json, it could break, json is not text, it can be printed in various lines, it can have additional spaces in places etc.
There are many ways to do this. Here's another one:
ubus -S call system board | sed 's/.*"model":"\([^"]*\)".*$/\1/' | tr ' ' _
I would attempt to answer this using my own version. Since GNU grep has support for Perl regex, we can get the result using:
ubus -S call system board | grep -oP '"model":"\K[^"]+' | tr ' ' _
We attempt to take advantage of the pattern that the JSON key-value pair is of the form "key-name":"<value>" and try to capture the <value> part using grep.

Extract json value with sed

I have a json result and I would like to extract a string without double quotes
{"value1":5.0,"value2":2.5,"value3":"2019-10-24T15:26:00.000Z","modifier":[]}
With this regex I can extract the value3 (019-10-24T15:26:00.000Z) correctly
sed -e 's/^.*"endTime":"\([^"]*\)".*$/\1/'
How can I extract the "value2" result, a string without double quotes?
I need to do with sed so can’t install jq. That’s my problem
With GNU sed for -E to enable EREs:
$ sed -E 's/.*"value3":"?([^,"]*)"?.*/\1/' file
2019-10-24T15:26:00.000Z
$ sed -E 's/.*"value2":"?([^,"]*)"?.*/\1/' file
2.5
With any POSIX sed:
$ sed 's/.*"value3":"\{0,1\}\([^,"]*\)"\{0,1\}.*/\1/' file
2019-10-24T15:26:00.000Z
$ sed 's/.*"value2":"\{0,1\}\([^,"]*\)"\{0,1\}.*/\1/' file
2.5
The above assumes you never have commas inside quoted strings.
Just run jq a Command-line JSON processor
$ json_data='{"value1":5.0,"value2":2.5,"value3":"2019-10-24T15:26:00.000Z","modifier":[]}'
$ jq '.value2' <(echo "$json_data")
2.5
with the key .value2 to access the value you are interested in.
This link summarize why you should NOT use, regex for parsing json
(the same goes for XML/HTML and other data structures that are in
theory can be infinitely nested)
Regex for parsing single key: values out of JSON in Javascript
If you do not have jq available:
you can use the following GNU grep command:
$ echo '{"value1":5.0,"value2":2.5,"value3":"2019-10-24T15:26:00.000Z","modifier":[]}' | grep -zoP '"value2":\s*\K[^\s,]*(?=\s*,)'
2.5
using the regex detailed here:
"value2":\s*\K[^\s,]*(?=\s*,)
demo: https://regex101.com/r/82J6Cb/1/
This will even work if the json is not linearized!!!!
With python it is also pretty direct and you should have it installed by default on your machine even if it is not python3 it should work
$ cat data.json
{"value1":5.0,"value2":2.5,"value3":"2019-10-24T15:26:00.000Z","modifier":[]}
$ cat extract_value2.py
import json
with open('data.json') as f:
data = json.load(f)
print(data["value2"])
$ python extract_value2.py
2.5
You can try this :
creds=$(eval aws secretsmanager get-secret-value --region us-east-1 --secret-id dpi/dev/hivemetastore --query SecretString --output text )
passwd=$(/bin/echo "${creds}" | /bin/sed -n 's/.*"password":"\(.*\)",/\1/p' | awk -F"\"" '{print $1}')
it is definitely possible to remove the AWK part though ...
To extract all values in proper list form to a file using sed(LINUX).
sed 's/["{}\]//g' <your_file.json> | sed 's/,/\n/g' >> <your_new_file_to_save>
sed 's/regexp/replacement/g' inputFileName > outputFileName
In some versions of sed, the expression must be preceded by -e to indicate that an expression follows.
The s stands for substitute, while the g stands for global, which means that all matching occurrences in the line would be replaced.
I've put [ ] inside it as elements that you wanna remove from .json file.
The pipe character | is used to connect the output from one command to the input of another.
Then, the last thing I did is substitute , and add a \n, known as line breaker.
If you want to show a single value see below command:
sed 's/["{}\]//g' <your_file.json> | sed 's/,/\n/g' | sed 's/<ur_value>//p'
p is run; this is equivalent to /pattern match/! p as per above; i.e., "if the line does not match /pattern match/, print it". So the complete command prints all the lines from the first occurrence of the pattern to the last line, but suppresses the ones that match.
if your data in 'd' file, try gnu sed
sed -E 's/[{,]"\w+":([^,"]+)/\1\n/g ;s/(.*\n).*".*\n/\1/' d

How to use non-displaying characters like newline (\n) and tab (\t) with jq's "join" function

I couldn't find this anywhere on the internet, so figured I'd add it as documentation.
I wanted to join a json array around the non-displaying character \30 ("RecordSeparator") so I could safely iterate over it in bash, but I couldn't quite figure out how to do it. I tried echo '["one","two","three"]' | jq 'join("\30")' and a couple permutations of that, but it didn't work.
Turns out the solution is pretty simple.... (See answer)
Use jq -j to eliminate literal newlines between records and use only your own delimiter. This works in your simple case:
#!/usr/bin/env bash
data='["one","two","three"]'
sep=$'\x1e' # works only for non-NUL characters, see NUL version below
while IFS= read -r -d "$sep" rec || [[ $rec ]]; do
printf 'Record: %q\n' "$rec"
done < <(jq -j --arg sep "$sep" 'join($sep)' <<<"$data")
...but it also works in a more interesting scenario where naive answers fail:
#!/usr/bin/env bash
data='["two\nlines","*"]'
while IFS= read -r -d $'\x1e' rec || [[ $rec ]]; do
printf 'Record: %q\n' "$rec"
done < <(jq -j 'join("\u001e")' <<<"$data")
returns (when run on Cygwin, hence the CRLF):
Record: $'two\r\nlines'
Record: \*
That said, if using this in anger, I would suggest using NUL delimiters, and filtering them out from the input values:
#!/usr/bin/env bash
data='["two\nlines","three\ttab-separated\twords","*","nul\u0000here"]'
while IFS= read -r -d '' rec || [[ $rec ]]; do
printf 'Record: %q\n' "$rec"
done < <(jq -j '[.[] | gsub("\u0000"; "#NUL#")] | join("\u0000")' <<<"$data")
NUL is a good choice because it's a character than can't be stored in C strings (like the ones bash uses) at all, so there's no loss in the range of data which can be faithfully conveyed when they're excised -- if they did make it through to the shell, it would (depending on version) either discard them, or truncate the string at the point when one first appears.
The recommended way to solve the problem is to use the -c command-line
option, e.g. as follows:
echo "$data" | jq -c '.[]' |
while read -r rec
do
echo "Record: $rec"
done
Output:
Record: "one"
Record: "two"
Record: "three"
Problems with the OP's proposed answer
There are several problems with the proposal in the OP's answer based on $'\30'
First, it doesn't work reliably, e.g. using bash on a Mac
the output is: Record: "one\u0018two\u0018three";
this is because jq correctly converts octal 30 to \u0018
within the JSON string.
Second, RS is ASCII decimal 30, i.e. octal 36, which
would be written as $'\36' in the shell.
If you use this value instead, the program produces:
Record: "one\u001etwo\u001ethree" because that is
the correct JSON string with embedded RS characters. (For the record $'\30' is Control-X.)
Third, as noted by Charles Duffy, "for rec in $(...) is inherently buggy."
Fourth, any approach which assumes jq will in future accept
illegal JSON strings is brittle in the sense that in the
future, jq might disallow them or at least require a command-line
switch to allow them.
Fifth, unset IFS is not guaranteed to restore IFS to its state beforehand.
The RS character is special in jq when used with the --seq command-line option. For example, with a JSON array stored in a shell variable named data we could invoke jq as follows:
$ jq -n --seq --argjson arg '[1,2]' '$arg | .[]'
Here is a transcript:
$ data='["one","two","three"]'
$ jq -n --seq --argjson arg "$data" '$arg | .[]' | tr $'\36' X
X"one"
X"two"
X"three"
$
You simply use bash's $'\30' syntax to insert the special character in-line, like so: echo '["one","two","three"]' | jq '. | join("'$'\30''")'.
Here's the whole working example:
data='["one","two","three"]'
IFS=$'\30'
for rec in $(echo "$data" | jq '. | join("'$'\30''")'); do
echo "Record: $rec"
done
unset IFS
This prints
Record: one
Record: two
Record: three
as expected.
NOTE: It's important not to quote the subshell in the for loop. If you quote it, it will be taken as a single argument, regardless of the RecordSeparator characters. If you don't quote it, it will work as expected.

awk change datetime format

I have huge amount of files where each string is a json with incorrect date format. The format I have for now is 2011-06-02 21:43:59 and what I need to do is to add T in between to transform it to ISO format 2011-06-02T21:43:59.
Can somebody, please, point me to some one liner solution? Was struggling with this for 2 hours, but no luck.
sed will come to your rescue, with a simple regex:
sed 's/\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\) /\1T/g' file > file.new
or, to modify the file in place:
sed -i 's/\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\) /\1T/g' file
Example
echo '2011-06-02 21:43:59' | sed 's/\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\) /\1T/g'
2011-06-02T21:43:59
Read more about regexes here: Regex Tag Info
The following seems to be the working solution:
sed -i -r 's/([0-9]{4}-[0-9]{2}-[0-9]{2}) ([0-9]{2}:[0-9]{2}:[0-9]{2})/\1T\2/g' myfiles
-i to process files
-r is to switch on extended regular expression
([0-9]{4}-[0-9]{2}-[0-9]{2}) - is for date
- the space between date and time in source data
([0-9]{2}:[0-9]{2}:[0-9]{2}) - is for time
Also with awk, you can match group with gensub :
awk '{
print gensub(/([0-9]{4}-[0-9]{2}-[0-9]{2})\s+([0-9]{2}:[0-9]{2}:[0-9]{2})/,
"\\1T\\2",
"g");
}' data.txt
echo '2011-06-02 21:43:59' | awk 'sub(/ /,"T")'
2011-06-02T21:43:59

removing commas from numbers in CSV file

I have a file that has many columns and I only need two of those columns. I am getting the columns I need using
cut -f 2-3 -d, file1.csv > file2.csv
The issue I am having is that the first column is ID and once it gets past 999 it becomes 1,000 and so it is treated as an extra column now. I cant get rid of all commas because I need them to separate the data. Is there a way to use sed to remove commas that only show up between 0-9?
I'd use a real CSV parser, and count backwards from the end of the line:
ruby -rcsv -ne '
row = $_.parse_csv
puts row[-5..-4].to_csv :force_quotes => true
' <<END
999,"someone#example.com","Doe, John","Doe","555-1212","address"
1,234,"email#email.com","name","lastname","phone","address"
END
"someone#example.com","Doe, John"
"email#email.com","name"
This works for the example in the comments:
awk -F'"?,"' '{print $2, $3}' file
The field separator is zero or one " followed by ,". This means that the comma in the first number doesn't count.
To separate the two fields with a comma instead of a space, you can change the OFS variable like this:
awk -F'"?,"' -v OFS=',' '{print $2, $3}' file
Or like this:
awk -F'"?,"' 'BEGIN{OFS=","}{print $2, $3}' file
Alternatively, if you want the quotes as well, you can use printf:
awk -F'"?,"' '{printf "\"%s\",\"%s\"\n", $2, $3}' file
From your comments, it sounds like there is a comma and a space (', ') pattern between tokens.
If this is the case, you can do this easily with sed. The strategy is to first replace all occurrences of , with some unique character sequence (like maybe ||).
's:, :||:g'
From there you can remove all commas:
's:,::g'
Finally, replace the double pipes with comma-space again.
's:||:, :g'
Putting it into one statement:
sed -i -e 's:, :||:g;s:,::g;s:||:, :g' your_odd_file.csv
And a command-line example to try before you buy:
bash$ sed -e 's:, :||:g;s:,::g;s:||:, :g' <<< "1,200,000, hello world, 123,456"
1200000, hello world, 123456
If you are in the unfortunate situation where there is not a space between fields in the CSV - you can attempt to 'fake it' by detecting changes in data type - like where there is a numeric field followed by a text field.
's:,\([^0-9]\):, \1:g' # numeric followed by non-numeric
's:\([^0-9]\),:\1, :g' # non-numeric field followed by something (anything)
You can put this all together into one statement, but you are venturing into dangerous waters here - this will definitely be a one-off solution and should be taken with a large grain of salt.
sed -e 's:,\([^0-9]\):, \1:g;s:\([^0-9]\),:\1, :g' \
-e 's:, :||:g;s:,::g;s:||:, :g' file1.csv > file2.csv
And another example:
bash$ sed -e 's:,\([^0-9]\):, \1:g;s:\([^0-9]\),:\1, :g' \
-e 's:, :||:g;s:,::g;s:||:, :g' <<< "1,200,000,hello world,123,456"
1200000, hello world, 123456