How can I extract specific lines from a json document? - json

I have a json file with thousands of lines and 90 json objects.
Each object come with the following structure:
{
"country_codes": [
"GB"
],
"institution_id": "ins_118309",
"name": "Barclaycard (UK) - Online Banking: Personal", // I want to extract this line only
"oauth": true,
"products": [
"assets",
"auth",
"balance",
"transactions",
"identity",
"standing_orders"
],
"routing_numbers": []
},
For the ninety objects, I would like to delete all the lines and keep only the one with the name of the institution.
I guess that I will have to use a regex here?
I'm happy to use with vim, sublime, vscode or any other code editor that will alow me to do so
How can I extract these lines so I will stay with the following 90 lines?
"name": "Barclaycard (UK) - Online Banking: Personal",
"name": "Metro Bank - Commercial and Business Online Plus",
...
...
"name": "HSBC (UK) - Business",

If you must use a code editor, then in Vim you can delete all lines not
matching a pattern with: :v/^\s*"name":/d
The above pattern says:
^ line begins with
\s* zero or more white spaces
"name:" (pretty explanatory)
Although it's better to use a dedicated tool for parsing json files
rather than regex as json is not a 'regular
language'.
Bonus
If you do end up doing it in Vim, you can finish up by left align all the lines, do :%left or even just :%le.

That doesn't sound like the job for a text editor or even for regular expressions. How about using the right tool for the job™?
# print only the desired fields to stdout
$ jq '.[] | .name' < in.json
# write only the desired fields to file
$ jq '.[] | .name' < in.json > out.json
See https://stedolan.github.io/jq/.
If you really want to do it from a text editor, the simplest is still to filter the current buffer through a specialized external tool. In Vim, it would look like this:
:%!jq '.[] | .name'
See :help filter.
FWIW, here it is with another right tool for the job™:
:%!jj \\#.name -l
See https://github.com/tidwall/jj.

you can use grep eventually :
grep '^\s*"name":' your_file.json

In VSC
select institution_id
execute Selection > Select All Occurenses
Arrow Left
Ctrl+X
Esc
Ctrl+V

In vscode (although I would think it is the same for any regex-handling editor), use this Find:
^(?!\s*"name":.*).*\n?|^\s*
and replace with nothing. See regex101 demo.
^(?!\s*"name":.*).*\n? : get all lines that are not followed by "name":... including the newline so that line is completely discarded.
^\s* gets the whitespace before "name":...... - also discarded since we are replacing all matches with nothing.

Parsing JSON in Vim natively:
call getline(1, '$')
\ ->join("\n")
\ ->json_decode()
\ ->map({_, v -> printf('"name": "%s",', v.name)})
\ ->append('$')
NB. Line continuation is only available when sourcing script from file. If run interactively then type command on a single line.

Related

Only want filename from url in json file

I have the following json file below:
{"cloud":"https://cloudfronturl/folder/folder",
"env": "int"
"sources":["https://www.example.com/some.tar.gz","https://www.example2.com/folder1/folder2/another.tar.gz"],
"owner": "some manager"
}
How can I modify the file to read like below, where only the file names stripped from sources url? Don't want to touch cloud value
{"cloud":"https://cloudfronturl/folder/folder",
"env": "int"
"sources":["some.tar.gz","another.tar.gz"],
"owner": "some manager"
}
Assuming your JSON snippet is fixed and using jq is an option, you could do
jq '.sources[] |= ( split("/") | last )'
Use this Perl one-liner:
perl -i.bak -pe 's{http[^"]+/}{}g' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
The regex uses this modifier:
/g : Match the pattern repeatedly.
http[^"]+/ : literal http, followed by any character other than ", repeated 1 or more times.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start

How can I format a json file into a bash environment variable?

I'm trying to take the contents of a config file (JSON format), strip out extraneous new lines and spaces to be concise and then assign it to an environment variable before starting my application.
This is where I've got so far:
pwr_config=`echo "console.log(JSON.stringify(JSON.parse(require('fs').readFileSync(process.argv[2], 'utf-8'))));" | node - config.json | xargs -0 printf '%q\n'` npm run start
This pipes a short node.js app into the node runtime taking an argument of the file name and it parses and stringifies the JSON file to validate it and remove any unnecessary whitespace. So far so good.
The result of this is then piped to printf, or at least it would be but printf doesn't support input in this way, apparently, so I'm using xargs to pass it in in a way it supports.
I'm using the %q formatter to format the string escaping any characters that would be a problem as part of a command, but when calling printf through xargs, printf claims it doesn't support %q. I think this is perhaps because there is more than one version of printf but I'm not exactly sure how to resolve that.
Any help would be appreciated, even if the solution is completely different from what I've started :) Thanks!
Update
Here's the output I get on MacOS:
$ cat config.json | xargs -0 printf %q
printf: illegal format character q
My JSON file looks like this:
{
"hue_host": "192.168.1.2",
"hue_username": "myUsername",
"port": 12000,
"player_group_config": [
{
"name": "Family Room",
"player_uuid": "ATVUID",
"hue_group": "3",
"on_events": ["media.play", "media.resume"],
"off_events": ["media.stop", "media.pause"]
},
{
"name": "Lounge",
"player_uuid": "STVUID",
"hue_group": "1",
"on_events": ["media.play", "media.resume"],
"off_events": ["media.stop", "media.pause"]
}
]
}
Two ways:
Use xargs to pick up bash's printf builtin instead of the printf(1) executable, probably in /usr/bin/printf(thanks to #GordonDavisson):
pwr_config=`echo "console.log(JSON.stringify(JSON.parse(require('fs').readFileSync(process.argv[2], 'utf-8'))));" | node - config.json | xargs -0 bash -c 'printf "%q\n"'` npm run start
Simpler: you don't have to escape the output of a command if you quote it. In the same way that echo "<|>" is OK in bash, this should also work:
pwr_config="$(echo "console.log(JSON.stringify(JSON.parse(require('fs').readFileSync(process.argv[2], 'utf-8'))));" | node - config.json )" npm run start
This uses the newer $(...) form instead of `...`, and so the result of the command is a single word stored as-is into the pwr_config variable.*
Even simpler: if your npm run start script cares about the whitespace in your JSON, it's fundamentally broken :) . Just do:
pwr_config="$(< config.json)" npm run start
The $(<...) returns the contents of config.json. They are all stored as a single word ("") into pwr_config, newlines and all.* If something breaks, either config.json has an error and should be fixed, or the code you're running has an error and needs to be fixed.
* You actually don't need the "" around $(). E.g., foo=$(echo a b c) and foo="$(echo a b c)" have the same effect. However, I like to include the "" to remind myself that I am specifically asking for all the text to be kept together.

map, but with newline characters between keypairs

Say I have input like
{"DESCRIPTION": "Need to run script to do stuff", "PRIORITY": "Medium"}
but also get input like
{"STACK_NAME": "applecakes", "BACKEND_OR_INTEGRATIONS": "integrations", "PRIORITY": "Medium"}
ie, the parameters can be completely different.
I need to get the output in a format more friendly to send to Jira to make tickets. Specifically, I would like to strip the json formatting away, and insert a \n between each keypair. Here's what the above samples should look like:
DESCRIPTION: Need to run script to do stuff\nPRIORITY: Medium
STACK_NAME: applecakes\nBACKEND_OR_INTEGRATIONS: integrations\nPRIORITY: Medium
There can be a little flexibility in that if, for example, more spaces were needed or whatever.
So far I've got this worked out (assuming my input is stored in a variable called description
echo $description | jq -r "to_entries|map(\"\(.key)=\(.value|tostring)\")|.[]"
This works to strip away the JSON formatting, but doesn't handle newlines. I'm stumped on how to make sure I split only on each keypair, not on say every space or anything equally messy. What do I need to add to include newlines? Is a map even my best choice?
Just join what the array of strings with \\n (the sequence of the \ character which we need to escape and the n character) and use raw-output :
jq --raw-output 'to_entries | map("\(.key) : \(.value)") | join("\\n")'
Try it here.
Or more efficiently and more simply:
jq -r 'to_entries[] | "\(.key) : \(.value)"'
This produces one line per key-value pair.
The two-character sequence \n as a join-string
With your sample JSON, the invocation:
jq -j -r 'to_entries[] | "\(.key) : \(.value)", "\\n" '
would produce:
STACK_NAME : applecakes\nBACKEND_OR_INTEGRATIONS : integrations\nPRIORITY : Medium\n
Notice the trailing "\n".

Is there a `jq` command line tool or wrapper which lets you interactively explore `jq` similar to `jmespath.terminal`

jq is a lightweight and flexible command-line JSON processor.
https://stedolan.github.io/jq/
Is there a jq command line tool or wrapper which lets you pipe output into it and interactively explore jq, with the JSON input in one pane and your interactively updating result in another pane, similar to jmespath.terminal ?
I'm looking for something similar to the JMESPath Terminal jpterm
"JMESPath exploration tool in the terminal"
https://github.com/jmespath/jmespath.terminal
I found this project jqsh but it's not maintained and it appears to produce a lot of errors when I use it.
https://github.com/bmatsuo/jqsh
I've used https://jqplay.org/ and it's a great web based jq learning tool. However, I want to be able to, in the shell, pipe the json output of a command into an interactive jq which allows me to explore and experiment with jq commands.
Thanks in advance!
I've been using jiq and I'm pretty happy with it.
https://github.com/fiatjaf/jiq
It's jid with jq.
You can drill down interactively by using jq filtering queries.
jiq uses jq internally, and it requires you to have jq in your PATH.
Using the aws cli
aws ec2 describe-regions --region-names us-east-1 us-west-1 | jiq
jiq output
[Filter]> .Regions
{
"Regions": [
{
"Endpoint": "ec2.us-east-1.amazonaws.com",
"RegionName": "us-east-1"
},
{
"Endpoint": "ec2.us-west-1.amazonaws.com",
"RegionName": "us-west-1"
}
]
}
https://github.com/simeji/jid
n.b. I'm not clear how strictly it follows jq syntax and feature set
You may have to roll-your-own.
Of course, jq itself is interactive in the sense that if you invoke it without specifying any JSON input, it will process STDIN interactively.
If you want to feed the same data to multiple programs, you could easily write your own wrapper. Over at github, there's a bash script named jqplay that has a few bells and whistles. For example, if the input command begins with | then the most recent result is used as input.
Example 1
./jqplay -c spark.json
Enter a jq filter (possibly beginning with "|"), or blank line to terminate:
.[0]
{"name":"Paddington","lovesPandas":null,"knows":{"friends":["holden","Sparky"]}}
.[1]
{"name":"Holden"}
| .name
"Holden"
| .[0:1]
"H"
| length
1
.[1].name
"Holden"
Bye.
Example 2
./jqplay -n
Enter a jq filter (possibly beginning and/or ending with "|"), or blank line to terminate:
?
An initial | signifies the filter should be applied to the previous jq
output.
A terminating | causes the next line that does not trigger a special
action to be appended to the current line.
Special action triggers:
:exit # exit this script, also triggered by a blank line
:help # print this help
:input PATHNAME ...
:options OPTIONS
:save PN # save the most recent output in the named file provided
it does not exist
:save! PN # save the most recent output in the named file
:save # save to the file most recently specified by a :save command
:show # print the OPTIONS and PATHNAMEs currently in effect
:! PN # equivalent to the sequence of commands
:save! PN
:input PN
? # print this help
# # ignore this line
1+2
3
:exit
Bye.
If you're using Emacs (or willing to) then JQ-mode allows you to run JQ filters interactively on the current JSON document buffer:
https://github.com/ljos/jq-mode
There is a new one: https://github.com/PaulJuliusMartinez/jless
JLess is a command-line JSON viewer designed for reading, exploring, and searching through JSON data.
JLess will pretty print your JSON and apply syntax highlighting.
Expand and collapse Objects and Arrays to grasp the high- and low-level structure of a JSON document. JLess has a large suite of vim-inspired commands that make exploring data a breeze.
JLess supports full text regular-expression based search. Quickly find the data you're looking for in long String values, or jump between values for the same Object key.

Alter log file date with the command sed?

i have the following line multiple times in a log file with other data.
And i like to analyze this data by importing the json part to a mongodb first and the run selected queries over it.
DEBUG 2015-04-18 23:13:23,374 [TEXT] (Class.java:19) - {"a":"1", "b":"2", ...}
To alter the data just to get the json part i use:
cat mylog.log | sed "s/DEBUG.*19) - //g" > mylog.json
The main problem here is, that is like to add the date and time part as well and as an additional json value to get something like this:
{"date": "2015-04-18", "time":"23:13:26,374", "a":"1", "b":"2", ...}
Here is the main question. How can i do this by using the linux console and the comman sed? Or by an alternative console command?
thx in advance
Since this appears to be a very rigid format, you could probably use sed like so:
sed 's/DEBUG \([^ ]*\) \([^ ]*\).*19) - {/{ "date": "\1", "time": "\2", /' mylog.log
Where [^ ]* matches a sequence of non-space characters and \(regex\) is a capturing group that makes a matched string available for use in the replacement as \1, \2, and so forth depending on its position. You can see these used in the replacement part.
If it were me, though, I'd use Perl for its ability to split a line into fields and match non-greedily:
perl -ape 's/.*?{/{ "date": "$F[1]", "time": "$F[2]", /' mylog.log
The latter replaces everything up to the first { (because .*? matches non-greedily) and replaces it with the string you want. $F[1] and $F[2] are the second and third whitespace-delimited field in the line; -a makes Perl split the line into the #F array this way.