How can I process oneline json files using `jq` - json

I have a onelined json file that looks similar to this
{"fieldA":1, "fieldB":"foo"}
{"fieldA":2, "fieldB":"bar"}
{"fieldA":4, "fieldB":"foobar"}
...
How can I properly read this file using jq?
I tried doing:
cat myFile.json | jq [.]
but this returns something like:
[{
"fieldA":1,
"fieldB":"foo"
}]
[{
"fieldA":2,
"fieldB":"bar"
}]
[{
"fieldA":4,
"fieldB":"foobar"
}]
...
but I would like to receive this instead:
[{
"fieldA":1,
"fieldB":"foo"
},
{
"fieldA":2,
"fieldB":"bar"
},
{
"fieldA":4,
"fieldB":"foobar"
},
...]
Thanks in advance!

Are you sure you want that? What's your end goal? You can merge all of the inputs into a single array using jq -n '[inputs]' (the -n avoids the usual reading of stdin into ., allowing it all to appear on inputs), but that means that it can't produce any output, or do any further processing, until the entire input has been read, which may or may not be what you want.

Related

Non json output from gcloud ai-platform predict. Parsing non-json outputs

I am using gcloud ai-platform predict to call an endpoint and get predictions as below using json-request and not json-response
gcloud ai-platform predict --json-request instances.json
The response is however not json and hense cannot be read further causing other complications. Below is the response.
VAL HS
0.5 {'hs_1': [[-0.134501, -0.307326, -0.151994, -0.065352, -0.14138]], 'hs_2' : [[-0.134501, -0.307326, -0.151994, -0.065352, 0.020759]]}
Can gcloud ai-platform predict return a json instead or may be parse it differently. ?
Thanks for your help.
Apparently, your output is a table with headers and two columns: a score and the (alleged) JSON content. You should extract the second column of any preferred data row (your example only has one but in general you might receive several score-JSON pairs). Maybe your API already offers functionality to extract a certain 'state', e.g. the one with the highest score. If not, a simple awk or sed script can get this job done easily.
Then, the only remaining issue before having proper JSON (which can then be queried by jq) is with the quoting style. Your output encloses field names with ' instead of " ('lstm_1' instead of "lstm_1"). Correcting thin, unfortunately, is a not-so-easy task if you can expect to receive arbitrarily complex JSON data (such as strings containing quotation marks etc.). However, if your JSON will always look as simple as in the example provided, simply substituting the wrong for the right one becomes an easy task again for tools like awk or sed.
For instance, using sed on your example output to select the second line (which is the first data row), drop everything from the beginning until but not including the first opening curly brace (which marks the beginning of the second column), make said substitutions and pipe the result into jq:
... | sed -n "2{s/^[^{]\+//;s/'/\"/g;p;q}" | jq .
{
"lstm_1": [
[
-0.13450142741203308,
-0.3073260486125946,
-0.15199440717697144,
-0.06535257399082184,
-0.1413831114768982
]
],
"lstm_2": [
[
-0.13450142741203308,
-0.3073260486125946,
-0.15199440717697144,
-0.06535257399082184,
0.02075939252972603
]
]
}
[Edited to reflect upon a comment]
If you want to utilize the score as well, let jq handle it. For instance:
... | sed -n "2{s/'/\"/g;p;q}" | jq -s '{score:first,status:last}'
{
"score": 0.548,
"status": {
"lstm_1": [
[
-0.13450142741203308,
-0.3073260486125946,
-0.15199440717697144,
-0.06535257399082184,
-0.1413831114768982
]
],
"lstm_2": [
[
-0.13450142741203308,
-0.3073260486125946,
-0.15199440717697144,
-0.06535257399082184,
0.02075939252972603
]
]
}
}
[Edited to reflect upon changes in the OP]
As changes affected only names and values but no structure, the hitherto valid approach still holds:
... | sed -n "2{s/'/\"/g;p;q}" | jq -s '{val:first,hs:last}'
{
"val": 0.5,
"hs": {
"hs_1": [
[
-0.134501,
-0.307326,
-0.151994,
-0.065352,
-0.14138
]
],
"hs_2": [
[
-0.134501,
-0.307326,
-0.151994,
-0.065352,
0.020759
]
]
}
}

How to find something in a json file using Bash

I would like to search a JSON file for some key or value, and have it print where it was found.
For example, when using jq to print out my Firefox' extensions.json, I get something like this (using "..." here to skip long parts) :
{
"schemaVersion": 31,
"addons": [
{
"id": "wetransfer#extensions.thunderbird.net",
"syncGUID": "{e6369308-1efc-40fd-aa5f-38da7b20df9b}",
"version": "2.0.0",
...
},
{
...
}
]
}
Say I would like to search for "wetransfer#extensions.thunderbird.net", and would like an output which shows me where it was found with something like this:
{ "addons": [ {"id": "wetransfer#extensions.thunderbird.net"} ] }
Is there a way to get that with jq or with some other json tool?
I also tried to simply list the various ids in that file, and hoped that I would get it with jq '.id', but that just returned null, because it apparently needs the full path.
In other words, I'm looking for a command-line json parser which I could use in a way similar to Xpath tools
The path() function comes in handy:
$ jq -c 'path(.. | select(. == "wetransfer#extensions.thunderbird.net"))' input.json
["addons",0,"id"]
The resulting path is interpreted as "In the addons field of the initial object, the first array element's id field matches". You can use it with getpath(), setpath(), delpaths(), etc. to get or manipulate the value it describes.
Using your example with modifications to make it valid JSON:
< input.json jq -c --arg s wetransfer#extensions.thunderbird.net '
paths as $p | select(getpath($p) == $s) | null | setpath($p;$s)'
produces:
{"addons":[{"id":"wetransfer#extensions.thunderbird.net"}]}
Note
If there are N paths to the given value, the above will produce N lines. If you want only the first, you could wrap everything in first(...).
Listing all the "id" values
I also tried to simply list the various ids in that file
Assuming that "id" values of false and null are of no interest, you can print all the "id" values of interest using the jq filter:
.. | .id? // empty

Search and extract value using JQ command line processor

I have a JSON file very similar to the following:
[
{
"uuid": "832390ed-58ed-4338-bf97-eb42f123d9f3",
"name": "Nacho"
},
{
"uuid": "5b55ea5e-96f4-48d3-a258-75e152d8236a",
"name": "Taco"
},
{
"uuid": "a68f5249-828c-4265-9317-fc902b0d65b9",
"name": "Burrito"
}
]
I am trying to figure out how to use the JQ command line processor to first find the UUID that I input and based on that output the name of the associated item. So for example, if I input UUID a68f5249-828c-4265-9317-fc902b0d65b9 it should search the JSON file, find the matching UUID and then return the name Burrito. I am doing this in Bash. I realize it may require some outside logic in addition to JQ. I will keep thinking about it and put an update here in a bit. I know I could do it in an overly complicated way, but I know there is probably a really simple JQ method of doing this in one or two lines. Please help me.
https://shapeshed.com/jq-json/#how-to-find-a-key-and-value
You can use select:
jq -r --arg query Burrito '.[] | select( .name == $query ) | .uuid ' tst.json

How to get a subobject out of JSON using jq, keeping final key in the result without Bash processing?

I'm writing a Bash function to get a portion of a JSON object. The API for the function is:
GetSubobject()
{
local Filter="$1" # Filter is of the form .<key>.<key> ... .<key>
local File="$2" # File is the JSON to get the subobject
# Code to get subobject using jq
# ...
}
To illustrate what I mean by a subobject, consider the Bash function call:
GetSubobject .b.x.y example.json
where the file example.json contains:
{
"a": { "p": 1, "q": 2 },
"b":
{
"x":
{
"y": { "j": true, "k": [1,2,3] },
"z": [4,5,6]
}
}
}
The result from the function call would be emitted to stdout:
{
"y": {
"j": true,
"k": [
1,
2,
3
]
}
}
Note that the code jq -r "$Filter" "$File" would not give the desired answer. It would give:
{ "j": true, "k": [1,2,3] }
Please note that the answer I'm looking for needs to be something I can use in the Bash function API above. So, the answer should use the Filter and File variables as show above and not be specific to the example above.
I have come up with a solution; however, it relies on Bash to do part of the job. I am hoping that the solution can be pure jq without reliance on Bash processing.
#!/bin/bash
GetSubobject()
{
local Filter="$1"
local File="$2"
# General case: separate:
# .<key1>.<key2> ... .<keyN-1>.<keyN>
# into:
# Prefix=.<key1>.<key2> ... .<keyN-1>
# Suffix=<keyN>
local Suffix="${Filter##*.}"
local Prefix="${Filter%.$Suffix}"
# Edge case: where Filter = .<key>
# Set:
# Prefix=.
# Suffix=<key>
if [[ -z $Prefix ]]; then
Prefix='.'
Suffix="${Filter#.}"
fi
jq -r "$Prefix|to_entries|map(select(.key==\"$Suffix\"))|from_entries" "$File"
}
GetSubobject "$#"
How would I complete the above Bash function using jq to obtain the desired result, hopefully in a less brute-force way that takes advantage of jq's capabilities without having to do pre-processing in Bash?
Somewhat further simplifying the jq part but with the same general constraints as JawguyChooser's answer, how about the much simpler Bash function
GetSubject () {
local newroot=${1##*.}
jq -r "{$newroot: $1}" "$2"
}
I may be overlooking some nuances of your more-complex Bash processing, but this seems to work for the example you provided.
If I understand what you're trying to do correctly, it doesn't seem possible to me to do it "pure jq" having read the docs (and being a regular jq user myself). The closest I could come to helping here was to simplify the jq part itself:
jq -r "$Prefix| { $Suffix }" "$File"
This has the same behavior as your example (on this limited set of cases):
GetSubobject '.b.x.y' example.json
{
"y": {
"j": true,
"k": [
1,
2,
3
]
}
}
This is really a case of metaprogramming, you want to programmatically operate on a jq program. Well, it makes sense (to me) that jq takes its program as input but doesn't allow you to alter the program itself. bash seems like an appropriate choice for doing the metaprogramming here: to convert a jq program into another one and then run jq using that.
If the goal is to do as little as possible in bash, then maybe the following bash function will fill the bill:
function GetSubobject {
local Filter="$1" # Filter is of the form .<key>.<key> ... .<key>
local File="$2" # File is the JSON to get the subobject
jq '(null|path('"$Filter"')) as $path
| {($path[-1]): '"$Filter"'}' "$File"
}
An alternative would be to pass $Filter in as a string (e.g. --arg filter "$Filter"), have jq do the parsing, and then use getpath.
It would of course be simplest if GetSubobject could be called with the path separated from the field of interest, like this:
GetSubobject .b.x y filename

jq construct with value strings spanning multiple lines

I am trying to form a JSON construct using jq that should ideally look like below:-
{
"api_key": "XXXXXXXXXX-7AC9-D655F83B4825",
"app_guid": "XXXXXXXXXXXXXX",
"time_start": 1508677200,
"time_end": 1508763600,
"traffic": [
"event"
],
"traffic_including": [
"unattributed_traffic"
],
"time_zone": "Australia/NSW",
"delivery_format": "csv",
"columns_order": [
"attribution_attribution_action",
"attribution_campaign",
"attribution_campaign_id",
"attribution_creative",
"attribution_date_adjusted",
"attribution_date_utc",
"attribution_matched_by",
"attribution_matched_to",
"attribution_network",
"attribution_network_id",
"attribution_seconds_since",
"attribution_site_id",
"attribution_site_id",
"attribution_tier",
"attribution_timestamp",
"attribution_timestamp_adjusted",
"attribution_tracker",
"attribution_tracker_id",
"attribution_tracker_name",
"count",
"custom_dimensions",
"device_id_adid",
"device_id_android_id",
"device_id_custom",
"device_id_idfa",
"device_id_idfv",
"device_id_kochava",
"device_os",
"device_type",
"device_version",
"dimension_count",
"dimension_data",
"dimension_sum",
"event_name",
"event_time_registered",
"geo_city",
"geo_country",
"geo_lat",
"geo_lon",
"geo_region",
"identity_link",
"install_date_adjusted",
"install_date_utc",
"install_device_version",
"install_devices_adid",
"install_devices_android_id",
"install_devices_custom",
"install_devices_email_0",
"install_devices_email_1",
"install_devices_idfa",
"install_devices_ids",
"install_devices_ip",
"install_devices_waid",
"install_matched_by",
"install_matched_on",
"install_receipt_status",
"install_san_original",
"install_status",
"request_ip",
"request_ua",
"timestamp_adjusted",
"timestamp_utc"
]
}
What I have tried unsuccessfully thus far is below:-
json_construct=$(cat <<EOF
{
"api_key": "6AEC90B5-4169-59AF-7AC9-D655F83B4825",
"app_guid": "komacca-s-rewards-app-au-ios-production-cv8tx71",
"time_start": 1508677200,
"time_end": 1508763600,
"traffic": ["event"],
"traffic_including": ["unattributed_traffic"],
"time_zone": "Australia/NSW",
"delivery_format": "csv"
"columns_order": ["attribution_attribution_action","attribution_campaign","attribution_campaign_id","attribution_creative","attribution_date_adjusted","attribution_date_utc","attribution_matched_by","attribution_matched_to","attributio
network","attribution_network_id","attribution_seconds_since","attribution_site_id","attribution_tier","attribution_timestamp","attribution_timestamp_adjusted","attribution_tracker","attribution_tracker_id","attribution_tracker_name","
unt","custom_dimensions","device_id_adid","device_id_android_id","device_id_custom","device_id_idfa","device_id_idfv","device_id_kochava","device_os","device_type","device_version","dimension_count","dimension_data","dimension_sum","ev
t_name","event_time_registered","geo_city","geo_country","geo_lat","geo_lon","geo_region","identity_link","install_date_adjusted","install_date_utc","install_device_version","install_devices_adid","install_devices_android_id","install_
vices_custom","install_devices_email_0","install_devices_email_1","install_devices_idfa","install_devices_ids","install_devices_ip","install_devices_waid","install_matched_by","install_matched_on","install_receipt_status","install_san_
iginal","install_status","request_ip","request_ua","timestamp_adjusted","timestamp_utc"]
}
EOF)
followed by:-
echo "$json_construct" | jq '.'
I get the following error:-
parse error: Expected separator between values at line 10, column 15
I am guessing it is because of the string literal which spans to multiple lines that jq is unable to parse it.
Use jq itself:
my_formatted_json=$(jq -n '{
"api_key": "XXXXXXXXXX-7AC9-D655F83B4825",
"app_guid": "XXXXXXXXXXXXXX",
"time_start": 1508677200,
"time_end": 1508763600,
"traffic": ["event"],
"traffic_including": ["unattributed_traffic"],
"time_zone": "Australia/NSW",
"delivery_format": "csv",
"columns_order": [
"attribution_attribution_action",
"attribution_campaign",
...,
"timestamp_utc"
]
}')
Your input "JSON" is not valid JSON, as indicated by the error message.
The first error is that a comma is missing after the key/value pair: "delivery_format": "csv", but there are others -- notably, JSON strings cannot be split across lines. Once you fix the key/value pair problem and the JSON strings that are split incorrectly, jq . will work with your text. (Note that once your input is corrected, the longest JSON string is quite short -- 50 characters or so -- whereas jq has no problems processing strings of length 10^8 quite speedily ...)
Generally, jq is rather permissive when it comes to JSON-like input, but if you're ever in doubt, it would make sense to use a validator such as the online validator at jsonlint.com
By the way, the jq FAQ does suggest various ways for handling input that isn't strictly JSON -- see https://github.com/stedolan/jq/wiki/FAQ#processing-not-quite-valid-json
Along the lines of chepner's suggestion since jq can read raw text data you could just use a jq filter to generate a legal json object from your script variables. For example:
#!/bin/bash
# whatever logic you have to obtain bash variables goes here
key=XXXXXXXXXX-7AC9-D655F83B4825
guid=XXXXXXXXXXXXXX
# now use jq filter to read raw text and construct legal json object
json_construct=$(jq -MRn '[inputs]|map(split(" ")|{(.[0]):.[1]})|add' <<EOF
api_key $key
app_guid $guid
EOF)
echo $json_construct
Sample Run (assumes executable script is in script.sh)
$ ./script.sh
{ "api_key": "XXXXXXXXXX-7AC9-D655F83B4825", "app_guid": "XXXXXXXXXXXXXX" }
Try it online!