How to convert json from one format to another using jq - json

I would like to convert 500+ json files to csv format using the solution provided in How to convert arbirtrary simple JSON to CSV using jq? but my json files are not in the same format as per the proposed solution.
Following represents a sample json file:
[
{
"jou_entry": {
"id": 655002886,
"units": 2
}
},
{
"jou_entry": {
"id": 655002823,
"units": 4
}
},
{
"jou_entry": {
"id": 657553949,
"units": 6
}
}
]
Where as the proposed solution requires the json in the following format:
[
{
"id": 655002886,
"units": 2
},
{
"id": 655002823,
"units": 4
},
{
"id": 657553949,
"units": 6
}
]
I am able to convert the json from source format to required format using the following jq filter
jq -r '[.[] | ."jou_entry"]'
But I don't like the hard-coding of key "jou_entry" in the filter. As this will require individual key definition for so many files. I would like to have the conversion without the hard-coded value.
How can I do this? Please help

This gets the desired output
jq '[.[] | .[]]'
Explanation from the manual. When .[] is used on an array, it returns all of the elements of an array. When it is used on an object it will return all the values of the object.

Related

Merge and Sort JSON using JQ

I have a file containing the following structure and unknown number of results:
{
"results": [
[
{
"field": "AccountID",
"value": "5177497"
},
{
"field": "Requests",
"value": "50900"
}
],
[
{
"field": "AccountID",
"value": "pro"
},
{
"field": "Requests",
"value": "251"
}
]
],
"statistics": {
"Matched": 51498,
"Scanned": 8673577,
"ScannedByte": 2.72400814E10
},
"status": "HOLD"
}
{
"results": [
[
{
"field": "AccountID",
"value": "5577497"
},
{
"field": "Requests",
"value": "51900"
}
],
"statistics": {
"Matched": 51498,
"Scanned": 8673577,
"ScannedByte": 2.72400814E10
},
"status": "HOLD"
}
There are multiple such results which are indexed as an array with the results folder. They are not seperated by a comma.
I am trying to just print The "AccountID" sorted by "Requests" in ZSH using jq. I have tried flattening them and using:
jq -r '.results[][0] |.value ' filename
jq -r '.results[][1] |.value ' filename
To get the Account ID and Requests seperately and sorting them. I don't think bash has a dictionary that can be used. The problem lies in the file as the Field and value are not key value pair but are both pairs. Therefore extracting them using the above two lines into seperate arrays and sorting by the second array seems a bit too long. I was wondering if there is a way to combine both the operations.
The other way is to combine it all to a string and sort it in ascending order. Python would probably have the best solution but the code requires to be a zsh or bash script.
Solutions that use sed, jq or any other ZSH supported compilers are welcome. If there is a way to create a dictionary in bash, please do let me know.
The projectd output requirement is just the Account ID vs Request Number.
5577497 has 51900 requests
5177497 has 50900 requests
pro has 251 requests
If you don't mind learning a little jq, it will probably be best to write a small jq program to do what you want.
To get you started, consider the following jq program, which assumes your input is a stream of valid JSON objects with a "results" key similar to your sample:
[inputs | .results[] | map( { (.field) : .value} ) | add]
After making minor changes to your input so that it consists of valid JSON objects, an invocation of jq with the -n option produces an array of AccountID/Requests objects:
[
{
"AccountID": "5177497",
"Requests": "50900"
},
{
"AccountID": "pro",
"Requests": "251"
},
{
"AccountID": "5577497",
"Requests": "51900"
}
]
You could (for example) now use jq's group_by to group these objects by AccountID, and thereby produce the result you want.
jq -S '.results[] | map( { (.field) : .value} ) | add' query-results-aggregate \
| jq -s -c 'group_by(.number_of_requests) | .[]'
This does the trick. Thanks to peak for the guidance.

Why is JQ treating arrays as a single field in CSV output?

With the following input file:
{
"events": [
{
"mydata": {
"id": "123456",
"account": "21234"
}
},
{
"mydata": {
"id": "123457",
"account": "21234"
}
}
]
}
When I run it through this JQ filter,
jq ".events[] | [.mydata.id, .mydata.account]" events.json
I get a set of arrays:
[
"123456",
"21234"
]
[
"123457",
"21234"
]
When I put this output through the #csv filter to create CSV output:
jq ".events[] | [.mydata.id, .mydata.account] | #csv" events.json
I get a CSV file with one record per row:
"\"123456\",\"21234\""
"\"123457\",\"21234\""
I would like CSV file with two records per row, like this:
"123456","21234"
"123457","21234"
What am I doing wrong?
Use the -r flag.
Here is the explanation in the manual:
--raw-output / -r: With this option, if the filter's result is a string then it will be written directly to standard output rather than
being formatted as a JSON string with quotes.
jq -r '.events[] | [.mydata.id, .mydata.account] | #csv'
Yields
"123456","21234"
"123457","21234"

Using jq to search value of a property and return another value

Sorry if this sounds too simple but I am still learning and have spent few hours to get a solution. I have a large json file and I would like to search a specific value from an object and return value from other object.
Example, from the below data, I would like to search the json file for all objects that have value in unique_number that match "123456" and return this value along with the IP address.
jq should return something like - 123456, 127.0.0.1
Since the file is going to be about 300 MB with many IP addresses will there be any performace issues?
Partial json -
{
"ip": "127.0.0.1",
"data": {
"tls": {
"status": "success",
"protocol": "tls",
"result": {
"handshake_log": {
"server_hello": {
"version": {
"name": "TLSv1.2",
"value": 1111
},
"random": "dGVzdA==",
"session_id": "dGVzdA==",
"cipher_suite": {
"name": "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
"value": 1122
},
"compression_method": 0,
},
"server_certificates": {
"certificate": {
"raw": "dGVzdA==",
"parsed": {
"version": 3,
"unique_number": "123456",
"signature_algorithm": {
"name": "SHA256-RSA",
"oid": "1.2.4.5.6"
},
The straight-forward way would be to use the select filter (either standalone on multiple values or with map on an array) and filter all objects matching your criterion (e.g. equal to "123456") and then transform into your required output format (e.g. using string interpolation).
jq -r '.[]
| select(.data.tls.result.handshake_log.server_certificates.certificate.parsed.unique_number=="123456")
| "\(.ip), \(.data.tls.result.handshake_log.server_certificates.certificate.parsed.unique_number)"'
Because the unique_number property is nested quite deeply and cumbersome to write twice, it makes sense to first transform your object into something simpler, then filter, and finally output in the desired format:
jq -r '.[]
| { ip, unique_number: .data.tls.result.handshake_log.server_certificates.certificate.parsed.unique_number }
| select(.unique_number=="123456")
| "\(.ip), \(.unique_number)"'
Alternatively using join:
.[]
| { ip, unique_number: .data.tls.result.handshake_log.server_certificates.certificate.parsed.unique_number }
| select(.unique_number=="123456")
| [.ip, .unique_number]
| join(", ")

Perform string manipulation on a value and return the original JSON document with jq

In my JSON document I have a string that I need manipulated and then have the entire document returned with the 'fixed' values.
The input document is:
{
"records" : [
{
"time": "123456789000"
},
{
"time": "123456789000"
}
]
}
I want to find the "time" key and replace the string by dropping off the last 3 chars. The resulting document would be:
{
"records" : [
{
"time": "123456789"
},
{
"time": "123456789"
}
]
}
I've been trying to understand the jq query syntax but I'm not coming right. I'm still struggling to return the whole document when filtering on a specific value. All I have so far is:
.records[] | select(.time | contains("123456789000"))
Here is a solution using |= and string slicing
.records[].time |= .[:-3]
Sample Run (assuming data in data.json)
$ jq -M '.records[].time |= .[:-3]' data.json
{
"records": [
{
"time": "123456789"
},
{
"time": "123456789"
}
]
}
Try it online at jqplay.org
With jq sub() function:
jq '.records[].time |= sub("[0-9]{3}$";"")' file
The output:
{
"records": [
{
"time": "123456789"
},
{
"time": "123456789"
}
]
}
Or even simpler: via dividing the time value by 1000:
jq '.records[].time |= (tonumber / 1000 | tostring)' file
The following works with jq version 1.4 or later:
jq '.records[].time |= .[:-3]' file.json
(The expression .[:-3] is short for .[0:-3]; the negative integer here counts from the right.)
With jq 1.3, the following filter would work in your particular case:
.records[].time |= (tonumber | ./1000 | tostring)

Using jq to list keys in a JSON object

I have a hierarchically deep JSON object created by a scientific instrument, so the file is somewhat large (1.3MB) and not readily readable by people. I would like to get a list of keys, up to a certain depth, for the JSON object. For example, given an input object like this
{
"acquisition_parameters": {
"laser": {
"wavelength": {
"value": 632,
"units": "nm"
}
},
"date": "02/03/2525",
"camera": {}
},
"software": {
"repo": "github.com/username/repo",
"commit": "a7642f",
"branch": "develop"
},
"data": [{},{},{}]
}
I would like an output like such.
{
"acquisition_parameters": [
"laser",
"date",
"camera"
],
"software": [
"repo",
"commit",
"branch"
]
}
This is mainly for the purpose of being able to enumerate what is in a JSON object. After processing the JSON objects from the instrument begin to diverge: for example, some may have a field like .frame.cross_section.stats.fwhm, while others may have .sample.species, so it would be convenient to be able to interrogate the JSON object on the command line.
The following should do exactly what you want
jq '[(keys - ["data"])[] as $key | { ($key): .[$key] | keys }] | add'
This will give the following output, using the input you described above:
{
"acquisition_parameters": [
"camera",
"date",
"laser"
],
"software": [
"branch",
"commit",
"repo"
]
}
Given your purpose you might have an easier time using the paths builtin to list all the paths in the input and then truncate at the desired depth:
$ echo '{"a":{"b":{"c":{"d":true}}}}' | jq -c '[paths|.[0:2]]|unique'
[["a"],["a","b"]]
Here is another variation uing reduce and setpath which assumes you have a specific set of top-level keys you want to examine:
. as $v
| reduce ("acquisition_parameters", "software") as $k (
{}; setpath([$k]; $v[$k] | keys)
)