How to convert JSON to tsv using jq in unix? - json

I need to convert this JSON to a TSV format. I've a source file like this:
{
"event": "log",
"timestamp": 1535306331840,
"tags": [
"info"
],
"data": {
"_id": "A301180827005852329209020",
"msisdn": "6282134920902",
"method": "get",
"url": "/api/tcash/balance",
"timeTaken": 32,
"channelid": "UX"
},
"pid": 7920
}
Then I want to convert it to tsv which are consist of below column:
event, timestamp, tags, _id, msisdn, method, url, timeTaken, channelID, pid

You just have to construct an array of atomic values. Since .tags is not atomic, in the following I'll assume (as suggested by #chepner) that we can use .tags|join(","), though you might want to use something else, such as .tags|#csv:
[.event, .timestamp, (.tags | join(","))]
+ (.data|[._id, .msisdn, .method, .url, .timeTaken, .channelID])
+ [.pid]
| #tsv

Related

how to denormalise this json structure

I have a json formatted overview of backups, generated using pgbackrest. For simplicity I removed a lot of clutter so the main structures remain. The list can contain multiple backup structures, I reduced here to just 1 for simplicity.
[
{
"backup": [
{
"archive": {
"start": "000000090000000200000075",
"stop": "000000090000000200000075"
},
"info": {
"size": 1200934840
},
"label": "20220103-122051F",
"type": "full"
},
{
"archive": {
"start": "00000009000000020000007D",
"stop": "00000009000000020000007D"
},
"info": {
"size": 1168586300
},
"label": "20220103-153304F_20220104-081304I",
"type": "incr"
}
],
"name": "dbname1"
}
]
Using jq I tried to generate a simpeler format out of this, until now without any luck.
What I would like to see is the backup.archive, backup.info, backup.label, backup.type, name combined in one simple structure, without getting into a cartesian product. I would be very happy to get the following output:
[
{
"backup": [
{
"archive": {
"start": "000000090000000200000075",
"stop": "000000090000000200000075"
},
"name": "dbname1",
"info": {
"size": 1200934840
},
"label": "20220103-122051F",
"type": "full"
},
{
"archive": {
"start": "00000009000000020000007D",
"stop": "00000009000000020000007D"
},
"name": "dbname1",
"info": {
"size": 1168586300
},
"label": "20220103-153304F_20220104-081304I",
"type": "incr"
}
]
}
]
where name is redundantly added to the list. How can I use jq to convert the shown input to the requested output? In the end I just want to generate a simple csv from the data. Even with the simplified structure using
'.[].backup[].name + ":" + .[].backup[].type'
I get a cartesian product:
"dbname1:full"
"dbname1:full"
"dbname1:incr"
"dbname1:incr"
how to solve that?
So, for each object in the top-level array you want to pull in .name into each of its .backup array's elements, right? Then try
jq 'map(.backup[] += {name} | del(.name))'
Demo
Then, generating a CSV output using jq is easy: There is a builtin called #csv which transforms an array into a string of its values with quotes (if they are stringy) and separated by commas. So, all you need to do is to iteratively compose your desired values into arrays. At this point, removing .name is not necessary anymore as we are piecing together the array for CSV output anyway. And we're giving the -r flag to jq in order to make the output raw text rather than JSON.
jq -r '.[]
| .backup[] + {name}
| [(.archive | .start, .stop), .name, .info.size, .label, .type]
| #csv
'
Demo
First navigate to backup and only then “print” the stuff you’re interested.
.[].backup[] | .name + ":" + .type

How to parse this boolean contained JSON output with jq?

The JSON output I am trying to parse:
{
"success": true,
"data": {
"aa": [
{
"timestamp": 123456,
"price": 1
},
{
"timestamp": 123457,
"price": 2
],
"bb": [
{
"timestamp": 123456,
"price": 3
},
{
"timestamp": 123457,
"price": 4
}
]
}
}
So after banging my head against the wall a million times, I just removed the "success": true", line from the output and I could easily do jq stuff with it. Otherwise if I ran for example:
cat jsonfile.json | jq -c .[].aa
I would get:
Cannot index boolean with string "aa"
Which makes sense, since the first key is boolean. But I have no clue how to skip it while processing with jq.
Goal is to filter only timestamp and price of "aa", without giving any care about the "success": true key/value pair.
You need to select the data field first: jq .data.aa[]

InfluxDB query in json format transform to csv with jq including tags and fields

I want to process data with a bash script but have trouble to get the InfluxDB output to the desired csv output with all tags and fields.
Below an example output from an influx query:
{
"results": [
{
"series": [
{
"name": "tickerPrice",
"tags": {
"symbol": "AAVE",
"symbolTo": "EUR"
},
"columns": [
"time",
"priceMean"
],
"values": [
[
1614402874120627200,
282.398263888889
]
]
},
{
"name": "tickerPrice",
"tags": {
"symbol": "BTC",
"symbolTo": "EUR"
},
"columns": [
"time",
"priceMean"
],
"values": [
[
1614402874120627200,
39189.756944444445
]
]
}
]
}
]
}
And I would like to transform it to:
"name","symbol","symbolTo","time","priceMean"
"tickerPrice","AAVE","EUR",1614402874120627200,282.398263888889
"tickerPrice","BTC","EUR",1614402874120627200,39189.756944444445
I have managed (google) to get the fields to a csv format but till now not managed to get all data in the csv. Here is the commands that I use for that:
$ jq -r '(.results[0].series[0].columns), (.results[0].series[].values[])'
Because this is not the only query I want to do it would be nice that it is universal for the content, so the number of fields and tags could be different.
Why you just don't specify csv format directly in influxdb CLI https://docs.influxdata.com/influxdb/v1.8/tools/shell/ :
-format 'json|csv|column' Specifies the format of the server responses.
So you won't need any result post processing.
The following produces the required output in a way that
allows for multiple values of "time" in each .values array, but does not refer to the specific headers except for "name":
def headers:
(.tags | keys_unsorted) as $tags
| (["name"] + $tags + .columns);
.results[0]
| (.series[0] | headers),
(.series[] | ([.name, .tags[]] + .values[]))
| #csv
This of course assumes that the separate "series" are conformal.

Using jq to search value of a property and return another value

Sorry if this sounds too simple but I am still learning and have spent few hours to get a solution. I have a large json file and I would like to search a specific value from an object and return value from other object.
Example, from the below data, I would like to search the json file for all objects that have value in unique_number that match "123456" and return this value along with the IP address.
jq should return something like - 123456, 127.0.0.1
Since the file is going to be about 300 MB with many IP addresses will there be any performace issues?
Partial json -
{
"ip": "127.0.0.1",
"data": {
"tls": {
"status": "success",
"protocol": "tls",
"result": {
"handshake_log": {
"server_hello": {
"version": {
"name": "TLSv1.2",
"value": 1111
},
"random": "dGVzdA==",
"session_id": "dGVzdA==",
"cipher_suite": {
"name": "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
"value": 1122
},
"compression_method": 0,
},
"server_certificates": {
"certificate": {
"raw": "dGVzdA==",
"parsed": {
"version": 3,
"unique_number": "123456",
"signature_algorithm": {
"name": "SHA256-RSA",
"oid": "1.2.4.5.6"
},
The straight-forward way would be to use the select filter (either standalone on multiple values or with map on an array) and filter all objects matching your criterion (e.g. equal to "123456") and then transform into your required output format (e.g. using string interpolation).
jq -r '.[]
| select(.data.tls.result.handshake_log.server_certificates.certificate.parsed.unique_number=="123456")
| "\(.ip), \(.data.tls.result.handshake_log.server_certificates.certificate.parsed.unique_number)"'
Because the unique_number property is nested quite deeply and cumbersome to write twice, it makes sense to first transform your object into something simpler, then filter, and finally output in the desired format:
jq -r '.[]
| { ip, unique_number: .data.tls.result.handshake_log.server_certificates.certificate.parsed.unique_number }
| select(.unique_number=="123456")
| "\(.ip), \(.unique_number)"'
Alternatively using join:
.[]
| { ip, unique_number: .data.tls.result.handshake_log.server_certificates.certificate.parsed.unique_number }
| select(.unique_number=="123456")
| [.ip, .unique_number]
| join(", ")

why parsing with jq returns null?

i am trying to extract values of 3 fields (status, id, name) from my json file by using jq tool, here is my json:
cat parse.json
{
"stream": {
"_id": 65675798730520654496,
"broadcast_platform": "live",
"community_id": "",
"community_ids": [],
"average_fps": 60.0247524752,
"delay": 0,
"created_at": "2018-09-26T07:25:38Z",
"is_playlist": false,
"stream_type": "live",
"preview": {
"small": "https://static-cdn.jtvnw.net/previews-ttv/live_user_versuta-80x4512wdfqf.jpg",
},
"channel": {
"mature": true,
"status": "status",
"broadcaster_language": "ru",
"broadcaster_software": "",
"_id": 218025408945123423423445,
"name": "djvbsdhvsdvasdv",
"created_at": "2011-04-17T17:31:36.091604Z",
"updated_at": "2018-09-26T09:49:04.434245Z",
"partner": true,
"video_banner": null,
"profile_banner": "https://static-cdn.jtvnw.net/jtv_user_pictures/25c2bec3-95b8-4347-aba0-128b3b913b0d-profile_banner-480.png",
"profile_banner_background_color": "",
"views": 103911737,
"followers": 446198,
"broadcaster_type": "",
"description": "",
"private_video": false,
"privacy_options_enabled": false
}
}
}
online json validators say that it is valid, when i try to get some field it return null
cat parse.json | jq '.channel'
null
cat parse.json | jq '.channel.status'
null
what am i doing wrong guys ?
Your JSON object has a top-level field "stream" You need to access "stream" to access the other sub-properties, e.g. channel:
jq '.stream.channel.status' parse.json
You can also do cat parse.json | jq '.stream.channel.status'.
Your example JSON is also invalid because the stream.preview.small property has a trailing comma. Simply removing that comma will make it valid, though.
To deal with the invalid JSON, you could use a JSON rectifier such as hjson; to avoid any hassles associated with identifying the relevant paths, you could use ..|objects. Thus for example:
$ hjson -j parse.json | jq '..|objects|select(.status) | .status, ._id, .name'
"status"
218025408945123440000000
"djvbsdhvsdvasdv"