How can I convert nested JSON to CSV on command line?

How can I convert nested JSON to CSV on command line? - json

I have a JSON file who I am trying to convert to CSV using jq, but I've been having a lot of problems, this is the JSON:
{
"nhits": 2,
"parameters": {
"dataset": "real-time-bezettingen-fietsenstallingen-gent",
"rows": 10,
"start": 0,
"facet": [
"facilityname"
],
"format": "json",
"timezone": "UTC"
},
"records": [
{
"datasetid": "real-time-bezettingen-fietsenstallingen-gent",
"recordid": "d471594688a931ba8d81f8d883874a08cee84775",
"fields": {
"id": "48-2",
"freeplaces": 71,
"facilityname": "Braunplein",
"geo_point_2d": [
51.05406845807926,
3.723722319130363
],
"time": "2022-11-10T12:18:01+00:00",
"totalplaces": 116,
"occupiedplaces": 45,
"bezetting": 38
},
"geometry": {
"type": "Point",
"coordinates": [
3.723722319130363,
51.05406845807926
]
},
"record_timestamp": "2022-11-10T12:18:04.838Z"
},
{
"datasetid": "real-time-bezettingen-fietsenstallingen-gent",
"recordid": "d0121748cf31c7e1c02d99712bdf07cb33156689",
"fields": {
"id": "48-1",
"freeplaces": 65,
"facilityname": "Korenmarkt",
"geo_point_2d": [
51.05388288288933,
3.7214177570400473
],
"time": "2022-11-10T12:18:01+00:00",
"totalplaces": 235,
"occupiedplaces": 170,
"bezetting": 72
},
"geometry": {
"type": "Point",
"coordinates": [
3.7214177570400473,
51.05388288288933
]
},
"record_timestamp": "2022-11-10T12:18:04.838Z"
}
],
"facet_groups": [
{
"name": "facilityname",
"facets": [
{
"name": "Braunplein",
"count": 1,
"state": "displayed",
"path": "Braunplein"
},
{
"name": "Korenmarkt",
"count": 1,
"state": "displayed",
"path": "Korenmarkt"
}
]
}
]
}
I only want to have the columns facilityname, time, totalplaces, occupiedplaces and bezetting, I tried converting using the following command:
jq -r '["naam", "tijd", "totaalAantalPlaatsen", "bezettePlaatsen", "bezetting"] , .records[] | (.fields[] | [.facilityname, .time, .totalplaces, .occupiedplaces, .bezetting]) | #csv' data.json
But I get the error:
jq: error (at data.json:0): Cannot index array with string "fields"
Does anyone know what I'm doing wrong?

You just need some parentheses around the .records[] ... part
jq -r '
["name", "tijd", "totaalAantalPlaatsen", "bezettePlaatsen", "bezetting"],
(.records[].fields | [.facilityname, .time, .totalplaces, .occupiedplaces, .bezetting])
| #csv
' file.json

Related

Cannot get jq to query json object [duplicate]

This question already has answers here:
How to use jq when the variable has reserved characters?
(3 answers)
Closed 6 months ago.
I have a JSON file that I am trying to query with jq. I am unable to retrieve the observations. I am trying to retieve each of the "observations using the following command and not able to get to the result:
cat sample3.json | jq .dataSets[0].series.0:0:0:0:0.observations.0[0]
I am able to retieve up to the series using:
cat sample3.json | jq .dataSets[0].series
But once I try to drill down further I am getting a compile error:
$ cat sample3.json | jq .dataSets[0].series.0:0:0:0:0
jq: error: syntax error, unexpected LITERAL, expecting end of file (Unix shell quoting issues?) at <top-level>, line 1:
.dataSets[0].series.0:0:0:0:0
jq: 1 compile error
I am not sure what I am doing wrong here....
The input file is:
{
"header": {
"id": "b8be2cd5-33bf-4687-9e81-eb032f6f8a71",
"test": false,
"prepared": "2022-09-01T13:30:57.013+02:00",
"sender": {
"id": "ECB"
}
},
"dataSets": [
{
"action": "Replace",
"validFrom": "2022-09-01T13:30:57.013+02:00",
"series": {
"0:0:0:0:0": {
"attributes": [
0,
null,
0,
null,
null,
null,
null,
null,
null,
null,
null,
null,
0,
null,
0,
null,
0,
0,
0,
0
],
"observations": {
"0": [
1.4529,
0,
0,
null,
null
],
"1": [
1.4472,
0,
0,
null,
null
],
"2": [
1.4591,
0,
0,
null,
null
]
}
}
}
}
],
"structure": {
"links": [
{
"title": "Exchange Rates",
"rel": "dataflow",
"href": "https://sdw-wsrest.ecb.europa.eu:443/service/dataflow/ECB/EXR/1.0"
}
],
"name": "Exchange Rates",
"dimensions": {
"series": [
{
"id": "FREQ",
"name": "Frequency",
"values": [
{
"id": "D",
"name": "Daily"
}
]
},
{
"id": "CURRENCY",
"name": "Currency",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "CURRENCY_DENOM",
"name": "Currency denominator",
"values": [
{
"id": "EUR",
"name": "Euro"
}
]
},
{
"id": "EXR_TYPE",
"name": "Exchange rate type",
"values": [
{
"id": "SP00",
"name": "Spot"
}
]
},
{
"id": "EXR_SUFFIX",
"name": "Series variation - EXR context",
"values": [
{
"id": "A",
"name": "Average"
}
]
}
],
"observation": [
{
"id": "TIME_PERIOD",
"name": "Time period or range",
"role": "time",
"values": [
{
"id": "2022-08-29",
"name": "2022-08-29",
"start": "2022-08-29T00:00:00.000+02:00",
"end": "2022-08-29T23:59:59.999+02:00"
},
{
"id": "2022-08-30",
"name": "2022-08-30",
"start": "2022-08-30T00:00:00.000+02:00",
"end": "2022-08-30T23:59:59.999+02:00"
},
{
"id": "2022-08-31",
"name": "2022-08-31",
"start": "2022-08-31T00:00:00.000+02:00",
"end": "2022-08-31T23:59:59.999+02:00"
}
]
}
]
},
"attributes": {
"series": [
{
"id": "TIME_FORMAT",
"name": "Time format code",
"values": [
{
"name": "P1D"
}
]
},
{
"id": "BREAKS",
"name": "Breaks",
"values": []
},
{
"id": "COLLECTION",
"name": "Collection indicator",
"values": [
{
"id": "A",
"name": "Average of observations through period"
}
]
},
{
"id": "COMPILING_ORG",
"name": "Compiling organisation",
"values": []
},
{
"id": "DISS_ORG",
"name": "Data dissemination organisation",
"values": []
},
{
"id": "DOM_SER_IDS",
"name": "Domestic series ids",
"values": []
},
{
"id": "PUBL_ECB",
"name": "Source publication (ECB only)",
"values": []
},
{
"id": "PUBL_MU",
"name": "Source publication (Euro area only)",
"values": []
},
{
"id": "PUBL_PUBLIC",
"name": "Source publication (public)",
"values": []
},
{
"id": "UNIT_INDEX_BASE",
"name": "Unit index base",
"values": []
},
{
"id": "COMPILATION",
"name": "Compilation",
"values": []
},
{
"id": "COVERAGE",
"name": "Coverage",
"values": []
},
{
"id": "DECIMALS",
"name": "Decimals",
"values": [
{
"id": "4",
"name": "Four"
}
]
},
{
"id": "NAT_TITLE",
"name": "National language title",
"values": []
},
{
"id": "SOURCE_AGENCY",
"name": "Source agency",
"values": [
{
"id": "4F0",
"name": "European Central Bank (ECB)"
}
]
},
{
"id": "SOURCE_PUB",
"name": "Publication source",
"values": []
},
{
"id": "TITLE",
"name": "Title",
"values": [
{
"name": "Australian dollar/Euro"
}
]
},
{
"id": "TITLE_COMPL",
"name": "Title complement",
"values": [
{
"name": "ECB reference exchange rate, Australian dollar/Euro, 2:15 pm (C.E.T.)"
}
]
},
{
"id": "UNIT",
"name": "Unit",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "UNIT_MULT",
"name": "Unit multiplier",
"values": [
{
"id": "0",
"name": "Units"
}
]
}
],
"observation": [
{
"id": "OBS_STATUS",
"name": "Observation status",
"values": [
{
"id": "A",
"name": "Normal value"
}
]
},
{
"id": "OBS_CONF",
"name": "Observation confidentiality",
"values": [
{
"id": "F",
"name": "Free"
}
]
},
{
"id": "OBS_PRE_BREAK",
"name": "Pre-break observation value",
"values": []
},
{
"id": "OBS_COM",
"name": "Observation comment",
"values": []
}
]
}
}
}

The .foo syntax cannot be used if the key name has anything but alphanumeric characters or the underscore, or if the first character of the key name is numeric.
Assuming you are using a recent version of jq,
you can always use the form: ."foo", which is actually an abbreviation of the basic form, .["foo"].
So assuming you're using a sufficiently recent version of jq, your query could begin with:
.dataSets[0].series."0:0:0:0:0"
If you are presenting the jq query on a command line, then you may have to escape the double-quotes appropriately, e.g. in a bash shell, by enclosing the jq query in single-quotes.

jq return a json array in a very specifique way

I have this Json ( is a test database, no data is true here )
{
"pguid": "4EA979A2-E578-4DA3-89DB-24082F3092AA",
"lastEnrollTguid": "EA98B161-04D3-4F0A-920A-58DBFF3C2274",
"timestamp": 1016086888000,
"keys": [
{
"id": "gr",
"value": "1907971"
}
],
"biographics": [
{
"id": "localNascimento",
"value": "JOINVILLE SC"
},
{
"id": "dataNascimento",
"value": "1859-03-08"
},
{
"id": "mae",
"value": "ANTA MARCIA PINHEAD"
},
{
"id": "nome",
"value": "MIR PINHEAD"
}
],
"biometric": [
{
"source": "ORIGINAL",
"type": "FACE",
"format": "JPEG",
"properties": {
"width": 0,
"height": 0,
"resolution": 500,
"ratio": 0,
"matcherId": 0,
"extractorId": 0
},
"index": 10,
"content": "5215421547"
}
],
"labels": [
"SC",
"CIVIL",
"MASCULINO",
"JOINVILLE"
],
"history": {
"events": [
{
"type": "ENROLL",
"tguid": "3C1B0D1F-9143-4C24-A351-E88A19317AC9",
"timestamp": 1014086658288
},
{
"type": "UPDATE",
"tguid": "EA98B161-04D3-4F0A-920A-58DBFF3C2274",
"timestamp": 1016786888028
}
]
}
}
I want to retrive only de tguid in history array, and if exist a way to do this, use de index of the array to acomplish that.
Here I tryed to acomplish that ( and miserable failed in that )
example ( and it do not work ):
jq '.[].history.events.tguid[1]' /tmp/teste.json
I want to retrieve the pguid in a index to work with that.
Someone have any ideas?

try this
jq '.history.events | .[1].tguid' /tmp/teste.json

tnks everyone
jq '.[].history.events | .[0].tguid' /tmp/teste1.json

Iterate over array and output TSV report

I have file with 30, 000 JSON lines delimited by new line. I am using JQ to process it.
Below is each line schema (new.json).
{
"indexed": {
"date-parts": [
[
2020,
8,
13
]
],
"date-time": "2020-08-13T06:27:26Z",
"timestamp": 1597300046660
},
"reference-count": 42,
"publisher": "American Chemical Society (ACS)",
"issue": "3",
"content-domain": {
"domain": [],
"crossmark-restriction": false
},
"short-container-title": [
"Org. Lett."
],
"published-print": {
"date-parts": [
[
2005,
2
]
]
},
"DOI": "10.1021/ol047829t",
"type": "journal-article",
"created": {
"date-parts": [
[
2005,
1,
27
]
],
"date-time": "2005-01-27T05:53:29Z",
"timestamp": 1106805209000
},
"page": "383-386",
"source": "Crossref",
"is-referenced-by-count": 38,
"title": [
"Liquid-Crystalline [60]Fullerene-TTF Dyads"
],
"prefix": "10.1021",
"volume": "7",
"author": [
{
"given": "Emmanuel",
"family": "Allard",
"affiliation": []
},
{
"given": "Frédéric",
"family": "Oswald",
"affiliation": []
},
{
"given": "Bertrand",
"family": "Donnio",
"affiliation": []
},
{
"given": "Daniel",
"family": "Guillon",
"affiliation": []
}
],
"member": "316",
"container-title": [
"Organic Letters"
],
"original-title": [],
"link": [
{
"URL": "https://pubs.acs.org/doi/pdf/10.1021/ol047829t",
"content-type": "unspecified",
"content-version": "vor",
"intended-application": "similarity-checking"
}
],
"deposited": {
"date-parts": [
[
2020,
4,
7
]
],
"date-time": "2020-04-07T13:39:55Z",
"timestamp": 1586266795000
},
"score": null,
"subtitle": [],
"short-title": [],
"issued": {
"date-parts": [
[
2005,
2
]
]
},
"references-count": 42,
"alternative-id": [
"10.1021/ol047829t"
],
"URL": "http://dx.doi.org/10.1021/ol047829t",
"relation": {},
"ISSN": [
"1523-7060",
"1523-7052"
],
"issn-type": [
{
"value": "1523-7060",
"type": "print"
},
{
"value": "1523-7052",
"type": "electronic"
}
],
"subject": [
"Physical and Theoretical Chemistry",
"Organic Chemistry",
"Biochemistry"
]
}
For every DOI, I need to obtain the values of given and family key in the same cell of the same row of that DOI in the CSV/TSV format.
The expected output for the above json is (in CSV/TSV format):
|DOI| givenName|familyName|
|10.1021/ol047829t|Emmanuel; Frédéric; Bertrand; Daniel;|Allard; Oswald; Donnio; Guillon|
I am using the below command line but it is throwing error and when I try to alter I am unable to get CSV/TSV output at all.
cat new.json | jq -r "[.DOI, .publisher, .author[] | .given] | #tsv" > manage.tsv
The same logic applies for subject key also. I am using the below command line to output values of subject key to CSV but it is throwing only the first element (in this case only: "Physical and Theoretical Chemistry")
cat new.json | jq -c -r "[.DOI, .publisher, .subject[0]] | #csv" > manage.csv
Any pointers for right jq command line will be of great help.

Join given and family names by semicolons separately, then pass resulting strings as fields to the TSV filter.
["DOI", "givenName", "familyName"],
(inputs | [.DOI, (.author | map(.given), map(.family) | join("; "))])
| #tsv
Online demo
Note that you need to invoke JQ with -r and -n flags for this to work and produce a valid TSV output.

How to print nested JSON array data in a tabular format?

I want to read the status of clusters and servers inside it.
Below is the sample json file
"data": [{
"id": 7865,
"timeCreated": 1602589399294,
"timeUpdated": 1602748892149,
"name": "gw-ext-1",
"type": "CLUSTER",
"status": "RUNNING",
"multicastEnabled": false,
"primaryNodeId": 546,
"servers": [{
"id": 768,
"timeCreated": 1602589028419,
"timeUpdated": 1602747941321,
"name": "gw-jpg208765-1",
"type": "SERVER",
"serverType": "GATEWAY",
"status": "RUNNING",
"addresses": [{
"networkInterface": "eng123"
},
{
"networkInterface": "eng124"
}],
"clusterId": 098,
"clusterName": "gw-ext-1",
"currentClusteringPort": 897,
"runtimeInformation": {
"Information": {
"runtime": {
"name": "abctech",
"version": "1.6.8"
},
"specification": {
"vendor": "rrr",
"name": "rrrt",
"version": "1.8.89"
}
},
"osInformation": {
"name": "LX",
"version": "35",
"architecture": "klh"
},
"mExpirationDate": 098765589283662
}
},
{
"id": 876,
"timeCreated": 1602589007370,
"timeUpdated": 1602748894901,
"name": "gw-jpg208765-2",
"type": "SERVER",
"serverType": "GATEWAY",
"mVersion": "3.9.1",
"gaVersion": "3.9.1",
"agentVersion": "1.9.5",
"ExpirationDate": 32521996800000,
"ExpirationDate": 1665661007000,
"status": "DISCONNECTED",
"addresses": [{
"networkInterface": "engg"
},
{
"networkInterface": "engg"
}],
"clusterId": 768,
"clusterName": "gw-ext-1",
"serverPort": 987,
"currentClusteringPort": 987,
"runtimeInformation": {
"abcInfo": {
"runtime": {
"name": "abc",
"version": "1.2.3"
},
"specification": {
"vendor": "RRR",
"name": "RTR",
"version": "1.8.0"
}
},
"osInformation": {
"name": "LX",
"version": "4.78",
"architecture": "eng"
},
"ExpirationDate": 8765478999765
}
}],
"visibilityMap": {
"mapNodes": [{
"serverId": 765,
"visibleNodeIds": [765,
876],
"unknownNodeIps": []
},
{
"serverId": 876,
"visibleNodeIds": [765,
876],
"unknownNodeIps": []
}]
}
},
{
"id": 7865,
"timeCreated": 1602589399294,
"timeUpdated": 1602748892149,
"name": "gw-ext-2",
"type": "CLUSTER",
"status": "RUNNING",
"multicastEnabled": false,
"primaryNodeId": 546,
"servers": [{
"id": 768,
"timeCreated": 1602589028419,
"timeUpdated": 1602747941321,
"name": "gw-jpg208766-1",
"type": "SERVER",
"serverType": "GATEWAY",
"status": "RUNNING",
"addresses": [{
"networkInterface": "eng123"
},
{
"networkInterface": "eng124"
}],
"clusterId": 098,
"clusterName": "gw-ext-2",
"currentClusteringPort": 897,
"runtimeInformation": {
"Information": {
"runtime": {
"name": "abctech",
"version": "1.6.8"
},
"specification": {
"vendor": "rrr",
"name": "rrrt",
"version": "1.8.89"
}
},
"osInformation": {
"name": "LX",
"version": "35",
"architecture": "klh"
},
"mExpirationDate": 098765589283662
}
},
{
"id": 876,
"timeCreated": 1602589007370,
"timeUpdated": 1602748894901,
"name": "gw-jpg208766-2",
"type": "SERVER",
"serverType": "GATEWAY",
"mVersion": "3.9.1",
"gaVersion": "3.9.1",
"agentVersion": "1.9.5",
"ExpirationDate": 32521996800000,
"ExpirationDate": 1665661007000,
"status": "DISCONNECTED",
"addresses": [{
"networkInterface": "engg"
},
{
"networkInterface": "engg"
}],
"clusterId": 768,
"clusterName": "gw-ext-2",
"serverPort": 987,
"currentClusteringPort": 987,
"runtimeInformation": {
"abcInfo": {
"runtime": {
"name": "abc",
"version": "1.2.3"
},
"specification": {
"vendor": "RRR",
"name": "RTR",
"version": "1.8.0"
}
},
"osInformation": {
"name": "LX",
"version": "4.78",
"architecture": "eng"
},
"ExpirationDate": 8765478999765
}
}],
"visibilityMap": {
"mapNodes": [{
"serverId": 765,
"visibleNodeIds": [765,
876],
"unknownNodeIps": []
},
{
"serverId": 876,
"visibleNodeIds": [765,
876],
"unknownNodeIps": []
}]
}
}]
So in each cluster we have two servers and this json continues to have around 15 clusters.
I want to filter out the status of each cluster and server in below format
name cluster/server status
gw-ext-1 CLUSTER RUNNING
gw-jpg208765-1 SERVER RUNNING
gw-jpg208765-2 SERVER DISCONNECTED
similarly for other clusters also.
I tried few things but its not giving me the servers .. it gives only cluster's details
target_id=echo \$targetIdResponse | ${env.WORKSPACE}/jq -r '.data[] | [.name, .type, .status]'
OR
target_id=echo \$targetIdResponse | ${env.WORKSPACE}/jq -r '.data[] | [.name, .type, .status, .servers.name, .servers.type, .servers.status]'
where $targetIdResponse contains my json data
I want to know how i can filter the above json to get the required data.

You need to have the header array the required fields in a separate array and put them together in a tabular format using #tsv
jq -r '[ "name", "cluster/server", "status" ],
( .data[] | [.name, .type, .status] ),
( .data[].servers[] | [ .name, .type, .status ] ) | #tsv'
The requirement was modified since originally posted to have the server information exactly below the cluster information
jq -r '[ "name", "cluster/server", "status" ],
( .data[] | [.name, .type, .status], ( .servers[] | [.name, .type, .status] ) ) | #tsv'

Mac Terminal: Delete Entire Column From Json File

How can I use mac terminal to delete an entire column from a json File.
The Json structure is as follows:
[{
"recordid": "6a0a9c66f8e0292a54c9f023c93732f1b41d8943",
"fields": {
"city": "Cove",
"zip": "71937",
"dst": 1,
"geopoint": [
34.398483,
-94.39398
],
"longitude": -94.39398,
"state": "AR",
"latitude": 34.398483,
"timezone": -6
},
"geometry": {
"type": "Point",
"coordinates": [
-94.39398,
34.398483
]
},
"record_timestamp": "2018-02-09T09:33:38.603-07:00"
},
{
"recordid": "37e2c801aafc7befde9734bcb1b1f83a5645ad0f",
"fields": {
"city": "Edgemont",
"zip": "72044",
"dst": 1,
"geopoint": [
35.624351,
-92.16056
],
"longitude": -92.16056,
"state": "AR",
"latitude": 35.624351,
"timezone": -6
},
"geometry": {
"type": "Point",
"coordinates": [
-92.16056,
35.624351
]
},
"record_timestamp": "2018-02-09T09:33:38.603-07:00"
}]
Using Terminal, how can I remove both and all columns including geopoint and geometry attributes while saving the file with the rest which I would wanna keep.

Use jq, map trough the JSON, delete .geometry and .fields.geopoint;
jq 'map(del(.fields.geopoint, .geometry))'
Result;
[
{
"recordid": "6a0a9c66f8e0292a54c9f023c93732f1b41d8943",
"fields": {
"city": "Cove"
}
},
{
"recordid": "6a0a9c66f8e0292a54c9f023c93732f1b41d8342",
"fields": {
"city": "Edgemont"
}
}
]
cat json.json | jq 'map(del(.fields.geopoint, .geometry))' > new.json
mv new.json json.json # Overwrites original file
Try it online!
cat mainzip.json | jq 'map(del(.datasetid, .fields.city, .fields.dst, .fields.geopoint, .fields.state, .fields.timezone, .type.point, .geometry, .record_timestamp))' > temporary_mainzip.json
mv temporary_mainzip.json mainzip.json

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How can I convert nested JSON to CSV on command line? - json

You just need some parentheses around the .records[] ... part jq -r ' ["name", "tijd", "totaalAantalPlaatsen", "bezettePlaatsen", "bezetting"], (.records[].fields | [.facilityname, .time, .totalplaces, .occupiedplaces, .bezetting]) | #csv ' file.json

Related

Cannot get jq to query json object [duplicate]

jq return a json array in a very specifique way

Iterate over array and output TSV report

How to print nested JSON array data in a tabular format?

Mac Terminal: Delete Entire Column From Json File

Categories

Resources