Iwas able to transform data but not getting the desired output
filter (which i tried):
[inputs | {author , totalpages : .pages , books : [{"title": .title, "year" : .year }] } ] | sort
Input :
{"title":"War of the worlds","author":"H G Wells","year":1896,"pages":203}
{"title":"The invisible man","author":"H G Wells","year":1895,"pages":2136}
{"title":"The Lost World","author":"A C Doyle","year":1912,"pages":185}
{"title":"A Study in Scarlet","author":"A C Doyle","year":1887,"pages":251}
{"title":"20,000 leagues under the sea","author":"J Verne","year":1870,"pages":450}
output should be:
{
"author": "A C Doyle",
"totalpages": 436,
"books": [
{
"title": "The Lost World",
"year": 1912
},
{
"title": "A Study in Scarlet",
"year": 1887
}
]
}
{
"author": "H G Wells",
"totalpages": 2339,
"books": [
{
"title": "War of the worlds",
"year": 1896
},
{
"title": "The invisible man",
"year": 1895
}
]
}
{
"author": "J Verne",
"totalpages": 450,
"books": [
{
"title": "20,000 leagues under the sea",
"year": 1870
}
]
}
Just a series of transformation is all is required, starting with group_by()
jq -n '
[ inputs ] |
group_by(.author) |
map
(
{
author: .[0].author,
totalpages: ( map(.pages) | add ),
books: ( map( { title, year } ) )
}
)'
jqplay - demo
Related
I have a deep embeded json file:
I want to extract and parse only the subset I am interested in , in my case all content in 'node' key.
How can I:
extract subset of this json file which contains "edges[].node" (edges is the 'parent' key of node)
in 'node' session , I am interested in key:value pair of
.url,
.headline.default, (*this one is 'grandchild' of key 'node'*)
.firstPublished
I want to keep only above 3 item inside 'node' key
How can I print out the super slim version of json file I need ?
a better to have option is : can I still keep the structure/full path which leads json root key to embed 'node' json subset I am interested in ?
Here is the jqplay-myjson (full content of my json file)
Try to attach my full content here :
{
"data": {
"legacyCollection": {
"longDescription": "The latest news, analysis and investigations from Europe.",
"section": {
"name": "world",
"url": "/section/world"
},
"collectionsPage": {
"stream": {
"pageInfo": {
"hasNextPage": true,
"__typename": "PageInfo"
},
"__typename": "AssetsConnection",
"edges": [
{
"node": {
"url": "https://www.nytimes.com/video/world/europe/100000008323381/icc-war-crimes-ukraine.html",
"firstPublished": "2022-04-27T23:28:33.241Z",
"headline": {
"default": "I.C.C. Joins Investigation of War Crimes in Ukraine",
"__typename": "CreativeWorkHeadline"
},
"summary": "Karim Khan, the chief prosecutor of the International Criminal Court, said that his organization would participate in a joint effort — with Ukraine, Poland and Lithuania — to investigate war crimes committed since Russia’s invasion.",
"promotionalMedia": {
"__typename": "Image",
"id": "SW1hZ2U6bnl0Oi8vaW1hZ2UvYTY3MTVhNDUtZDE0NS01OWZjLThkZWItNzYxMWViN2UyODhk"
},
"embedded": false
},
"__typename": "AssetsEdge"
},
{
"node": {
"__typename": "Article",
"url": "https://www.nytimes.com/2022/04/27/sports/soccer/chelsea-sale-roman-abramovich.html",
"firstPublished": "2022-04-27T19:42:17.000Z",
"typeOfMaterials": [
"News"
],
"archiveProperties": {
"lede": "",
"__typename": "ArticleArchiveProperties"
},
"headline": {
"default": "Endgame Nears in Bidding for Chelsea F.C.",
"__typename": "CreativeWorkHeadline"
},
"summary": "The American bank selling the English soccer team on behalf of its Russian owner could name its preferred suitor by the end of the week. But the drama isn’t over.",
"translations": []
},
"__typename": "AssetsEdge"
}
],
"totalCount": 52559
}
},
"sourceId": "100000004047788",
"tagline": "",
"__typename": "LegacyCollection"
}
}
}
Here is the command I have jqplay Demo:
.data.legacyCollection.collectionsPage.stream.edges[].node|= with_entries(select([.key]|inside(["default","url","firstPublished"]))
And here is the output I got
{
"data": {
"legacyCollection": {
"longDescription": "The latest news, analysis and investigations from Europe.",
"section": {
"name": "world",
"url": "/section/world"
},
"collectionsPage": {
"stream": {
"pageInfo": {
"hasNextPage": true,
"__typename": "PageInfo"
},
"__typename": "AssetsConnection",
"edges": [
{
"node": {
"url": "https://www.nytimes.com/video/world/europe/100000008323381/icc-war-crimes-ukraine.html",
"firstPublished": "2022-04-27T23:28:33.241Z"
},
"__typename": "AssetsEdge"
},
{
"node": {
"url": "https://www.nytimes.com/2022/04/27/sports/soccer/chelsea-sale-roman-abramovich.html",
"firstPublished": "2022-04-27T19:42:17.000Z"
},
"__typename": "AssetsEdge"
}
],
"totalCount": 52559
}
},
"sourceId": "100000004047788",
"tagline": "",
"__typename": "LegacyCollection"
}
}
}
Here is the output I expect to have
{
"data": {
"legacyCollection": {
"collectionsPage": {
"stream": {
"edges": [
{
"node": {
"url": "https://www.nytimes.com/video/world/europe/100000008323381/icc-war-crimes-ukraine.html",
"firstPublished": "2022-04-27T23:28:33.241Z"
}
},
{
"node": {
"url": "https://www.nytimes.com/2022/04/27/sports/soccer/chelsea-sale-roman-abramovich.html",
"firstPublished": "2022-04-27T19:42:17.000Z"
}
}
]
}
}
}
}
}
Here's a (somewhat) declarative solution:
(.data.legacyCollection.collectionsPage.stream.edges
| map( {node: (.node
| {url,
firstPublished,
headline: {default: .headline.default} })})) as $edges
| {data: {
legacyCollection: {
collectionsPage: {
stream: {
$edges
}
}
}
}
}
Here's one way to make the selection while ensuring that the structure is preserved. This solution may be of interest because
it can easily be adapted for use with jq's "--stream" option.
def array_startswith($head): .[: $head|length] == $head;
. as $in
| ["data", "legacyCollection", "collectionsPage", "stream", "edges"] as $head
| ($head|length) as $len
| reduce (paths
| select( array_startswith($head) and .[1+$len] == "node" )) as $p
(null;
if ((($p|length) == $len + 3) and ($p[-1] | IN("url", "firstPublished")))
or ((($p|length) == $len + 4) and $p[-2:] == ["headline", "default"])
then setpath($p; $in | getpath($p))
else .
end)
This is how my input looks:
{
"text" : "Some text here"
}
{
"usage": {
"text_units": 1,
"text_characters": 101,
"features": 1
},
"language": "en",
"categories": [
{
"score": 0.655041,
"label": "/technology law, govt and politics/espionage and intelligence/surveillance"
},
{
"score": 0.639809,
"label": "/technology and computing/computer security/network security"
},
{
"score": 0.624533,
"label": "/business and industrial/business operations"
}
]
}
Using JQ, if the first element of array category in the second object contains /technology, I want to add a new field named relevant with 1 as value (which I managed), and copy the text field from the first object.
So, the expected output is:
{
"usage": {
"text_units": 1,
"text_characters": 101,
"features": 1
},
"language": "en",
"categories": [
{
"score": 0.655041,
"label": "/technology law, govt and politics/espionage and intelligence/surveillance"
},
{
"score": 0.639809,
"label": "/technology and computing/computer security/network security"
},
{
"score": 0.624533,
"label": "/business and industrial/business operations"
}
],
"relevant": 1,
"text": "Some text here"
}
And this is what I have done so far:
if .categories[0].label | test("/technology"; "i") then . |=( . + {"relevant": 1} + {"text": .text}) else . |= . + {"relevant": 0} end
Link to a demo on jqplay
Your input consists of two separate objects. In order to be able to access the first while processing the second, you could save the first into a variable.
. as {$text} | input | if .categories[0].label | test("/technology"; "i") then . + {relevant: 1, $text} else . + {relevant: 0} end
Online demo
i'm trying to convert the inspection_date field from string to date for every object inside my db.
Every object is built like this one.
"name": "$1 STORE",
"address": "5573 ROSEMEAD BLVD",
"city": "TEMPLE CITY",
"zipcode": "91780",
"state": "California",
"violations": [{
"inspection_date": "2015-09-29",
"description": " points ... violation_status\n62754 1 ... OUT OF COMPLIANCE\n62755 1 ... OUT OF COMPLIANCE\n62756 2 ... OUT OF COMPLIANCE\n\n[3 rows x 5 columns]",
"risk": "Risk 3 (Low)"
}, {
"inspection_date": "2016-08-18",
"description": " points ... violation_status\n338879 2 ... OUT OF COMPLIANCE\n\n[1 rows x 5 columns]",
"risk": "Risk 3 (Low)"
} //could be more than 2 or less then 2 object inside violations array//]}
How can i convert all of the inspection_date field avoiding doing it by myself one by one?
As suggested by #turivishal, you have to have to make use of $map and $dateFromString operators.
db.collection.aggregate([
{
"$addFields": {
"violations": {
"$map": {
"input": "$violations",
"in": {
"$mergeObjects": [
"$$this",
{
"inspection_date": {
"$dateFromString": {
"dateString": "$$this.inspection_date",
"format": "%Y-%m-%d",
"onError": null,
"onNull": null
}
}
}
],
},
}
}
}
},
])
Mongo Playground Sample Execution
I have two files and I would need to merge the elements of the second file into an object array in the first file based on searching the reference field.
The first file:
[
{
"reference": 25422,
"order_number": "10_1",
"details" : []
},
{
"reference": 25423,
"order_number": "10_2",
"details" : []
}
]
The second file:
[
{
"record_id" : 1,
"reference": 25422,
"row_description": "descr_1_0"
},
{
"record_id" : 2,
"reference": 25422,
"row_description": "descr_1_1"
},
{
"record_id" : 3,
"reference": 25423,
"row_description": "descr_2_0"
}
]
I would like to get:
[
{
"reference": 25422,
"order_number": "10_1",
"details" : [
{
"record_id" : 1,
"reference": 25422,
"row_description": "descr_1_0"
},
{
"record_id" : 2,
"reference": 25422,
"row_description": "descr_1_1"
}
]
},
{
"reference": 25423,
"order_number": "10_2",
"details" :[
{
"record_id" : 3,
"reference": 25423,
"row_description": "descr_2_0"
}
]
}
]
Below is my code in es_func.jq file launched by this command:
jq -n --argfile f1 es_file1.json --argfile f2 es_file2.json -f es_func.jq
INDEX($f2[] ; .reference) as $details
| $f1
| map( ($details[.reference|tostring]| .row_description) as $vn
| if $vn then .details = [{"row_description" : $vn}] else . end)
I get the result only for the last record in 25422 reference with "row description": "descr_1_1" and not have "row_description": "descr_1_0"
[
{
"reference": 25422,
"order_number": "10_1",
"details": [
{
"row_description": "descr_1_1"
}
]
},
{
"reference": 25423,
"order_number": "10_2",
"details": [
{
"row_description": "descr_2_0"
}
]
}
]
I think I'm close to the solution but something is still missing. Thank you
This would be way easier if you used reduce instead.
jq 'reduce inputs[] as $rec (INDEX(.reference);
.[$rec.reference | tostring].details += [$rec]
) | map(.)' es_file1.json es_file2.json
Online demo
Here's a straightforward, reduce-free solution:
jq '
group_by(.reference)
| INDEX(.[]; .[0]|.reference|tostring) as $dict
| input
| map_values(. + {details: $dict[.reference|tostring]})
' 2.json 1.json
Given the input of sizes:
[
{
"stock": 1,
"sales": 0,
"sizes": [
{
"countries": ["at", "be", "ch", "cy", "de", "ee", "es", "fi", "gr", "ie", "lu", "lv", "nl", "pl", "pt", "se", "si", "sk"],
"size": "EU 45,5"
},
{
"countries": ["it"],
"size": "EU 45,5"
},
{
"countries": ["fr"],
"size": "EU 45,5"
},
{
"countries": ["gb"],
"size": "EU 45,5"
}
]
}
]
I will like to get the same structure without the ones that countries hasn't "de" (Germany) and remove the field complete. Expected something like this:
[
{
"stock": 1,
"sizes": [
{
"size": "EU 45,5"
}
]
}
]
I tried this:
map(.sizes[] |= select(.countries | join(",") | contains("de"))) | map({ stock, sizes })
But the filter is not working properly, throwing jq: error (at <stdin>:48): Cannot iterate over null (null).
Tried has, in, contains, inside and nothing seems to work.
Also, how can I filter which field appears? With map({ stock, sizes }) countries still there. Can I do something like map({ stock, sizes: { size } })?
Here's a one-liner that answers your main question -- if you can't see how it works, try breaking it up into separate pieces:
map( .sizes |= map( select(.countries | index("de") ) | del(.countries) ))
Regarding the selection of fields, you can use del/1 as above, or sometimes simply using an expression such as {key1, key2} will do the trick. Consider also this function and the following example:
def query(queryobject):
with_entries( select( .key as $key | queryobject | has( $key ) ));
Example:
$ jq -c -n '{"a": 1, "b": null, "c":3} | query( {a,b,d} )'
{"a":1,"b":null}