Leveling select fields - json

I am fetching a json response of following structure:
{
"data": {
"children": [
{
"data": {
"id": "abcdef",
"preview": {
"images": [
{
"source": {
"url": "https://example.com/somefiles_1.jpg"
}
}
]
},
"title": "Boring Title One"
}
},
{
"data": {
"id": "ghijkl",
"preview": {
"images": [
{
"source": {
"url": "https://example.com/somefiles_2.jpg"
}
}
]
},
"title": "Boring Title Two"
}
},
{
"data": {
"id": "mnopqr",
"preview": {
"images": [
{
"source": {
"url": "https://example.com/somefiles_3.jpg"
}
}
]
},
"title": "Boring Title Three"
}
},
{
"data": {
"id": "stuvwx",
"preview": {
"images": [
{
"source": {
"url": "https://example.com/somefiles_4.jpg"
}
}
]
},
"title": "Boring Title Four"
}
}
]
}
}
Ideally I would like to have a shortened json like this:
{
"data": [
{
"id": "abcdef",
"title": "Boring Title One",
"url": "https://example.com/somefiles_1.jpg"
},
{
"id": "ghijkl",
"title": "Boring Title Two",
"url": "https://example.com/somefiles_2.jpg"
},
{
"id": "mnopqr",
"title": "Boring Title Three",
"url": "https://example.com/somefiles_3.jpg"
},
{
"id": "stuvwx",
"title": "Boring Title Four",
"url": "https://example.com/somefiles_4.jpg"
}
]
}
If this is not possible I can work with joining those three values into a single string and latter split when necessary; like this:
abcdef#Boring Title One#https://example.com/somefiles_1.jpg
ghijkl#Boring Title Two#https://example.com/somefiles_2.jpg
mnopqr#Boring Title Three#https://example.com/somefiles_3.jpg
stuvwx#Boring Title Four#https://example.com/somefiles_4.jpg
This is where I am. I was uring the jq with select() and then pipe the results to to_entries like this:
jq -r '.data.children[] | select(.data.post_type|test("image")?) | .data | to_entries[] | [ .value.title , .value.preview.images[0].source.url ] | join("#")' ~/Documents/json/sample.json
I don't understand what goes after to_entries[]; I have tried multiple variations of .key and .values; Mostly I don't get any result but sometimes I get key pairs I do not intend to select. How to learn the proper syntax for it?
Is creating a flat json out of a nested json like this good or is it better to create the string outputs? I feel the string might be error prone especially with the presence of spaces or special characters.

Apparently what you're looking for is the {field} syntax. You don't need to resort to string outputs.
{ data: [
.data.children[].data
| select(has("post_type") and (.post_type | index("image")))
| {id, title} + (.preview.images[].source | {url})
# or, if images array always contains one element:
# | {id, title, url: .preview.images[0].source.url}
]
}

A simple solution to the main question is:
{data: [.data.children[]
| .data
| {id, title, url: .preview.images[0].source.url} ]}
(The "post_type" seems to have disappeared, but hopefully if it's relevant, you will be able to adapt the above as required. Likewise if .images[1] and beyond are relevant.)
String Output
If you want linear output, you should probably consider CSV or TSV, both of which are supported by jq via #csv and #tsv respectively.

Related

How to search within sections of a JSON File?S

So, lets say I had a JSON File like this:
{
"content": [
{
"word": "cat",
"adjectives": [
{
"type": "textile",
"adjective": "fluffy"
},
{
"type": "visual",
"adjective": "small"
}
]
},
{
"word": "dog",
"adjectives": [
{
"type": "textile",
"adjective": "fluffy"
},
{
"type": "visual",
"adjective": "big"
}
]
},
{
"word": "chocolate",
"adjectives": [
{
"type": "visual",
"adjective": "small"
},
{
"type": "gustatory",
"adjective": "sweet"
}
]
}
]
}
Now, say I wanted to search for two words. For example, "Fluffy" and "Small." The problem with this is that both words' adjectives contain small, and so I would have to manually search for which one contains fluffy. So, how would I do this in a quicker manner?
In other words, how would I find the word(s) with both "fluffy" and "small"
EDIT: Sorry, new asker. Anything that words in a terminal is fair game. jq is a really great JSON searcher, and so this is preferred, and sorry for the confusion. I also fixed the JSON
A command-line solution would be to use jq:
jq -r '.content[] | select(.adjectives[].adjective == "fluffy") | .word' /pathToJsonFile.json
Output:
cat
Are you looking for something like this? Do you need a solution that uses other programming languages?
(P.S. your JSON example appears to be invalid)
Since jq is now fair game (this was only clarified later in the comments), here is one solution using jq.
First, fix the JSON to be actually valid:
{
"content": [
{
"word": "cat",
"adjectives": [
{
"type": "textile",
"adjective": "fluffy"
},
{
"type": "visual",
"adjective": "small"
}
]
},
{
"word": "dog",
"adjectives": [
{
"type": "textile",
"adjective": "fluffy"
},
{
"type": "visual",
"adjective": "big"
}
]
},
{
"word": "chocolate",
"adjectives": [
{
"type": "visual",
"adjective": "small"
},
{
"type": "gustatory",
"adjective": "sweet"
}
]
}
]
}
Then, the following jq filter returns an array containing the words which contain both adjectives:
.content
| map(
select(
.adjectives as $adj
| all("small","fluffy"; IN($adj[].adjective))
)
| .word
)
If a non-array output is required, and only one word per line, use .[] instead of map (either after content or as a final filter), e.g.:
jq -r '.content[]
| select(
.adjectives as $adj
| all("small","fluffy"; IN($adj[].adjective))
)
| .word'

extract a subset of deep embed json and print only key,value pair I am interested in the subset json

I have a deep embeded json file:
I want to extract and parse only the subset I am interested in , in my case all content in 'node' key.
How can I:
extract subset of this json file which contains "edges[].node" (edges is the 'parent' key of node)
in 'node' session , I am interested in key:value pair of
.url,
.headline.default, (*this one is 'grandchild' of key 'node'*)
.firstPublished
I want to keep only above 3 item inside 'node' key
How can I print out the super slim version of json file I need ?
a better to have option is : can I still keep the structure/full path which leads json root key to embed 'node' json subset I am interested in ?
Here is the jqplay-myjson (full content of my json file)
Try to attach my full content here :
{
"data": {
"legacyCollection": {
"longDescription": "The latest news, analysis and investigations from Europe.",
"section": {
"name": "world",
"url": "/section/world"
},
"collectionsPage": {
"stream": {
"pageInfo": {
"hasNextPage": true,
"__typename": "PageInfo"
},
"__typename": "AssetsConnection",
"edges": [
{
"node": {
"url": "https://www.nytimes.com/video/world/europe/100000008323381/icc-war-crimes-ukraine.html",
"firstPublished": "2022-04-27T23:28:33.241Z",
"headline": {
"default": "I.C.C. Joins Investigation of War Crimes in Ukraine",
"__typename": "CreativeWorkHeadline"
},
"summary": "Karim Khan, the chief prosecutor of the International Criminal Court, said that his organization would participate in a joint effort — with Ukraine, Poland and Lithuania — to investigate war crimes committed since Russia’s invasion.",
"promotionalMedia": {
"__typename": "Image",
"id": "SW1hZ2U6bnl0Oi8vaW1hZ2UvYTY3MTVhNDUtZDE0NS01OWZjLThkZWItNzYxMWViN2UyODhk"
},
"embedded": false
},
"__typename": "AssetsEdge"
},
{
"node": {
"__typename": "Article",
"url": "https://www.nytimes.com/2022/04/27/sports/soccer/chelsea-sale-roman-abramovich.html",
"firstPublished": "2022-04-27T19:42:17.000Z",
"typeOfMaterials": [
"News"
],
"archiveProperties": {
"lede": "",
"__typename": "ArticleArchiveProperties"
},
"headline": {
"default": "Endgame Nears in Bidding for Chelsea F.C.",
"__typename": "CreativeWorkHeadline"
},
"summary": "The American bank selling the English soccer team on behalf of its Russian owner could name its preferred suitor by the end of the week. But the drama isn’t over.",
"translations": []
},
"__typename": "AssetsEdge"
}
],
"totalCount": 52559
}
},
"sourceId": "100000004047788",
"tagline": "",
"__typename": "LegacyCollection"
}
}
}
Here is the command I have jqplay Demo:
.data.legacyCollection.collectionsPage.stream.edges[].node|= with_entries(select([.key]|inside(["default","url","firstPublished"]))
And here is the output I got
{
"data": {
"legacyCollection": {
"longDescription": "The latest news, analysis and investigations from Europe.",
"section": {
"name": "world",
"url": "/section/world"
},
"collectionsPage": {
"stream": {
"pageInfo": {
"hasNextPage": true,
"__typename": "PageInfo"
},
"__typename": "AssetsConnection",
"edges": [
{
"node": {
"url": "https://www.nytimes.com/video/world/europe/100000008323381/icc-war-crimes-ukraine.html",
"firstPublished": "2022-04-27T23:28:33.241Z"
},
"__typename": "AssetsEdge"
},
{
"node": {
"url": "https://www.nytimes.com/2022/04/27/sports/soccer/chelsea-sale-roman-abramovich.html",
"firstPublished": "2022-04-27T19:42:17.000Z"
},
"__typename": "AssetsEdge"
}
],
"totalCount": 52559
}
},
"sourceId": "100000004047788",
"tagline": "",
"__typename": "LegacyCollection"
}
}
}
Here is the output I expect to have
{
"data": {
"legacyCollection": {
"collectionsPage": {
"stream": {
"edges": [
{
"node": {
"url": "https://www.nytimes.com/video/world/europe/100000008323381/icc-war-crimes-ukraine.html",
"firstPublished": "2022-04-27T23:28:33.241Z"
}
},
{
"node": {
"url": "https://www.nytimes.com/2022/04/27/sports/soccer/chelsea-sale-roman-abramovich.html",
"firstPublished": "2022-04-27T19:42:17.000Z"
}
}
]
}
}
}
}
}
Here's a (somewhat) declarative solution:
(.data.legacyCollection.collectionsPage.stream.edges
| map( {node: (.node
| {url,
firstPublished,
headline: {default: .headline.default} })})) as $edges
| {data: {
legacyCollection: {
collectionsPage: {
stream: {
$edges
}
}
}
}
}
Here's one way to make the selection while ensuring that the structure is preserved. This solution may be of interest because
it can easily be adapted for use with jq's "--stream" option.
def array_startswith($head): .[: $head|length] == $head;
. as $in
| ["data", "legacyCollection", "collectionsPage", "stream", "edges"] as $head
| ($head|length) as $len
| reduce (paths
| select( array_startswith($head) and .[1+$len] == "node" )) as $p
(null;
if ((($p|length) == $len + 3) and ($p[-1] | IN("url", "firstPublished")))
or ((($p|length) == $len + 4) and $p[-2:] == ["headline", "default"])
then setpath($p; $in | getpath($p))
else .
end)

Using jq to fetch and show key value with quotes

I have a file that looks as below:
{
"Job": {
"Name": "sample_job",
"Description": "",
"Role": "arn:aws:iam::00000000000:role/sample_role",
"CreatedOn": "2021-10-21T23:35:23.660000-03:00",
"LastModifiedOn": "2021-10-21T23:45:41.771000-03:00",
"ExecutionProperty": {
"MaxConcurrentRuns": 1
},
"Command": {
"Name": "glueetl",
"ScriptLocation": "s3://aws-sample-s3/scripts/sample.py",
"PythonVersion": "3"
},
"DefaultArguments": {
"--TempDir": "s3://aws-sample-s3/temporary/",
"--class": "GlueApp",
"--enable-continuous-cloudwatch-log": "true",
"--enable-glue-datacatalog": "true",
"--enable-metrics": "true",
"--enable-spark-ui": "true",
"--job-bookmark-option": "job-bookmark-enable",
"--job-insights-byo-rules": "",
"--job-language": "python",
"--spark-event-logs-path": "s3://aws-sample-s3/logs"
},
"MaxRetries": 0,
"AllocatedCapacity": 100,
"Timeout": 2880,
"MaxCapacity": 100.0,
"WorkerType": "G.1X",
"NumberOfWorkers": 100,
"GlueVersion": "2.0"
}
}
I want to get key/value from "Name", "--enable-continuous-cloudwatch-log": "" and "--enable-metrics": "". So, I need to show the info like this:
"Name" "sample_job"
"--enable-continuous-cloudwatch-log" ""
"--enable-metrics" ""
UPDATE
Follow the tips from #Inian and #0stone0 I came close to it:
jq -r '(.Job ) + (.Job.DefaultArguments | { "--enable-continuous-cloudwatch-log", "--enable-metrics"}) | to_entries[] | "\"\(.key)\" \"\(.value)\""'
This extract the values I need but show all another key/values.
Since you're JSON isn't valid, I've converted it into:
{
"Job": {
"Name": "sample_job",
"Role": "sample_role_job"
},
"DefaultArguments": {
"--enable-continuous-cloudwatch-log": "test_1",
"--enable-metrics": ""
},
"Timeout": 2880,
"NumberOfWorkers": 10
}
Using the following filter:
"Name \(.Job.Name)\n--enable-continuous-cloudwatch-log \(.DefaultArguments."--enable-continuous-cloudwatch-log")\n--enable-metrics \(.DefaultArguments."--enable-metrics")"
We use string interpolation to show the desired output:
Name sample_job
--enable-continuous-cloudwatch-log test_1
--enable-metrics
jq --raw-output '"Name \(.Job.Name)\n--enable-continuous-cloudwatch-log \(.DefaultArguments."--enable-continuous-cloudwatch-log")\n--enable-metrics \(.DefaultArguments."--enable-metrics")"'
Online Demo

Change subelement with jq

I have a structure that looks like so
[
[
{
"ID": "grp1-001",
},
{
"ID": "grp1-002",
},
{
"ID": "grp1-003",
},
{
"ID": "grp1-004",
},
{
"ID": "grp1-005",
},
{
"ID": "grp1-006",
}
],
[
{
"ID": "grp2-001",
},
{
"ID": "grp2-002",
},
{
"ID": "grp2-003",
},
{
"ID": "grp2-004",
},
{
"ID": "grp2-005",
},
{
"ID": "grp2-006",
}
.......
what I need to get as a result of the modification is this
[
[
["1", "grp1-001"],
["2", "grp1-002"],
["3", "grp1-003"],
["4", "grp1-004"],
["5", "grp1-005"],
["6", "grp1-006"],
],
[
["1", "grp2-001"],
["2", "grp2-002"],
["3", "grp2-003"],
["4", "grp2-004"],
["5", "grp2-005"],
["6", "grp2-006"],
],
Which means I need to keep the external structure (outside array and an internal grouping) but convert the inner dict to an array and replace the "ID" key with a value (that will come from external source like --argjson). I am not even sure how to start - any ideas/resources are highly appreciated.
Assuming you're just taking the objects and transforming them to pairs of the index in the array and the ID value, you could do this:
map([to_entries[] | [.key + 1, .value.ID | tostring]])
https://jqplay.org/s/RBac7SPfdG
Using to_entries/0 on an array gives you an array of key/value (index/value) pairs. You could then shift the indices by 1 and convert to strings.

GraphQL Query returns null objects both in GraphiQL and App from JSON data source

I'm trying to get my mocked JSON data via GraphQL in Gatsby. The response shows the correct data, but also two null objects as well. Why is it happening?
I'm using the gatsby-transformer.json plugin to query my data and gatsby-source-filesystem to point the transformer to my json files.
categories.json
the mock file I'm trying to get to work :)
{
"categories": [
{
"title": "DEZERTY",
"path": "/dezerty",
"categoryItems": [
{
"categoryName": "CUKRIKY",
"image": "../../../../static/img/dessertcategories/cukriky.jpg"
},
{
"categoryName": "NAHODNE",
"image": "../../../../static/img/dessertcategories/nahodne.jpg"
},
]
},
{
"title": "CANDY BAR",
"path": "/candy-bar",
"categoryItems": [
{
"categoryName": "CHEESECAKY",
"image": "../../../../static/img/dessertcategories/cheesecaky.jpg"
},
{
"categoryName": "BEZLEPKOVÉ TORTY",
"image": "../../../../static/img/dessertcategories/bezlepkove-torty.jpg"
},
]
}
]
}
GraphQL query in GraphiQL
query Collections {
allMockJson {
edges {
node {
categories {
categoryItems {
categoryName
image
}
title
path
}
}
}
}
}
And the response GraphiQL gives me
{
"data": {
"allMockJson": {
"edges": [
{
"node": {
"categories": null
}
},
{
"node": {
"categories": null
}
},
{
"node": {
"categories": [
{
"categoryItems": [
{
"categoryName": "CHEESECAKY",
"image": "../../../../static/img/dessertcategories/cheesecaky.jpg"
},
{
"categoryName": "BEZLEPKOVÉ TORTY",
"image": "../../../../static/img/dessertcategories/bezlepkove-torty.jpg"
}
],
"title": "DEZERTY",
"path": "/dezerty"
},
{
"categoryItems": [
{
"categoryName": "CUKRIKY",
"image": "../../../../static/img/dessertcategories/CUKRIKY.jpg"
},
{
"categoryName": "NAHODNE",
"image": "../../../../static/img/dessertcategories/NAHODNE.jpg"
}
],
"title": "CANDY BAR",
"path": "/candy-bar"
}
]
}
}
]
}
}
}
I expected only to get the DEZERTY and CANDY BAR sections. Why are there null categories and how do I fix it?
Thanks in advance
Your JSON contains syntax errors in the objects DEZERTY and CANDY BAR. It silently fails without telling you. Try this json linter.
Error: Parse error on line 12: },
Error: Parse error on line 25: },
Try again. Your query should work now.
You should look into an IDE that highlights these types of errors and saves you time and frustration.