I have a curious problem sending concatenated files to JQ. I'm using Windows 10, but the problem is surfacing with Bash (via Git Bash) as well.
I have several input JSON files in the form input-1.json, input-2.json, input-3.json, etc. The contents of each is a nested object that looks like this:
{
"blah": "blah",
"foo": [{
"bar" {…}
}, {
"bar" {…}
}]
}
I want to concatenate the files and extract all the values of "bar" across all the files into one array. (Note that "foo" is an array of objects, each with "bar" containing its own object.) In other words, I want to end up with (placed on one line for readability here):
[{…}, {…}, {…}, {…}, {…}]
Note that these are the foo…bars across all the files!
I start with this:
type input-*.json | jq ".foo[].bar" > output.json
That gives me
{…}, {…}, {…}, {…}, {…}
Close! Now I just have to wrap them in […], right? So I do this:
type input-*.json | jq "[.foo[].bar]" > output.json
Uh, oh; it gives me the following:
[{…}, {…}, {…}], [{…}, {…}]
But why? At the point of .foo[].bar, JQ just sees a stream of objects, right? How does JQ "remember" that some of those objects came from different inputs?
Note that How to convert a JSON object stream into an array with jq and jq: output array of json objects seem to be similar questions. They say to use JQ's --slurp mode. But won't that prevent streaming, i.e. if the output of all the files is really huge, won't it this have the potential of running out of memory by loading the entire input into memory?
Besides (and this is the crux of my confusion), if jq ".foo[].bar" gives me exactly the {…}, {…}, {…}], [{…}, {…} array contents I want, how does JQ "remember" that some of those objects came from different inputs? You'll see that JQ isn't wrapping each {…} in an array, but actually wrapping several objects based upon which input-*.json file it came from. Why?
One way would be to "slurp" the files, like so:
jq -s 'map( .foo[].bar )' input-*.json
If you want to save memory, you could go with:
jq -n '[inputs | .foo[].bar]' input-*.json
If your shell does not support filename globbing, then you would have to specify the file names in some other way, e.g. as in the Q.
Related
Given an invalid JSON string such as: { foo: bar } is it possible to get jq to process and format correctly as { "foo": "bar" }
No, or at least not without complex programming, though jq can handle objects with unquoted key names, e.g. {foo: "bar"}. (Hint: read the quasi-JSON as a jq program.)
The jq FAQ, however, does have a section giving details about a number of command-line tools that can be recommended for this kind of task, e.g. any-json and hjson. That page provides links as well.
I am a rank beginner with jq, and I've been going through the tutorial, but I think there is a conceptual difference I don't understand. A common problem I encounter is that a large JSON file will contain many objects, each of which is quite big, and I'd like to view the first complete object, to see which fields exist, what types, how much nesting, etc.
In the tutorial, they do this:
# We can use jq to extract just the first commit.
$ curl 'https://api.github.com/repos/stedolan/jq/commits?per_page=5' | jq '.[0]'
Here is an example with one object - here, I'd like to return the whole array (just like my_array=['foo']; my_array[0] would return foo in Python).
wget https://hacker-news.firebaseio.com/v0/item/8863.json
I can access and pretty-print the whole thing with .
$ cat 8863.json | jq '.'
$
{
"by": "dhouston",
"descendants": 71,
"id": 8863,
"kids": [
9224,
...
8876
],
"score": 104,
"time": 1175714200,
"title": "My YC app: Dropbox - Throw away your USB drive",
"type": "story",
"url": "http://www.getdropbox.com/u/2/screencast.html"
}
But trying to get the first element fails:
$ cat 8863.json| jq '.[0]'
$ jq: error (at <stdin>:0): Cannot index object with number
I get the same error jq '.[0]' 8863.json, but strangely echo 8863.json | jq '.[0]' gives me parse error: Invalid numeric literal at line 2, column 0. What is the difference? Also, is this not the correct way to get the zeroth member of the JSON?
I've looked at other SO posts with this error message and at the manual, but I'm still confused. I think of the file as an array of JSON objects, and I'd like to get the first. But it looks like jq works with something called a "stream", and does operations on all of it (say, return one given field from every object).
Clarification:
Let's say I have 2 objects in my JSON:
{
"by": "pg",
"id": 160705,
"poll": 160704,
"score": 335,
"text": "Yes, ban them; I'm tired of seeing Valleywag stories on News.YC.",
"time": 1207886576,
"type": "pollopt"
}
{
"by": "dpapathanasiou",
"id": 16070,
"kids": [
16078
],
"parent": 16069,
"text": "Dividends don't mean that much: Microsoft in its dominant years (when they had 40%-plus margins and were raking in the cash) never paid a dividend (they did so only recently).",
"time": 1177355133,
"type": "comment"
}
How would I get the entire first object (lines 1-9) with jq?
Cannot index object with number
This error message says it all, you can't index objects with numbers. If you want to get the value of by field, you need to do
jq '.by' file
Wrt
echo 8863.json | jq '.[0]' gives me parse error: Invalid numeric literal at line 2, column 0.
It's normal since you didn't specify -R/--raw-input flag, and so jq sees the shell string 8863.json as a JSON string, and one cannot apply array indexing to JSON strings. (To get the first character as a string, you'd write .[0:1].)
If your input file consists of several separate entities, to get the first one:
jq -n 'input' file
or,
jq -n 'first(inputs)' file
To get nth (let's say 5th for example):
jq -n 'nth(5; inputs)' file
a large JSON file will contain many objects, each of which is quite big, and I'd like to view the first complete object, to see which fields exist, what types, how much nesting, etc.
As implied in #OguzIsmail's response, there are important differences between:
- a JSON file (i.e, a file containing exactly one JSON entity);
- a file containing a sequence (i.e., stream) of JSON entities;
- a file containing an array of JSON entities.
In the first two cases, you can write jq -n input to select the first entity, and in the case of an array of entities, jq .[0] will suffice.
(In JSON-speak, a "JSON object" is a kind of dictionary, and is not to be confused with JSON entities in general.)
If you have a bunch of JSON objects (whether as a stream or array or whatever), just looking at the first often doesn't really give an accurate picture of all them. For getting a bird's eye view of a bunch of objects, using a "schema inference engine" is often the way to go. For this purpose, you might like to consider my schema.jq schema inference engine. It's usually very simple to use but of course how you use it will depend on whether you have a stream or array of JSON entities. For basic details, see https://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed; for related topics (e.g. verification), see the entry for JESS at https://github.com/stedolan/jq/wiki/Modules
Please note that schema.jq infers a structural schema that mirrors the entities under consideration. Such structural schemas have little in common with JSON Schema schemas, which you might also like to consider.
I'm trying to use jq to get a value from the JSON that cURL returns.
This is the JSON cURL passes to jq (and, FTR, I want jq to return "VALUE-I-WANT" without the quotation marks):
[
{
"success":{
"username":"VALUE-I-WANT"
}
}
]
I initially tried this:
jq ' . | .success | .username'
and got
jq: error (at <stdin>:0): Cannot index array with string "success"
I then tried a bunch of variations, with no luck.
With a bunch of searching the web, I found this SE entry, and thought it might have been my saviour (spoiler, it wasn't). But it led me to try these:
jq -r '.[].success.username'
jq -r '.[].success'
They didn't return an error, they returned "null". Which may or may not be an improvement.
Can anybody tell me what I'm doing wrong here? And why it's wrong?
You need to pipe the output of .[] into the next filter.
jq -r '.[] | .success.username' tmp.json
tl;dr
# Extract .success.username from ALL array elements.
# .[] enumerates all array elements
# -r produces raw (unquoted) output
jq -r '.[].success.username' file.json
# Extract .success.username only from the 1st array element.
jq -r '.[0].success.username' file.json
Your input is an array, so in order to access its elements you need .[], the array/object-value iterator (as the name suggests, it can also enumerate the properties of an object):
Just . | sends the input (.) array as a whole through the pipeline, and an array only has numerical indices, so the attempt to index (access) it with .success.username fails.
Thus, simply replacing . | with .[] | in your original attempt, combined with -r to get raw (unquoted output), should solve your problem, as shown in chepner's helpful answer.
However, peak points out that since at least jq 1.3 (current as of this writing is jq 1.5) you don't strictly need a pipeline, as demonstrated in the commands at the top.
So the 2nd command in your question should work with your sample input, unless you're using an older version.
I have a JSON-file which consists of multiple JSON-"elements", e.g.
{
"name": "Name 1",
"foo": "Bar"
}
{
"id": 123,
"bar": "Foo"
}
I'm only interested in the second element and I need to query by the 'index' of the element (i.e. I do not know what fields the element will contain).
How do I achieve this with jq?
There are several possible answers, depending on which version of jq you have, so here I'll focus on a generic and generally useful answer.
Use the -s ("slurp") option to get the second JSON entity, as in jq -s '.[1]'
In jq 1.4 and later, the jq filter .[] when used on objects preserves the order of the keys. (Using jq 1.3, you may be out of luck if you do not know anything about the key names.) For example, using jq 1.4 or later:
$ jq '.[]'
{"b":1, "a":2}
1
2
I want to get value from JSON file:
Example:
{"name":"ghprbActualCommitAuthorEmail","value":"test#gmail.com"},{"name":"ghprbPullId","value":"226"},{"name":"ghprbTargetBranch","value":"master"},
My expect is :
I want to get test#gmail.com, 226 and master.
sed is the wrong tool for processing JSON.
Assuming you have a file tmp.json with valid JSON like
[{"name":"ghprbActualCommitAuthorEmail","value":"test#gmail.com"},
{"name":"ghprbPullId","value":"226"},
{"name":"ghprbTargetBranch","value":"master"}]
you can use jq '.[].value' tmp.son.
If the file instead contains
{"name":"ghprbActualCommitAuthorEmail","value":"test#gmail.com"}
{"name":"ghprbPullId","value":"226"}
{"name":"ghprbTargetBranch","value":"master"}
(i.e., just a stream of 3 separate JSON objects, you could use jq '.value' tmp.json, as jq will apply the filter to each object in succession. You can also use jq -s '.[].value' tmp.son, where the -s flag tells jq to read the entire input into an array first. This lets you use the same filter in both cases.