Retrieving the first entity out of several ones - json

I am a rank beginner with jq, and I've been going through the tutorial, but I think there is a conceptual difference I don't understand. A common problem I encounter is that a large JSON file will contain many objects, each of which is quite big, and I'd like to view the first complete object, to see which fields exist, what types, how much nesting, etc.
In the tutorial, they do this:
# We can use jq to extract just the first commit.
$ curl 'https://api.github.com/repos/stedolan/jq/commits?per_page=5' | jq '.[0]'
Here is an example with one object - here, I'd like to return the whole array (just like my_array=['foo']; my_array[0] would return foo in Python).
wget https://hacker-news.firebaseio.com/v0/item/8863.json
I can access and pretty-print the whole thing with .
$ cat 8863.json | jq '.'
$
{
"by": "dhouston",
"descendants": 71,
"id": 8863,
"kids": [
9224,
...
8876
],
"score": 104,
"time": 1175714200,
"title": "My YC app: Dropbox - Throw away your USB drive",
"type": "story",
"url": "http://www.getdropbox.com/u/2/screencast.html"
}
But trying to get the first element fails:
$ cat 8863.json| jq '.[0]'
$ jq: error (at <stdin>:0): Cannot index object with number
I get the same error jq '.[0]' 8863.json, but strangely echo 8863.json | jq '.[0]' gives me parse error: Invalid numeric literal at line 2, column 0. What is the difference? Also, is this not the correct way to get the zeroth member of the JSON?
I've looked at other SO posts with this error message and at the manual, but I'm still confused. I think of the file as an array of JSON objects, and I'd like to get the first. But it looks like jq works with something called a "stream", and does operations on all of it (say, return one given field from every object).
Clarification:
Let's say I have 2 objects in my JSON:
{
"by": "pg",
"id": 160705,
"poll": 160704,
"score": 335,
"text": "Yes, ban them; I'm tired of seeing Valleywag stories on News.YC.",
"time": 1207886576,
"type": "pollopt"
}
{
"by": "dpapathanasiou",
"id": 16070,
"kids": [
16078
],
"parent": 16069,
"text": "Dividends don't mean that much: Microsoft in its dominant years (when they had 40%-plus margins and were raking in the cash) never paid a dividend (they did so only recently).",
"time": 1177355133,
"type": "comment"
}
How would I get the entire first object (lines 1-9) with jq?

Cannot index object with number
This error message says it all, you can't index objects with numbers. If you want to get the value of by field, you need to do
jq '.by' file
Wrt
echo 8863.json | jq '.[0]' gives me parse error: Invalid numeric literal at line 2, column 0.
It's normal since you didn't specify -R/--raw-input flag, and so jq sees the shell string 8863.json as a JSON string, and one cannot apply array indexing to JSON strings. (To get the first character as a string, you'd write .[0:1].)
If your input file consists of several separate entities, to get the first one:
jq -n 'input' file
or,
jq -n 'first(inputs)' file
To get nth (let's say 5th for example):
jq -n 'nth(5; inputs)' file

a large JSON file will contain many objects, each of which is quite big, and I'd like to view the first complete object, to see which fields exist, what types, how much nesting, etc.
As implied in #OguzIsmail's response, there are important differences between:
- a JSON file (i.e, a file containing exactly one JSON entity);
- a file containing a sequence (i.e., stream) of JSON entities;
- a file containing an array of JSON entities.
In the first two cases, you can write jq -n input to select the first entity, and in the case of an array of entities, jq .[0] will suffice.
(In JSON-speak, a "JSON object" is a kind of dictionary, and is not to be confused with JSON entities in general.)
If you have a bunch of JSON objects (whether as a stream or array or whatever), just looking at the first often doesn't really give an accurate picture of all them. For getting a bird's eye view of a bunch of objects, using a "schema inference engine" is often the way to go. For this purpose, you might like to consider my schema.jq schema inference engine. It's usually very simple to use but of course how you use it will depend on whether you have a stream or array of JSON entities. For basic details, see https://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed; for related topics (e.g. verification), see the entry for JESS at https://github.com/stedolan/jq/wiki/Modules
Please note that schema.jq infers a structural schema that mirrors the entities under consideration. Such structural schemas have little in common with JSON Schema schemas, which you might also like to consider.

Related

Finding the location (line, column) of a field value in a JSON file

Consider the following JSON file example.json:
{
"key1": ["arr value 1", "arr value 2", "arr value 3"],
"key2": {
"key2_1": ["a1", "a2"],
"key2_2": {
"key2_2_1": 1.43123123,
"key2_2_2": 456.3123,
"key2_2_3": "string1"
}
}
}
The following jq command extracts a value from the above file:
jq ".key2.key2_2.key2_2_1" example.json
Output:
1.43123123
Is there an option in jq that, instead of printing the value itself, prints the location (line and column, start and end position) of the value within a (valid) JSON file, given an Object Identifier-Index (.key2.key2_2.key2_2_1 in the example)?
The output could be something like:
some_utility ".key2.key2_2.key2_2_1" example.json
Output:
(6,25) (6,35)
Given JSON data and a query, there is no
option in jq that, instead of printing the value itself, prints the location
of possible matches.
This is because JSON parsers providing an interface to developers usually focus on processing the logical structure of a JSON input, not the textual stream conveying it. You would have to instruct it to explicitly treat its input as raw text, while properly parsing it at the same time in order to extract the queried value. In the case of jq, the former can be achieved using the --raw-input (or -R) option, the latter then by parsing the read-in JSON-encoded string using fromjson.
The -R option alone would read the input linewise into an array of strings, which would have to be concatenated (e.g. using add) in order to provide the whole input at once to fromjson. The other way round, you could also provide the --slurp (or -s) option which (in combination with -R) already concatenates the input to a single string which then, after having parsed it with fromjson, would have to be split again into lines (e.g. using /"\n") in order to provide row numbers. I found the latter to be more convenient.
That said, this could give you a starting point (the --raw-output (or -r) option outputs raw text instead of JSON):
jq -Rrs '
"\(fromjson.key2.key2_2.key2_2_1)" as $query # save the query value as string
| ($query | length) as $length # save its length by counting its characters
| ./"\n" | to_entries[] # split into lines and provide 0-based line numbers
| {row: .key, col: .value | indices($query)[]} # find occurrences of the query
| "(\(.row),\(.col)) (\(.row),\(.col + $length))" # format the output
'
(5,24) (5,34)
Demo
Now, this works for the sample query, how about the general case? Your example queried a number (1.43123123) which is an easy target as it has the same textual representation when encoded as JSON. Therefore, a simple string search and length count did a fairly good job (not a perfect one because it would still find any occurrence of that character stream, not just "values"). Thus, for more precision, but especially with more complex JSON datatypes being queried, you would need to develop a more sophisticated searching approach, probably involving more JSON conversions, whitespace stripping and other normalizing shenanigans. So, unless your goal is to rebuild a full JSON parser within another one, you should narrow it down to the kind of queries you expect, and compose an appropriately tailored searching approach. This solution provides you with concepts to simultaneously process the input textually and structurally, and with a simple search and ouput integration.

Replace value of object property in multiple JSON files

I'm working with multiple JSON files that are located in the same folder.
Files contain objects with the same properties and they are such as:
{
"identifier": "cameraA",
"alias": "a",
"rtsp": "192.168.1.1"
}
I want to replace a property for all the objects in the JSON files at the same time for a certain condition.
For example, let's say that I want to replace all the rtsp values of the objects with identifier equal to "cameraA".
I've been trying with something like:
jq 'if .identifier == \"cameraA" then .rtsp=\"cameraX" else . end' -c *.json
But it isn't working.
Is there a simple way to replace the property of an object among multiple JSON files?
jq can only write to STDIN and STDOUT, so the simplest approach would be to process one file at a time, e.g. putting your jq program inside a shell loop. sponge is often used when employing this approach.
However, there is an alternative that has the advantage of efficiency. It requires only one invocation of jq, the output of which would include the filename information (obtained from input_filename). This output would then be the input of an auxiliary process, e.g. awk.

Using jq to concatenate directory of JSON files

I have a directory of about 100 JSON files, each an array of 100 simple records, that I want to concatenate into one file for inclusion as static data in an app, so I don't have to make repeated API calls to retrieve small pieces. (I'm limited to downloading only 100 records at a time; that's why I have 100 short files.)
Here's a sample file, shortened to two records for display here:
[
{
"id": 11531,
"title": "category 1",
"count": 5
},
{
"id": 11532,
"title": "category 2",
"count": 5
}
]
My research led to a solution that works but only for two files with two records each:
jq -s '.[0] + .[1]' file1.json file2.json > output.json
This source also suggested this line would work to handle a directory (right now only two files in it):
jq -s 'reduce .[] as $item ({}; . * $item)' json_files/* > output.json
but I get an error:
jq: error (at json_files/categories-11-20.json:0): object ({}) and array ([{"id":1153...) cannot be multiplied
I thought maybe the problem was the *trying to multiply, so I tried + in that place, but I get a ... cannot be added. message.
Is there a way to do this through jq or is there a better tool?
The simplest and perfectly reasonable approach would be to use the -s command-line option and add along the following lines:
jq -s add json_files/*
Of course you may wish to specify the list of files differently. The order in which they are specified is also significant.
Notes:
This Q is really just a variant of Use jq to concatenate JSON arrays in multiple files
reduce can also be used, but you would need to start with null or [] rather than {}.
The operator '*' is (not surprisingly) quite different from '+'!

Fix JSON Formatting with jq

Given an invalid JSON string such as: { foo: bar } is it possible to get jq to process and format correctly as { "foo": "bar" }
No, or at least not without complex programming, though jq can handle objects with unquoted key names, e.g. {foo: "bar"}. (Hint: read the quasi-JSON as a jq program.)
The jq FAQ, however, does have a section giving details about a number of command-line tools that can be recommended for this kind of task, e.g. any-json and hjson. That page provides links as well.

Pulling data out of JSON Object via jq to CSV in Bash

I'm working on a bash script (running via gitBash on Windows technically but I don't think that matters) that will convert some JSON API data into CSV files. Most of it has gone fairly well, especially since I'm not particularly familiar with JQ as this is my first time using it.
I've got some JSON data that looks like the array below. What I'm trying to do is select the cardType,MaskedPan,amount and datetime out of the data.
this is probably the first time in life that my google searching has failed me. I know(or should I say think) that that is actually an object and not just a simple array.
I've not really found anything that helps me know how to grab that data I need and export it into a CSV file. I've had no issue grabbing the other data that I need but these few pieces are proving to be a big problem for me.
The script I'm trying basically can be boiled down to this:
jq='/c/jq-win64.exe -r';
header='("cardType")';
fields='[.TransactionDetails[0].Value[0].cardType]';
$jq ''$header',(.[] | '$fields' | #csv)' < /t/API_Data/JSON/GetByDate-082719.json >
/t/API_Data/CSV/test.csv;
If I do .TransactionDetails[0].Value I can get that whole chunk of data. But that is problematic in a CSV as it contains commas.
I suppose I could make this a TSV and import it into the database as one big string and sub string it out. But that isn't the "right" solution. I'm sure there is a way JQ can give me what I need.
"TransactionDetails": [
{
"TransactionId": 123456789,
"Name": "BlacklinePaymentDetail",
"Value": "{\"cardType\":\"Visa\",\"maskedPan\":\"1234\",\"paymentDetails\":{\"reference\":\"123456789012\",\"amount\":99.99,\"dateTime\":\"2019/08/27 08:41:09\"}}",
"ShowOnTill": false,
"PrintOnOrder": false,
"PrintOnReceipt": false
}
]
Ideally I'd be able to just have a field in the CSV for cardType,MaskedPan,amount and datetime instead of pulling the "Value" that contains all of it.
Any advice would be appreciated.
The ingredient you're missing is fromjson, which converts a stringified JSON to JSON. Adding enclosing braces around your sample input,
the invocation:
jq -r -f program.jq input.json
produces:
"Visa","1234",99.99,"2019/08/27 08:41:09"
where program.jq is:
.TransactionDetails[0].Value
| fromjson
| [.cardType, .maskedPan] + (.paymentDetails | [.amount, .dateTime])
| #csv