Fix JSON Formatting with jq - json

Given an invalid JSON string such as: { foo: bar } is it possible to get jq to process and format correctly as { "foo": "bar" }

No, or at least not without complex programming, though jq can handle objects with unquoted key names, e.g. {foo: "bar"}. (Hint: read the quasi-JSON as a jq program.)
The jq FAQ, however, does have a section giving details about a number of command-line tools that can be recommended for this kind of task, e.g. any-json and hjson. That page provides links as well.

Related

How to make string replacement in a JSON template?

If I have a JSON template in which some variable should be replaced with their actual value, is there a good way to handle the proper escape?
For example, $value may be replaced with a string that contains characters like " that should be treated specially in JSON.
{ "x": $value }
The template could be arbitrary complex. So it is not a good solution to code the template in some programming language like python, then perform the replacement in that language, then dump the JSON output.
Could anybody show me a generic but succinct way to perform the replacement?
Note that I tagged this question with jq. I am not sure it is strictly relevant to this question. If not, please remove the tag. I tagged jq because people who know jq may also know solutions to my question, although jq is just for transforming a JSON file. An elegant solution may be similar to jq in the sense that a domain-specific language is defined.
jq works quite nicely as a template engine, but there are choices to be made, e.g. depending on whether the "template" itself is valid JSON.
In the example you gave, the template is not valid JSON, but it is potentially valid jq, so the strategy using jq "$-variables" would make sense, e.g. along the lines of:
jq -n --arg value "someValue" -f template.jq
where template.jq is your template.
Three different strategies for using jq as a template engine are explained at some length in the jq Cookbook:
https://github.com/stedolan/jq/wiki/Cookbook#using-jq-as-a-template-engine

Using jq to edit key:value in a .conf file from shell

I'm trying to write a shell script that passes an env variable into a .conf file so that I can manipulate the log_file and log_level keys programatically.
Actual file as station.conf
{
"SX1301_conf": {
"lorawan_public": true,
"clksrc": 1,
"radio_0": {
"type": "SX1257",
"rssi_offset": -166.0,
"tx_enable": true,
"antenna_gain": 0
},
"radio_1": {
"type": "SX1257",
"rssi_offset": -166.0,
"tx_enable": false
}
},
"station_conf": {
"log_file": "stderr",
"log_level": "DEBUG",
/* XDEBUG,DEBUG,VERBOSE,INFO,NOTICE,WARNING,ERROR,CRITICAL */
"log_size": 10000000,
"log_rotate": 3,
"CUPS_RESYNC_INTV": "1s"
}
}
I wanted to test manually before passing shell variables so I tried jq '".station_conf.log_level="ERROR"' station.conf, but I keep getting errors including shell quoting errors and invalid numeric literal errors (which btw, seems to be a open bug: https://github.com/stedolan/jq/issues/501)
Any tips on how to do this? Ideally I'd be able to replace log_level value with a $LOG_LEVEL from my env. Thanks!
Assuming the input is valid JSON, for robustness, you could start with:
jq '.station_conf.log_level="ERROR"' station.conf
To pass in a shell variable, consider:
jq —-arg v "$LOG_LEVEL" '
.station_conf.log_level=$v' station.conf
You are getting invalid numeric literal error because at least your example input is not valid json. As you can see, it contains /* comment */, which is not supported by jq. You have several options here.
keep using jq and make your input files valid json.
use another tool instead of jq, which support comments and/or other non-standard features.
If you choose second way, i.e. different tool, you can find some alternatives either on jq web page (https://github.com/stedolan/jq/wiki/FAQ#processing-not-quite-valid-json) or there is also scout (https://github.com/ABridoux/scout).

Retrieving the first entity out of several ones

I am a rank beginner with jq, and I've been going through the tutorial, but I think there is a conceptual difference I don't understand. A common problem I encounter is that a large JSON file will contain many objects, each of which is quite big, and I'd like to view the first complete object, to see which fields exist, what types, how much nesting, etc.
In the tutorial, they do this:
# We can use jq to extract just the first commit.
$ curl 'https://api.github.com/repos/stedolan/jq/commits?per_page=5' | jq '.[0]'
Here is an example with one object - here, I'd like to return the whole array (just like my_array=['foo']; my_array[0] would return foo in Python).
wget https://hacker-news.firebaseio.com/v0/item/8863.json
I can access and pretty-print the whole thing with .
$ cat 8863.json | jq '.'
$
{
"by": "dhouston",
"descendants": 71,
"id": 8863,
"kids": [
9224,
...
8876
],
"score": 104,
"time": 1175714200,
"title": "My YC app: Dropbox - Throw away your USB drive",
"type": "story",
"url": "http://www.getdropbox.com/u/2/screencast.html"
}
But trying to get the first element fails:
$ cat 8863.json| jq '.[0]'
$ jq: error (at <stdin>:0): Cannot index object with number
I get the same error jq '.[0]' 8863.json, but strangely echo 8863.json | jq '.[0]' gives me parse error: Invalid numeric literal at line 2, column 0. What is the difference? Also, is this not the correct way to get the zeroth member of the JSON?
I've looked at other SO posts with this error message and at the manual, but I'm still confused. I think of the file as an array of JSON objects, and I'd like to get the first. But it looks like jq works with something called a "stream", and does operations on all of it (say, return one given field from every object).
Clarification:
Let's say I have 2 objects in my JSON:
{
"by": "pg",
"id": 160705,
"poll": 160704,
"score": 335,
"text": "Yes, ban them; I'm tired of seeing Valleywag stories on News.YC.",
"time": 1207886576,
"type": "pollopt"
}
{
"by": "dpapathanasiou",
"id": 16070,
"kids": [
16078
],
"parent": 16069,
"text": "Dividends don't mean that much: Microsoft in its dominant years (when they had 40%-plus margins and were raking in the cash) never paid a dividend (they did so only recently).",
"time": 1177355133,
"type": "comment"
}
How would I get the entire first object (lines 1-9) with jq?
Cannot index object with number
This error message says it all, you can't index objects with numbers. If you want to get the value of by field, you need to do
jq '.by' file
Wrt
echo 8863.json | jq '.[0]' gives me parse error: Invalid numeric literal at line 2, column 0.
It's normal since you didn't specify -R/--raw-input flag, and so jq sees the shell string 8863.json as a JSON string, and one cannot apply array indexing to JSON strings. (To get the first character as a string, you'd write .[0:1].)
If your input file consists of several separate entities, to get the first one:
jq -n 'input' file
or,
jq -n 'first(inputs)' file
To get nth (let's say 5th for example):
jq -n 'nth(5; inputs)' file
a large JSON file will contain many objects, each of which is quite big, and I'd like to view the first complete object, to see which fields exist, what types, how much nesting, etc.
As implied in #OguzIsmail's response, there are important differences between:
- a JSON file (i.e, a file containing exactly one JSON entity);
- a file containing a sequence (i.e., stream) of JSON entities;
- a file containing an array of JSON entities.
In the first two cases, you can write jq -n input to select the first entity, and in the case of an array of entities, jq .[0] will suffice.
(In JSON-speak, a "JSON object" is a kind of dictionary, and is not to be confused with JSON entities in general.)
If you have a bunch of JSON objects (whether as a stream or array or whatever), just looking at the first often doesn't really give an accurate picture of all them. For getting a bird's eye view of a bunch of objects, using a "schema inference engine" is often the way to go. For this purpose, you might like to consider my schema.jq schema inference engine. It's usually very simple to use but of course how you use it will depend on whether you have a stream or array of JSON entities. For basic details, see https://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed; for related topics (e.g. verification), see the entry for JESS at https://github.com/stedolan/jq/wiki/Modules
Please note that schema.jq infers a structural schema that mirrors the entities under consideration. Such structural schemas have little in common with JSON Schema schemas, which you might also like to consider.

Convert CSV to Grouped JSON

I have several large CSV's which I would like to export to a particular JSON format but I'm not really sure how to convert it over. It's a list of usernames and urls.
b00nw33,harrypotter788.flv
b00nw33,harrypotter788.mov
b00nw33,levitation271.avi
b01spider,schimbvalutar109.avi
...
I want to export them to JSON grouped by the username like the following
{
"b00nw33": [
"harrypotter788.flv",
"harrypotter788.mov",
"levitation271.avi"
],
"b01spider": [
"schimbvalutar109.avi"
]
}
What is the JQ to do this? Thank you!
The key to a simple solution is the generic function aggregate_by:
# In this formulation, f must either always evaluate to a string or
# always to an integer, it being understood that negative integers
# might be problematic
def aggregate_by(s; f; g):
reduce s as $x (null; .[$x|f] += [$x|g]);
If the CSV can be accurately parsed by simply splitting on commas, then the desired transformation can be accomplished using the following jq filter:
aggregate_by(inputs | split(","); .[0]; .[1])
This assumes jq is invoked with the -R (raw) and -n options.
Output
With the given CSV input, the output would be:
{
"b00nw33": [
"harrypotter788.flv",
"harrypotter788.mov",
"levitation271.avi"
],
"b01spider": [
"schimbvalutar109.avi"
]
}
Handling non-trivial CSV
The above solution assumes that the CSV is as uncomplicated as the sample. If, on the contrary, the CSV cannot be accurately parsed by simply splitting at commas, a more general parser will be needed.
One approach would be to use the very robust and fast csv2json parser at https://github.com/fadado/CSV
Alternatively, you could use one of the many available "csv2tsv" parsers to generate TSV, which jq can handle directly (by splitting on tabs, i.e. split("\t") rather than split(",")).
In any case, once the CSV has been converted to JSON, the filter aggregate_by defined above can be used.
If you are interested in a jq parser for CSV, you might want to look at fromcsvfile (https://gist.github.com/pkoppstein/bbbbdf7489c8c515680beb1c75fa59f2); see also
the definitions for fromcsv being proposed at https://github.com/stedolan/jq/issues/1650#issuecomment-448050902

Basic jq usage. How to get nested value

This must be incredibly simple but the man page makes no sense to me.
curl example.com/json gives me
{
"stats": {
"storage_server.disk_total": XXXXXXXXXX
},
"counters": {}
}
and I want to extract the value XXXXXXXXXX of the disk_total. What is the syntax to do this?
To get deeply nested values by their key:
$ jq '.. |."storage_server.disk_total"? | select(. != null)'
.. is a shortcut for the zero-argument recurse -- an analog of the XPath // operator.
For learning how to construct jq queries, it is more useful to look at the tutorial and manual than the "man" page. There's also a FAQ.
The inner key name has a period in it, and therefore the .keyname shorthand cannot be used for it. So you could write:
.stats["storage_server.disk_total"]
or if your jq allows it:
.stats."storage_server.disk_total"
These are both abbreviations for:
.stats | .["storage_server.disk_total"]
Tho dot in `storage_server.disk_total" needs to be escaped to prevent it from being interpreted as an object key separator. so you can use:
jq '.stats."storage_server.disk_total"'
assuming that XXXXXXXXXX is a valid JSON number in your real JSON.