I can't seem to use JSON::XS's OO interface properly. The following croaks with an error I can't track down:
use JSON::XS;
my $array = ['foo', 'bar'];
my $coder = JSON::XS->new->utf8->pretty;
print $coder->encode_json($array);
This croaks with the following: Usage: JSON::XS::encode_json(scalar) at test.pl line 5. I have been combing through the code for JSON::XS and I can't find a "Usage:" warning anywhere. My usage seems to be pretty well matched with the examples in the documentation. Can anyone tell me where I have gone wrong?
JSON::XS has two interfaces: functional and OO.
In the functional interface, the function name is encode_json.
In the OO interface, the method is simply encode, not encode_json.
Both of the following two snippets work:
# Functional | # OO
------------------------------+-----------------------------------------
|
use JSON::XS; | use JSON::XS;
my $array = ['foo', 'bar']; | my $array = [ 'foo', 'bar' ];
|
print encode_json($array); | my $coder = JSON::XS->new->utf8->pretty;
| print $coder->encode($array);
|
# ["foo","bar"] | # [
| # "foo",
| # "bar"
| # ]
Related
I am attempting to parse a JSON structure to extract a dependency path, for use in an automation script.
The structure of this JSON is extracted to a format like this:
[
{
"Id": "abc",
"Dependencies": [
]
},
{
"Id": "def",
"Dependencies": [
"abc"
]
},
{
"Id": "ghi",
"Dependencies": [
"def"
]
}
]
Note: Lots of other irrelevant fields removed.
The plan is to be able to pass into my JQ command the Id of one of these and get back out a list.
Eg:
Input: abc
Expected Output: []
Input: def
Expected Output: ["abc"]
Input: ghi
Expected Output: ["abc", "def"]
Currently have a jq script like this (https://jqplay.org/s/NAhuXNYXXO):
jq
'. as $original | .[] |
select(.Id == "INPUTVARIABLE") |
[.Dependencies[]] as $level1Dep | [$original[] | select( [ .Id == $level1Dep[] ] | any )] as $level1Full | $level1Full[] |
[.Dependencies[]] as $level2Dep | [$original[] | select ( [ .Id == $level2Dep[] ] | any )] as $level2Full |
[$level1Dep[], $level2Dep[]]'
Input: abc
Output: empty
Input: def
Output: ["abc"]
Input: ghi
Output: ["def","abc"]
Great! However, as you can see this is not particularly scale-able and will only handle two dependency levels (https://jqplay.org/s/Zs0xIvJ2Zn), and also falls apart horribly when there are multiple dependencies on an item (https://jqplay.org/s/eB9zHQSH2r).
Is there a way of constructing this within JQ or do I need to move out to a different language?
I know that the data cannot have circular dependencies, it is pulled from a database that enforces this.
It's trivial then. Reduce your input JSON down to an object where each Id and corresponding Dependencies array are paired, and walk through it aggregating dependencies using a recursive function.
def deps($depdb; $id):
def _deps($id): $depdb[$id] // empty
| . + map(_deps(.)[]);
_deps($id);
deps(map({(.Id): .Dependencies}) | add; $fid)
Invocation:
jq -c --arg fid 'ghi' -f prog.jq file
Online demo - arbitrary dependency levels
Online demo - multiple dependencies per Id
Here's a short program that handles circular dependencies efficiently and illustrates how a subfunction can be defined after the creation of a local variable (here, $next) for efficiency:
def dependents($x):
(map( {(.Id): .Dependencies}) | add) as $next
# Input: array of dependents computed so far
# Output: array of all dependents
| def tc($x):
($next[$x] - .) as $new
| if $new == [] then .
else (. + $new | unique)
# avoid calling unique again:
| . + ([tc($new[])[]] - .)
end ;
[] | tc($x);
dependents($start)
Usage
With the given input and an invocation such as
jq --arg start START -f program.jq input.json
the output for various values of START is:
START output
abc []
def ["abc"]
ghi ["def", "abc"]
If the output must be sorted, then simply add a call to sort.
I am trying to join() a relatively big array (20k elements) of objects with a character ('\n' in this particular case). I have a few operation upfront which solve in about 8 seconds (acceptable) but when I try to '| join("\n")' at the end the runtime jump to 3+ minutes.
Is there any reason for the join() to be that slow ? Is there another way of having the same output without join() ?
I am currently using jq-1.5 (latest stable)
Here is the JQ file
json2csv.jq
def json2csv:
def tonull: if . == "null" then null else . end;
(.[0] | keys) as $headers |
[(
$headers | join("\t")
), (
[ .[] as $row | [ $headers[] as $h | $row[$h] | tostring | tonull ] | join("\t") ] | join("\n")
)] | join("\n")
;
json2csv
Considering:
$ jq 'length' test.json
23717
With the script is I want it (and put above)
$ time jq -rf json2csv.jq test.json > test.csv
real 3m46.721s
user 1m48.660s
sys 1m57.698s
With the same script, removing the join("\n")
$ time jq -rf json2csv.jq test.json > test.csv
real 0m8.564s
user 0m8.301s
sys 0m0.242s
(note: I remove the second join because else JQ cannot aggregate an array and a string, which make sense (but that's only on an array of 2 elements anyways, so the second join isn't the problem))
You don't need to use join at all. Rather than thinking of converting the whole file to a single string, think of it as converting each row to strings. The way jq outputs streams of results will give you the desired result in the end (assuming you take the raw output).
try something more like this.
def json2csv:
def tonull: if . == "null" then null else . end;
(.[0] | keys) as $headers
# output headers followed by rows of values as arrays
| (
$headers
),
(
.[] | [ .[$headers[]] | tostring | tonull ]
)
# convert the arrays to tab separated values strings
| #tsv
;
After thinking about it I remembered that jq automatically display carriage return ('\n') if you scan an array (.[]), which mean that in this particular case I can just do this:
def json2csv:
def tonull: if . == "null" then null else . end;
(.[0] | keys) as $headers |
[(
$headers | join("\t")
), (
[ .[] as $row | [ $headers[] as $h | $row[$h] | tostring | tonull ] | join("\t") ] | .[]
)] | .[]
;
json2csv
And this solved my problem
time jq -rf json2csv.jq test.json > test.csv
real 0m6.725s
user 0m6.454s
sys 0m0.245s
I'm leaving the question up as if I had wanted to use any other character than '\n' this wouldn't have solved the issue.
When producing output such as CSV or TSV, the idea is to stream the data as much as possible. The last thing you want to do is run join on an array containing all the data. If you did want to use a delimiter other than \n, you'd add it to each item in the stream, and then use the -j command-line option.
Also, I think your diagnosis is probably not quite right as joining an array with a large number of small strings is quite fast. Below are timings comparing joining an array with two strings and one with 100,000 strings. In case you're wondering, my machine is rather slow.
./join.sh 2
3
real 0.03
user 0.02
sys 0.00
1896448 maximum resident set size
$ ./join.sh 100000
588889
real 2.20
user 2.05
sys 0.13
21188608 maximum resident set size
$cat join.sh
#!/bin/bash
/usr/bin/time -lp jq -n --argjson n "$1" '[range(0;$n)|tostring]|join(".")|length'
The above runs used jq 1.6, but using jq 1.5 produces very similar results.
On the other hand, joining a large number (20,000) of very long strings (1K) is noticeably slow, so evidently the current jq implementation is not designed for such operations.
I'm trying to find all common keys in a Json file, given that we don't know names of keys in the file.
the Json file looks like:
{
"DynamicKey1" : {
"foo" : 1,
"bar" : 2
},
"DynamicKey2" : {
"bar" : 3
},
"DynamicKey3" : {
"foo" : 5,
"zyx" : 5
}
}
Expect result:
{
"foo"
}
I was trying to apply reduce/foreach logic here but I am not sure how to write it in jq. I appreciate any help!!
jq '. as $ss | reduce range(1; $ss|length) as $i ([]; . + reduce ($ss[i] | keys) as $key ([]; if $ss[$i - 1] | has($key) then . +$key else . end))' file.json
There are some inconsistencies in the Q as posted: there are no keys common to all the objects, and if one looks at the pair-wise intersection of keys, the result would include both "foo" and "bar".
In the following, I'll present solutions for both these problems.
Keys in more than one object
[.[] | keys_unsorted[]] | group_by(.)[] | select(length>1)[0]
Keys in all the objects
Here's a solution using a similar approach:
length as $length
| [.[] | keys_unsorted[]] | group_by(.)[]
| select(length==$length)
| .[0]
This involves group_by/2, which is implemented using a sort.
Here is an alternative approach that relies on the built-in function keys to do the sorting (the point being that ((nk ln(nk)) - n(k ln(k))) = nk ln(n), i.e. having n small sorts of k items is better than one large sort of n*k items):
# The intersection of an arbitrary number of sorted arrays
def intersection_of_sorted_arrays:
# intersecting/1 returns a stream
def intersecting($A;$B):
def pop:
.[0] as $i
| .[1] as $j
| if $i == ($A|length) or $j == ($B|length) then empty
elif $A[$i] == $B[$j] then $A[$i], ([$i+1, $j+1] | pop)
elif $A[$i] < $B[$j] then [$i+1, $j] | pop
else [$i, $j+1] | pop
end;
[0,0] | pop;
reduce .[1:][] as $x (.[0]; [intersecting(.; $x)]);
To compute the keys common to all the objects:
[.[] | keys] | intersection_of_sorted_arrays
Here is a sort-free and time-efficient answer that relies on the efficiency of jq's implementation of lookups in a JSON dictionary. Since keys are strings, we can simply use the concept of a "bag of words" (bow):
def bow(stream):
reduce stream as $word ({}; .[$word|tostring] += 1);
We can now solve the "Keys common to all objects" problem as follows:
length as $length
| bow(.[] | keys_unsorted[])
| to_entries[]
| select(.value==$length).key
And similarly for the "Keys in more than one object" problem.
Of course, to achieve the time-efficiency, there is the usual space-time tradeoff.
Is there a way to refactor jq into functions?
Prior to refactor:
jq ' .them ."keyName" ' ./some.json
After refactor:
def getThese(x): .them .$x;
in ~/.jq
and then call it with...
jq ' getThese("keyName") as $i | $i ' ./some.json
The above refactor does not appear to work (is there a way?)
The abbreviation '.x.y' will not work if y is a variable. Use the syntax '.x | .[ y ]' instead.
'E as $i| $i' can be written as 'E' in this case.
Your definition should be either:
def getThese(x): .them | .[x];
or with different semantics (and requiring a sufficiently recent version of jq):
def getThese($x): .them | .[$x];
One alternative would be to define getThem as:
def getThem(f): .them | f;
This would allow you to write: getThem(.keyName) for keys with unexceptional names.
Input :-
{"Timestamp":140,
"DateTime":"2014-06-02 14:32:34.440 PDT",
"CustomerId":"01",
"VisitorId":"78"}
Desired Output
Timestamp; DateTime; CustomerId; VisitorId
140; 2014-06-02 14:32:34.440 PDT; 01; 78
I tried the following code:-
results.txt
| (map(keys) | add | unique) as $cols
| map(. as $row | $cols | map($row[.])) as $rows
| $cols, $rows[] | #csv
Error:-
'add' is not recognized as an internal or external command,
operable program or batch file."
I don't know what is wrong. I am using window platform with cygwin.
With your input, and the following program in tocsv.jq:
(keys_unsorted | join(",")),
([.[]] | #csv)
the command:
$ jq -r -f tocsv.jq input.json
produces:
Timestamp,DateTime,CustomerId,VisitorId
140,"2014-06-02 14:32:34.440 PDT","01","78"
Eliminating the quotation marks in the second line is left as an exercise for the interested reader :-) [Hint: use join(",") again.]
WARNING: the above program is intended only for jq version 1.5 or later. When using an earlier version of jq, using to_entries or explicitly specifying the key names may be required.