Let's say I have the following:
jq 'map(select(. >= 5))'
given [1,2,3,4,5,6,7] it returns:
[5,6,7]
I also have
jq 'map(select(. < 5))'
which given the same data, returns [1,2,3,4]. How can I do these complementary queries at the same time, so that I receive for example:
[1,2,3,4], [5,6,7]
jq has a built-in filter for grouping by some (possibly multi-valued) criterion:
jq -nc '[1,2,3,4,5,6,7] | group_by(. < 5)'
produces:
[[5,6,7],[1,2,3,4]]
One option is to use reduce:
reduce .[] as $x
([]; if $x < 5 then .[0] += [$x] else .[1] += [$x] end)
This will produce:
[[1,2,3,4],[5,6,7]]
Related
To be more clear, look at the below text file.
https://brianbrandt.dk/web/var/www/public_html/.htpasswd
https://brianbrandt.dk/web/var/www/public_html/wp-config.php
https://briannajackson1.wordpress.org/high-entropy-misc.txt
https://briannajackson1.wordpress.org/Homestead.yaml
https://brickellmiami.centric.hyatt.com/dev
https://brickellmiami.centric.hyatt.com/django.log
https://brickellmiami.centric.hyatt.com/.dockercfg
https://brickellmiami.centric.hyatt.com/docker-compose.yml
https://brickellmiami.centric.hyatt.com/.docker/config.json
https://brickellmiami.centric.hyatt.com/Dockerfile
https://brideonashoestring.wordpress.org/web/var/www/public_html/config.php
https://brideonashoestring.wordpress.org/web/var/www/public_html/wp-config.php
https://brideonashoestring.wordpress.org/wp-config.php
https://brideonashoestring.wordpress.org/.wp-config.php.swp
https://brideonashoestring.wordpress.org/_wpeprivate/config.json
https://brideonashoestring.wordpress.org/yarn-debug.log
https://brideonashoestring.wordpress.org/yarn-error.log
https://brideonashoestring.wordpress.org/yarn.lock
https://brideonashoestring.wordpress.org/.yarnrc
https://bridgehome.adobe.com/etc/shadow
https://bridgehome.adobe.com/phpinfo.php
https://bridgetonema.wordpress.org/manifest.json
https://bridgetonema.wordpress.org/manifest.yml
https://bridge.twilio.com/.wp-config.php.swp
https://bridge.twilio.com/wp-content/themes/.git/config
https://bridge.twilio.com/_wpeprivate/config.json
https://bridge.twilio.com/yarn-debug.log
https://bridge.twilio.com/yarn-error.log
https://bridge.twilio.com/yarn.lock
https://bridge.twilio.com/.yarnrc
https://brightside.mtn.co.za/config.lua
https://brightside.mtn.co.za/config.php
https://brightside.mtn.co.za/config.php.txt
https://brightside.mtn.co.za/config.rb
https://brightside.mtn.co.za/config.ru
https://brightside.mtn.co.za/_config.yml
https://brightside.mtn.co.za/console
https://brightside.mtn.co.za/.credentials
https://brightside.mtn.co.za/CVS/Entries
https://brightside.mtn.co.za/CVS/Root
https://brightside.mtn.co.za/dasbhoard/
https://brightside.mtn.co.za/data
https://brightside.mtn.co.za/data.txt
https://brightside.mtn.co.za/db/dbeaver-data-sources.xml
https://brightside.mtn.co.za/db/dump.sql
https://brightside.mtn.co.za/db/.pgpass
https://brightside.mtn.co.za/db/robomongo.json
https://brightside.mtn.co.za/README.txt
https://brightside.mtn.co.za/RELEASE_NOTES.txt
https://brightside.mtn.co.za/.remote-sync.json
https://brightside.mtn.co.za/Resources.zip.manifest
https://brightside.mtn.co.za/.rspec
https://br.infinite.sx/db/dump.sql
https://br.infinite.sx/graphiql
The domain name brightside.mtn.co.za and other domains repeated more than 10 times now i want to drop brightside.mtn.co.za and other domains that are repeated more than 10 times and then the output the results the output should look like.
https://br.infinite.sx/db/dump.sql
https://br.infinite.sx/graphiql
https://bridgetonema.wordpress.org/manifest.json
https://bridgetonema.wordpress.org/manifest.yml
[The following is a response to the original question, which was premised on JSON input.]
Since you need to count the items in a group, it would appear that you will find group_by( sub("/[^/]*$";"") ) useful.
For example, if you wanted to omit large groups entirely, as one interpretation of the stated requirements would seem to imply, you could use the following filter:
[.results[] | select(.status==301) | .url]
| group_by( sub("/[^/]*$";"") )
| map(select(length < 10) )
| .[][]
If the text input is in input.txt, then one solution using jq at the bash command line would be:
< input.txt jq -Rr '[inputs]
| group_by( sub("/[^/]*$";"") )
| map(select(length < 10) )
| .[][]'
(If you want the output as JSON strings, omit the -r option.)
A more efficient solution
The above solution uses the built-in filter group_by/1 and is thus somewhat inefficient. For a very large number of input lines, a more efficient solution would be:
< input.txt jq -Rr '
def GROUPS_BY(stream; f):
reduce stream as $x ({}; .[$x|f] += [$x] ) | .[] ;
GROUPS_BY(inputs; sub("/[^/]*$";""))
| select(length < 10)
| .[]'
I'm not understanding how to_entries works in jq.
I have the following json payload in payload.json
{"REGION":"us-east-1","EMAIL":"contain","UPDATE":1}
which I want to convert into = delimited keypairs, like so;
REGION=us-east-1
EMAIL=contain
UPDATE=1
I was using
jq -r 'to_entries | .[] | .key + "=" + .value' < payload.json
But I get an error
jq: error (at <stdin>:0): string ("UPDATE=") and number (1) cannot be added
If I understand correctly, the issue is that the update value is a number, not a string (ie, having them not match types is an issue) so I tried the following, both with the same error;
string interpolation:
jq -r 'to_entries | .[] | (.key) + "=" + (.value)' < payload.json
tostring:
jq -r 'to_entries | .[] | .key + "=" + .value|tostring' < payload.json
What am I missing?
what am I missing?
A pair of parentheses:
.key + "=" + ( .value|tostring )
Alternatively, you could use string interpolation, e.g.
"\(.key)=\(.value)"
I have the following JSON records stored in a container
{"memberId":"123","city":"New York"}
{"memberId":"234","city":"Chicago"}
{"memberId":"345","city":"San Francisco"}
{"memberId":"123","city":"New York"}
{"memberId":"345","city":"San Francisco"}
I am looking to check if there is any duplication of the memberId - ideally return a true/false and then also return the duplicated values.
Desired Output:
true
123
345
Here's an efficient approach using inputs. It requires invoking jq with the -n command-line option. The idea is to create a dictionary that keeps count of each memberId string value.
The dictionary can be created as follows:
reduce (inputs|.memberId|tostring) as $id ({}; .[$id] += 1)
Thus, to produce a true/false indicator, followed by the duplicates if any, you could write:
reduce (inputs|.memberId|tostring) as $id ({}; .[$id] += 1)
| to_entries
| map(select(.value > 1))
| (length > 0), .[].key
(If all the .memberId values are known to be strings, then of course the call to tostring can be dropped. Conversely, if .memberId is both string and integer-valued, then the above program won't differentiate between occurrences of 1 and "1", for example.)
bow
The aforementioned dictionary is sometimes called a "bag of words" (https://en.wikipedia.org/wiki/Bag-of-words_model). This leads to the generic function:
def bow(stream):
reduce stream as $word ({}; .[($word|tostring)] += 1);
The solution can now be written more concisely:
bow(inputs.memberId)
| to_entries
| map(select(.value > 1))
| (length > 0), .[].key
For just the values which have duplicates, one could write the more efficient query:
bow(inputs.memberId)
| keys_unsorted[] as $k
| select(.[$k] > 1)
| $k
I have data in this format in a file:
{"field1":249449,"field2":116895,"field3":1,"field4":"apple","field5":42,"field6":"2019-07-01T00:00:10","metadata":"","frontend":""}
{"field1":249448,"field2":116895,"field3":1,"field4":"apple","field5":42,"field6":"2019-07-01T00:00:10","metadata":"","frontend":""}
{"field1":249447,"field2":116895,"field3":1,"field4":"apple","field5":42,"field6":"2019-07-01T00:00:10","metadata":"","frontend":""}
{"field1":249443,"field2":116895,"field3":1,"field4":"apple","field5":42,"field6":"2019-07-01T00:00:10","metadata":"","frontend":""}
{"field1":249449,"field2":116895,"field3":1,"field4":"apple","field5":42,"field6":"2019-07-01T00:00:10","metadata":"","frontend":""}
Here, each entry represents a row. I want to have a count of the rows with respect to the value in field one, like:
249449 : 2
249448 : 1
249447 : 1
249443 : 1
How can I get that?
with awk
$ awk -F'[,:]' -v OFS=' : ' '{a[$2]++} END{for(k in a) print k, a[k]}' file
You can use the jq command line tool to interpret JSON data. uniq -c counts the number of occurences.
% jq .field1 < $INPUTFILE | sort | uniq -c
1 249443
1 249447
1 249448
2 249449
(tested with jq 1.5-1-a5b5cbe on linux xubuntu 18.04 with zsh)
Here's an efficient jq-only solution:
reduce inputs.field1 as $x ({}; .[$x|tostring] += 1)
| to_entries[]
| "\(.key) : \(.value)"
Invocation: jq -nrf program.jq input.json
(Note in particular the -n option.)
Of course if an object-representation of the counts is satisfactory, then
one could simply write:
jq -n 'reduce inputs.field1 as $x ({}; .[$x|tostring] += 1)' input.json
Using datamash and some shell utils, change the non-data delimiters to squeezed tabs, count field 3, (it'd be field 2, but there's a leading tab), reverse, then pretty print as per OP spec:
tr -s '{":,}' '\t' < file | datamash -sg 3 count 3 | tac | xargs printf '%s : %s\n'
Output:
249449 : 2
249448 : 1
249447 : 1
249443 : 1
Say we have the following JSON:
[
{
"dir-1": [
"file-1.1",
"file-1.2"
]
},
"dir-1",
{
"dir-2": [
"file-2.1"
]
}
]
And we want to get the next output:
"dir-1/file-1.1"
"dir-1/file-1.2"
"dir-1"
"dir-2/file-2.1"
i.e. to get the paths to all leafs, joining items with /. Is there a way to do that on JQ?
I tried something like this:
cat source-file | jq 'path(..) | [ .[] | tostring ] | join("/")'
But it doesn't produce what I need even close.
You could take advantage of how streams work by merging the path with their values. Streams will only emit path, value pairs for leaf values. Just ignore the numbered indices.
$ jq --stream '
select(length == 2) | [(.[0][] | select(strings)), .[1]] | join("/")
' source-file
returns:
"dir-1/file-1.1"
"dir-1/file-1.2"
"dir-1"
"dir-2/file-2.1"
Here is a solution similar to Jeff Mercado's which uses tostream and flatten
tostream | select(length==2) | .[0] |= map(strings) | flatten | join("/")
Try it online at jqplay.org
Another way is to use a recursive function to walk the input such as
def slashpaths($p):
def concat($p;$k): if $p=="" then $k else "\($p)/\($k)" end;
if type=="array" then .[] | slashpaths($p)
elif type=="object" then
keys_unsorted[] as $k
| .[$k] | slashpaths(concat($p;$k))
else concat($p;.) end;
slashpaths("")
Try it online at tio.run!
Using --stream is good but the following is perhaps less esoteric:
paths(scalars) as $p
| getpath($p) as $v
| ($p | map(strings) + [$v])
| join("/")
(If using jq 1.4 or earlier, and if any of the leaves might be numeric or boolean or null, then [$v] above should be replaced by [$v|tostring].)
Whether the result should be regarded as "paths to leaves" is another matter...