Get count based on value bash - json

I have data in this format in a file:
{"field1":249449,"field2":116895,"field3":1,"field4":"apple","field5":42,"field6":"2019-07-01T00:00:10","metadata":"","frontend":""}
{"field1":249448,"field2":116895,"field3":1,"field4":"apple","field5":42,"field6":"2019-07-01T00:00:10","metadata":"","frontend":""}
{"field1":249447,"field2":116895,"field3":1,"field4":"apple","field5":42,"field6":"2019-07-01T00:00:10","metadata":"","frontend":""}
{"field1":249443,"field2":116895,"field3":1,"field4":"apple","field5":42,"field6":"2019-07-01T00:00:10","metadata":"","frontend":""}
{"field1":249449,"field2":116895,"field3":1,"field4":"apple","field5":42,"field6":"2019-07-01T00:00:10","metadata":"","frontend":""}
Here, each entry represents a row. I want to have a count of the rows with respect to the value in field one, like:
249449 : 2
249448 : 1
249447 : 1
249443 : 1
How can I get that?

with awk
$ awk -F'[,:]' -v OFS=' : ' '{a[$2]++} END{for(k in a) print k, a[k]}' file

You can use the jq command line tool to interpret JSON data. uniq -c counts the number of occurences.
% jq .field1 < $INPUTFILE | sort | uniq -c
1 249443
1 249447
1 249448
2 249449
(tested with jq 1.5-1-a5b5cbe on linux xubuntu 18.04 with zsh)

Here's an efficient jq-only solution:
reduce inputs.field1 as $x ({}; .[$x|tostring] += 1)
| to_entries[]
| "\(.key) : \(.value)"
Invocation: jq -nrf program.jq input.json
(Note in particular the -n option.)
Of course if an object-representation of the counts is satisfactory, then
one could simply write:
jq -n 'reduce inputs.field1 as $x ({}; .[$x|tostring] += 1)' input.json

Using datamash and some shell utils, change the non-data delimiters to squeezed tabs, count field 3, (it'd be field 2, but there's a leading tab), reverse, then pretty print as per OP spec:
tr -s '{":,}' '\t' < file | datamash -sg 3 count 3 | tac | xargs printf '%s : %s\n'
Output:
249449 : 2
249448 : 1
249447 : 1
249443 : 1

Related

Bash script with jq wont get date difference from strings, and runs quite slowly on i7 16GB RAM

Need to find the difference between TradeCloseTime and TradeOpenTime time in dd:hh:mm format for the Exposure column in the following script.
Also the script runs super slow (~4 mins for 800 rows of json, on Core i7 16gb RAM machine)
#!/bin/bash
echo "TradeNo, TradeOpenType, TradeCloseType, TradeOpenSource, TradeCloseSource, TradeOpenTime, TradeCloseTime, PNL, Exposure" > tradelist.csv
tradecount=$(jq -r '.performance.numberOfTrades|tonumber' D.json)
for ((i=0; i<$tradecount; i++))
do
tradeNo=$(jq -r '.trades['$i']|[.tradeNo][]|tonumber' D.json)
entrySide=$(jq -r '.trades['$i'].orders[0]|[.side][]' D.json)
exitSide=$(jq -r '.trades['$i'].orders[1]|[.side][]' D.json)
entrySource=$(jq -r '.trades['$i'].orders[0]|[.source][]' D.json)
exitSource=$(jq -r '.trades['$i'].orders[1]|[.source][]' D.json)
tradeEntryTime=$(jq -r '.trades['$i'].orders[0]|[.placedTime][]' D.json | tr -d 'Z' | tr -s 'T' ' ')
tradeExitTime=$(jq -r '.trades['$i'].orders[1]|[.placedTime][]' D.json | tr -d 'Z' | tr -s 'T' ' ')
profitPercentage=$(jq -r '(.trades['$i']|[.profitPercentage][0]|tonumber)*(100)' D.json)
echo $tradeNo","$entrySide","$exitSide","$entrySource","$exitSource","$tradeEntryTime","$tradeExitTime","$profitPercentage | tr -d '"' >> tradelist.csv
done
json file looks like this
{"market":{"exchange":"BINANCE_FUTURES","coinPair":"BTC_USDT"},"strategy":{"name":"","type":"BACKTEST","candleSize":15,"lookbackDays":6,"leverageLong":1.00000000,"leverageShort":1.00000000,"strategyName":"ABC","strategyVersion":35,"runNo":"002","source":"Personal"},"strategyParameters":[{"name":"DurationInput","value":"87.0"}],"openPositionStrategy":{"actionTime":"CANDLE_CLOSE","maxPerSignal":1.00000000},"closePositionStrategy":{"actionTime":"CANDLE_CLOSE","minProfit":"NaN","stopLossValue":0.07000000,"stopLossTrailing":true,"takeProfit":0.01290000,"takeProfitDeviation":"NaN"},"performance":{"startTime":"2019-01-01T00:00:00Z","endTime":"2021-11-24T00:00:00Z","startAllocation":1000.00000000,"endAllocation":3478.58904150,"absoluteProfit":2478.58904150,"profitPerc":2.47858904,"buyHoldRatio":0.62426630,"buyHoldReturn":4.57228387,"numberOfTrades":744,"profitableTrades":0.67833109,"maxDrawdown":-0.20924885,"avgMonthlyProfit":0.05242718,"profitableMonths":0.70370370,"avgWinMonth":0.09889897,"avgLoseMonth":-0.05275563,"startPrice":null,"endPrice":57623.08000000},"trades":[{"tradeNo":0,"profit":-5.48836165,"profitPercentage":-0.00549085,"accumulatedBalance":994.51163835,"compoundProfitPerc":-0.00548836,"orders":[{"side":"Long","placedTime":"2019-09-16T21:15:00Z","placedAmount":0.09700000,"filledTime":"2019-09-16T21:15:00Z","filledAmount":0.09700000,"filledPrice":10300.49000000,"commissionPaid":0.39965901,"source":"SIGNAL"},{"side":"CloseLong","placedTime":"2019-09-17T19:15:00Z","placedAmount":0.09700000,"filledTime":"2019-09-17T19:15:00Z","filledAmount":0.09700000,"filledPrice":10252.13000000,"commissionPaid":0.39778264,"source":"SIGNAL"}]},{"tradeNo":1,"profit":-3.52735800,"profitPercentage":-0.00356403,"accumulatedBalance":990.98428035,"compoundProfitPerc":-0.00901572,"orders":[{"side":"Long","placedTime":"2019-09-19T06:00:00Z","placedAmount":0.10000000,"filledTime":"2019-09-19T06:00:00Z","filledAmount":0.10000000,"filledPrice":9893.16000000,"commissionPaid":0.39572640,"source":"SIGNAL"},{"side":"CloseLong","placedTime":"2019-09-19T06:15:00Z","placedAmount":0.10000000,"filledTime":"2019-09-19T06:15:00Z","filledAmount":0.10000000,"filledPrice":9865.79000000,"commissionPaid":0.39463160,"source":"SIGNAL"}]},{"tradeNo":2,"profit":-5.04965308,"profitPercentage":-0.00511770,"accumulatedBalance":985.93462727,"compoundProfitPerc":-0.01406537,"orders":[{"side":"Long","placedTime":"2019-09-25T10:15:00Z","placedAmount":0.11700000,"filledTime":"2019-09-25T10:15:00Z","filledAmount":0.11700000,"filledPrice":8430.00000000,"commissionPaid":0.39452400,"source":"SIGNAL"},{"side":"CloseLong","placedTime":"2019-09-25T10:30:00Z","placedAmount":0.11700000,"filledTime":"2019-09-25T10:30:00Z","filledAmount":0.11700000,"filledPrice":8393.57000000,"commissionPaid":0.39281908,"source":"SIGNAL"}]}
You can do it all (extracts, conversions and formatting) with one jq call:
#!/bin/sh
echo 'TradeNo,TradeOpenType,TradeCloseType,TradeOpenSource,TradeCloseSource,TradeOpenTime,TradeCloseTime,PNL,Exposure'
query='
.trades[]
| [
.tradeNo,
.orders[0].side,
.orders[1].side,
.orders[0].source,
.orders[1].source,
(.orders[0].placedTime | fromdate | strftime("%Y-%m-%d %H:%M:%S")),
(.orders[1].placedTime | fromdate | strftime("%Y-%m-%d %H:%M:%S")),
.profitPercentage * 100,
(
(.orders[1].placedTime | fromdate) - (.orders[0].placedTime | fromdate)
| (. / 86400 | floor | tostring) + (. % 86400 | strftime(":%H:%M"))
)
]
|#csv
'
jq -r "$query" < D.json > tradelist.csv
example of JSON (cleaned of all irrelevant keys):
{
"trades": [
{
"tradeNo": 0,
"profitPercentage": -0.00549085,
"orders": [
{
"side": "Long",
"placedTime": "2018-12-16T21:34:46Z",
"source": "SIGNAL"
},
{
"side": "CloseLong",
"placedTime": "2019-09-17T19:15:00Z",
"source": "SIGNAL"
}
]
}
]
}
output:
TradeNo,TradeOpenType,TradeCloseType,TradeOpenSource,TradeCloseSource,TradeOpenTime,TradeCloseTime,PNL,Exposure
0,"Long","CloseLong","SIGNAL","SIGNAL","2018-12-16 21:34:46","2019-09-17 20:15:00",-0.549085,"274:22:40"
If you want to get rid of the double quotes that jq adds when generating a CSV (which are completely valid, but you need a real parser to read the CSV) then you can replace #csv with #tsv and post-process the output with tr '\t' ',', like this:
query='
...
|#tsv
'
jq -r "$query" < D.json | tr '\t' ',' > tradelist.csv
and you'll get:
TradeNo,TradeOpenType,TradeCloseType,TradeOpenSource,TradeCloseSource,TradeOpenTime,TradeCloseTime,PNL,Exposure
0,Long,CloseLong,SIGNAL,SIGNAL,2018-12-16 21:34:46,2019-09-17 20:15:00,-0.549085,274:22:40
note: This method of getting rid of the " in the CSV is only accurate when there is no \n \t \r \ , or " characters in the input data.
Regarding the main question (regarding computing time differences), you're in luck as jq provides the built-in function fromdateiso8601 for converting ISO times to "the
number of seconds since the Unix epoch (1970-01-01T00:00:00Z)".
With your JSON sample,
.trades[]
| [ .orders[1].placedTime, .orders[0].placedTime]
| map(fromdateiso8601)
| .[0] - .[1]
produces the three differences:
79200
900
900
And here's a function for converting seconds to "hh:mm:ss" format:
def hhmmss:
def l: tostring | if length < 2 then "0\(.)" else . end;
(. % 60) as $ss
| ((. / 60) | floor) as $mm
| (($mm / 60) | floor) as $hh
| ($mm % 60) as $mm
| [$hh, $mm, $ss] | map(l) | join(":");
I prefer using an intermediate structure of the "entry" and "exit" JSON. This helps with debugging the jq commands. Formatted for readability over performance:
#!/usr/bin/env bash
echo "TradeNo,TradeOpenType,TradeCloseType,TradeOpenSource,TradeCloseSource,TradeOpenTime,TradeCloseTime,PNL,Exposure" > tradelist.csv
jq -r '
.trades[]
|{tradeNo,
profitPercentage,
entry:.orders[0],
exit:.orders[1],
entryTS:.orders[0].placedTime|fromdate,
exitTS:.orders[1].placedTime|fromdate}
|[.tradeNo,
.entry.side,
.exit.side,
.entry.source,
.exit.source,
(.entry.placedTime|strptime("%Y-%m-%dT%H:%M:%SZ")|strftime("%Y-%m-%d %H:%M:%S")),
(.exit.placedTime|strptime("%Y-%m-%dT%H:%M:%SZ")|strftime("%Y-%m-%d %H:%M:%S")),
(.profitPercentage*100),
(.exitTS-.entryTS|todate|strptime("%Y-%m-%dT%H:%M:%SZ")|strftime("%d:%H:%M"))]|#csv
' D.json | tr -d '"' >> tradelist.csv
WARNING: This formatting assumes Exposure is LESS THAN 1 MONTH. Good luck with that!

Dump Json response to a bash variable

I have the following ouput
[
"notimportant",
[
"val1",
"val2",
...,
"valn"
]
]
I'm trying to store every value into a bash string, using jq I tried this
out=''
req=$(curl -s $url)
len=$(echo $req | jq length )
for (( i = 0; i < $len; i++ )); do
element=$(echo $req | jq '.[1]' | jq --argjson i "$i" '.[$i]')
out=${element}\n${out}
done
which feels clunky and also has a slow performance. I'm trying to dump the values at once without looping on all the elements
With an array:
mapfile -t arr < <(curl -s "$url" | jq -r '.[1] | .[]')
declare -p arr
Do you want the values separate by TAB or NEWLINE characters in a single variable? The #tsv function is useful for controlling output:
outTABS=$(curl -s "$url" | jq -r '.[1]|.|#tsv')
outLINE=$(curl -s "$url" | jq -r '.[1]|.[]|[.]|#tsv')
> echo "$outTABS"
val1 val2 valn
> echo "$outLINE"
val1
val2
valn

Filter results using bash

To be more clear, look at the below text file.
https://brianbrandt.dk/web/var/www/public_html/.htpasswd
https://brianbrandt.dk/web/var/www/public_html/wp-config.php
https://briannajackson1.wordpress.org/high-entropy-misc.txt
https://briannajackson1.wordpress.org/Homestead.yaml
https://brickellmiami.centric.hyatt.com/dev
https://brickellmiami.centric.hyatt.com/django.log
https://brickellmiami.centric.hyatt.com/.dockercfg
https://brickellmiami.centric.hyatt.com/docker-compose.yml
https://brickellmiami.centric.hyatt.com/.docker/config.json
https://brickellmiami.centric.hyatt.com/Dockerfile
https://brideonashoestring.wordpress.org/web/var/www/public_html/config.php
https://brideonashoestring.wordpress.org/web/var/www/public_html/wp-config.php
https://brideonashoestring.wordpress.org/wp-config.php
https://brideonashoestring.wordpress.org/.wp-config.php.swp
https://brideonashoestring.wordpress.org/_wpeprivate/config.json
https://brideonashoestring.wordpress.org/yarn-debug.log
https://brideonashoestring.wordpress.org/yarn-error.log
https://brideonashoestring.wordpress.org/yarn.lock
https://brideonashoestring.wordpress.org/.yarnrc
https://bridgehome.adobe.com/etc/shadow
https://bridgehome.adobe.com/phpinfo.php
https://bridgetonema.wordpress.org/manifest.json
https://bridgetonema.wordpress.org/manifest.yml
https://bridge.twilio.com/.wp-config.php.swp
https://bridge.twilio.com/wp-content/themes/.git/config
https://bridge.twilio.com/_wpeprivate/config.json
https://bridge.twilio.com/yarn-debug.log
https://bridge.twilio.com/yarn-error.log
https://bridge.twilio.com/yarn.lock
https://bridge.twilio.com/.yarnrc
https://brightside.mtn.co.za/config.lua
https://brightside.mtn.co.za/config.php
https://brightside.mtn.co.za/config.php.txt
https://brightside.mtn.co.za/config.rb
https://brightside.mtn.co.za/config.ru
https://brightside.mtn.co.za/_config.yml
https://brightside.mtn.co.za/console
https://brightside.mtn.co.za/.credentials
https://brightside.mtn.co.za/CVS/Entries
https://brightside.mtn.co.za/CVS/Root
https://brightside.mtn.co.za/dasbhoard/
https://brightside.mtn.co.za/data
https://brightside.mtn.co.za/data.txt
https://brightside.mtn.co.za/db/dbeaver-data-sources.xml
https://brightside.mtn.co.za/db/dump.sql
https://brightside.mtn.co.za/db/.pgpass
https://brightside.mtn.co.za/db/robomongo.json
https://brightside.mtn.co.za/README.txt
https://brightside.mtn.co.za/RELEASE_NOTES.txt
https://brightside.mtn.co.za/.remote-sync.json
https://brightside.mtn.co.za/Resources.zip.manifest
https://brightside.mtn.co.za/.rspec
https://br.infinite.sx/db/dump.sql
https://br.infinite.sx/graphiql
The domain name brightside.mtn.co.za and other domains repeated more than 10 times now i want to drop brightside.mtn.co.za and other domains that are repeated more than 10 times and then the output the results the output should look like.
https://br.infinite.sx/db/dump.sql
https://br.infinite.sx/graphiql
https://bridgetonema.wordpress.org/manifest.json
https://bridgetonema.wordpress.org/manifest.yml
[The following is a response to the original question, which was premised on JSON input.]
Since you need to count the items in a group, it would appear that you will find group_by( sub("/[^/]*$";"") ) useful.
For example, if you wanted to omit large groups entirely, as one interpretation of the stated requirements would seem to imply, you could use the following filter:
[.results[] | select(.status==301) | .url]
| group_by( sub("/[^/]*$";"") )
| map(select(length < 10) )
| .[][]
If the text input is in input.txt, then one solution using jq at the bash command line would be:
< input.txt jq -Rr '[inputs]
| group_by( sub("/[^/]*$";"") )
| map(select(length < 10) )
| .[][]'
(If you want the output as JSON strings, omit the -r option.)
A more efficient solution
The above solution uses the built-in filter group_by/1 and is thus somewhat inefficient. For a very large number of input lines, a more efficient solution would be:
< input.txt jq -Rr '
def GROUPS_BY(stream; f):
reduce stream as $x ({}; .[$x|f] += [$x] ) | .[] ;
GROUPS_BY(inputs; sub("/[^/]*$";""))
| select(length < 10)
| .[]'

jq create output in many separate files

given the following json:
[
{"_id":{"$oid":"6d2"},"jlo":"ΕΙ AJSB","dd":"d5f"},
{"_id":{"$oid":"c6d3"},"jlo":"ΕΙ ALKSB","dd":"5d9"},
{"_id":{"$oid":"b0cc6d4"},"jlo":"ΕΙ AGHTSB","dd":"1b1"},
{"_id":{"$oid":"6d2"},"jlo":"ΕPOWΙ AJSB","dd":"d5f"},
{"_id":{"$oid":"c6d3"},"jlo":"ΕGTΙ ALKSB","dd":"5d9"},
{"_id":{"$oid":"b0cc6d4"},"jlo":"ΕLKΙ AGHTSB","dd":"1b1"}
]
what i need to do is have as output for each discrete value of the ll element, the unique values of ta, in a separate file, named after a one to one representation where each dd code is substituted with a human readable representation:
d5f:departmentone
5d9:departmentalt
1b1:departshort
Desired output, in a per row basis, each unique value of jlo with the count of times it was found in each dd element so we get in the end something like this:
first file named departmentone.txt:
ΕΙ AJSB 1
ΕPOWΙ AJSB 1
second file named departmentalt.txt
ΕΙ ALKSB 1
ΕGTΙ ALKSB 1
third file named departshort.txt
ΕΙ AGHTSB 2
i have tried with map and reduce, group_by, sort_by, with really poor results
Only one invocation of jq is necessary. To allocate the output to the separate files, you can combine this one invocation with a single invocation to awk, or you could use a shell loop as illustrated below.
First, here's an illustration of how the shell pipeline would look:
jq -r --rawfile dd2name dd2name.tsv -f group.jq input.json |
while IFS=$'\t' read -r f v ; do echo "$v" >> "$f" ; done
This assumes that the mapping to filenames is in a TSV file named dd2name.tsv, and that the following jq program is in group.jq:
def dict:
split("\n") | map(select(length>0) | split("\t"))
| INDEX(.[0]) | map_values(.[1]);
($dd2name | dict) as $dict
| ($dict | keys_unsorted[]) as $dd
| map(select(.dd == $dd))
| group_by(.jlo)
| map("\($dict[$dd])\t\(.[0].jlo) \(length)")[]
As the name suggests, the dict function creates a dictionary giving the mapping of .dd values to the filenames. It assumes the availability of INDEX. If your jq does not have INDEX, then now would be an excellent time to upgrade your jq; otherwise, its def can easily be copied from builtin.jq (google: builtin.jq "def INDEX"), or you could replace the last line by: | reduce .[] as $p ({}; .[$p[0]] = $p[1]);
awk-based solution
The following invocation of awk can be used instead of the while ... done command above:
awk -F\\t 'fn && (fn!=$1) {close(fn)}; {fn=$1; print $2 >> fn}'
Season to taste
If the dd2name.tsv mapping file does not contain the ".txt" suffix, it can easily be added in any of a variety of ways, according to taste.
Note also that the proposed solutions above make some assumptions, notably that the .jlo values do not contain tabs, newlines, or NULs. If any of those assumptions is violated, then some tweaking will be required.
I'd do it in three passes, filtering the array with the desired dd and grouping by jlo, then extracting the jlo of the first (guaranteed) item of the array and its length :
map(select(.dd == "d5f")) | group_by(.jlo) | map("\(.[0].jlo) \(length)") | .[]
You can try it here.
Full bash run :
jq --arg dd d5f --raw-output 'map(select(.dd == $dd)) | group_by(.jlo) | map("\(.[0].jlo) \(length)") | .[]' yourJsonFile > departmentone.txt
jq --arg dd 5d9 --raw-output 'map(select(.dd == $dd)) | group_by(.jlo) | map("\(.[0].jlo) \(length)") | .[]' yourJsonFile > departmentalt.txt
jq --arg dd 1b1 --raw-output 'map(select(.dd == $dd)) | group_by(.jlo) | map("\(.[0].jlo) \(length)") | .[]' yourJsonFile > departmentshort.txt
Supposing you have a file named "mapping.txt" with the following content :
d5f:departmentone
5d9:departmentalt
1b1:departshort
You could extract those codes and labels to generate the files :
while IFS=: read -r code label; do
jq --arg dd $code --raw-output 'map(select(.dd == $dd)) | group_by(.jlo) | map("\(.[0].jlo) \(length)") | .[]' yourJsonFile > "$label".txt
done < mapping.txt

Read MySQL result set with multiple columns and spaces

Pretend I have a MySQL table test that looks like:
+----+---------------------+
| id | value |
+----+---------------------+
| 1 | Hello World |
| 2 | Foo Bar |
| 3 | Goodbye Cruel World |
+----+---------------------+
And I execute the query SELECT id, value FROM test.
How would I assign each column to a variable in Bash using read?
read -a truncates everything after the first space in value:
mysql -D "jimmy" -NBe "SELECT id, value FROM test" | while read -a row;
do
id="${row[0]}"
value="${row[1]}"
echo "$id : $value"
done;
and output looks like:
1 : Hello
2 : Foo
3 : Goodbye
but I need it to look like:
1 : Hello World
2 : Foo Bar
3 : Goodbye Cruel World
I'm aware there are args I could pass to MySQL to format the results in table format, but I need to parse each value in each row. This is just a simplified example of my problem.
Use individual fields in the read loop instead of the array:
mysql -D "jimmy" -NBe "SELECT id, value FROM test" | while read -r id value;
do
echo "$id : $value"
done
This will make sure that id will be read into the id field and everything else would be read into the value field - that's how read behaves when input has more fields than the number of variables being read into. If there are more columns to be read, using a delimiter (such as #) that doesn't clash with actual data would help:
mysql -D "jimmy" -NBe "SELECT CONCAT(id, '#', value, '#', column3) FROM test" | while IFS='#' read -r id value column3;
do
echo "$id : $value : $column3"
done
You can do this, also avoid piping a command to a while read loop if possible to avoid creating a subshell.
while read -r line; do
id=$(echo $line | awk '{print $1}')
value=$(echo $line | awk '{print $1=""; print $0}'|sed ':a;N;$!ba;s/\n/ /g'| sed 's/^[ \t]*//g')
echo "ID: $id"
echo "VALUE: $value"
done< <(mysql -D "jimmy" -NBe "SELECT id, value FROM test")
If you want to store all the id's and values in an array for later use, you can modify it to look like this.
#!/bin/bash
declare -A -g arr
while read -r line; do
id=$(echo $line | awk '{print $1}')
value=$(echo $line | awk '{print $1=""; print $0}'|sed ':a;N;$!ba;s/\n/ /g'| sed 's/^[ \t]*//g')
arr[$id]=$value
done< <(mysql -D "jimmy" -NBe "SELECT id, value FROM test")
for key in "${!arr[#]}"; do
echo "$key: ${arr[$key]}"
done
Which gives you this output
dumbledore#ansible1a [OPS]:~/tmp/tmp > bash test.sh
1: Hello World
2: Foo Bar
3: Goodbye Cruel World