Remove mismatch and add missing json jq - json

Hi I have two file for example:
file1.json
{
"id": "001",
"name" : "my_policy",
"list_1": ["111111111", "22222222","33333333"],
"list_2": ["a", "b","c"],
.....
}
Then I have file2.json (not always has the same field as f1)
{
"list_1": ["111111111","111111122","33333333"],
"list_2": ["a", "b","c","d","e"],
.....
}
How I can via jq merge same keys values in the two file json and in addiction to the merge operation remove from file1 keys the values non present in file2 ?
So get this result:
{
"id": "001",
"policy" : "my_policy",
"list_1": ["111111111","111111122","33333333"],
"list_2": ["a", "b","c","d","e"],
.....
}
I solved the merge operation via:
jq -s 'reduce .[] as $item ({}; reduce ($item | keys_unsorted[]) as $key (.; $item[$key] as $val | ( $val | type) as $ type | .[$key] = if ( $type == "array") then (.[$key] + $val | unique) elif ($type == "object") then (.[$key] + $val) else $val end ))' file1.json f2.json
How I can solve? Or is impossible via jq?

It is quite simple once you figure out how to find the difference among two list items and add/unique them. One way would be to
jq --slurpfile s 2.json -f script.jq 1.json
Where my script contents are
#!/usr/bin/env jq -f
# backup list_1, list_2 from 1.json
.list_1 as $l1 |
.list_2 as $l2 |
# Perform the operation of removing file1 keys not present in 2.json
# for both list_1 and list_2
( ( $l1 - ( $l1 - $s[].list_1 ) ) + $s[].list_1 | unique ) as $f1 |
( ( $l2 - ( $l2 - $s[].list_2 ) ) + $s[].list_2 | unique ) as $f2 |
# Update the original result 1.json with the modified content
.list_1 |= $f1 |
.list_2 |= $f2
or directly from the command line as
jq --slurpfile s 2.json '
.list_1 as $l1 |
.list_2 as $l2 |
( ( $l1 - ( $l1 - $s[].list_1 ) ) + $s[].list_1 | unique ) as $f1 |
( ( $l2 - ( $l2 - $s[].list_2 ) ) + $s[].list_2 | unique ) as $f2 |
.list_1 |= $f1 |
.list_2 |= $f2
' 1.json

Related

Bash script with jq wont get date difference from strings, and runs quite slowly on i7 16GB RAM

Need to find the difference between TradeCloseTime and TradeOpenTime time in dd:hh:mm format for the Exposure column in the following script.
Also the script runs super slow (~4 mins for 800 rows of json, on Core i7 16gb RAM machine)
#!/bin/bash
echo "TradeNo, TradeOpenType, TradeCloseType, TradeOpenSource, TradeCloseSource, TradeOpenTime, TradeCloseTime, PNL, Exposure" > tradelist.csv
tradecount=$(jq -r '.performance.numberOfTrades|tonumber' D.json)
for ((i=0; i<$tradecount; i++))
do
tradeNo=$(jq -r '.trades['$i']|[.tradeNo][]|tonumber' D.json)
entrySide=$(jq -r '.trades['$i'].orders[0]|[.side][]' D.json)
exitSide=$(jq -r '.trades['$i'].orders[1]|[.side][]' D.json)
entrySource=$(jq -r '.trades['$i'].orders[0]|[.source][]' D.json)
exitSource=$(jq -r '.trades['$i'].orders[1]|[.source][]' D.json)
tradeEntryTime=$(jq -r '.trades['$i'].orders[0]|[.placedTime][]' D.json | tr -d 'Z' | tr -s 'T' ' ')
tradeExitTime=$(jq -r '.trades['$i'].orders[1]|[.placedTime][]' D.json | tr -d 'Z' | tr -s 'T' ' ')
profitPercentage=$(jq -r '(.trades['$i']|[.profitPercentage][0]|tonumber)*(100)' D.json)
echo $tradeNo","$entrySide","$exitSide","$entrySource","$exitSource","$tradeEntryTime","$tradeExitTime","$profitPercentage | tr -d '"' >> tradelist.csv
done
json file looks like this
{"market":{"exchange":"BINANCE_FUTURES","coinPair":"BTC_USDT"},"strategy":{"name":"","type":"BACKTEST","candleSize":15,"lookbackDays":6,"leverageLong":1.00000000,"leverageShort":1.00000000,"strategyName":"ABC","strategyVersion":35,"runNo":"002","source":"Personal"},"strategyParameters":[{"name":"DurationInput","value":"87.0"}],"openPositionStrategy":{"actionTime":"CANDLE_CLOSE","maxPerSignal":1.00000000},"closePositionStrategy":{"actionTime":"CANDLE_CLOSE","minProfit":"NaN","stopLossValue":0.07000000,"stopLossTrailing":true,"takeProfit":0.01290000,"takeProfitDeviation":"NaN"},"performance":{"startTime":"2019-01-01T00:00:00Z","endTime":"2021-11-24T00:00:00Z","startAllocation":1000.00000000,"endAllocation":3478.58904150,"absoluteProfit":2478.58904150,"profitPerc":2.47858904,"buyHoldRatio":0.62426630,"buyHoldReturn":4.57228387,"numberOfTrades":744,"profitableTrades":0.67833109,"maxDrawdown":-0.20924885,"avgMonthlyProfit":0.05242718,"profitableMonths":0.70370370,"avgWinMonth":0.09889897,"avgLoseMonth":-0.05275563,"startPrice":null,"endPrice":57623.08000000},"trades":[{"tradeNo":0,"profit":-5.48836165,"profitPercentage":-0.00549085,"accumulatedBalance":994.51163835,"compoundProfitPerc":-0.00548836,"orders":[{"side":"Long","placedTime":"2019-09-16T21:15:00Z","placedAmount":0.09700000,"filledTime":"2019-09-16T21:15:00Z","filledAmount":0.09700000,"filledPrice":10300.49000000,"commissionPaid":0.39965901,"source":"SIGNAL"},{"side":"CloseLong","placedTime":"2019-09-17T19:15:00Z","placedAmount":0.09700000,"filledTime":"2019-09-17T19:15:00Z","filledAmount":0.09700000,"filledPrice":10252.13000000,"commissionPaid":0.39778264,"source":"SIGNAL"}]},{"tradeNo":1,"profit":-3.52735800,"profitPercentage":-0.00356403,"accumulatedBalance":990.98428035,"compoundProfitPerc":-0.00901572,"orders":[{"side":"Long","placedTime":"2019-09-19T06:00:00Z","placedAmount":0.10000000,"filledTime":"2019-09-19T06:00:00Z","filledAmount":0.10000000,"filledPrice":9893.16000000,"commissionPaid":0.39572640,"source":"SIGNAL"},{"side":"CloseLong","placedTime":"2019-09-19T06:15:00Z","placedAmount":0.10000000,"filledTime":"2019-09-19T06:15:00Z","filledAmount":0.10000000,"filledPrice":9865.79000000,"commissionPaid":0.39463160,"source":"SIGNAL"}]},{"tradeNo":2,"profit":-5.04965308,"profitPercentage":-0.00511770,"accumulatedBalance":985.93462727,"compoundProfitPerc":-0.01406537,"orders":[{"side":"Long","placedTime":"2019-09-25T10:15:00Z","placedAmount":0.11700000,"filledTime":"2019-09-25T10:15:00Z","filledAmount":0.11700000,"filledPrice":8430.00000000,"commissionPaid":0.39452400,"source":"SIGNAL"},{"side":"CloseLong","placedTime":"2019-09-25T10:30:00Z","placedAmount":0.11700000,"filledTime":"2019-09-25T10:30:00Z","filledAmount":0.11700000,"filledPrice":8393.57000000,"commissionPaid":0.39281908,"source":"SIGNAL"}]}
You can do it all (extracts, conversions and formatting) with one jq call:
#!/bin/sh
echo 'TradeNo,TradeOpenType,TradeCloseType,TradeOpenSource,TradeCloseSource,TradeOpenTime,TradeCloseTime,PNL,Exposure'
query='
.trades[]
| [
.tradeNo,
.orders[0].side,
.orders[1].side,
.orders[0].source,
.orders[1].source,
(.orders[0].placedTime | fromdate | strftime("%Y-%m-%d %H:%M:%S")),
(.orders[1].placedTime | fromdate | strftime("%Y-%m-%d %H:%M:%S")),
.profitPercentage * 100,
(
(.orders[1].placedTime | fromdate) - (.orders[0].placedTime | fromdate)
| (. / 86400 | floor | tostring) + (. % 86400 | strftime(":%H:%M"))
)
]
|#csv
'
jq -r "$query" < D.json > tradelist.csv
example of JSON (cleaned of all irrelevant keys):
{
"trades": [
{
"tradeNo": 0,
"profitPercentage": -0.00549085,
"orders": [
{
"side": "Long",
"placedTime": "2018-12-16T21:34:46Z",
"source": "SIGNAL"
},
{
"side": "CloseLong",
"placedTime": "2019-09-17T19:15:00Z",
"source": "SIGNAL"
}
]
}
]
}
output:
TradeNo,TradeOpenType,TradeCloseType,TradeOpenSource,TradeCloseSource,TradeOpenTime,TradeCloseTime,PNL,Exposure
0,"Long","CloseLong","SIGNAL","SIGNAL","2018-12-16 21:34:46","2019-09-17 20:15:00",-0.549085,"274:22:40"
If you want to get rid of the double quotes that jq adds when generating a CSV (which are completely valid, but you need a real parser to read the CSV) then you can replace #csv with #tsv and post-process the output with tr '\t' ',', like this:
query='
...
|#tsv
'
jq -r "$query" < D.json | tr '\t' ',' > tradelist.csv
and you'll get:
TradeNo,TradeOpenType,TradeCloseType,TradeOpenSource,TradeCloseSource,TradeOpenTime,TradeCloseTime,PNL,Exposure
0,Long,CloseLong,SIGNAL,SIGNAL,2018-12-16 21:34:46,2019-09-17 20:15:00,-0.549085,274:22:40
note: This method of getting rid of the " in the CSV is only accurate when there is no \n \t \r \ , or " characters in the input data.
Regarding the main question (regarding computing time differences), you're in luck as jq provides the built-in function fromdateiso8601 for converting ISO times to "the
number of seconds since the Unix epoch (1970-01-01T00:00:00Z)".
With your JSON sample,
.trades[]
| [ .orders[1].placedTime, .orders[0].placedTime]
| map(fromdateiso8601)
| .[0] - .[1]
produces the three differences:
79200
900
900
And here's a function for converting seconds to "hh:mm:ss" format:
def hhmmss:
def l: tostring | if length < 2 then "0\(.)" else . end;
(. % 60) as $ss
| ((. / 60) | floor) as $mm
| (($mm / 60) | floor) as $hh
| ($mm % 60) as $mm
| [$hh, $mm, $ss] | map(l) | join(":");
I prefer using an intermediate structure of the "entry" and "exit" JSON. This helps with debugging the jq commands. Formatted for readability over performance:
#!/usr/bin/env bash
echo "TradeNo,TradeOpenType,TradeCloseType,TradeOpenSource,TradeCloseSource,TradeOpenTime,TradeCloseTime,PNL,Exposure" > tradelist.csv
jq -r '
.trades[]
|{tradeNo,
profitPercentage,
entry:.orders[0],
exit:.orders[1],
entryTS:.orders[0].placedTime|fromdate,
exitTS:.orders[1].placedTime|fromdate}
|[.tradeNo,
.entry.side,
.exit.side,
.entry.source,
.exit.source,
(.entry.placedTime|strptime("%Y-%m-%dT%H:%M:%SZ")|strftime("%Y-%m-%d %H:%M:%S")),
(.exit.placedTime|strptime("%Y-%m-%dT%H:%M:%SZ")|strftime("%Y-%m-%d %H:%M:%S")),
(.profitPercentage*100),
(.exitTS-.entryTS|todate|strptime("%Y-%m-%dT%H:%M:%SZ")|strftime("%d:%H:%M"))]|#csv
' D.json | tr -d '"' >> tradelist.csv
WARNING: This formatting assumes Exposure is LESS THAN 1 MONTH. Good luck with that!

JSON JQ filter by date older than bash

I have a json with this format of data in a text.json file
[
{
"name": "page/page1.html",
"properties": {
"lastModified": "2021-08-10T18:00:45+00:00",
}
},
{
"name": "page/page2.html",
"properties": {
"lastModified": "2021-08-10T19:24:23+00:00",
}
},
{
"name": "page/page3.html",
"properties": {
"lastModified": "2021-08-10T20:36:21+00:00",
}
}
]
I want to make a list of all the names of files which are last modified more that 30 minutes ago. This is my query at the moment to get a list of file names as a variable which i can use later.
file_names=`cat text.json | jq -r .[].name`
How can I use jq to filter for lastModified more than 30 minutes ago based on the timestamp in the properties so I only get the relevant file names?
I'd typically calculate the target date in native bash.
#!/usr/bin/env bash
# make sure we have bash new enough for printf %(...)T time-formatting
# this makes our script work even without GNU date
case $BASH_VERSION in
''|[123].*|4.[012].*) echo "ERROR: bash 4.3+ required" >&2; exit 1;;
esac
export TZ=UTC # force all timestamps to be in UTC (+00:00 / Z)
# faster, bash-builtin now=$(date +%s)
printf -v now '%(%s)T' -1
# faster, bash-builtin start_date_iso8601=$(date +%s -d '30 minutes ago')
start_date_epoch=$((now - 60*30))
printf -v start_date_iso8601 '%(%Y-%m-%dT%H:%M:%S+00:00)T' "$start_date_epoch"
# read our resulting names into an array (not a string)
# jq -j suppresses newlines so we can use NUL delimiters
while IFS= read -r -d '' name; do
names+=( "$name" )
done < <(
jq -j --arg start_date "$start_date_iso8601" '
.[] |
select(.properties.lastModified < $start_date) |
(.name, "\u0000")
' <text.json
)
# print the content of the array we just read the names into
printf 'Matching name: %q\n' "${names[#]}"
This seems to work
date=`date +%Y-%m-%d'T'%H:%M'Z' -d "15 min ago"`
file_names=`jq -r --arg date "$date" '.[] | select(.properties.lastModified < $date) | .name' < text.json`
Let jq do all date computations:
With bash 4 and above with mapfile:
mapfile -d '' last_modified < <(
jq --join-output '(now - 1800) as $date | .[] | select((.properties.lastModified | .[:18] + "Z" | fromdate) < $date) | .name + "\u0000"' input_file.json
)
# For debug purpose
declare -p last_modified
Without mapfile, records are delimited with ASCII RS control character rather than a null byte:
IFS=$'\36' read -ra last_modified < <(jq -j '(now - 1800) as $date | .[] | select((.properties.lastModified | .[:18] + "Z" | fromdate) < $date) | .name + "\u001e"' input_file.json)
Here is the stand-alone jq script with comments:
#!/usr/bin/env -S jq -jf
# Store current timestamp minus 30 minutes (1800 seconds) as $date
(now - 1800) as $date |
.[] |
#
select(
(
# Strip the numerical timezone offset out from the timestamp
# and replace it with the Z for UTC iso8601
# to make it an iso8601 date string that jq understands
.properties.lastModified | .[:18] + "Z" | fromdate
) < $date
) |
.name + "\u0000"

Dump Json response to a bash variable

I have the following ouput
[
"notimportant",
[
"val1",
"val2",
...,
"valn"
]
]
I'm trying to store every value into a bash string, using jq I tried this
out=''
req=$(curl -s $url)
len=$(echo $req | jq length )
for (( i = 0; i < $len; i++ )); do
element=$(echo $req | jq '.[1]' | jq --argjson i "$i" '.[$i]')
out=${element}\n${out}
done
which feels clunky and also has a slow performance. I'm trying to dump the values at once without looping on all the elements
With an array:
mapfile -t arr < <(curl -s "$url" | jq -r '.[1] | .[]')
declare -p arr
Do you want the values separate by TAB or NEWLINE characters in a single variable? The #tsv function is useful for controlling output:
outTABS=$(curl -s "$url" | jq -r '.[1]|.|#tsv')
outLINE=$(curl -s "$url" | jq -r '.[1]|.[]|[.]|#tsv')
> echo "$outTABS"
val1 val2 valn
> echo "$outLINE"
val1
val2
valn

How to convert json values to prometheus?

After runing this command:
mtr -jnbz www.google.com |jq .report.hubs|jq -r 'keys_unsorted[] as $k | "\(.[$k])"'
I get this result:
{"count":"1","host":"1.1.1.1","ASN":"AS???","Loss%":0,"Snt":10,"Last":36.28,"Avg":39.43,"Best":34.77,"Wrst":62.37,"StDev":8.15}
{"count":"2","host":"2.2.2.2","ASN":"AS???","Loss%":100,"Snt":10,"Last":0,"Avg":0,"Best":0,"Wrst":0,"StDev":0}
How can I get this result(prometheus format):
mtr_loss{"count"="1","host"="1.1.1.1","ASN"="AS???"} 0
mtr_snt{"count"="1","host"="1.1.1.1","ASN"="AS???"} 10
mtr_last{"count"="1","host"="1.1.1.1","ASN"="AS???"} 36.28
mtr_avg{"count"="1","host"="1.1.1.1","ASN"="AS???"} 39.43
mtr_best{"count"="1","host"="1.1.1.1","ASN"="AS???"} 34.77
mtr_wrst{"count"="1","host"="1.1.1.1","ASN"="AS???"} 67.37
mtr_stdev{"count"="1","host"="1.1.1.1","ASN"="AS???"} 8.15
And so on for the second string
I will be grateful to you for any tips and tricks
Regards
The following is perhaps easier to follow, maintain, and reuse:
# produce the {"k"="value", ...} representation:
def kv:
. as $in
| reduce keys_unsorted[] as $k ([]; . + ["\"\($k)\"=\"\($in[$k])\""] )
| join(",")
| "{" + . + "}" ;
# downcase and remove %
def ht($s):
keys_unsorted[] as $key
| .[$key] as $value
| "mtr_\($key|ascii_downcase|gsub("%";""))\($s) \($value)";
({count,host,ASN} | kv) as $s
| {"Loss%", Snt, Last, Avg, Best, Wrst, StDev}
| ht($s)
It was quite easy :) :
mtr -jnbz www.google.com |
tr -d "%" |
jq -r '.report.hubs[]
| "mtr_loss{count=\"\(.count)\",host=\"\(.host)\",asn=\"\(.ASN)\"} \(.Loss)\nmtr_sent{count=\"\(.count)\",host=\"\(.host)\",asn=\"\(.ASN)\"} \(.Snt)\nmtr_last{count=\"\(.count)\",host=\"\(.host)\",asn=\"\(.ASN)\"} \(.Last)\nmtr_avg{count=\"\(.count)\",host=\"\(.host)\",asn=\"\(.ASN)\"} \(.Avg)\nmtr_best{count=\"\(.count)\",host=\"\(.host)\",asn=\"\(.ASN)\"} \(.Best)\nmtr_wrst{count=\"\(.count)\",host=\"\(.host)\",asn=\"\(.ASN)\"} \(.Wrst)\nmtr_stdev{count=\"\(.count)\",host=\"\(.host)\",asn=\"\(.ASN)\"} \(.StDev)" '

Calculate accumulated times from json with bash script

I have data in json format that logs timestamps (hh:mm in 24h format) with an event (In/Out). My goal is to add up all the time differences between an "IN" event and the next "OUT" event.
For simplification I assume that there are no inconsistencies (The first element is always an "IN" and each "IN" is followed by an "OUT"). Exception: If the last element is an "IN", the calculation has to be done between the current time and the timestamp from the last "IN" event.
This is my script so far, which calculates all the timespans, also between an OUT and an IN event. But I need only those that are inbetween an IN and OUT event.
Any tips what might be more useful here are welcome !
#!/bin/bash
JSON='{ "times": [ [ "7:43", "IN" ], [ "8:26", "OUT" ], [ "8:27", "IN" ], [ "9:12", "OUT" ], [ "9:14", "IN" ], [ "9:22", "OUT" ], [ "9:23", "IN " ], [ "12:12", "OUT" ], [ "13:12", "IN" ] ]}'
IN_TIMES=$(jq '.times | to_entries | .[] | select(.value[1]| tostring | contains("IN")) | .value[0]' <<< "$JSON")
OUT_TIMES=$(jq '.times | to_entries | .[] | select(.value[1]| tostring | contains("OUT")) | .value[0]' <<< "$JSON")
ALL_TIMES=$(jq -r '.times| to_entries | .[] | .value[0]' <<< "$JSON")
prevtime=0
count=0
for i in $(echo $ALL_TIMES | sed "s/ / /g")
do
if [[ "$count" -eq 0 ]]; then
(( count++ ))
prevtime=$i
continue
else
(( count++ ))
fi
time1=`date +%s -d ${prevtime}`
time2=`date +%s -d ${i}`
diffsec=`expr ${time2} - ${time1}`
echo From $prevtime to $i: `date +%H:%M -ud #${diffsec}`
prevtime=$i
done
Here's an only-jq solution that only calls jq once.
Please note, though, that it may need tweaking to take into account time zone considerations, error-handling, and potentially other complications:
def mins: split(":") | map(tonumber) | .[0] * 60 + .[1];
def diff: (.[1] - .[0]) | if . >= 0 then . else 24*60 + . end;
def now_mins: now | gmtime | .[3] * 60 + .[4];
def pairs:
range(0; length; 2) as $i | [.[$i], .[$i+1] ];
def sigma(s): reduce s as $s (0; . + $s);
.times
| map( .[0] |= mins )
| if .[-1][1] == "IN" then . + [ [now_mins, "OUT"] ] else . end
| sigma(pairs | map(.[0]) | diff)
Since you measure times up to the minute, it is enough to compute minutes without messing up with the command date. I have an awk solution:
awk -F: -vIRS=" " -vfmt="From %5s to %5s: %4u minutes\n" \
'{this=$1*60+$2}a{printf(fmt,at,$0,this-a);a=0;next}{a=this;at=$0}\
END{if(a){$0=strftime("%H:%M");printf(fmt,at,$0,$1*60+$2-a)}}' <<<"$ALL_TIMES"
which works by defining a colon as field separator and a space as record separator. In this way we get a separate record with two fields for each time. Then
{this=$1*60+$2} : We compute how many minutes there are in the current record and put them in the variable this.
a{printf(fmt,at,$0,this-a);a=0;next} : If the (initially empty) variable a is not null nor zero, we are reading an OUT entry, so we print what we want, set a to zero because the next field will be an IN entry, and we continue to the next record.
{a=this;at=$0} : Otherwise, we are reading an IN entry, and set a to its minutes and at to its string representation (needed we will print it, as per previous case).
END{if(a){$0=strftime("%H:%M");printf(fmt,at,$0,$1*60+$2-a)}} : at the end, if we still have some dangling IN data, we set $0 to be the properly formatted current time and print what we want.
All done.
With Xidel and a little XQuery magic this is rather simple:
#!/bin/bash
JSON='{"times": [["7:43", "IN"], ["8:26", "OUT"], ["8:27", "IN"], ["9:12", "OUT"], ["9:14", "IN"], ["9:22", "OUT"], ["9:23", "IN "], ["12:12", "OUT"], ["13:12", "IN"]]}'
xidel -s - --xquery '
let $in:=$json/(times)()[contains(.,"IN")](1) ! time(
substring(
"00:00:00",
1,
8-string-length(.)
)||.
),
$out:=$json/(times)()[contains(.,"OUT")](1) ! time(
substring(
"00:00:00",
1,
8-string-length(.)
)||.
)
for $x at $i in $out return
concat(
"From ",
$in[$i],
" to ",
$x,
": ",
$x - $in[$i] + time("00:00:00")
)
' <<< "$JSON"
$in:
00:07:43
00:08:27
00:09:14
00:09:23
00:13:12
$out:
00:08:26
00:09:12
00:09:22
00:12:12
Output:
From 00:07:43 to 00:08:26: 00:00:43
From 00:08:27 to 00:09:12: 00:00:45
From 00:09:14 to 00:09:22: 00:00:08
From 00:09:23 to 00:12:12: 00:02:49