How to export additional columns which is calculated - json

I'm trying to solve this with jq for json.
now my command is just like this below
curl -s 'https://api.test-foo.com' \ | jq -r '.[0:4]|[#csv ' > lineJSON.csv;
Then it exports lineJSON.csv successfully and the contents of it is just like this.
1656118800000 6.41 6.54 6.37 6.49
1656122400000 6.49 6.49 6.37 6.41
1656126000000 6.4 6.49 6.4 6.46
1656129600000 6.46 6.49 6.41 6.45
1656133200000 6.46 6.69 6.43 6.62
Now I want the average price column of 2nd and 3rd columns in the end of each row
(For the 1st row, 6.455 should be added in the end of row since 6.54 + 6.37 = 12.91 / 2 = 6.455)
and
I also want 1st column(Unix Time) to be changed to our local timestamp(Tokyo UTC+9)
just like this style 2022/06/25 10:00:00
Anyone can show me how to modify my code to add "Average price" and "DateTimeTokyo" in the end of each row?
Input This is original JSON
[
[
1656057600000,
"6.34000000",
"6.46000000",
"6.32000000",
"6.40000000",
"357905.78000000",
1656061199999,
"2288895.56780000",
4948,
"159142.65000000",
"1019093.46560000",
"0"
],
[
1656061200000,
"6.40000000",
"6.43000000",
"6.32000000",
"6.36000000",
"289049.78000000",
1656064799999,
"1843763.98200000",
3894,
"118557.64000000",
"756429.84070000",
"0"
],
[
1656064800000,
"6.36000000",
"6.37000000",
"6.29000000",
"6.37000000",
"285129.01000000",
1656068399999,
"1807541.57600000",
3334,
"103341.13000000",
"655180.37320000",
"0"
],
[
1656068400000,
"6.37000000",
"6.48000000",
"6.35000000",
"6.41000000",
"518232.95000000",
1656071999999,
"3324943.41850000",
5783,
"238676.31000000",
"1531735.31810000",
"0"
],
[
1656072000000,
"6.41000000",
"6.50000000",
"6.36000000",
"6.41000000",
"433692.04000000",
1656075599999,
"2792577.58310000",
4879,
"208006.61000000",
"1338394.97480000",
"0"
],
[
1656075600000,
"6.41000000",
"6.46000000",
"6.38000000",
"6.38000000",
"331641.55000000",
1656079199999,
"2129404.72680000",
3572,
"129553.56000000",
"832084.11410000",
"0"
],
[
1656079200000,
"6.39000000",
"6.46000000",
"6.31000000",
"6.33000000",
"367138.99000000",
1656082799999,
"2345811.81770000",
4138,
"155818.18000000",
"996639.28720000",
"0"
],
[
1656082800000,
"6.33000000",
"6.34000000",
"6.25000000",
"6.32000000",
"277765.44000000",
1656086399999,
"1748712.60040000",
3229,
"105937.04000000",
"667653.50140000",
"0"
],
[
1656086400000,
"6.31000000",
"6.34000000",
"6.21000000",
"6.33000000",
"292571.62000000",
1656089999999,
"1838415.38530000",
3322,
"125106.86000000",
"786627.83740000",
"0"
],
[
1656090000000,
"6.33000000",
"6.39000000",
"6.30000000",
"6.32000000",
"256547.72000000",
1656093599999,
"1629535.25120000",
3111,
"142450.25000000",
"905145.04640000",
"0"
],
[
1656093600000,
"6.31000000",
"6.37000000",
"6.29000000",
"6.36000000",
"145670.56000000",
1656097199999,
"922043.36350000",
1874,
"64248.58000000",
"406818.31590000",
"0"
],
[
1656097200000,
"6.36000000",
"6.43000000",
"6.33000000",
"6.41000000",
"166864.05000000",
1656100799999,
"1065420.11920000",
2283,
"91270.24000000",
"582815.21460000",
"0"
],
[
1656100800000,
"6.42000000",
"6.47000000",
"6.39000000",
"6.41000000",
"263666.61000000",
1656104399999,
"1694938.34150000",
2981,
"134116.12000000",
"862431.12740000",
"0"
],
[
1656104400000,
"6.41000000",
"6.58000000",
"6.39000000",
"6.49000000",
"333943.30000000",
1656107999999,
"2173180.47910000",
3467,
"173197.33000000",
"1127178.67370000",
"0"
],
[
1656108000000,
"6.48000000",
"6.59000000",
"6.47000000",
"6.48000000",
"275831.12000000",
1656111599999,
"1799221.74850000",
3021,
"135880.84000000",
"886416.11690000",
"0"
],
[
1656111600000,
"6.48000000",
"6.58000000",
"6.45000000",
"6.58000000",
"212810.34000000",
1656115199999,
"1384445.00940000",
2780,
"100044.84000000",
"651205.64950000",
"0"
],
[
1656115200000,
"6.58000000",
"6.80000000",
"6.39000000",
"6.41000000",
"1132685.69000000",
1656118799999,
"7446281.09020000",
13348,
"550911.19000000",
"3625786.19610000",
"0"
],
[
1656118800000,
"6.41000000",
"6.54000000",
"6.37000000",
"6.49000000",
"222382.87000000",
1656122399999,
"1436781.15290000",
3073,
"115733.77000000",
"747659.36930000",
"0"
],
[
1656122400000,
"6.49000000",
"6.49000000",
"6.37000000",
"6.41000000",
"175230.64000000",
1656125999999,
"1123960.98650000",
2096,
"82402.98000000",
"529043.58300000",
"0"
],
[
1656126000000,
"6.40000000",
"6.49000000",
"6.40000000",
"6.46000000",
"82505.41000000",
1656129599999,
"532169.91250000",
1568,
"42924.41000000",
"276746.49050000",
"0"
],
[
1656129600000,
"6.46000000",
"6.49000000",
"6.41000000",
"6.45000000",
"94275.69000000",
1656133199999,
"608332.20580000",
1543,
"45898.91000000",
"296161.88110000",
"0"
],
[
1656133200000,
"6.46000000",
"6.69000000",
"6.43000000",
"6.54000000",
"471454.85000000",
1656136799999,
"3099237.66700000",
6029,
"248054.66000000",
"1630171.24030000",
"0"
],
[
1656136800000,
"6.54000000",
"6.55000000",
"6.46000000",
"6.51000000",
"225240.12000000",
1656140399999,
"1464238.69720000",
3053,
"100045.45000000",
"650888.68290000",
"0"
],
[
1656140400000,
"6.51000000",
"6.61000000",
"6.51000000",
"6.52000000",
"233901.49000000",
1656143999999,
"1537312.84570000",
2919,
"119864.29000000",
"787784.96020000",
"0"
]
]

If jq -r '.[] | .[:5] | #tsv' produced your original output, try changing it to
jq -r '
.[] | .[:5]
| .[0] |= (./1000 | strflocaltime("%Y/%m/%d %H:%M:%S"))
| .[2,3] |= tonumber
| .[5] = (.[2] + .[3]) / 2
| #tsv
'
This updates column 0 by dividing it by 1000 to turn milliseconds into seconds, then applying strflocaltime with your desired format.
Then it updates columns 2 and 3 to turn them into numbers, as then they are used to calculate column 5 by adding them up and dividing the sum by 2.
Note: I supposed your original filter as having .[], .[0:5] and #tsv instead of only .[0:4] and #csv, as your input is wrapped in another array, and the output shown had five columns, not four, and was separated by tabs, not commas.

Related

How to use `select` within a jq --stream command?

I have a very large json document (~100 GB) that I am trying to use jq to parse out specific objects that meet a given criteria. Because it is so large, I won't be able to read it into memory, and will need to utilize the --stream option.
I understand how to run a select to extract what I need when I'm not streaming, but could use some assistance in figuring out how to configure my command correctly.
Here's a sample of my document named example.json.
{
"reporting_entity_name" : "INSURANCE COMPANY",
"reporting_entity_type" : "INSURER",
"last_updated_on" : "2022-12-01",
"version" : "1.0.0",
"in_network" : [ {
"negotiation_arrangement" : "ffs",
"name" : "ER VISIT",
"billing_code_type" : "CPT",
"billing_code_type_version" : "2022",
"billing_code" : "99285",
"description" : "HIGHEST LEVEL ER VISIT",
"negotiated_rates" : [ {
"provider_groups" : [ {
"npi" : [ 111111111, 222222222],
"tin" : {
"type" : "ein",
"value" : "99-9999999"
}
} ],
"negotiated_prices" : [ {
"negotiated_type" : "negotiated",
"negotiated_rate" : 550.50,
"expiration_date" : "9999-12-31",
"service_code" : [ "23" ],
"billing_class" : "institutional"
} ]
} ]
}
]
}
I am trying to grab the in_network object where billing_code is equal to 99285.
If I was able to do this without streaming, here's how I would approach it:
jq '.in_network[] | select(.billing_code == "99285")' example.json
Expected output:
{
"negotiation_arrangement": "ffs",
"name": "ER VISIT",
"billing_code_type": "CPT",
"billing_code_type_version": "2022",
"billing_code": "99285",
"description": "HIGHEST LEVEL ER VISIT",
"negotiated_rates": [
{
"provider_groups": [
{
"npi": [
111111111,
222222222
],
"tin": {
"type": "ein",
"value": "99-9999999"
}
}
],
"negotiated_prices": [
{
"negotiated_type": "negotiated",
"negotiated_rate": 550.5,
"expiration_date": "9999-12-31",
"service_code": [
"23"
],
"billing_class": "institutional"
}
]
}
]
}
Any help on how I could configure this with the --stream option would be greatly appreciated!
If the objects from the .in_network array alone do fit into your memory, truncate at the array items (two levels deep):
jq --stream -n '
fromstream(2|truncate_stream(inputs | select(.[0][0] == "in_network")))
| select(.billing_code == "99285")
' example.json
{
"negotiation_arrangement": "ffs",
"name": "ER VISIT",
"billing_code_type": "CPT",
"billing_code_type_version": "2022",
"billing_code": "99285",
"description": "HIGHEST LEVEL ER VISIT",
"negotiated_rates": [
{
"provider_groups": [
{
"npi": [
111111111,
222222222
],
"tin": {
"type": "ein",
"value": "99-9999999"
}
}
],
"negotiated_prices": [
{
"negotiated_type": "negotiated",
"negotiated_rate": 550.5,
"expiration_date": "9999-12-31",
"service_code": [
"23"
],
"billing_class": "institutional"
}
]
}
]
}
You will find jq —-stream excruciatingly slow even for 10GB. Since jq is intended to complement other shell tools, I would recommend using jstream (https://github.com/bcicen/jstream), or my own jm or jm.py (https://github.com/pkoppstein/jm), to ”splat” the array, and pipe the result to jq.
E.g. to achieve the same effect as your jq filter:
jm —-pointer /in_network example.json |
jq 'select(.billing_code == "99285")'

parsing jq returns null

I have a json output
{
"7": [
{
"devices": [
"/dev/sde"
],
"name": "osd-block-dcc9b386-529c-451e-9d84-8ccc4091102b",
"tags": {
"ceph.crush_device_class": "None",
"ceph.db_device": "/dev/nvme0n1p5",
"ceph.wal_device": "/dev/nvme0n1p6",
},
"type": "block",
"vg_name": "ceph-c4de9e90-853e-4569-b04f-8677ef9a8c7a"
},
{
"path": "/dev/nvme0n1p5",
"tags": {
"PARTUUID": "69712eb4-be52-4618-ba46-e317d6d3d76e"
},
"type": "db"
}
],
"41": [
{
"devices": [
"/dev/nvme1n1p13"
],
"name": "osd-block-97bce07f-ae98-4fdb-83a9-9fa2f35cee60",
"tags": {
"ceph.crush_device_class": "None",
},
"type": "block",
"vg_name": "ceph-c1d48671-2a33-4615-95e3-cc1b18783f0c"
}
],
"9": [
{
"devices": [
"/dev/sdf"
],
"name": "osd-block-35323eb8-17c1-460d-8cc5-565f549e6991",
"tags": {
"ceph.crush_device_class": "None",
"ceph.db_device": "/dev/nvme0n1p7",
"ceph.wal_device": "/dev/nvme0n1p8",
},
"type": "block",
"vg_name": "ceph-9488e8b8-ec18-4860-93d3-6a1ad91c698c"
},
{
"path": "/dev/nvme0n1p7",
"tags": {
"PARTUUID": "ef0e9588-2a20-4c2c-8b62-d73945e01322"
},
"type": "db"
}
]
}
Required output:
osd.7 /dev/sde /dev/nvme0n1p5 /dev/nvme0n1p6
osd.41 /dev/nvme1n1p13 n/a n/a
osd.9 /dev/sdf /dev/nvme0n1p7 /dev/nvme0n1p7
Problems:
When I try parsing using jq .[][].devices, I get null values:
$ cat json | jq .[][].devices
[
"/dev/sde"
]
null
[
"/dev/nvme1n1p13"
]
null
[
"/dev/sdf"
]
null
I can solve it via jq .[][].devices[]?.
However, this trick doesn't help me when I do want to see where there's no value (to print n/a instead):
$ cat json | jq '.[][].tags | ."ceph.db_device"'
"/dev/nvme0n1p5"
null
"/dev/nvme0n1p3"
null
null
"/dev/nvme0n1p7"
null
And finally, I try to create a table:
$ cat json | jq -r '["osd."+keys[]], [.[][].devices[]?], [.[][].tags."ceph.db_device" // ""] | #csv' | column -t -s,
"osd.7" "osd.41" "osd.9"
"/dev/sde" "/dev/nvme0n1p13" "/dev/sdf"
"/dev/nvme0n1p5" "/dev/nvme0n1p7"
So the obvious problem is that the 3rd row doesn't match the correct values.
And the final problem is how do I transpose it from columns to rows, as detailed in the required output?
Would this do what you want?
jq --raw-output '
to_entries[] | [
"osd." + .key,
( .value[0]
| .devices[],
( .tags
| ."ceph.db_device" // "n/a",
."ceph.wal_device" // "n/a"
)
)
]
| #tsv
'
osd.7 /dev/sde /dev/nvme0n1p5 /dev/nvme0n1p6
osd.41 /dev/nvme1n1p13 n/a n/a
osd.9 /dev/sdf /dev/nvme0n1p7 /dev/nvme0n1p8
Demo

need to convert a mathematical string to its calculated value

i have query like
SELECT equation FROM eqs;
and results are
[equation]
[ 1x1 ]
[ 2x2 ]
[ 3x3 ]
[ 4x4 ]
[ 5x5 ]
[ 6x6 ]
how to make it instead calculate the string as an equation and give me results of it in form of
[equation]
[ 1 ]
[ 4 ]
[ 9 ]
[ 16 ]
[ 25 ]
[ 36 ]

Convert json fetched into dataframe using R

I've json like below, which i got from below URL:
{
"info" : {
"1484121600" : [
212953175.053333,212953175.053333,null
],
"1484125200" : [
236203014.133333,236203014.133333,236203014.133333
],
"1484128800" : [
211414832.968889,null,211414832.968889
],
"1484132400" : [
208604573.791111,208604573.791111,208604573.791111
],
"1484136000" : [
231358374.288889,231358374.288889,231358374.288889
],
"1484139600" : [
210529301.097778,210529301.097778,210529301.097778
],
"1484143200" : [
212009682.04,null,212009682.04
],
"1484146800" : [
232364759.566667,232364759.566667,232364759.566667
],
"1484150400" : [
218138788.524444,218138788.524444,218138788.524444
],
"1484154000" : [
218883301.282222,218883301.282222,null
],
"1484157600" : [
237874583.771111,237874583.771111,237874583.771111
],
"1484161200" : [
216227081.924444,null,216227081.924444
],
"1484164800" : [
227102054.082222,227102054.082222,null
]
},
"summary" : "data",
"end" : 1484164800,
"start": 1484121600
}
I'm fetching this json from some url using jsonlite package in R like below:
library(jsonlite)
input_data <- fromJSON(url)
timeseries <- input_data[['info']] # till here code is fine
abc <- data.frame(ds = names(timeseries[[1]]),
y = unlist(timeseries[[1]]), stringsAsFactors = FALSE)
(something is wrong in above line)
I need to convert this data in timeseries variable into data frame; which will have index column as the epoch time and no. of columns in dataframe will depend upon no. of values in array and all arrays will have same no. of values for sure. But no. of values in array can be 1 0r 2 or etc; it is not fixed. Like in below example array size is 3 for all.
for eg : dataframe should look like:
index y1 y2 y3
1484121600 212953175.053333 212953175.053333 null
1484125200 236203014.133333 236203014.133333 236203014.133333
Please suggest how do I do this in R. I'm new to it.
JSON with only 1 item in array:
{
"info": {
"1484121600": [
212953175.053333
],
"1484125200": [
236203014.133333
],
"1484128800": [
211414832.968889
],
"1484132400": [
208604573.791111
],
"1484136000": [
231358374.288889
],
"1484139600": [
210529301.097778
],
"1484143200": [
212009682.04
],
"1484146800": [
232364759.566667
],
"1484150400": [
218138788.524444
],
"1484154000": [
218883301.282222
],
"1484157600": [
237874583.771111
],
"1484161200": [
216227081.924444
],
"1484164800": [
227102054.082222
]
},
"summary": "data",
"end": 1484164800,
"start": 1484121600
}
Consider binding the list of json values to a matrix with sapply(), then transpose columns to rows with t(), and finally convert to dataframe with data.frame()
abc <- data.frame(t(sapply(timeseries, c)))
colnames(abc) <- gsub("X", "y", colnames(abc))
abc
# y1 y2 y3
# 1484121600 212953175 212953175 NA
# 1484125200 236203014 236203014 236203014
# 1484128800 211414833 NA 211414833
# 1484132400 208604574 208604574 208604574
# 1484136000 231358374 231358374 231358374
# 1484139600 210529301 210529301 210529301
# 1484143200 212009682 NA 212009682
# 1484146800 232364760 232364760 232364760
# 1484150400 218138789 218138789 218138789
# 1484154000 218883301 218883301 NA
# 1484157600 237874584 237874584 237874584
# 1484161200 216227082 NA 216227082
# 1484164800 227102054 227102054 NA

Cypher query JSON formatted result

On the Actor/Movie demo graph, cypher returns column names in a separate array.
MATCH (n:Person) RETURN n.name as Name, n.born as Born ORDER BY n.born LIMIT 5
results:
{ "columns" : [ "Name", "Born" ], "data" : [ [ "Max von Sydow", 1929 ], [ "Gene Hackman", 1930 ], [ "Richard Harris", 1930 ], [ "Clint Eastwood", 1930 ], [ "Mike Nichols", 1931 ] ]}
Is it possible to get each node properties tagged instead?
{ "nodes" : [ ["Name": "Max von Sydow", "Born": 1929 ], ...] }
If I return the node instead of selected properties, I get way too many properties.
MATCH (n:Person) RETURN n LIMIT 5
results:
{ "columns" : [ "n" ], "data" : [ [ { "outgoing_relationships" : "http://localhost:7474/db/data/node/58/relationships/out", "labels" : "http://localhost:7474/db/data/node/58/labels", "data" : { "born" : 1929, "name" : "Max von Sydow" }, "all_typed_relationships" : "http://localhost:7474/db/data/node/58/relationships/all/{-list|&|types}", "traverse" : "http://localhost:7474/db/data/node/58/traverse/{returnType}", "self" : "http://localhost:7474/db/data/node/58", "property" : "http://localhost:7474/db/data/node/58/properties/{key}", "outgoing_typed_relationships" : "http://localhost:7474/db/data/node/58/relationships/out/{-list|&|types}", "properties" : "http://localhost:7474/db/data/node/58/properties", "incoming_relationships" : "http://localhost:7474/db/data/node/58/relationships/in", "extensions" : { }, "create_relationship" : "http://localhost:7474/db/data/node/58/relationships", "paged_traverse" : "http://localhost:7474/db/data/node/58/paged/traverse/{returnType}{?pageSize,leaseTime}", "all_relationships" : "http://localhost:7474/db/data/node/58/relationships/all", "incoming_typed_relationships" : "http://localhost:7474/db/data/node/58/relationships/in/{-list|&|types}" } ], ... ]}
You can use the new literal map syntax in Neo4j 2.0 and do something like:
MATCH (n:Person)
RETURN { Name: n.name , Born: n.born } as Person
ORDER BY n.born
LIMIT 5