Looking for a little assistance working with Powershell and JSON - json

I have next to zero programming experience but I've done a tiny bit of work with with Powershell so I've picked this for a little project I'm working on. I'm trying to pull some information from JSON and so far have been able to figure it out as I go but the result is formatted strangely which is throwing me for a loop. Here's a snippet of the JSON result:
"data": {
"reviews": [
{
"available_at": "2021-06-18T12:00:00.000000Z",
"subject_ids": [
7572,
3428,
759,
732,
712,
718,
731
]
},
{
"available_at": "2021-06-18T22:00:00.000000Z",
"subject_ids": [
730,
710,
854,
1029,
2938,
736,
734
]
},
{
"available_at": "2021-06-19T03:00:00.000000Z",
"subject_ids": [
3493,
3086,
3091,
2847
]
}
]
}
}
What I want to do is get a count of each set of data.reviews.subject_ids. By using ($summary.data.reviews.subject_ids | Measure-Object).count I get a result of 18, which is to be expected, since there are 18 subject_ids. I only want the count of the first group of 7 though and I can't figure out how to get this from the first set but not the second, third etc. How can I do this?

$summary.data.reviews is an array of 3 children (i.e. the available_at and subject_ids blocks).
At the moment you're counting the number of items in the subject_ids property for all 3 children.
If you want to just select the first one you can use:
$summary.data.reviews[0]
Note that arrays are zero-indexed in PowerShell so the first index is [0], the second is [1] and the Nth is [N-1].
Your code then becomes:
($summary.data.reviews[0].subject_ids | Measure-Object).Count
#7

Today, you're using this syntax $summary.data.reviews.subject_ids, which says give me all of the subject_ids of all reviews inside of summary.data.
The item you want though is the first member of the array of reviews. We can get that by using indexing notation like this,$summary.data.reviews[0], which says "give me the first item in reviews".
PowerShell starts counting from 0, like a lot of programming languages, so that's why we say 0 to mean the first one in the list.
You can then dereference the specific fields you want, just like before, with this syntax $summary.data.reviews[0].subject_ids.

Use the position of the array you want to count.
From about Arrays:
You can refer to the elements in an array by using an index, beginning at position 0. Enclose the index number in brackets.
PS /> $json.data.reviews
available_at subject_ids
------------ -----------
2021-06-18T12:00:00.000000Z {7572, 3428, 759, 732...}
2021-06-18T22:00:00.000000Z {730, 710, 854, 1029...}
2021-06-19T03:00:00.000000Z {3493, 3086, 3091, 2847}
PS /> $json.data.reviews.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
PS /> $json.data.reviews
available_at subject_ids
------------ -----------
2021-06-18T12:00:00.000000Z {7572, 3428, 759, 732...}
2021-06-18T22:00:00.000000Z {730, 710, 854, 1029...}
2021-06-19T03:00:00.000000Z {3493, 3086, 3091, 2847}
PS /> $json.data.reviews[0]
available_at subject_ids
------------ -----------
2021-06-18T12:00:00.000000Z {7572, 3428, 759, 732...}
PS /> $json.data.reviews[0].subject_ids.Count
7
If you want to iterate over all elements and get their count you can do like this:
$i=1;foreach($object in $json.data.reviews)
{
"- Group $i has {0}" -f $object.subject_ids.count
$i++
}
Which yields:
- Group 1 has 7
- Group 2 has 7
- Group 3 has 4

Related

Dask how to open json with list of dicts

I'm trying to open a bunch of JSON files using read_json In order to get a Dataframe as follow
ddf.compute()
id owner pet_id
0 1 "Charlie" "pet_1"
1 2 "Charlie" "pet_2"
3 4 "Buddy" "pet_3"
but I'm getting the following error
_meta = pd.DataFrame(
columns=list(["id", "owner", "pet_id"]])
).astype({
"id":int,
"owner":"object",
"pet_id": "object"
})
ddf = dd.read_json(f"mypets/*.json", meta=_meta)
ddf.compute()
*** ValueError: Metadata mismatch found in `from_delayed`.
My JSON files looks like
[
{
"id": 1,
"owner": "Charlie",
"pet_id": "pet_1"
},
{
"id": 2,
"owner": "Charlie",
"pet_id": "pet_2"
}
]
As far I understand the problem is that I'm passing a list of dicts, so I'm looking for the right way to specify it the meta= argument
PD:
I also tried doing it in the following way
{
"id": [1, 2],
"owner": ["Charlie", "Charlie"],
"pet_id": ["pet_1", "pet_2"]
}
But Dask is wrongly interpreting the data
ddf.compute()
id owner pet_id
0 [1, 2] ["Charlie", "Charlie"] ["pet_1", "pet_2"]
1 [4] ["Buddy"] ["pet_3"]
The invocation you want is the following:
dd.read_json("data.json", meta=meta,
blocksize=None, orient="records",
lines=False)
which can be largely gleaned from the docstring.
meta looks OK from your code
blocksize must be None, since you have a whole JSON object per file and cannot split the file
orient "records" means list of objects
lines=False means this is not a line-delimited JSON file, which is the more common case for Dask (you are not assuming that a newline character means a new record)
So why the error? Probably Dask split your file on some newline character, and so a partial record got parsed, which therefore did not match your given meta.

How can I use jq to create a CSV with multiple headers and detail lines?

I'd like to use jq to output in CSV format, but for multiple headers, followed by multiple details. The solutions that I've already seen on Stack Overflow provide a way to insert a single header, but I haven't found anything for multiple headers.
To give you an idea of what I'm talking about, here is some sample JSON input:
[
{
"HDR": [1, "abc"],
"DTL": [ [101,"Descr A"], [102,"Descr B"] ]
}, {
"HDR": [2, "def"],
"DTL": [ [103,"Descr C"], [104,"Descr D"] ]
}
]
Desired output:
HDR|1|abc
DTL|101|Descr A
DTL|102|Descr B
HDR|2|def
DTL|103|Descr C
DTL|104|Descr D
I don't know if it's possible, but my approach so far has been to try to create a filter to give me the following, since transforming this to what I need would be trivial:
["HDR", 1, "abc"]
["DTL", 101, "Descr A"]
["DTL", 102, "Descr B"]
["HDR", 2, "def"]
["DTL", 103, "Descr C"]
["DTL", 104, "Descr D"]
To be clear, I know how to do this in any number of scripting languages, but I'm really trying to stick with a single jq filter, if it's at all possible.
Edit: I should clarify that I don't necessarily need to copy the "HDR" and "DTL" keys into the CSV (I can hard-code those), so the sample JSON could look like this, if it makes the problem easier.
[
[
[1, "abc"],
[[101,"Descr A"], [102,"Descr B"]]
], [
[2, "def"],
[[103,"Descr C"], [104,"Descr D"]]
]
]
Edit: This filter technically answers the question with the second sample data I provided (the last one, that's only arrays and no objects), but I would still appreciate a better answer, if for no other reasons than the header length has to be hard-coded, and putting the HDR into two sets of arrays so that it can be flatten()'d later feels wrong. But I'll leave it here for reference.
.[] | flatten(1) | [[["HDR"] + .[0:2]]] as $hdr | .[2:] as $dtl | $dtl | map([["DTL"] + .]) as $dtl | $hdr + $dtl | flatten(1) | .[] | join("|")
This works for your original input, assuming you chose | as the delimiter because none of your fields can contain |.
jq -r 'map(["HDR"]+.HDR, ["DTL"] + .DTL[])[] | join("|")' data.json
map produces multiple array elements per object.
.DTL[] ensures "DTL" is prefixed to each sublist
[] flattens the result of the map

Merge JSON files with identical structure into JSON file containing lists

I have some JSON files, all with identical structure (same keys everywhere, corresponding values might differ for some keys). I would like to collect the values associated with certain keys into lists and store those lists as the values associated with those keys in a new JSON file.
As example, consider these three files, where I'm interested in the key number_items and the corresponding values. First file —
[
{
"box_id": 1,
"number_items": 4
},
{
"box_id": 3,
"number_items": 15
},
{
"box_id": 6,
"number_items": 2
}
]
Second file —
[
{
"box_id": 1,
"number_items": 7
},
{
"box_id": 3,
"number_items": 15
},
{
"box_id": 6,
"number_items": 4
}
]
Third file —
[
{
"box_id": 1,
"number_items": 5
},
{
"box_id": 3,
"number_items": 9
},
{
"box_id": 6,
"number_items": 0
}
]
These should be merged into something that looks like this —
[
{
"box_id": 1,
"number_items": [
4,
7,
5
]
},
{
"box_id": 3,
"number_items": [
15,
15,
9
]
},
{
"box_id": 6,
"number_items": [
2,
4,
0
]
}
]
Can this be done using jq? If not, what would be a good way to do this? Note that the actual scenario consists of 150+ files with 3 keys whose values I would like to merge into lists.
You can merge files with similar structures by simply passing them all in as input. Their contents will be streamed in in the order they are in.
Then you could just read them in to a single array, group the objects by the box_id then map out the results.
$ jq -n '
[inputs[]] | group_by(.box_id)
| map({box_id:.[0].box_id, number_items:map(.number_items)})
' input{1,2,3}.json
produces:
[
{
"box_id": 1,
"number_items": [
4,
7,
5
]
},
{
"box_id": 3,
"number_items": [
15,
15,
9
]
},
{
"box_id": 6,
"number_items": [
4,
2,
0
]
}
]
It seems the order isn't preserved when items are grouped on some platforms. In my case, running on the Windows 64-bit version produces this. So be aware of that if you want to use group_by. There are of course other approaches you could take if you want to avoid using this filter, but this is much more convenient to use.
I would like to collect the values associated with certain keys
Here is a solution which treats all keys, except for the grouping key, in the same way. It also handles missing keys gracefully and does NOT depend on the stability of jq's sort. The solution is based on a generic filter, merge/0, defined as follows:
# Combine an array of objects into a single object, ans, with array-valued keys,
# such that for every key, k, in the i-th object of the input array, a,
# ans[k][i] = a[i][k]
# null is used as padding if a value is missing.
# Example:
# [{a:1, b:2}, {b:3, c:4}] | merge
# produces:
# {"a":[1,null],"b":[2,3],"c":[null,4]}
def merge:
def allkeys: map(keys) | add | unique;
allkeys as $allkeys
| reduce .[] as $in ({};
reduce $allkeys[] as $k (.;
. + {($k): (.[$k] + [$in[$k]]) } ));
The solution to the given problem can then be formulated as:
transpose | map(merge) | map( .box_id |= .[0] )
Invocation:
jq -s -f merge.jq input{1,2,3}.json
Output: as shown in the question.
More robust solution
The above solution assumes uniformity of the ordering by box_id within each file. This assumption seems warranted by the OP requirements, but for safety and robustness, the objects would first be sorted:
map(sort_by(.box_id)) | transpose | map( merge | (.box_id |= .[0]) )
Note that this still assumes that there are no missing values of box_id in any of the input files.
Still more robust solution
If there is a possibility that some of the box_id values might be missing in any of the input files, then it would be appropriate to add the missing values. This can be done with the help of the following filter:
# Input: a matrix of objects (that is, an array of rows of objects),
# each of which is assumed to have a distinguished field, f,
# with distinct values on each row;
# Output: a rectangular matrix such that every row, r, of the output
# matrix includes the elements of the corresponding row of the input
# matrix, with additional elements as necessary so that (r |
# map(.id) | sort) is the same for all rows r.
#
def rectanglize(f):
def ids: [.[][] | f] | unique;
def it: . as $in | {} | (f = $in);
ids as $ids
| map( . + ( $ids - [.[]|f] | map(it) ) )
;
Putting everything together, the main pipeline becomes:
rectanglize(.id)
| map(sort_by(.box_id))
| transpose
| map( merge | .box_id |= .[0] )
Depending on where you are trying to save this new file (local vs server), there are several different approaches. As far as I know, there is no possible way to save a file locally without using one of the available plugins (How to write data to a JSON file using Javascript). If you want to save it to a server, this is impossible with JavaScript, and best be done with a background language.
Here is a way to combine the content of several JSON files into your desired format.
// send json files you want combined, and a new file path and name (path/to/filename.json)
function combineJsonFiles(files, newFileName) {
var combinedJson = [];
// iterate through each file
$.each(files, function(key, fileName) {
// load json file
// wait to combine until loaded. without this 'when().done()', boxes would return 'undefined'
$.when(loadJsonFile(fileName)).done(function(boxes) {
// combine json from file with combinedJson array
combinedJson = combineJson(boxes, combinedJson);
// check if this is the last file
if (key == files.length-1) {
// puts into json format
combinedJson = JSON.stringify(combinedJson);
// your json is now ready to be saved to a file
}
});
});
}
function loadJsonFile(fileName) {
return $.getJSON(fileName);
}
function combineJson(boxes, combinedJson) {
// iterate through each box
$.each(boxes, function(key, box) {
// use grep to search if this box's id is already included
var matches = $.grep(combinedJson, function(e) { return e.box_id == box.box_id; });
// if there are no matches, add box to the combined file
if (matches.length == 0) {
var newBox = { box_id: box.box_id };
// iterate through properties of box
for (var property in box) {
// check to ensure that properties are not inherited from base class
if (box.hasOwnProperty(property)) {
// will ignore if property is box_id
if (property !== 'box_id') {
// box is reformatted to make the property type into array
newBox[property] = [box[property]];
}
}
}
combinedJson.push(newBox);
} else {
// select first match (there should never be more than one)
var match = matches[0];
// iterate through properties of box
for (var property in box) {
// check to ensure that properties are not inherited from base class
if (box.hasOwnProperty(property)) {
// will ignore if property is box_id
if (property !== 'box_id') {
// add property to the already existing box in the combined file
match[property].push(box[property]);
}
}
}
}
});
return combinedJson;
}
var jsonFiles = ['path/to/data.json', 'path/to/data2.json', 'path/to/data3.json'];
combineJsonFiles(jsonFiles, 'combined_json.json');
The JSON output of this will look like:
[{"box_id":1,"number_items":[4,7,5]},{"box_id":3,"number_items":[15,15,9]},{"box_id":6,"number_items":[2,4,0]}]
Hope this helps!

Dataframe in R to be converted to sequence of JSON objects

I had asked the same question after editing 2 times of a previous question I had posted. I am sorry for the bad usage of this website. I have flagged it for deletion and I am posting a proper new question on the same here. Please look into this.
I am basically working on a recommender system code. The output has to be converted to sequence of JSON objects. I have a matrix that has a look up table for every item ID, with the list of the closest items it is related to and the the similarity scores associated with their combinations.
Let me explain through a example.
Suppose I have a matrix
In the below example, Item 1 is similar to Items 22 and 23 with similarity scores 0.8 and 0.5 respectively. And the remaining rows follow the same structure.
X1 X2 X3 X4 X5
1 22 23 0.8 0.5
34 4 87 0.4 0.4
23 7 92 0.6 0.5
I want a JSON structure for every item (every X1 for every row) along with the recommended items and the similarity scores for each combination as a separate JSON entity and this being done in sequence. I don't want an entire JSON object containing these individual ones.
Assume there is one more entity called "coid" that will be given as input to the code. I assume it is XYZ and it is same for all the rows.
{ "_id" : { "coid" : "XYZ", "iid" : "1"}, "items" : [ { "item" : "22", "score" : 0.8},{ "item": "23", "score" : 0.5}] }
{ "_id" : { "coid" : "XYZ", "iid" : "34"},"items" : [ { "item" : "4", "score" : 0.4},{ "item": "87", "score" : 0.4}] }
{ "_id" : { "coid" : "XYZ", "iid" : "23"},"items" : [ { "item" : "7", "score" : 0.6},{ "item": "92", "score" : 0.5}] }
As in the above, each entity is a valid JSON structure/object but they are not put together into a separate JSON object as a whole.
I appreciate all the help done for the previous question but somehow I feel this new alteration I have here is not related to them because in the end, if you do a toJSON(some entity), then it converts the entire thing to one JSON object. I don't want that.
I want individual ones like these to be written to a file.
I am very sorry for my ignorance and inconvenience. Please help.
Thanks.
library(rjson)
## Your matrix
mat <- matrix(c(1,34,23,
22, 4, 7,
23,87,92,
0.8, 0.4, 0.6,
0.5, 0.4, 0.5), byrow=FALSE, nrow=3)
I use a function (not very interesting name makejson) that takes a row of the matrix and returns a JSON object. It makes two list objects, _id and items, and combines them to a JSON object
makejson <- function(x, coid="ABC") {
`_id` <- list(coid = coid, iid=x[1])
nitem <- (length(x) - 1) / 2 # Number of items
items <- list()
for(i in seq(1, nitem)) {
items[[i]] <- list(item = x[i + 1], score = x[i + 1 + nitem])
}
toJSON(list(`_id`=`_id`, items=items))
}
Then using apply (or a for loop) I use the function for each row of the matrix.
res <- apply(mat, 1, makejson, coid="XYZ")
cat(res, sep = "\n")
## {"_id":{"coid":"XYZ","iid":1},"items":[{"item":22,"score":0.8},{"item":23,"score":0.5}]}
## {"_id":{"coid":"XYZ","iid":34},"items":[{"item":4,"score":0.4},{"item":87,"score":0.4}]}
## {"_id":{"coid":"XYZ","iid":23},"items":[{"item":7,"score":0.6},{"item":92,"score":0.5}]}
The result can be saved to a file with cat by specifying the file argument.
## cat(res, sep="\n", file="out.json")
There is a small difference in your output and mine, the numbers are in quotes ("). If you want to have it like that, mat has to be character.
## mat <- matrix(as.character(c(1,34,23, ...
Hope it helps,
alex

How to model boolean expressions in JSON tree structure

I've spent a few hours on google and stack overflow, but I'm yet to come to a conclusion on just how to model nested boolean data.
Let's say I have the following expression:
123 and 321 and (18 or 19 and (20 or 21))
How could I model this in a JSON tree structure so that I could rebuild the expression as you see it above by simply traversing the tree? I don't need to actually evaluate the logic, but simply structure it in such a way that it portrays the logic in tree-form.
Thanks in advance.
For the record, this is the type of system I'm trying to accomplish and how I'm guessing the tree should be structured based on the answer below.
ANY OF THESE:
13
14
ALL OF THESE:
18
19
20
or
/ \
or 13
/ \
14 and
/ \
and 18
/ \
20 19
My ConditionSet in json format :
"FilterCondition": {
"LogicalOperator": "AND",
"Conditions": [
{
"Field": "age",
"Operator": ">",
"Value": "8"
},
{
"LogicalOperator": "OR",
"Conditions": [
{
"Field": "gender",
"Operator": "=",
"Value": "female"
},
{
"Field": "occupation",
"Operator": "IN",
"Value": ["business","service"]
}
]
}
]
}
Reference : https://zebzhao.github.io/Angular-QueryBuilder/demo/
Think about which order the programming language would evaluate the parts of your statement in. Depending on the precedence of and and or and their left or right associativity, it will have to pick some part that is the 'deepest' and it must be evaluated first, then it is given to its 'parent' (the closest less associative operator) as one of its fully evaluated operands, then when that is evaluated it has a parent and so on.
So, you would have a tree where the root is reached after full evaluation, and leaf nodes are the parts of the expression that can be evaluated first (don't rely on any evaluations to come to their result).
As a simple example,1 and (2 OR 3) would be modelled as
and
/ \
1 or
/ \
2 3
If operators at the same precedence are evaluated left to right and AND is higher precedence than OR (for example true in C++: http://en.cppreference.com/w/cpp/language/operator_precedence ) then
123 and 321 and (18 or 19 and (20 or 21))
becomes
and
/ \
and \
/ \ \
123 321 \
\
or
/ \
18 and
/ \
or 19
/ \
20 21
And to evaluate the result of this tree, you would evaluate deepest first, replacing each node with the result of evaluating its left and its right with its current operator until there is only one number left in the root.
To go from a boolean expression to a boolean expression tree programatically you need to write a parser*, for example in Python you would write it using PLY http://www.dabeaz.com/ply/ and each language has a different third party parser construction library that is the most popular.