Can I validate that nodes exist that edges in a graph point to with JSON SCHEMA? - json

I want to describe a network graph of vertices and edges with JSON Schema.
An example JSON could look like this:
{
"V": [
"1",
"2",
"3"
],
"E": [
{
"v1": "1",
"v2": "2"
},
{
"v1": "2",
"v2": "3"
}
]
}
I have a set of 3 vertices and 2 edges to connect them. I want all vertices to have an arbitrary string identifier, so it could also be "node1" or "panda". However, is there a way to validate that the endpoints of my edges only point to existing vertices?
I.e.: Should NOT pass:
{
"V": [
"n1",
"n2",
"n3"
],
"E": [
{
"v1": "n1",
"v2": "IdThatDoesNotExistAbove"
}
]
}
I looked at ENUMs, however, I struggle to have them point at data from a JSON that I want to validate rather than to the specification itself.

With jq this task can be solved.
jq -r '([.E[] | to_entries[].value] | unique) - .V |
if length == 0
then "all vertices defined"
else "undefined vertices: \(.)\n" | halt_error(1)
end
' "$FILE"
echo "exit code: $?"
Output valid file
all vertices defined
exit code: 0
Output invalid file
undefined vertices: ["IdThatDoesNotExistAbove"]
exit code: 1
If you are not interested which vertices are undefined you can use a shorter version
jq -e '([.E[] | to_entries[].value]) - .V | length == 0' "$FILE"
echo "exit code: $?"
Output valid file
true
exit code: 0
Output invalid file
false
exit code: 1

JSON Schema doesn't define a way to reference data like this, but it does have extension vocabularies, which allow the definition of custom keywords. I have created a data vocabulary that does precisely what you're looking to do.
{
"$schema": "https://json-everything.net/meta/data-2022",
"type": "object",
"$defs": {
"user-defined-vertex": {
"data": {
"enum": "/V"
}
}
},
"properties": {
"V": {
"type": "array",
"items": {"type": "string"}
},
"E": {
"type": "array",
"items": {
"type": "object",
"properties":{
"v1": { "$ref": "#/$defs/user-defined-vertex" },
"v2": { "$ref": "#/$defs/user-defined-vertex" }
},
"required": ["v1", "v2"],
"additionalProperties": false
}
}
},
"additionalProperties": false
}
The key part of this is the data keyword in #/$defs.
data takes an object with schema keywords as keys and JSON Pointers or URIs as values. If you want to extract values from the instance data, you'll use JSON Pointers. For anything else, you'll use a URI.
So for this case, I have
{
"data": {
"enum": "/V"
}
}
which says to take the value from /V in the instance data and use that as the value for the enum keyword.
In #/properties/V you define that /V must be an array with string values.
However, to my knowledge, this vocabulary is only implemented for my library, JsonSchema.Net and you'll need the extension package JsonSchema.Net.Data.

Related

Documentation based on JSON Schema

I'd like to use my JSON schemas to generate the documentation.
In the example below, I want to list all combinations ErrorNumber/ErrorMessage available in my output messages in JSON.
But I can't find a way on the object level, my attempts with "examples" or "enum" failed.
Does anyone have a solution?
{
"type": "object",
"required": [
"ErrorNumber",
"ErrorMessage"
],
"properties": {
"ErrorNumber": {
"$id": "#root/ErrorNumber",
"type": "integer"
},
"ErrorMessage": {
"$id": "#root/ErrorMessage",
"type": "string"
}
}
}
Did you mean to write "$ref" where you use "$id" in the example?
Where exactly did you have problems with enum? The following works fine for me with a draft-2020-12 Validator (and after removing your "$id"!):
{
// ... your JSON here ...
"enum": [
{"ErrorNumber": 200, "ErrorMessage": "OK"},
{"ErrorNumber": 404, "ErrorMessage": "Not found."}
// ...
]
}
Different approaches, in case you still can change that:
If your error numbers start at 0 and are contiguous, then an Array of messages might serve your purpose.
Alternatively an object with numerical keys might:
{
"200": "OK",
"404": "Not found."
}

Create merged JSON array from multiple files using jq

I have multiple JSON files one.json, two.json, three.json with the below format and I want to create a consolidated array from them using jq. So, from all the files I want to extract Name and Value field inside the Parameters and use them to create an array where the id value will be constructed from the Name value and value field will be constructed using Value field value.
input:
one.json:
{
"Parameters": [
{
"Name": "id1",
"Value": "one",
"Version": 2,
"LastModifiedDate": 1581663187.36
}
]
}
two.json
{
"Parameters": [
{
"Name": "id2",
"Value": "xyz",
"Version": 2,
"LastModifiedDate": 1581663187.36
}
]
}
three.json
{
"Parameters": [
{
"Name": "id3",
"Value": "xyz",
"Version": 2,
"LastModifiedDate": 1581663187.36
}
]
}
output:
[
{
"id": "id1",
"value": "one"
},
{
"id": "id2",
"value": "xyz"
},
{
"id": "id3",
"value": "xyz"
}
]
How to achieve this using jq
You can use a reduce expression instead of slurping the whole file into memory (-s); by iterative manipulation of the input file contents and then appending the required fields one at a time.
jq -n 'reduce inputs.Parameters[] as $d (.; . + [ { id: $d.Name, value: $d.Value } ])' one.json two.json three.json
The -n flag is to ensure that we construct the output JSON data from scratch over the input file contents made available over the inputs function. Since reduce works in an iterative manner, for each of the object in the input, we create a final array, creating the KV pair as desired.

Elegant way to select nested objects with the associated key based on a specific criteria

Given an example document in JSON similar to this:
{
"id": "post-1",
"type": "blog-post",
"tags": [
{
"id": "tag-1",
"name": "Tag 1"
},
{
"id": "tag-2",
"name": "Tag 2"
}
],
"heading": "Post 1",
"body": "this is my first blog post",
"links": [
{
"id": "post-2",
"heading": "Post 2",
"tags": [
{
"id": "tag-1",
"name": "Tag 1"
},
{
"id": "tag-3",
"name": "Tag 3"
}
]
}
],
"metadata": {
"user": {
"social": [
{
"id": "twitter",
"handle": "#user"
},
{
"id": "facebook",
"handle": "123456"
},
{
"id": "youtube",
"handle": "ABC123xyz"
}
]
},
"categories": [
{
"name": "Category 1"
},
{
"name": "Category 2"
}
]
}
}
I would like to select any object (regardless of depth) that has an attribute "id", as well as the attribute name of the parent object. The above example should be taken as just that, an example. The actual data, that I'm not at liberty to share, can have any depth and just about any structure. Attributes can be introduced and removed at any time. Using the Blog Post style is just because it is quite popular for examples and I have very limited imagination.
The attribute signifies a particular type within the domain, that might also be (but is not necessarily) coded into the value of the attribute.
If an object does not have the "id" attribute it is not interesting and should not be selected.
A very important special case is when the value of an attribute is an array of objects, in that case I need to keep the attribute name and associate it with each element in the array.
An example of the desired output would be:
[
{
"type": "tags",
"node": {
"id": "tag-1",
"name": "Tag 1"
}
},
{
"type": "tags",
"node": {
"id": "tag-2",
"name": "Tag 2"
}
},
{
"type": "links",
"node": {
"id": "post-2",
"heading": "Post 2",
"tags": [
{
"id": "tag-1",
"name": "Tag 1"
},
{
"id": "tag-3",
"name": "Tag 3"
}
]
}
},
{
"type": "tags",
"node": {
"id": "tag-1",
"name": "Tag 1"
}
},
{
"type": "tags",
"node": {
"id": "tag-3",
"name": "Tag 3"
}
},
{
"type": "social",
"node": {
"id": "twitter",
"handle": "#user"
}
},
{
"type": "social",
"node": {
"id": "facebook",
"handle": "123456"
}
},
{
"type": "social",
"node": {
"id": "youtube",
"handle": "ABC123xyz"
}
}
]
It isn't strictly necessary that the output is identical, order for instance is irrelevant for my use-case it could be grouped as well. Since the top level object has an attribute "id" it could be included with a special name, but I'd prefer if it was not included at all.
I've tried to use walk, reduce and recurse to no avail, I'm afraid my jq skills are too limited. But I imagine that a good solution would make use of at least one of them.
I would like an expression something like
to_entries[] | .value | .. | select(has("id")?)
which would select the correct objects, but with .. I'm no longer able to keep the associated attribute name.
The best I've come up with is
. as $document
| [paths | if length > 1 and .[-1] == "id" then .[0:-1] else empty end]
| map(. as $path
| $document
| { "type": [$path[] | if type == "string" then . else empty end][-1],
"node": getpath($path) })
Which works, but feels quite complicated and involves first extracting all paths, ignoring any path that does not have "id" as the last element, then remove the "id" segment to get the path to the actual object and storing the (by now last) segment that is a string, which corresponds to the parent objects attribute containing the interesting object. Finally the actual object is selected through getpath.
Is there a more elegant, or at the least shorter way to express this?
I should note that I'd like to use jq for the convenience of having bindings to other languages as well as being able to run the program on the command line.
For the scope of this question, I'm not really interested in alternatives to jq as I can imagine how to solve this differently using other tooling, but I would really like to "just" use jq.
Since the actual requirements aren’t clear to me, I’ll assume that the given implementation defines the functional requirements, and propose a shorter and hopefully sleeker version:
. as $document
| paths
| select(length > 1 and .[-1] == "id")
| .[0:-1] as $path
| { "type": last($path[] | strings),
"node": $document | getpath($path) }
This produces a stream, so if you want an array, you could simply enclose the above in square brackets.
last(stream) emits null if the stream is empty, which accords with the behavior of .[-1].
This works:
[
foreach (paths | select(.[-1] == "id" and length > 1)[:-1]) as $path ({i:.};
.o = {
type: last($path[] | strings),
node: (.i | getpath($path))
};
.o
)
]
The trick is to know that any numbers in the path indicates the value is part of an array. You'll have to adjust the path to get the parent name. But using last/1 with a string filter makes it simpler.

parsing JSON with jq to return value of element where another element has a certain value

I have some JSON output I am trying to parse with jq. I read some examples on filtering but I don't really understand it and my output it more complicated than the examples. I have no idea where to even begin beyond jq '.[]' as I don't understand the syntax of jq beyond that and the hierarchy and terminology are challenging as well. My JSON output is below. I want to return the value for Valid where the ItemName equals Item_2. How can I do this?
"1"
[
{
"GroupId": "1569",
"Title": "My_title",
"Logo": "logo.jpg",
"Tags": [
"tag1",
"tag2",
"tag3"
],
"Owner": [
{
"Name": "John Doe",
"Id": "53335"
}
],
"ItemId": "209766",
"Item": [
{
"Id": 47744,
"ItemName": "Item_1",
"Valid": false
},
{
"Id": 47872,
"ItemName": "Item_2",
"Valid": true
},
{
"Id": 47872,
"ItemName": "Item_3",
"Valid": false
}
]
}
]
"Browse"
"8fj9438jgge9hdfv0jj0en34ijnd9nnf"
"v9er84n9ogjuwheofn9gerinneorheoj"
Except for the initial and trailing JSON scalars, you'd simply write:
.[] | .Item[] | select( .ItemName == "Item_2" ) | .Valid
In your particular case, to ensure the top-level JSON scalars are ignored, you could prefix the above with:
arrays |

How to add properties to topojson file?

Given a data.tsv file such :
id code name
1 AL Alabama
2 AK Alaska
4 AZ Arizona
5 AR Arkansas
6 CA California
... ... ...
Given a topojson.json file such : (the structure is correct, the numeral values are random)
{
"type":"Topology",
"transform":
{
"scale": [0.0015484881821515486,0.0010301030103010299],
"translate":[-5.491666666666662,41.008333333333354]
},
"objects":
{
"states":
{
"type":"GeometryCollection",
"geometries":
[
{"type":"Polygon","arcs":[[0]],"properties":{"code_2":"AL"}},
{"type":"Polygon","arcs":[[1]],"properties":{"code_2":"AK"}}
]
}
},
"arcs":
[
[[2466,9916],[-25,-5],[3,-13]],
[[2357,9852],[1,-2],[1,-2]]
]
}
How to use the common fields(1) to inject the values of an other field(2) into the json file ?
1]: data.txt#code and topojson.txt.objects.states.geometries.properties.code_2
2]: data.txt#name
The end result should contains :
{"type":"Polygon","arcs":[[0]],"properties":{"code_2":"AL", "name":"Alabama" }},
{"type":"Polygon","arcs":[[1]],"properties":{"code_2":"AK", "name":"Alaska" }},
EDIT: Accepted answer:
topojson -o final.json -e data.tsv --id-property=code_2,code -p code_2,state=name -- topojson.json
Try using this:
topojson -o final.json -e data.tsv \
--id-property=code_2,code -p code_2,state=name \
-- topojson.json
Which should output:
{
"type": "Topology",
"transform": {
"scale": [
0.000016880209206372492,
0.000007005401010148724
],
"translate": [ -1.8418800213354616, 51.15278777877789 ]
},
"objects": {
"states": {
"type": "GeometryCollection",
"geometries": [
{
"type": "Polygon",
"arcs": [
[ 0 ]
],
"id": "AK",
"properties": {
"code_2": "AK",
"state": "Alaska"
}
}
]
}
},
"arcs": [
[[2466,9916],[-25,-5],[3,-13]],
[[2357,9852],[1,-2],[1,-2]]
]
}
From the Command Line Reference wiki:
--id-property name of feature property to promote to geometry id
By using the code_2 property with this option, you promote it as the feature ID.
Prepend a + in front of the input property name to coerce its value to a number.
Plus:
If the properties referenced by --id-property are null or undefined,
they are omitted from the output geometry object. Thus, the generated
objects may not have a defined ID if the input features did not have a
property with the specified name.
So, when you are using +code and +code_2, they are probably undefined, as you can't convert the AK string value to a number.
Here, the input property "FIPS" is coerced to a number and used as the
feature identifier; likewise, the column named "FIPS" is used as the
identifier in the CSV file. (If your CSV file uses a different column
name for the feature identifier, you can specify multiple id
properties, such as --id-property=+FIPS,+id.)
That's why you have to add the code to the --id-property=code_2,code option. This is how the mapping is made (the code_2 from topojson.json and the code column from data.tsv).
Then, the output property "unemployment" is generated from the
external data file, unemployment.tsv, which defines the input property
"rate"
In our case, -p code_2,state=name specifies that we will preserve the code_2 property and we will rename the name property to state. The Properties and External Properties sections in the aforementioned documentation wiki are pretty informative on the matter.
The topojson package has been deprecated. The following steps are based on the command-line cartography workflow. These interfaces are more flexible, but a little bit more complicated to use.
Install dependencies:
npm install d3-dsv ndjson-cli
Add the node_modules/.bin directory to the path so that you can easily run the commands:
PATH=$(npm bin):$PATH
Convert the tsv file into a newline-delimited json file:
tsv2json data.tsv -n > data.ndjson
{"id":"1","code":"AL","name":"Alabama"}
{"id":"2","code":"AK","name":"Alaska"}
Parse the id column as a number:
ndjson-map '{id: +d.id, code: d.code, name: d.name}' < data.ndjson > data_parsed.ndjson
{"id":1,"code":"AL","name":"Alabama"}
{"id":2,"code":"AK","name":"Alaska"}
Extract the geometries of the topojson file:
ndjson-cat topojson.json | ndjson-split 'd.objects.states.geometries' > topojson_geometries.ndjson
{"type":"Polygon","arcs":[[0]],"properties":{"code_2":"AK"}}
{"type":"Polygon","arcs":[[1]],"properties":{"code_2":"AL"}}
Join both newline-delimited json files:
ndjson-join 'd.properties.code_2' 'd.code' topojson_geometries.ndjson data_parsed.ndjson > geometries_data_join.ndjson
[{"type":"Polygon","arcs":[[0]],"properties":{"code_2":"AK"}},{"id":2,"code":"AK","name":"Alaska"}]
[{"type":"Polygon","arcs":[[1]],"properties":{"code_2":"AL"}},{"id":1,"code":"AL","name":"Alabama"}]
Add the name column to the topojson properties and only keep the topojson geometries:
ndjson-map 'd[0].properties.name = d[1].name, d[0]' < geometries_data_join.ndjson > geometries_data_merge.ndjson
{"type":"Polygon","arcs":[[0]],"properties":{"code_2":"AK","name":"Alaska"}}
{"type":"Polygon","arcs":[[1]],"properties":{"code_2":"AL","name":"Alabama"}}
Convert the previous result into an array and concat it with the original topojson file:
ndjson-join <(ndjson-cat topojson.json) <(ndjson-reduce < geometries_data_merge.ndjson) > topojson_concat.ndjson
[{
"type": "Topology",
"transform": {
"scale": [0.0015484881821515486, 0.0010301030103010299],
"translate": [-5.491666666666662, 41.008333333333354]
},
"objects": {
"states": {
"type": "GeometryCollection",
"geometries": [{
"type": "Polygon",
"arcs": [[0]],
"properties": {
"code_2": "AK"
}
}, {
"type": "Polygon",
"arcs": [[1]],
"properties": {
"code_2": "AL"
}
}
]
}
},
"arcs": [[[2466, 9916], [-25, -5], [3, -13]], [[2357, 9852], [1, -2], [1, -2]]]
}, [{
"type": "Polygon",
"arcs": [[0]],
"properties": {
"code_2": "AK",
"name": "Alaska"
}
}, {
"type": "Polygon",
"arcs": [[1]],
"properties": {
"code_2": "AL",
"name": "Alabama"
}
}
]
]
Overwrite the geometries of original topojson file and save it as a normal json file:
ndjson-map 'd[0].objects.states.geometries = d[1], d[0]' < topojson_concat.ndjson > topojson_data.json
{
"type": "Topology",
"transform": {
"scale": [0.0015484881821515486, 0.0010301030103010299],
"translate": [-5.491666666666662, 41.008333333333354]
},
"objects": {
"states": {
"type": "GeometryCollection",
"geometries": [{
"type": "Polygon",
"arcs": [[0]],
"properties": {
"code_2": "AK",
"name": "Alaska"
}
}, {
"type": "Polygon",
"arcs": [[1]],
"properties": {
"code_2": "AL",
"name": "Alabama"
}
}
]
}
},
"arcs": [[[2466, 9916], [-25, -5], [3, -13]], [[2357, 9852], [1, -2], [1, -2]]]
}
All commands in one line:
ndjson-join <(ndjson-cat topojson.json) <(ndjson-join 'd.properties.code_2' 'd.code' <(ndjson-cat topojson.json | ndjson-split 'd.objects.states.geometries') <(tsv2json data.tsv -n | ndjson-map '{id: +d.id, code: d.code, name: d.name}') | ndjson-map 'd[0].properties.name = d[1].name, d[0]' | ndjson-reduce) | ndjson-map 'd[0].objects.states.geometries = d[1], d[0]' > topojson_data.json
Notes:
I swapped "AK" and "AL" in the topojson file to check if the join really works.
The last command (before the one-liner) only works on the original output and not on the given pretty-printed version, which has newlines in it.
I tested the workflow on the subsystem for Linux since ndjson-map does not seem to work properly on Windows currently.