Are there recommended ways to structure multiple JSON schemas? - json

I just wrote my first JSON schemas, great! However, I am now looking at how to structure multiple JSON schemas, both during development and later how to host them. There seems to be very limited guidance on this.
Of course, I looked up the official json-schema.org documentation on how to structure a complex scheme, and even got something to work. I now have three schemas, organized during development in a folder structure as follows:
json-schemas
- /common
- /common-schemas.json
- /data
- /data-schemas.json
- /DataStreamServiceRequest
- /DataStreamServiceRequest-schema.json
Out of these three, only DataStreamServiceRequest-schema.json contains a single schema (it is a schema for all possible requests to an application service endpoint). It refers to types defined in data-schemas.json and common-schemas.json by using relative references. The intent here was to have all types for a single subsystem (e.g., data) available in one file.
I assigned all three .json files with an $id containing the absolute URI, corresponding to the directory they are in.
For example, here is common-schemas.json:
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"$id": "https://carp.cachet.dk/schemas/common",
"NamespacedId": {
"type": "string",
"pattern": "^([a-z_0-9]+\\.?)+[a-z_0-9]$"
}
}
And, from data-schemas.json, I refer to NamespacedId in common-schemas.json using:
"dataType": { "$ref": "common#/NamespacedId" }
I'm glad this works, but, is this recommended? Am I overlooking something fundamental structuring my schemas as follows? Is there any particular reason to prefer "one schema per type"? Is there an idiomatic structure I am overlooking, e.g., an equivalent to Java's file location corresponding to namespace?
If there isn't (I would argue in Java there is) that would also constitute an answer.
Perhaps additional relevant context: I'm using networknt/json-schema-validator, and for development/testing purposes, it is very convenient to not have too many absolute URIs. I need to map each URI to a local file when initializing the validator.

Like #Ether said, there's no one right answer, but here are the guidelines I use.
The most important guide post is to treat your schemas like you would any other code in your system. Generally, that means each "thing" that you are describing ("Person", "Product", "Address", etc) should have it's own schema.
Definitions ($defs) should only be referenced ($ref) from within the same schema. Use definitions to improve readability or reduce duplication within a schema. Any reference to an external schema should not have a JSON Pointer URI fragment (e.g., the #/$defs/NamespacedId fragment you use). If you need to reference a definition in an external schema, it's probably a sign it should be in it's own schema.
An exception to this might be if you have a bunch of tiny schemas you want to put in a "common.schema.json" schema. In this case, all of your schemas should be defined as definitions, but have an anchor $anchor. It needs to be a definition so you can use standard tools to verify that your schemas are valid. Using an anchor allows you to reference it easier, but more importantly, it's a signal that some external schema might be depending on it. Definitions without an anchor are effectively private to the schema and definitions with an anchor are effectively public. You can freely refactor unanchored definitions, but you have to be wary of breaking other schemas if you refactor anchored definitions.
Don't use $ids. When a schema doesn't have an $id, it's identified by the URI that was used to retrieve the schema. That can be a file URI (file:///path/to/schemas/person.schema.json). Then all of your references can be relative meaning there are no absolute URIs in your schemas and you don't need configuration to map https URIs to file locations. Unfortunately, not all implementations support file based retrieval or even retrieval URI identification, so this isn't always possible.
Organize your schemas however makes sense. Follow the same instincts and guidelines you would for any other code. If you have to use $ids, make sure the path matches the path on the file system (Ex https://example.com/schemas/person => file:///local/path/to/app/schemas/person.schema.json). This just makes it easier to find schemas you're looking for.

Like code organization, there is no one right answer. If you are going to reuse a particular schema in a number of different environments, putting it in a separate file can make sense. But if you're only going to use it in one place, it may be more convenient to put it under a $defs in the same file. You could even give that schema a nicer name to reference it, via $anchor.

Related

Can an rdflib program use user defined ontology

Not sure if it is a dumb question, but I am looking for an example program using
rdflib which works with local ontology. I am seeing lots of examples using standard ontolgies like FOAF,
but I want to write a python program that works with a user defined ontology file on local machine,
creates a graph and nodes and arcs from the definitions from locally available
ontology file. Is it possible? Are there any guide lines etc.
RDF data can exist independently on its ontology, indeed it must be able to exist independently.
Ontology in terms of RDF is a description of the entities, properties and classes. It can specify human-readable labels or comments so that users of it know how to use it. It can link to other vocabularies so that tools that work with other ontologies might pick the meaning of your data without understanding your exact vocabulary. It can store implications and constraints so that reasoners can infer additional facts from a dataset, or check if a dataset is consistent in the first place.
That being said, you don't need to explicitly write it down to create a dataset. You most definitely should attempt to write it, but your data can live without it, and some things can be inferred from the data itself.
<a> a <c> .
<a> <p> <d> .
Already you can infer that <p> is a property, that <c> is a class, and <a> is probably neither.

What alternatives are there for creating a REST-full web service API based on JSON?

We're creating a web service and we'd like 2 things:
- to be JSON based
- to be REST-full - how much so, we haven't decided
We've already implemented custom APIs but now we'd like to follow some standards, since at some point it gets a little crazy to remember all the rules, all the exceptions, and all the undocumented parts that the creator also forgot.
Are any of you using some standards that you've found useful? Or At least, what are some alternatives?
So far I know of jsonapi and HAL.
These don't seem to be good enough though, since what we'd optimaly like is to be able to:
+ define, expose and update entities and relations between them
+ define, expose and invoke operations
+ small numbers of requests are preferable, at least where it "makes sense" (i'll leave that as a blank check)
[EDIT]
Apparently, there's OData too: http://www.odata.org/
Are any of you using some standards that you've found useful? Or At least, what are some alternatives?
Between your own question and the comments most of the big names have been mentioned. I just like to also add JSON Hyper Schema:
"JSON Schema is a JSON based format for defining the structure of JSON data. This document specifies hyperlink- and hypermedia-related keywords of JSON Schema."
http://json-schema.org/latest/json-schema-hypermedia.html
It's an extension to JSON schema and fulfils a very similar role to the others mentioned above.
I've been using json-hal for a while and like it a lot, but I'm increasingly drawn to the JSON Schema family of schemas which also handle data model definition and validation. These schemas are also the basis of the excellent Swagger REST API standard:
http://swagger.io/specification/
Hope this helps.

Application Input Specification: Drawing input data of method

Does anyone know a good way to to draw the exact structure of input data for a method? In my case I have to specify the correct input data for a server application. The server gets an http post with data. Because this data is a very complex json data structure, I want to draw this, so next developer can easily check the drawing and is able to understand, what data is needed for the http post. It would be nice if I can also draw http headers mark data as mandatory or nice to have.
I dont need a data flow diagramm or sth. like that. What I need is a drawing, how to build a valid json for the server method.
Please if anyone have an idea, just answer or comment this question, even if you just have ideas for buzz words, I can google myself.
In order to describe data structure consider (1) using the UML class diagram with multiplicities and ownership and "named association ends". Kirill Fakhroutdinov's examples uml-diagrams.org: Online Shopping and uml-diagrams.org: Sentinel HASP Licensing Domain illustrate what your drawing might look like.
As you need to specifically describe json structure then (2) Google: "json schema" to see how others approached the same problem.
Personally, besides providing the UML diagram I'd (3) consider writing a TypeScript definition file which actually can describe json structure including simple types, nested structures, optional parts etc. and moreover the next developer can validate examples of data structures (unit tests) against the definition by writing a simple TypeScript script and trying to compile it

can json-ld be used to build a unique hash signature of a json object?

This is a near duplicate of How to reliably hash JavaScript objects?, where someone wants to reliably hash javascript objects ;
Now that the json-ld specification has been validated, I saw that there is a normalization procedure that they advertise as a potential way to normalize a json object :
normalize the data using the RDF Dataset normalization algorithm, and then dump the output to normalized NQuads format. The NQuads can then be processed via SHA-256, or similar algorithm, to get a deterministic hash of the contents of the Dataset.
Building a hash of a json object has always been a pain because something like
sha1(JSON.stringify(object))
does not work or is not guaranteed to work the same across implementations (the order of the keys is not defined of example).
Does json-ld work as advertized ? Is it safe to use it as universal json normalization procedure for hashing objects ? Can those objects be standard json objects or do they need some json-ld decorations (#context,..) to be normalized ?
Yes, normalization works with JSON-LD, but the objects do need to be given context (via the #context property) in order for them to produce any RDF. It is the RDF that is deterministically output in NQuads format (and that can then be hashed, for example).
If a property in a JSON-LD document is not defined via #context, then it will be dropped during processing. JSON-LD requires that you provide global meaning (semantics) to the properties in your document by associating them with URLs. These URLs may provide further machine-readable information about the meaning of the properties, their range, domain, etc. In this way data becomes "linked" -- you can both understand the meaning of a JSON document from one API in the context of another and you can traverse documents (via HTTP) to find more information.
So the short answer to the main question is "Yes, you can use JSON-LD normalization to build a unique hash for a JSON object", however, the caveat is that the JSON object must be a JSON-LD object, which really constitutes a subset of JSON. One of the main reasons for the invention of the normalization algorithm was for hashing and digitally-signing graphs (JSON-LD documents) for comparison.

Should persistent objects validate data upon set?

If one has a object which can persist itself across executions (whether to a DB using ORM, using something like Python's shelve module, etc), should validation of that object's attributes be placed within the class representing it, or outside?
Or, rather; should the persistent object be dumb and expect whatever is setting it's values to be benevolent, or should it be smart and validate the data being assigned to it?
I'm not talking about type validation or user input validation, but rather things that affect the persistent object such as links/references to other objects exist, ensuring numbers are unsigned, that dates aren't out of scope, etc.
Validation is a part of the encapsulation- an object is responsible for it's internal state, and validation is part of it's internal state.
It's like asking "should I let an object do a function and set his own variables or should I user getters to get them all, do the work in an external function and then you setters to set them back?"
Of course you should use a library to do most of the validation- you don't want to implement the "check unsigned values" function in every model, so you implement it at one place and let each model use it in his own code as fit.
The object should validate the data input. Otherwise every part of the application which assigns data has to apply the same set of tests, and every part of the application which retrieves the persisted data will need to handle the possibility that some other module hasn't done their checks properly.
Incidentally I don't think this is an object-oriented thang. It applies to any data persistence construct which takes input. Basically, you're talking Design By Contract preconditions.
My policy is that, for a global code to be robust, each object A should check as much as possible, as early as possible. But the "as much as possible" needs explanation:
The internal coherence of each field B in A (type, range in type etc) should be checked by the field type B itself. If it is a primitive field, or a reused class, it is not possible, so the A object should check it.
The coherence of related fields (if that B field is null, then C must also be) is the typical responsibility of object A.
The coherence of a field B with other codes that are external to A is another matter. This is where the "pojo" approach (in Java, but applicable to any language) comes into play.
The POJO approach says that with all the responsibilities/concerns that we have in modern software (persistance & validation are only two of them), domain model end up being messy and hard to understand. The problem is that these domain objects are central to the understanding of the whole application, to communicating with domain experts and so on. Each time you have to read a domain object code, you have to handle the complexity of all these concerns, while you might care of none or one...
So, in the POJO approach, your domain objects must not carry code related to one of these concerns (which usually carries an interface to implement, or a superclass to have).
All concern except the domain one are out of the object (but some simple information can still be provided, in java usually via Annotations, to parameterize generic external code that handle one concern).
Also, the domain objects relate only to other domain objects, not to some framework classes related to one concern (such as validation, or persistence). So the domain model, with all classes, can be put in a separate "package" (project or whatever), without dependencies on technical or concern-related codes. This make it much easier to understand the heart of a complex application, without all that complexity of these secondary aspects.