Pros and cons of pydantic compared to json schemas - json

As far as I understand Pydantic and Json Schemas provide similar functionality - both can be used for validating data outputs.
I am interested to understand the pros and cons of using each one. A few questions I am interested in:
Are there any differences in accruacy between them?
Which one is faster to implement in terms of development time?
Is there any functionality difference between the two? i.e. features one supports, that the other doesn't?
These are only examples of questions I am thinking about, I would love to know more about the pros and cons also.

WHile both Pydantic and Json Schema are used to verify data adheres to a certain format they're serve different use-cases:
Json Schema: a tool for defining JSON structures independent of any implementation or programming language.
Pydantic: a python specific tool for validating input data against a pydantic specific definition
You can find many implementations of Json Schema validator in many languages those are the tools that you might want to check out in a 1:1 comparison to pydantic. However, pydantic understands Json Schema: you can create pydantic code from Json Schema and also export a pydantic definition to Json Schema. They should be equivalent from a functional perspective. You can find a type mapping in the pedantic docs.
So, which should you use? Your use-case is important but most likely its not either/or. If you're python-only and prefer to define your schema in python directly definitely go for pydantic. If you need to exchange the schemas across languages or want to handle schemas generated somewhere else, you can add Json Schema on top and pydantic will be able to handle it.

Related

How should I process nested data structures (e.g. JSON, XML, Parquet) with Dask?

We often work with scientific datasets distributed as small (<10G compressed), individual, but complex files (xml/json/parquet). UniProt is one example, and here is a schema for it.
We typically process data like this using Spark since it is supported well. I wanted to see though what might exist for doing work like this with the Dataframe or Bag APIs. A few specific questions I had are:
Does anything exist for this other than writing custom python functions for Bag.map or Dataframe/Series.apply?
Given any dataset compatible with Parquet, are there any secondary ecosystems of more generic (possibly JIT compiled) functions for at least doing simple things like querying individual fields along an xml/json path?
Has anybody done work to efficiently infer a nested schema from xml/json? Even if that schema was an object that Dask/Pandas can’t use, simply knowing it would be helpful for figuring out how to write functions for something like Bag.map. I know there are a ton of Python json schema inference libraries, but none of them look to be compiled or otherwise built for performance when applied to thousands or millions of individual json objects.

What alternatives are there for creating a REST-full web service API based on JSON?

We're creating a web service and we'd like 2 things:
- to be JSON based
- to be REST-full - how much so, we haven't decided
We've already implemented custom APIs but now we'd like to follow some standards, since at some point it gets a little crazy to remember all the rules, all the exceptions, and all the undocumented parts that the creator also forgot.
Are any of you using some standards that you've found useful? Or At least, what are some alternatives?
So far I know of jsonapi and HAL.
These don't seem to be good enough though, since what we'd optimaly like is to be able to:
+ define, expose and update entities and relations between them
+ define, expose and invoke operations
+ small numbers of requests are preferable, at least where it "makes sense" (i'll leave that as a blank check)
[EDIT]
Apparently, there's OData too: http://www.odata.org/
Are any of you using some standards that you've found useful? Or At least, what are some alternatives?
Between your own question and the comments most of the big names have been mentioned. I just like to also add JSON Hyper Schema:
"JSON Schema is a JSON based format for defining the structure of JSON data. This document specifies hyperlink- and hypermedia-related keywords of JSON Schema."
http://json-schema.org/latest/json-schema-hypermedia.html
It's an extension to JSON schema and fulfils a very similar role to the others mentioned above.
I've been using json-hal for a while and like it a lot, but I'm increasingly drawn to the JSON Schema family of schemas which also handle data model definition and validation. These schemas are also the basis of the excellent Swagger REST API standard:
http://swagger.io/specification/
Hope this helps.

Painting JSON data schema

I'm writing the documentation for my project.
I need to represent the JSON nested schema, but I don't want include the raw-code.
Is there a standard way to represent JSON data schema as graphs?
There is no a standard way, and I doubt it will ever exist. Due to its graph nature, you might leverage any representation of graphs.
For json-schema you might have a look to any of the following libraries: matic, docson or jsonary.

Tools for describing JSON schemas

I'm writing a spec and need to describe some JSON objects. Big JSONs tend to get too confusing with text and tabs alone. Is there any online (preferably) tool to create diagrams like the ones on http://www.json.org/ or http://www.sqlite.org/lang_altertable.html. They use them to describe syntax, but, is there anything like it to describe JSON objects ? They are great to represent objects that are required, optional, arrays, etc.
These types of syntax diagrams are known as "railroad diagrams".
There is an online tool at http://bottlecaps.de/rr/ui that you can use to generate tour own diagrams. You must specify your grammar in EBNF notation.

Is CouchDB best suited for dynamic languages?

I'm familiar with CouchDB and the idea of mapping its results to Scala objects, as well as find some natural way to iteract with it, came immediatly.
But I see that Dynamic languages such as Ruby and Javascript do things very well with the json/document-centric/shchema-free aproach of CouchDB.
Any good aproach to do things with Couch in static languages?
I understand that CouchDB works purely with JSON objects. Since JSON is untyped, it's tempting to believe that it's more naturally suited for dynamic languages. However, XML is generally untyped too, and Scala has very good library support for creating and manipulating XML. For an exploration of Scala's XML features, see: http://www.ibm.com/developerworks/library/x-scalaxml/
Likewise with JSON. With the proper library support, dealing with JSON can feel natural even in static languages. For one approach to dealing with JSON data in Scala, see this article: http://technically.us/code/x/weaving-tweed-with-scala-and-json/
With object databases in general, sometimes it's convenient to define a "model" (using, for example, a class in the language) and use JSON or XML or some other untyped document language to be a serialized representation of the class. Proper library support can then translate between the serialized form (like JSON) and the in-memory data structures, with static typing and all the goodies that come with it. For one example of this approach, see Lift's Record which has added conversions to and from JSON: http://groups.google.com/group/liftweb/msg/63bb390a820d11ba
I wonder if you asked the right question. Why are you using Scala, and not dynamic languages? Probably because of some goodness that Scala provides you that is important for you and, I assume, your code quality. Then why aren't you using a "statically typed" (i.e. schema-based) database either? Once again I'm just assuming, but the ability to respond to change comes to mind. Production SQL databases have a horrible tendency of being very difficult to change and refactor.
So, your data is weakly typed, and your code is strongly typed. But somewhere you'll need to make the transition. This means that somewhere, you'll have a "schema" for your data even though the database has none. This schema is defined by the classes you're mapping Couch documents onto. This makes perfect sense; most uses of Couch that I've seen have a key such as "type" and for each type at least some common set of keys. Whether to hand-map the JSON to these Scala classes or to use e.g. fancy reflection tools (slower but pretty), or some even fancier Scala feature that I'm yet new to is a detail. Start with the easy-but-slow one, then see if it's fast enough.
The big thing occurs when your classes, i.e. your schema, change. Instead of ALTER'ing your tables, you can just change the class, ensure that you do something smart if for some document a key you expect is missing (because it was based on an older version of the class), and off you go. Responding to change has never been easier, and still your code is as statically typed as it can get.
If this is not good enough for you, and you want no schema at all, then you're effectively saying that you don't want to use classes to define and manipulate your data. That's fine too (though I can't imagine a use), but then the question is not about dynamic vs static languages, but about whether to use class-based OO languages at all.