How to declare an optional field in Google Pub/Sub Topic Schema? - json

I'm trying to create a Pub/Sub Schema Topic on AVRO respecting the indications on the documentation with "default" : null indication.
I declared my optional field this way :
{
"name": "myField",
"type": ["null","string"],
"default": null
}
The error I get :
Incorrect token in the stream. Expected: Object start, found String
Do you have any idea on how to solve this ?

Not sure if this is exactly what you would like, but I would probably try something like this:
{
"type": "record",
"name": "SomeName",
"fields": [
{
"name": "myField",
"type": ["string", "null"]
}
]
}
as defined in Apache Avro specification and described in PubSub Schema creation

#Snowfire, you have mentioned in the comment that you choose to continue with schema-less topic.
You can follow this document to create Pub/Sub Schema.
To create a schema of Avro type you can try this:
{
"type": "record",
"name": "Avro",
"fields": [
{
"name": "testname",
"type": ["null", "string"],
"default": null
},
{
"name": "testId",
"type": ["null", "int"],
"default": null
}
]
}
I tested below message against the schema and it works.
{
"testname": {
"string": "Jack"
},
"testId": {
"int": 101
}
}

Related

Azure Data Factory json dataset missing property in typeProperties

I am creating a dataset from json-formatted data in an Azure Data Factory (v1). When using the following code, I get a Property expected error with the infotext Property specific to this data set type on the typeProperties object. From what I can see, I am using the same properties as in the example docs. What property am I missing?
Dataset definition:
{
"name": "JsonDataSetData",
"properties": {
"type": "AzureDataLakeStore",
"linkedServiceName": "TestAzureDataLakeStoreLinkedService",
"structure": [
{
"name": "timestamp",
"type": "String"
},
{
"name": "value",
"type": "Double"
}
],
"typeProperties": {
"folderPath": "root_folder/sub_folder",
"format": {
"type": "JsonFormat",
"filePattern": "setOfObjects",
"jsonPathDefinition": {
"spid": "$.timestamp",
"value": "$.value"
}
},
},
"availability": {
"frequency": "Day",
"interval": 1
}
}
}
Have you tried adding encodingName and nestingSeparator with the default values? The documentation has mistakes sometimes and a property that is documented as not required might be giving you this error.

Json-schema doesn't validate json with $ref-references

I have this json:
{
"categories": [
{
"id": 1,
"name": "cat1",
"topics": [
{
"$ref": "#/topics/1"
}
]
}
],
"topics": [
{
"id": 1,
"name": "top1"
}
]
}
And I've written the next schema to validate it:
{
"definitions": {
"category": {
"type": "object",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string"
},
"topics": {
"type": "array"
"items": { "$ref": "#/definitions/topic" }
}
}
},
"topic": {
"type": "object",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string"
}
}
}
},
"type": "object",
"properties": {
"categories": {
"items": { "$ref": "#/definitions/category" },
"type": "array"
},
"topics": {
"items": { "$ref": "#/definitions/topic" },
"type": "array"
}
}
}
When I use this schema on popular online validators, it doesn't catch invalid references like #/topics/5 or #/ttt/555.
Can I use this schema to validate references? Can you suggest me library or service, that can do it?
Currently this is outside the scope of JSON Schema. The proposal #erosb mentions is still under consideration, but not for the soon-to-be-forthcoming draft-07. With enough demand it may be considered for draft-08. It would be a significant expansion of the project's scope, which is why it has been on hold while other things are addressed.
Some validators make it easy to define your own extension keywords, which may be a good way to do what you want. There are definitely libraries that will apply a JSON Pointer and let you find out if it points to anything or not. If you implement #erosb's proposal somewhere, it would be great if you could comment on the issue and let us know how it works out.
I'm not sure if I properly understand what you try to achieve. I assume you want to denote that the items of the "topics" array should be JSON references ("$ref" with a JSON Pointer) _and the pointed object should match the schema "#/definitions/topic".
If this is the case, then currently there is no way to express it with json schema, so - even with the latest version - you can only denote that a string should be a json pointer, but you can't make restrictions on what the type of the referenced object should be.
Last year I made a suggestion addressing this problem, but due to mixed feedback it got somewhat stuck.

JSON Schema definition for array of objects

I've seen this other question but it's not quite the same, and I feel like my issue is simpler, but just isn't working.
My data would look like this:
[
{ "loc": "a value 1", "toll" : null, "message" : "message is sometimes null"},
{ "loc": "a value 2", "toll" : "toll is sometimes null", "message" : null}
]
I'm wanting to use AJV for JSON validation in a Node.js project and I've tried several schemas to try to describe my data, but I always get this as the error:
[ { keyword: 'type',
dataPath: '',
schemaPath: '#/type',
params: { type: 'array' },
message: 'should be array' } ]
The schema I've tried looks like this:
{
"type": "array",
"items": {
"type": "object",
"properties": {
"loc": {
"type": "string"
},
"toll": {
"type": "string"
},
"message": {
"type": "string"
}
},
"required": [
"loc"
]
}
}
I've also tried to generate the schema using this online tool but that also doesn't work, and to verify that that should output the correct result, I've tried validating that output against jsonschemavalidator.net, but that also gives me a similar error:
Found 1 error(s)
Message:
Invalid type. Expected Array but got Object.
Schema path:
#/type
You have defined your schema correctly, except that it doesn't match the data you say you are validating. If you change the property names to match the schema, you still have one issue. If you want to allow "toll" and "message" to be null, you can do the following.
{
"type": "array",
"items": {
"type": "object",
"properties": {
"loc": {
"type": "string"
},
"toll": {
"type": ["string", "null"]
},
"message": {
"type": ["string", "null"]
}
},
"required": [
"loc"
]
}
}
However, that isn't related to the error message you are getting. That message means that data you are validating is not an array. The example data you posted should not result in this error. Are you running the validator on some data other than what is posted in the question?

JSON Hyper-Schema: different schemas for GET and POST

I want to describe an API that has fields which allows for different ways to define values when POSTing an item, but only ever output in the field in one specific way.
For example, I might want to describe an API where an item can be created or updated like this: {"name": "Task", "due": "2014-12-31"} or like this: {"name": "Task", "due": {"$date": 1419984000000}}, but it is only ever returned from the API in the first way.
The schema for POST/PUT could therefore be:
{
"type": "object"
"properties": {
"name": {
"type": "string"
},
"due": {
"oneOf": [
{
"type": "string",
"format": "date"
},
{
"type": "object",
"properties": {
"$date": {
"type": "number"
}
},
"required": ["$date"],
"additionalProperties": false
}
]
}
}
}
Whereas the schema for access via GET would be much simpler:
{
"type": "object"
"properties": {
"name": {
"type": "string"
},
"due": {
"type": "string",
"format": "date"
}
}
}
It would be good for consumers of the API to know that they only have to account for one possible output method rather then all of them.
Is there any accepted standard approach to specify the different schemas within the context of JSON Hyper-Schema? I've thought about specifying these differences via the "links" property, but I do not know what "rel" I would define these schemas under and it seems very-non-standard.
If I understood correctly, and you want to specify one schema per operation you can do it with standard hyper-schema. Let's see and example for a post operation:
{
"description": "create an item.",
"href": "/items",
"method": "POST",
"rel": "create",
"schema": {
"$ref": "#/api/createitem"
},
"title": "Create an item"
}
The actual schema that is required is referenced in "schema" property through "$ref".
If you also wanted to describe the response types, then you could use "targetSchema" property. Be aware that this is advisory only (as it is explained in the docs)

Schema definition generated online Schema-Generator is not accepted by BigQuery while using Load Table API

Any relevant help will be appreciated.
I have several different JSON docs whic need to be inserted into BigQuery. Now to avoid generating schema manually, I am using the help of online Json Schema Generation tools available. But the schema generated by them are not being accepted by BigQuery Load Data wizard.
For eaxmple: for a Json data like this:
{"_id":100,"actor":"KK","message":"CCD is good in Pune",
"comment":[{"actor":"Subho","message":"CCD is not as good in Kolkata."},
{"actor":"bisu","message":"CCD is costly too in Kolkata"}]
}
the generated schema by online tool is:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"description": "Generated from c:jsonccd.json with shasum a003286a350a6889b152
b3e33afc5458f3771e9c",
"type": "object",
"required": [
"_id",
"actor",
"message",
"comment"
],
"properties": {
"_id": {
"type": "integer"
},
"actor": {
"type": "string"
},
"message": {
"type": "string"
},
"comment": {
"type": "array",
"minItems": 1,
"uniqueItems": true,
"items": {
"type": "object",
"required": [
"actor",
"message"
],
"properties": {
"actor": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
}
}
But when I put it into BigQuery in the Load Data wizard, it fails with errors.
How can this be mitigated?
Thanks.
The schema generated by that tool is way more complex than what BigQuery requires.
Look at the sample in the docs:
"schema": {
"fields": [
{"name":"f1", "type":"STRING"},
{"name":"f2", "type":"INTEGER"}
]
},
https://developers.google.com/bigquery/loading-data-into-bigquery?hl=en#loaddatapostrequest
Meanwhile the tool mentioned in the question adds fields like $schema, description, type, required, properties that are not necessary and confusing to the BigQuery schema parser.